Documente Academic
Documente Profesional
Documente Cultură
64 ar chit ect ure processor s may cont ain design def ect s or er ror s known as errat a. Cur rent char-
act er ized errat a ar e available on r equest .
I nt el
64 ar chit ect ur e- enabled BI OS. Per for mance will vary depend-
ing on your har dwar e and sof t war e conf igurat ions. Consult wit h your syst em vendor f or mor e inf ormat ion.
Enabling Execut e Disable Bit funct ionalit y r equir es a PC wit h a processor wit h Execut e Disable Bit capabilit y
and a support ing operat ing syst em. Check wit h your PC manuf act urer on whet her your syst em delivers Ex-
ecut e Disable Bit funct ionalit y.
I nt el, Pent ium, I nt el Xeon, I nt el Net Bur st , I nt el Cor e, I nt el Cor e Solo, I nt el Core Duo, I nt el Core 2 Duo,
I nt el Cor e 2 Ext r eme, I nt el Pent ium D, I t anium, I nt el SpeedSt ep, MMX, I nt el At om, and VTune are t rade-
mar ks or r egist er ed t rademar ks of I nt el Cor porat ion or it s subsidiar ies in t he Unit ed St at es and ot her coun-
t r ies.
* Ot her names and brands may be claimed as t he pr oper t y of ot her s.
Cont act your local I nt el sales off ice or your dist ribut or t o obt ain t he lat est specif icat ions and bef or e placing
your pr oduct or der.
Copies of document s which have an order ing number and are r ef er enced in t his document , or ot her I nt el
lit erat ure, may be obt ained by calling 1- 800- 548- 4725, or by visit ing I nt els websit e at http://www.intel.com
Copyr ight 1997- 2010 I nt el Corporat ion
Vol. 3A iii
CONTENTS
PAGE
CHAPTER 1
ABOUT THIS MANUAL
1.1 PROCESSORS COVERED IN THIS MANUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.3 NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.3.1 Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.3.2 Reserved Bits and Software Compatibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.3.3 Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.4 Hexadecimal and Binary Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.5 Segmented Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.6 Syntax for CPUID, CR, and MSR Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
1.3.7 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10
1.4 RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.1.1 Global and Local Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.1.1 Global and Local Descriptor Tables in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.2 System Segments, Segment Descriptors, and Gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.2.1 Gates in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.1.3 Task-State Segments and Task Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.1.3.1 Task-State Segments in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.4 Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.4.1 Interrupt and Exception Handling IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.5 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.1.5.1 Memory Management in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.1.6 System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.6.1 System Registers in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.7 Other System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.2 MODES OF OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.3 SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.3.1 System Flags and Fields in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2.4 MEMORY-MANAGEMENT REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2.4.1 Global Descriptor Table Register (GDTR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.4.2 Local Descriptor Table Register (LDTR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.4.3 IDTR Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.4.4 Task Register (TR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.5 CONTROL REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.5.1 CPUID Qualification of Control Register Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
2.6 EXTENDED CONTROL REGISTERS (INCLUDING THE XFEATURE_ENABLED_MASK REGISTER)
2-26
2.7 SYSTEM INSTRUCTION SUMMARY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
2.7.1 Loading and Storing System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.7.2 Verifying of Access Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.7.3 Loading and Storing Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.7.4 Invalidating Caches and TLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
CONTENTS
iv Vol. 3A
PAGE
2.7.5 Controlling the Processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.7.6 Reading Performance-Monitoring and Time-Stamp Counters . . . . . . . . . . . . . . . . . . . . . 2-32
2.7.6.1 Reading Counters in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.7.7 Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.7.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode. . . . . . . . . . . . . . . . . . 2-34
2.7.8 Enabling Processor Extended States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
3.1 MEMORY MANAGEMENT OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 USING SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.2.1 Basic Flat Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.2.2 Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.2.3 Multi-Segment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
3.2.4 Segmentation in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.2.5 Paging and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
3.3 PHYSICAL ADDRESS SPACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
3.3.1 Intel 64 Processors and Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
3.4 LOGICAL AND LINEAR ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
3.4.1 Logical Address Translation in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.4.2 Segment Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.4.3 Segment Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
3.4.4 Segment Loading Instructions in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
3.4.5 Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
3.4.5.1 Code- and Data-Segment Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
3.5 SYSTEM DESCRIPTOR TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
3.5.1 Segment Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20
3.5.2 Segment Descriptor Tables in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22
CHAPTER 4
PAGING
4.1 PAGING MODES AND CONTROL BITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.1 Three Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.2 Paging-Mode Enabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.3 Paging-Mode Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.1.4 Enumeration of Paging Features by CPUID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.3 32-BIT PAGING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.4 PAE PAGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.4.1 PDPTE Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.4.2 Linear-Address Translation with PAE Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
4.5 IA-32E PAGING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
4.6 ACCESS RIGHTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-35
4.7 PAGE-FAULT EXCEPTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
4.8 ACCESSED AND DIRTY FLAGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-39
4.9 PAGING AND MEMORY TYPING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
4.9.1 Paging and Memory Typing When the PAT is Not Supported (Pentium Pro and Pentium II
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
4.9.2 Paging and Memory Typing When the PAT is Supported (Pentium III and More Recent
Processor Families) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
Vol. 3A v
CONTENTS
PAGE
4.9.3 Caching Paging-Related Information about Memory Typing . . . . . . . . . . . . . . . . . . . . . . .4-41
4.10 CACHING TRANSLATION INFORMATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41
4.10.1 Process-Context Identifiers (PCIDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-42
4.10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation Lookaside Buffers (TLBs)4-43
4.10.2.1 Page Numbers, Page Frames, and Page Offsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-43
4.10.2.2 Caching Translations in TLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-44
4.10.2.3 Details of TLB Use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-44
4.10.2.4 Global Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-45
4.10.3 Paging-Structure Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-45
4.10.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caches for Paging Structures4-45
4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses . . . . . . . . . . . . .4-48
4.10.3.3 . . . . . . . . . . . . . . . . . . . . Multiple Cached Entries for a Single Paging-Structure Entry4-49
4.10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invalidation of TLBs and Paging-Structure Caches4-50
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches . . . . . . . . . . . . . . . . .4-50
4.10.4.2 Recommended Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-52
4.10.4.3 Optional Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-53
4.10.4.4 Delayed Invalidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-55
4.10.5 . . . . . . . . . . . . . . . . . Propagation of Paging-Structure Changes to Multiple Processors4-56
4.11 INTERACTIONS WITH VIRTUAL-MACHINE EXTENSIONS (VMX) . . . . . . . . . . . . . . . . . . . . . . . 4-57
4.11.1 VMX Transitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-57
4.11.2 VMX Support for Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-58
4.12 USING PAGING FOR VIRTUAL MEMORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
4.13 MAPPING SEGMENTS TO PAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
CHAPTER 5
PROTECTION
5.1 ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION. . . . . . . . . . . . . . . . . . . . . . . 5-1
5.2 FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.2.1 Code Segment Descriptor in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
5.3 LIMIT CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
5.3.1 Limit Checking in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.4 TYPE CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.4.1 Null Segment Selector Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.4.1.1 NULL Segment Checking in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.5 PRIVILEGE LEVELS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS . . . . . . . . . . . . . . . . . . . 5-12
5.6.1 Accessing Data in Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-14
5.7 PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS REGISTER. . . . . . . . . . . . . . . . . . . . . 5-14
5.8 PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING PROGRAM CONTROL BETWEEN CODE
SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
5.8.1 Direct Calls or Jumps to Code Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15
5.8.1.1 Accessing Nonconforming Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-16
5.8.1.2 Accessing Conforming Code Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-17
5.8.2 Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18
5.8.3 Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19
5.8.3.1 IA-32e Mode Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-20
5.8.4 Accessing a Code Segment Through a Call Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-22
5.8.5 Stack Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-25
5.8.5.1 Stack Switching in 64-bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-28
5.8.6 Returning from a Called Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-28
CONTENTS
vi Vol. 3A
PAGE
5.8.7 Performing Fast Calls to System Procedures with the
SYSENTER and SYSEXIT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
5.8.7.1 SYSENTER and SYSEXIT Instructions in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
5.8.8 Fast System Calls in 64-bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
5.9 PRIVILEGED INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33
5.10 POINTER VALIDATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
5.10.1 Checking Access Rights (LAR Instruction). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35
5.10.2 Checking Read/Write Rights (VERR and VERW Instructions) . . . . . . . . . . . . . . . . . . . . . . 5-36
5.10.3 Checking That the Pointer Offset Is Within Limits (LSL Instruction). . . . . . . . . . . . . . . . 5-36
5.10.4 Checking Caller Access Privileges (ARPL Instruction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
5.10.5 Checking Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39
5.11 PAGE-LEVEL PROTECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39
5.11.1 Page-Protection Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.2 Restricting Addressable Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.3 Page Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.4 Combining Protection of Both Levels of Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.11.5 Overrides to Page Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.12 COMBINING PAGE AND SEGMENT PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE BIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
5.13.1 Detecting and Enabling the Execute-Disable Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
5.13.2 Execute-Disable Page Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
5.13.3 Reserved Bit Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-45
5.13.4 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47
CHAPTER 6
INTERRUPT AND EXCEPTION HANDLING
6.1 INTERRUPT AND EXCEPTION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.2 EXCEPTION AND INTERRUPT VECTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3 SOURCES OF INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3.1 External Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3.2 Maskable Hardware Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.3.3 Software-Generated Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4 SOURCES OF EXCEPTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.1 Program-Error Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.2 Software-Generated Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.4.3 Machine-Check Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.5 EXCEPTION CLASSIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.6 PROGRAM OR TASK RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6.7 NONMASKABLE INTERRUPT (NMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.7.1 Handling Multiple NMIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8 ENABLING AND DISABLING INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8.1 Masking Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8.2 Masking Instruction Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.8.3 Masking Exceptions and Interrupts When Switching Stacks. . . . . . . . . . . . . . . . . . . . . . . 6-11
6.9 PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS . . . . . . . . . . . . . . . . . . 6-11
6.10 INTERRUPT DESCRIPTOR TABLE (IDT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.11 IDT DESCRIPTORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
6.12 EXCEPTION AND INTERRUPT HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
6.12.1 Exception- or Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . 6-18
6.12.1.2 Flag Usage By Exception- or Interrupt-Handler Procedure. . . . . . . . . . . . . . . . . . . . . 6-19
Vol. 3A vii
CONTENTS
PAGE
6.12.2 Interrupt Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-20
6.13 ERROR CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
6.14 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT MODE. . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.14.1 64-Bit Mode IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23
6.14.2 64-Bit Mode Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24
6.14.3 IRET in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.14.4 Stack Switching in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.14.5 Interrupt Stack Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26
6.15 EXCEPTION AND INTERRUPT REFERENCE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
Interrupt 0Divide Error Exception (#DE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28
Interrupt 1Debug Exception (#DB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
Interrupt 2NMI Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
Interrupt 3Breakpoint Exception (#BP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
Interrupt 4Overflow Exception (#OF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32
Interrupt 5BOUND Range Exceeded Exception (#BR) . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
Interrupt 6Invalid Opcode Exception (#UD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34
Interrupt 7Device Not Available Exception (#NM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Interrupt 8Double Fault Exception (#DF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38
Interrupt 9Coprocessor Segment Overrun. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-41
Interrupt 10Invalid TSS Exception (#TS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42
Interrupt 11Segment Not Present (#NP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-46
Interrupt 12Stack Fault Exception (#SS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-48
Interrupt 13General Protection Exception (#GP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-50
Interrupt 14Page-Fault Exception (#PF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-54
Interrupt 16x87 FPU Floating-Point Error (#MF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58
Interrupt 17Alignment Check Exception (#AC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-60
Interrupt 18Machine-Check Exception (#MC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-62
Interrupt 19SIMD Floating-Point Exception (#XM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-64
Interrupts 32 to 255User Defined Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-67
CHAPTER 7
TASK MANAGEMENT
7.1 TASK MANAGEMENT OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.1.1 Task Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.1.2 Task State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.1.3 Executing a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
7.2 TASK MANAGEMENT DATA STRUCTURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
7.2.1 Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
7.2.2 TSS Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
7.2.3 TSS Descriptor in 64-bit mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.2.4 Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.2.5 Task-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-11
7.3 TASK SWITCHING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.4 TASK LINKING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
7.4.1 Use of Busy Flag To Prevent Recursive Task Switching. . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
7.4.2 Modifying Task Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
7.5 TASK ADDRESS SPACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19
7.5.1 Mapping Tasks to the Linear and Physical Address Spaces . . . . . . . . . . . . . . . . . . . . . . . .7-19
7.5.2 Task Logical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
CONTENTS
viii Vol. 3A
PAGE
7.6 16-BIT TASK-STATE SEGMENT (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
7.7 TASK MANAGEMENT IN 64-BIT MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
CHAPTER 8
MULTIPLE-PROCESSOR MANAGEMENT
8.1 LOCKED ATOMIC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.1 Guaranteed Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
8.1.2 Bus Locking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.2.1 Automatic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.2.2 Software Controlled Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.3 Handling Self- and Cross-Modifying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.1.4 Effects of a LOCK Operation on Internal Processor Caches . . . . . . . . . . . . . . . . . . . . . . . . 8-7
8.2 MEMORY ORDERING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
8.2.1 Memory Ordering in the Intel
Pentium
and Intel486
Processors . . . . . . . . . . . . . 8-8
8.2.2 Memory Ordering in P6 and More Recent Processor Families . . . . . . . . . . . . . . . . . . . . . . 8-9
8.2.3 Examples Illustrating the Memory-Ordering Principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11
8.2.3.1 Assumptions, Terminology, and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
8.2.3.2 . . . . . . . . . . . . . . . . Neither Loads Nor Stores Are Reordered with Like Operations8-13
8.2.3.3 Stores Are Not Reordered With Earlier Loads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations . . . . . . . . . . . 8-14
8.2.3.5 Intra-Processor Forwarding Is Allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
8.2.3.6 Stores Are Transitively Visible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
8.2.3.7 Stores Are Seen in a Consistent Order by Other Processors . . . . . . . . . . . . . . . . . . . 8-16
8.2.3.8 Locked Instructions Have a Total Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17
8.2.3.9 . . . . . . . . . . . . . . . . Loads and Stores Are Not Reordered with Locked Instructions8-17
8.2.4 Out-of-Order Stores For String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
8.2.4.1 Memory-Ordering Model for String Operations on Write-back (WB) Memory . . . . 8-19
8.2.4.2 Examples Illustrating Memory-Ordering Principles for String Operations. . . . . . . . 8-20
8.2.5 Strengthening or Weakening the Memory-Ordering Model . . . . . . . . . . . . . . . . . . . . . . . . 8-23
8.3 SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25
8.4 MULTIPLE-PROCESSOR (MP) INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
8.4.1 BSP and AP Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
8.4.2 MP Initialization Protocol Requirements and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . 8-28
8.4.3 MP Initialization Protocol Algorithm for Intel Xeon Processors . . . . . . . . . . . . . . . . . . . . 8-28
8.4.4 MP Initialization Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30
8.4.4.1 Typical BSP Initialization Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31
8.4.4.2 Typical AP Initialization Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33
8.4.5 Identifying Logical Processors in an MP System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33
8.5 INTEL
82489DX EXTERNAL APIC, THE APIC, THE XAPIC, AND THE X2APIC. . . . 10-5
10.4 LOCAL APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
10.4.1 The Local APIC Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-6
10.4.2 Presence of the Local APIC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.4.3 Enabling or Disabling the Local APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.4.4 Local APIC Status and Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11
10.4.5 Relocating the Local APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.4.6 Local APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.4.7 Local APIC State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
10.4.7.1 Local APIC State After Power-Up or Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.4.7.2 Local APIC State After It Has Been Software Disabled . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.4.7.3 Local APIC State After an INIT Reset (Wait-for-SIPI State) . . . . . . . . . . . . . . . . . . 10-15
10.4.7.4 Local APIC State After It Receives an INIT-Deassert IPI . . . . . . . . . . . . . . . . . . . . . . 10-15
10.4.8 Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.5 HANDLING LOCAL INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
10.5.1 Local Vector Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
10.5.2 Valid Interrupt Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.5.3 Error Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.5.4 APIC Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
10.5.5 Local Interrupt Acceptance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.6 ISSUING INTERPROCESSOR INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.6.1 Interrupt Command Register (ICR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24
10.6.2 Determining IPI Destination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-29
10.6.2.1 Physical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-30
10.6.2.2 Logical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
10.6.2.3 Broadcast/Self Delivery Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33
10.6.2.4 Lowest Priority Delivery Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33
10.6.3 IPI Delivery and Acceptance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
10.7 SYSTEM AND APIC BUS ARBITRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35
10.8 HANDLING INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35
10.8.1 Interrupt Handling with the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . 10-36
10.8.2 Interrupt Handling with the P6 Family and Pentium Processors . . . . . . . . . . . . . . . . . 10-37
10.8.3 Interrupt, Task, and Processor Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-38
10.8.3.1 Task and Processor Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39
10.8.4 Interrupt Acceptance for Fixed Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
10.8.5 Signaling Interrupt Servicing Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-42
10.8.6 Task Priority in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.8.6.1 Interaction of Task Priorities between CR8 and APIC . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.9 SPURIOUS INTERRUPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-44
10.10 APIC BUS MESSAGE PASSING MECHANISM AND
PROTOCOL (P6 FAMILY, PENTIUM PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-45
10.10.1 Bus Message Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-46
10.11 MESSAGE SIGNALLED INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-46
10.11.1 Message Address Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-47
10.11.2 Message Data Register Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-48
10.12 EXTENDED XAPIC (X2APIC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-50
CONTENTS
xii Vol. 3A
PAGE
10.12.1 Detecting and Enabling x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-50
10.12.1.1 Instructions to Access APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
10.12.1.2 x2APIC Register Address Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-52
10.12.1.3 Reserved Bit Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-55
10.12.2 x2APIC Register Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-55
10.12.3 MSR Access in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-56
10.12.4 VM-Exit Controls for MSRs and x2APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-56
10.12.5 x2APIC State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-57
10.12.5.1 x2APIC States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-57
x2APIC After RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-58
x2APIC Transitions From x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
x2APIC Transitions From Disabled Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
State Changes From xAPIC Mode to x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10.12.6 System Software Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10.12.7 CPUID Extensions And Topology Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-60
10.12.7.1 Consistency of APIC IDs and CPUID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-61
10.12.8 Error Handling in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-61
10.12.9 ICR Operation in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-62
10.12.10 Determining IPI Destination in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
10.12.10.1 Logical Destination Mode in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
10.12.10.2 Deriving Logical x2APIC ID from the Local x2APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
10.12.11 SELF IPI Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
CHAPTER 11
MEMORY CACHE CONTROL
11.1 INTERNAL CACHES, TLBS, AND BUFFERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 CACHING TERMINOLOGY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
11.3 METHODS OF CACHING AVAILABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
11.3.1 Buffering of Write Combining Memory Locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
11.3.2 Choosing a Memory Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
11.3.3 Code Fetches in Uncacheable Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.4 CACHE CONTROL PROTOCOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.5 CACHE CONTROL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
11.5.1 Cache Control Registers and Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
11.5.2 Precedence of Cache Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-20
11.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II Processors. . . . . . . . . . 11-20
11.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families. . 11-22
11.5.2.3 Writing Values Across Pages with Different Memory Types . . . . . . . . . . . . . . . . . . 11-23
11.5.3 Preventing Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24
11.5.4 Disabling and Enabling the L3 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
11.5.5 Cache Management Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
11.5.6 L1 Data Cache Context Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.5.6.1 Adaptive Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.5.6.2 Shared Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.6 SELF-MODIFYING CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
11.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
11.8 EXPLICIT CACHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28
11.9 INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS). . . . . . . . . . . . . . . . . . . 11-29
11.10 STORE BUFFER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-29
11.11 MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Vol. 3A xiii
CONTENTS
PAGE
11.11.1 MTRR Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
11.11.2 Setting Memory Ranges with MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
11.11.2.1 IA32_MTRR_DEF_TYPE MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
11.11.2.2 Fixed Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34
11.11.2.3 Variable Range MTRRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34
11.11.2.4 System-Management Range Register Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-37
11.11.3 Example Base and Mask Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
11.11.3.1 Base and Mask Calculations for Greater-Than 36-bit Physical Address Support11-40
11.11.4 Range Size and Alignment Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.4.1 MTRR Precedences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.5 MTRR Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.6 Remapping Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7 MTRR Maintenance Programming Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7.1 MemTypeGet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7.2 MemTypeSet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-44
11.11.8 MTRR Considerations in MP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46
11.11.9 Large Page Size Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-47
11.12 PAGE ATTRIBUTE TABLE (PAT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48
11.12.1 Detecting Support for the PAT Feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48
11.12.2 IA32_PAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
11.12.3 Selecting a Memory Type from the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50
11.12.4 Programming the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50
11.12.5 PAT Compatibility with Earlier IA-32 Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-52
CHAPTER 12
INTEL
MMX
TECHNOLOGY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1
14.1.1 Software Interface For Initiating Performance State Transitions . . . . . . . . . . . . . . . . . 14-1
14.2 P-STATE HARDWARE COORDINATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
14.3 SYSTEM SOFTWARE CONSIDERATIONS AND OPPORTUNISTIC PROCESSOR PERFORMANCE
OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.3.1 Intel Dynamic Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.3.2 System Software Interfaces for Opportunistic Processor Performance Operation . 14-4
14.3.2.1 Discover Hardware Support and Enabling of Opportunistic Processor Operation 14-5
14.3.2.2 OS Control of Opportunistic Processor Performance Operation . . . . . . . . . . . . . . . . 14-5
14.3.2.3 Required Changes to OS Power Management P-state Policy. . . . . . . . . . . . . . . . . . . 14-6
14.3.2.4 Application Awareness of Opportunistic Processor Operation (Optional). . . . . . . . 14-7
14.3.3 Intel Turbo Boost Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8
14.3.4 Performance and Energy Bias Hint support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8
14.4 MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.5 THERMAL MONITORING AND PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
14.5.1 Catastrophic Shutdown Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2 Thermal Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.1 Thermal Monitor 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.2 Thermal Monitor 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.3 Two Methods for Enabling TM2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
14.5.2.4 Performance State Transitions and Thermal Monitoring. . . . . . . . . . . . . . . . . . . . . . 14-14
14.5.2.5 Thermal Status Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
14.5.2.6 Adaptive Thermal Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16
14.5.3 Software Controlled Clock Modulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16
14.5.4 Detection of Thermal Monitor and Software Controlled
Clock Modulation Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5 On Die Digital Thermal Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5.1 Digital Thermal Sensor Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5.2 Reading the Digital Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
CHAPTER 15
MACHINE-CHECK ARCHITECTURE
15.1 MACHINE-CHECK ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1
15.2 COMPATIBILITY WITH PENTIUM
PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
15.3 MACHINE-CHECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Vol. 3A xv
CONTENTS
PAGE
15.3.1 Machine-Check Global Control MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
15.3.1.1 IA32_MCG_CAP MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
15.3.1.2 IA32_MCG_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-5
15.3.1.3 IA32_MCG_CTL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2 Error-Reporting Register Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2.1 IA32_MCi_CTL MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2.2 IA32_MCi_STATUS MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-7
15.3.2.3 IA32_MCi_ADDR MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
15.3.2.4 IA32_MCi_MISC MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
15.3.2.5 IA32_MCi_CTL2 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
15.3.2.6 IA32_MCG Extended Machine Check State MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15
15.3.3 Mapping of the Pentium
Processor Machine-Check Errors
to the Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
15.4 ENHANCED CACHE ERROR REPORTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
15.5 CORRECTED MACHINE CHECK ERROR INTERRUPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
15.5.1 CMCI Local APIC Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-19
15.5.2 System Software Recommendation for Managing CMCI and Machine Check Resources .
15-21
15.5.2.1 CMCI Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-21
15.5.2.2 CMCI Threshold Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-22
15.5.2.3 CMCI Interrupt Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-23
15.6 RECOVERY OF UNCORRECTED RECOVERABLE (UCR) ERRORS . . . . . . . . . . . . . . . . . . . . . . 15-23
15.6.1 Detection of Software Error Recovery Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
15.6.2 UCR Error Reporting and Logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
15.6.3 UCR Error Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-25
15.6.4 UCR Error Overwrite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-27
15.7 MACHINE-CHECK AVAILABILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
15.8 MACHINE-CHECK INITIALIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
15.9 INTERPRETING THE MCA ERROR CODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-29
15.9.1 Simple Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
15.9.2 Compound Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
15.9.2.1 Correction Report Filtering (F) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-31
15.9.2.2 Transaction Type (TT) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-31
15.9.2.3 Level (LL) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
15.9.2.4 Request (RRRR) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
15.9.2.5 Bus and Interconnect Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-33
15.9.2.6 Memory Controller Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3 Architecturally Defined UCR Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3.1 Architecturally Defined SRAO Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3.2 Architecturally Defined SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-36
15.9.4 Multiple MCA Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-38
15.9.5 Machine-Check Error Codes Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-39
15.10 GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-39
15.10.1 Machine-Check Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-40
15.10.2 Pentium
Processor Machine-Check Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . 15-41
15.10.3 Logging Correctable Machine-Check Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-42
15.10.4 Machine-Check Software Handler Guidelines for Error Recovery . . . . . . . . . . . . . . . . 15-44
15.10.4.1 Machine-Check Exception Handler for Error Recovery . . . . . . . . . . . . . . . . . . . . . . . 15-44
15.10.4.2 Corrected Machine-Check Handler for Error Recovery . . . . . . . . . . . . . . . . . . . . . . . 15-50
CONTENTS
xvi Vol. 3A
PAGE
CHAPTER 16
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.1 OVERVIEW OF DEBUG SUPPORT FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1
16.2 DEBUG REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
16.2.1 Debug Address Registers (DR0-DR3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.2 Debug Registers DR4 and DR5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.3 Debug Status Register (DR6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.4 Debug Control Register (DR7). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5
16.2.5 Breakpoint Field Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6
16.2.6 Debug Registers and Intel
64 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-8
16.3 DEBUG EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9
16.3.1 Debug Exception (#DB)Interrupt Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9
16.3.1.1 Instruction-Breakpoint Exception Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
16.3.1.2 Data Memory and I/O Breakpoint Exception Conditions. . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.3 General-Detect Exception Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.4 Single-Step Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.5 Task-Switch Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13
16.3.2 Breakpoint Exception (#BP)Interrupt Vector 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13
16.4 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING OVERVIEW. . . . . . . . . . . . . . 16-14
16.4.1 IA32_DEBUGCTL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-14
16.4.2 Monitoring Branches, Exceptions, and Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-16
16.4.3 Single-Stepping on Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . 16-16
16.4.4 Branch Trace Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
16.4.5 Branch Trace Store (BTS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
16.4.6 CPL-Qualified Branch Trace Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18
16.4.7 Freezing LBR and Performance Counters on PMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18
16.4.8 LBR Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19
16.4.8.1 LBR Stack and Intel
64 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
16.4.8.2 LBR Stack and IA-32 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
16.4.8.3 Last Exception Records and Intel 64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-21
16.4.9 BTS and DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-21
16.4.9.1 DS Save Area and IA-32e Mode Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-25
16.4.9.2 Setting Up the DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-28
16.4.9.3 Setting Up the BTS Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-29
16.4.9.4 Setting Up CPL-Qualified BTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-30
16.4.9.5 Writing the DS Interrupt Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-31
16.5 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL
CORE
2 DUO AND
INTEL
ATOM
CORE
I7 PROCESSOR
FAMILY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-33
16.6.1 LBR Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-34
16.6.2 Filtering of Last Branch Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
16.7 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PROCESSORS BASED ON INTEL
NETBURST
MICROARCHITECTURE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-36
16.7.1 MSR_DEBUGCTLA MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-37
16.7.2 LBR Stack for Processors Based on Intel NetBurst Microarchitecture . . . . . . . . . . . . 16-38
16.7.3 Last Exception Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-40
16.8 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL
CORE
SOLO AND
INTEL
CORE
DUO PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-41
Vol. 3A xvii
CONTENTS
PAGE
16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PENTIUM M PROCESSORS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-43
16.10 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (P6 FAMILY PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-45
16.10.1 DEBUGCTLMSR Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-45
16.10.2 Last Branch and Last Exception MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-46
16.10.3 Monitoring Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-47
16.11 TIME-STAMP COUNTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-48
16.11.1 Invariant TSC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-49
16.11.2 IA32_TSC_AUX Register and RDTSCP Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-50
CHAPTER 17
8086 EMULATION
17.1 REAL-ADDRESS MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1
17.1.1 Address Translation in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-3
17.1.2 Registers Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-4
17.1.3 Instructions Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-4
17.1.4 Interrupt and Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-6
17.2 VIRTUAL-8086 MODE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
17.2.1 Enabling Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-9
17.2.2 Structure of a Virtual-8086 Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-9
17.2.3 Paging of Virtual-8086 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10
17.2.4 Protection within a Virtual-8086 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-11
17.2.5 Entering Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-11
17.2.6 Leaving Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-14
17.2.7 Sensitive Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8 Virtual-8086 Mode I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8.1 I/O-Port-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8.2 Memory-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.2.8.3 Special I/O Buffers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.3 INTERRUPT AND EXCEPTION HANDLING
IN VIRTUAL-8086 MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.3.1 Class 1Hardware Interrupt and Exception Handling in Virtual-8086 Mode. . . . . . 17-18
17.3.1.1 Handling an Interrupt or Exception Through a Protected-Mode Trap or Interrupt Gate
17-18
17.3.1.2 Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception
Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-20
17.3.1.3 Handling an Interrupt or Exception Through a Task Gate . . . . . . . . . . . . . . . . . . . . 17-21
17.3.2 Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual
Interrupt Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-22
17.3.3 Class 3Software Interrupt Handling in Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . 17-24
17.3.3.1 Method 1: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-27
17.3.3.2 Methods 2 and 3: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.3 Method 4: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.4 Method 5: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.5 Method 6: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-29
17.4 PROTECTED-MODE VIRTUAL INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-30
CONTENTS
xviii Vol. 3A
PAGE
CHAPTER 18
MIXING 16-BIT AND 32-BIT CODE
18.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
18.2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT . . . . . . . . . . . . . . . . . 18-2
18.3 SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4
18.4 TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . . . . . 18-4
18.4.1 Code-Segment Pointer Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.4.2 Stack Management for Control Transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.4.2.1 Controlling the Operand-Size Attribute For a Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7
18.4.2.2 Passing Parameters With a Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.3 Interrupt Control Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.4 Parameter Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.5 Writing Interface Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9
CHAPTER 19
ARCHITECTURE COMPATIBILITY
19.1 PROCESSOR FAMILIES AND CATEGORIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1
19.2 RESERVED BITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
19.3 ENABLING NEW FUNCTIONS AND MODES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
19.4 DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE . . . . . . . . . . . . . . 19-3
19.5 INTEL MMX TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3
19.6 STREAMING SIMD EXTENSIONS (SSE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3
19.7 STREAMING SIMD EXTENSIONS 2 (SSE2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.8 STREAMING SIMD EXTENSIONS 3 (SSE3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.9 ADDITIONAL STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.10 INTEL HYPER-THREADING TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.11 MULTI-CORE TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.12 SPECIFIC FEATURES OF DUAL-CORE PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.13 NEW INSTRUCTIONS IN THE PENTIUM AND LATER IA-32 PROCESSORS . . . . . . . . . . . . . . 19-5
19.13.1 Instructions Added Prior to the Pentium Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6
19.14 OBSOLETE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.15 UNDEFINED OPCODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.16 NEW FLAGS IN THE EFLAGS REGISTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.16.1 Using EFLAGS Flags to Distinguish Between 32-Bit IA-32 Processors . . . . . . . . . . . . . 19-8
19.17 STACK OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-8
19.17.1 PUSH SP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-8
19.17.2 EFLAGS Pushed on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18 X87 FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18.1 Control Register CR0 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18.2 x87 FPU Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10
19.18.2.1 Condition Code Flags (C0 through C3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10
19.18.2.2 Stack Fault Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.3 x87 FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.4 x87 FPU Tag Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.5 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
19.18.5.1 NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
19.18.5.2 Pseudo-zero, Pseudo-NaN, Pseudo-infinity, and Unnormal Formats . . . . . . . . . . . 19-12
19.18.6 Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
19.18.6.1 Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
19.18.6.2 Numeric Overflow Exception (#O). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
Vol. 3A xix
CONTENTS
PAGE
19.18.6.3 Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.4 Exception Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.5 CS and EIP For FPU Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.6 FPU Error Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.7 Assertion of the FERR# Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-15
19.18.6.8 Invalid Operation Exception On Denormals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-15
19.18.6.9 Alignment Check Exceptions (#AC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.10 Segment Not Present Exception During FLDENV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.11 Device Not Available Exception (#NM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.12 Coprocessor Segment Overrun Exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.13 General Protection Exception (#GP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.14 Floating-Point Error Exception (#MF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.7 Changes to Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.1 FDIV, FPREM, and FSQRT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.2 FSCALE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.3 FPREM1 Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.4 FPREM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.5 FUCOM, FUCOMP, and FUCOMPP Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.6 FPTAN Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.7 Stack Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.8 FSIN, FCOS, and FSINCOS Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.9 FPATAN Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.10 F2XM1 Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.11 FLD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.12 FXTRACT Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.13 Load Constant Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.14 FSETPM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.15 FXAM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.7.16 FSAVE and FSTENV Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.8 Transcendental Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.9 Obsolete Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.10 WAIT/FWAIT Prefix Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.18.11 Operands Split Across Segments and/or Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.18.12 FPU Instruction Synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.19 SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.20 FPU AND MATH COPROCESSOR INITIALIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-22
19.20.1 Intel
CORE
SOLO AND INTEL
CORE
DUO
PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-14
30.4 PERFORMANCE MONITORING (PROCESSORS BASED ON INTEL
CORE
MICROARCHITECTURE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-16
30.4.1 Fixed-function Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-18
30.4.2 Global Counter Control Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-19
30.4.3 At-Retirement Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-21
30.4.4 Precise Event Based Sampling (PEBS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-22
30.4.4.1 Setting up the PEBS Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
30.4.4.2 PEBS Record Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
30.4.4.3 Writing a PEBS Interrupt Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
CONTENTS
xxviii Vol. 3A
PAGE
30.5 PERFORMANCE MONITORING (PROCESSORS BASED ON INTEL
ATOM
MICROARCHITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-25
30.6 PERFORMANCE MONITORING FOR PROCESSORS BASED ON INTEL
MICROARCHITECTURE
CODENAME NEHALEM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-26
30.6.1 Enhancements of Performance Monitoring in the Processor Core . . . . . . . . . . . . . . . . 30-27
30.6.1.1 Precise Event Based Sampling (PEBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-27
30.6.1.2 Load Latency Performance Monitoring Facility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-32
30.6.1.3 Off-core Response Performance Monitoring in the Processor Core. . . . . . . . . . . . 30-34
30.6.2 Performance Monitoring Facility in the Uncore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-37
30.6.2.1 Uncore Performance Monitoring Management Facility. . . . . . . . . . . . . . . . . . . . . . . . 30-37
30.6.2.2 Uncore Performance Event Configuration Facility. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-40
30.6.2.3 Uncore Address/Opcode Match MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-42
30.6.3 Intel Xeon Processor 7500 Series Performance Monitoring Facility . . . . . . . . . . . . . . 30-43
30.7 PERFORMANCE MONITORING FOR PROCESSORS BASED ON NEXT GENERATION INTEL
PROCESSOR (CODENAMED WESTMERE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-46
30.8 PERFORMANCE MONITORING (PROCESSORS
BASED ON INTEL NETBURST MICROARCHITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-46
30.8.1 ESCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-50
30.8.2 Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-52
30.8.3 CCCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-53
30.8.4 Debug Store (DS) Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-55
30.8.5 Programming the Performance Counters
for Non-Retirement Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
30.8.5.1 Selecting Events to Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
30.8.5.2 Filtering Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-58
30.8.5.3 Starting Event Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
30.8.5.4 Reading a Performance Counters Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
30.8.5.5 Halting Event Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-61
30.8.5.6 Cascading Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-61
30.8.5.7 EXTENDED CASCADING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-62
30.8.5.8 Generating an Interrupt on Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-64
30.8.5.9 Counter Usage Guideline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-64
30.8.6 At-Retirement Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-65
30.8.6.1 Using At-Retirement Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-66
30.8.6.2 Tagging Mechanism for Front_end_event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-67
30.8.6.3 Tagging Mechanism For Execution_event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-67
30.8.6.4 Tagging Mechanism for Replay_event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-68
30.8.7 Precise Event-Based Sampling (PEBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-68
30.8.7.1 Detection of the Availability of the PEBS Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.2 Setting Up the DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.3 Setting Up the PEBS Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.4 Writing a PEBS Interrupt Service Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.5 Other DS Mechanism Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-70
30.8.8 Operating System Implications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-70
30.9 PERFORMANCE MONITORING AND INTEL HYPER-THREADING TECHNOLOGY IN
PROCESSORS BASED ON INTEL NETBURST MICROARCHITECTURE . . . . . . . . . . . . . . . . . 30-70
30.9.1 ESCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-71
30.9.2 CCCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-72
30.9.3 IA32_PEBS_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-74
30.9.4 Performance Monitoring Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-74
30.10 COUNTING CLOCKS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-76
30.10.1 Non-Halted Clockticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-77
Vol. 3A xxix
CONTENTS
PAGE
30.10.2 Non-Sleep Clockticks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-78
30.10.3 Incrementing the Time-Stamp Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.10.4 Non-Halted Reference Clockticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.10.5 Cycle Counting and Opportunistic Processor Operation . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.11 PERFORMANCE MONITORING, BRANCH PROFILING AND SYSTEM EVENTS . . . . . . . . . . 30-80
30.12 PERFORMANCE MONITORING AND DUAL-CORE TECHNOLOGY. . . . . . . . . . . . . . . . . . . . . . 30-81
30.13 PERFORMANCE MONITORING ON 64-BIT INTEL XEON PROCESSOR MP WITH UP TO 8-
MBYTE L3 CACHE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-81
30.14 PERFORMANCE MONITORING ON L3 AND CACHING BUS CONTROLLER SUB-SYSTEMS . 30-
86
30.14.1 Overview of Performance Monitoring with L3/Caching Bus Controller . . . . . . . . . . . 30-88
30.14.2 GBSQ Event Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-89
30.14.3 GSNPQ Event Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-91
30.14.4 FSB Event Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-93
30.14.4.1 FSB Sub-Event Mask Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-94
30.14.5 Common Event Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-95
30.15 PERFORMANCE MONITORING (P6 FAMILY PROCESSOR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-95
30.15.1 PerfEvtSel0 and PerfEvtSel1 MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-96
30.15.2 PerfCtr0 and PerfCtr1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-98
30.15.3 Starting and Stopping the Performance-Monitoring Counters . . . . . . . . . . . . . . . . . . . 30-98
30.15.4 Event and Time-Stamp Monitoring Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-99
30.15.5 Monitoring Counter Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-99
30.16 PERFORMANCE MONITORING (PENTIUM PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . 30-100
30.16.1 Control and Event Select Register (CESR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-100
30.16.2 Use of the Performance-Monitoring Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-102
30.16.3 Events Counted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-102
APPENDIX A
PERFORMANCE-MONITORING EVENTS
A.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A.2 PERFORMANCE MONITORING EVENTS FOR INTEL
CORE
PROCESSOR
(CODENAMED WESTMERE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-53
A.4 PERFORMANCE MONITORING EVENTS FOR INTEL
XEON
CORE
2
EXTREME PROCESSORS QX 9000 SERIES . . . . . . . . . . . A-108
A.5 PERFORMANCE MONITORING EVENTS FOR INTEL
XEON
CORE
2
DUO PROCESSORS. . . . . . . . . . . . . . . . . . . . . A-108
A.6 PERFORMANCE MONITORING EVENTS FOR INTEL
ATOM
PROCESSORS. . . . . . . . . A-153
A.7 PERFORMANCE MONITORING EVENTS FOR INTEL
CORE
CORE
DUO PROCESSORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-176
A.8 PENTIUM 4 AND INTEL XEON PROCESSOR PERFORMANCE-MONITORING EVENTS. . . A-185
A.9 PERFORMANCE MONITORING EVENTS FOR
INTEL
PENTIUM
M PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-234
A.10 P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . . . . . . . . . . . . . A-237
A.11 PENTIUM PROCESSOR PERFORMANCE-
MONITORING EVENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-255
CONTENTS
xxx Vol. 3A
PAGE
APPENDIX B
MODEL-SPECIFIC REGISTERS (MSRS)
B.1 ARCHITECTURAL MSRS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.2 MSRS IN THE INTEL
CORE
2 PROCESSOR FAMILY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-40
B.3 MSRS IN THE INTEL
ATOM
PROCESSOR FAMILY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-61
B.4 MSRS IN THE INTEL
4 AND INTEL
XEON
PROCESSORS . . . . . . . . . . . . . . . . . . B-121
B.6.1 MSRs Unique to Intel Xeon Processor MP with L3 Cache . . . . . . . . . . . . . . . . . . . . . . . . B-161
B.7 MSRS IN INTEL
CORE
CORE
DUO PROCESSORS . . . . . . . . . . B-164
B.8 MSRS IN THE PENTIUM M PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-177
B.9 MSRS IN THE P6 FAMILY PROCESSORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-187
B.10 MSRS IN PENTIUM PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-199
APPENDIX C
MP INITIALIZATION FOR P6 FAMILY PROCESSORS
C.1 OVERVIEW OF THE MP INITIALIZATION PROCESS FOR P6 FAMILY PROCESSORS . . . . . . . C-1
C.2 MP INITIALIZATION PROTOCOL ALGORITHM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
C.2.1 Error Detection and Handling During the MP Initialization Protocol . . . . . . . . . . . . . . . . . .C-4
APPENDIX D
PROGRAMMING THE LINT0 AND LINT1 INPUTS
D.1 CONSTANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
D.2 LINT[0:1] PINS PROGRAMMING PROCEDURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
APPENDIX E
INTERPRETING MACHINE-CHECK
ERROR CODES
E.1 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 06H MACHINE ERROR
CODES FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
E.2 INCREMENTAL DECODING INFORMATION: INTEL CORE 2 PROCESSOR FAMILY MACHINE
ERROR CODES FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-5
E.2.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor 7400 Series . . . .E-9
E.2.1.1 Processor Machine Check Status Register
Incremental MCA Error Code Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .E-9
E.2.2 Intel Xeon Processor 7400 Model Specific Error Code Field. . . . . . . . . . . . . . . . . . . . . . . E-10
E.2.2.1 Processor Model Specific Error Code Field
Type B: Bus and Interconnect Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
E.2.2.2 Processor Model Specific Error Code Field
Type C: Cache Bus Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
E.3 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY WITH CPUID
DISPLAYFAMILY_DISPLAYMODEL SIGNATURE 06_1AH, MACHINE ERROR CODES FOR
MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-11
E.3.1 QPI Machine Check Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-12
E.3.2 Internal Machine Check Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-13
E.3.3 Memory Controller Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-14
Vol. 3A xxxi
CONTENTS
PAGE
E.4 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 0FH MACHINE ERROR CODES
FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-15
E.4.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor MP 7100 Series . E-
16
E.4.1.1 Processor Machine Check Status Register
MCA Error Code Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-18
E.4.2 Other_Info Field (all MCA Error Types) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-19
E.4.3 Processor Model Specific Error Code Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
E.4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MCA Error Type A: L3 ErrorE-21
E.4.3.2 Processor Model Specific Error Code Field
Type B: Bus and Interconnect Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
E.4.3.3 Processor Model Specific Error Code Field
Type C: Cache Bus Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-23
APPENDIX F
APIC BUS MESSAGE FORMATS
F.1 BUS MESSAGE FORMATS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-1
F.2 EOI MESSAGE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-1
F.2.1 Short Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-2
F.2.2 Non-focused Lowest Priority Message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-3
F.2.3 APIC Bus Status Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-5
APPENDIX G
VMX CAPABILITY REPORTING FACILITY
G.1 BASIC VMX INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-1
G.2 RESERVED CONTROLS AND DEFAULT SETTINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
G.3 VM-EXECUTION CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-3
G.3.1 Pin-Based VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-3
G.3.2 Primary Processor-Based VM-Execution Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-4
G.3.3 Secondary Processor-Based VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-5
G.4 VM-EXIT CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-6
G.5 VM-ENTRY CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-7
G.6 MISCELLANEOUS DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-7
G.7 VMX-FIXED BITS IN CR0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-8
G.8 VMX-FIXED BITS IN CR4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
G.9 VMCS ENUMERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
G.10 VPID AND EPT CAPABILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
APPENDIX H
FIELD ENCODING IN VMCS
H.1 16-BIT FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.1 16-Bit Control Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.2 16-Bit Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.3 16-Bit Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-2
H.2 64-BIT FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-2
H.2.1 64-Bit Control Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-3
H.2.2 64-Bit Read-Only Data Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-4
H.2.3 64-Bit Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-4
H.2.4 64-Bit Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-5
CONTENTS
xxxii Vol. 3A
PAGE
H.3 32-BIT FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-6
H.3.1 32-Bit Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-6
H.3.2 32-Bit Read-Only Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-7
H.3.3 32-Bit Guest-State Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-7
H.3.4 32-Bit Host-State Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4 NATURAL-WIDTH FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4.1 Natural-Width Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4.2 Natural-Width Read-Only Data Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10
H.4.3 Natural-Width Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10
H.4.4 Natural-Width Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-11
APPENDIX I
VMX BASIC EXIT REASONS
Vol. 3A xxxiii
CONTENTS
PAGE
FIGURES
Figure 1-1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Figure 1-2. Syntax for CPUID, CR, and MSR Data Presentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-10
Figure 2-1. IA-32 System-Level Registers and Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode . . . . . . . . . . . . . . . . . . . 2-4
Figure 2-3. Transitions Among the Processors Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
Figure 2-4. System Flags in the EFLAGS Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-13
Figure 2-5. Memory Management Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-16
Figure 2-6. Control Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-19
Figure 2-7. XFEATURE_ENABLED_MASK Register (XCR0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-26
Figure 3-1. Segmentation and Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Figure 3-2. Flat Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Figure 3-3. Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Figure 3-4. Multi-Segment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Figure 3-5. Logical Address to Linear Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
Figure 3-6. Segment Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10
Figure 3-7. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Figure 3-8. Segment Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear. . . . . . . . . . . . . . . . . . . . . .3-15
Figure 3-10. Global and Local Descriptor Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20
Figure 3-11. Pseudo-Descriptor Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22
Figure 4-1. Enabling and Changing Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Figure 4-2. Linear-Address Translation to a 4-KByte Page using 32-Bit Paging. . . . . . . . . . . . .4-11
Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging . . . . . . . . . . . .4-11
Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging . . . . . . . . . . . . . .4-15
Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging . . . . . . . . . . . . . . .4-18
Figure 4-6. Linear-Address Translation to a 2-MByte Page using PAE Paging. . . . . . . . . . . . . . .4-18
Figure 4-7. Formats of CR3 and Paging-Structure Entries with PAE Paging . . . . . . . . . . . . . . . .4-23
Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging . . . . . . . . . . . .4-26
Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging . . . . . . . . . . . .4-27
Figure 4-10. Linear-Address Translation to a 1-GByte Page using IA-32e Paging . . . . . . . . . . . .4-28
Figure 4-11. Formats of CR3 and Paging-Structure Entries with IA-32e Paging. . . . . . . . . . . . . .4-36
Figure 4-12. Page-Fault Error Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-38
Figure 4-13. Memory Management Convention That Assigns a Page Table
to Each Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-60
Figure 5-1. Descriptor Fields Used for Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Figure 5-2. Descriptor Fields with Flags used in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Figure 5-3. Protection Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-10
Figure 5-4. Privilege Check for Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-12
Figure 5-5. Examples of Accessing Data Segments From Various Privilege Levels . . . . . . . . . .5-13
Figure 5-6. Privilege Check for Control Transfer Without Using a Gate . . . . . . . . . . . . . . . . . . . . .5-16
Figure 5-7. Examples of Accessing Conforming and Nonconforming Code Segments From Various
Privilege Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-17
Figure 5-8. Call-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19
Figure 5-9. Call-Gate Descriptor in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-21
Figure 5-10. Call-Gate Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-22
Figure 5-11. Privilege Check for Control Transfer with Call Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-23
Figure 5-12. Example of Accessing Call Gates At Various Privilege Levels . . . . . . . . . . . . . . . . . . .5-25
Figure 5-13. Stack Switching During an Interprivilege-Level Call . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-27
Figure 5-14. MSRs Used by SYSCALL and SYSRET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-33
Figure 5-15. Use of RPL to Weaken Privilege Level of Called Procedure. . . . . . . . . . . . . . . . . . . . .5-38
CONTENTS
xxxiv Vol. 3A
PAGE
Figure 6-1. Relationship of the IDTR and IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
Figure 6-2. IDT Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
Figure 6-3. Interrupt Procedure Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines. . . . . . . 6-18
Figure 6-5. Interrupt Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Figure 6-6. Error Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
Figure 6-7. 64-Bit IDT Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Figure 6-8. IA-32e Mode Stack Usage After Privilege Level Change . . . . . . . . . . . . . . . . . . . . . . . 6-26
Figure 6-9. Page-Fault Error Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-55
Figure 7-1. Structure of a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Figure 7-2. 32-Bit Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Figure 7-3. TSS Descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
Figure 7-4. Format of TSS and LDT Descriptors in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
Figure 7-5. Task Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
Figure 7-6. Task-Gate Descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
Figure 7-7. Task Gates Referencing the Same Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Figure 7-8. Nested Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
Figure 7-9. Overlapping Linear-to-Physical Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
Figure 7-10. 16-Bit TSS Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Figure 7-11. 64-Bit TSS Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24
Figure 8-1. Example of Write Ordering in Multiple-Processor Systems . . . . . . . . . . . . . . . . . . . . . 8-11
Figure 8-2. Interpretation of APIC ID in Early MP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35
Figure 8-3. Local APICs and I/O APIC in MP System Supporting Intel HT Technology. . . . . . . . 8-39
Figure 8-4. IA-32 Processor with Two Logical Processors Supporting Intel HT Technology . 8-40
Figure 8-5. Generalized Four level Interpretation of the APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . 8-50
Figure 8-6. Conceptual Five-level Topology and 32-bit APIC ID Composition . . . . . . . . . . . . . . . 8-50
Figure 8-7. Topological Relationships between Hierarchical IDs in a Hypothetical MP Platform. 8-
53
Figure 9-1. Contents of CR0 Register after Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Figure 9-2. Version Information in the EDX Register after Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Figure 9-3. Processor State After Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of
List File) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-31
Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List File). . . 9-32
Figure 9-6. Task Switching (Lines 282-296 of List File). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-33
Figure 9-7. Applying Microcode Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37
Figure 9-8. Microcode Update Write Operation Flow [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-60
Figure 9-9. Microcode Update Write Operation Flow [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-61
Figure 10-1. Relationship of Local APIC and I/O APIC In Single-Processor Systems. . . . . . . . . . . 10-3
Figure 10-2. Local APICs and I/O APIC When Intel Xeon Processors Are Used in Multiple-
Processor Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Figure 10-3. Local APICs and I/O APIC When P6 Family Processors Are Used in Multiple-Processor
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Figure 10-4. Local APIC Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
Figure 10-5. IA32_APIC_BASE MSR (APIC_BASE_MSR in P6 Family) . . . . . . . . . . . . . . . . . . . . . . . 10-12
Figure 10-6. Local APIC ID Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Figure 10-7. Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
Figure 10-8. Local Vector Table (LVT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18
Figure 10-9. Error Status Register (ESR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Figure 10-10. Divide Configuration Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
Figure 10-11. Initial Count and Current Count Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
Figure 10-12. Interrupt Command Register (ICR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-25
Vol. 3A xxxv
CONTENTS
PAGE
Figure 10-13. Logical Destination Register (LDR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
Figure 10-14. Destination Format Register (DFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
Figure 10-15. Arbitration Priority Register (APR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
Figure 10-16. Interrupt Acceptance Flow Chart for the Local APIC (Pentium 4 and Intel Xeon
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-36
Figure 10-17. Interrupt Acceptance Flow Chart for the Local APIC (P6 Family and Pentium
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-37
Figure 10-18. Task Priority Register (TPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39
Figure 10-19. Processor Priority Register (PPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
Figure 10-20. IRR, ISR and TMR Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-41
Figure 10-21. EOI Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-42
Figure 10-22. CR8 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
Figure 10-23. Spurious-Interrupt Vector Register (SVR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-45
Figure 10-24. Layout of the MSI Message Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-47
Figure 10-25. Layout of the MSI Message Data Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-49
Figure 10-26. IA32_APIC_BASE MSR Supporting x2APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
Figure 10-27. Local x2APIC State Transitions with IA32_APIC_BASE, INIT, and RESET . . . . . . 10-58
Figure 10-28. Error Status Register (ESR) in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-62
Figure 10-29. Interrupt Command Register (ICR) in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
Figure 10-30. Logical Destination Register in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-64
Figure 10-31. SELF IPI register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
Figure 11-1. Cache Structure of the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . . . .11-1
Figure 11-2. Cache Structure of the Intel Core i7 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
Figure 11-3. Cache-Control Registers and Bits Available in Intel 64 and IA-32 Processors . . 11-16
Figure 11-4. Mapping Physical Memory With MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-31
Figure 11-5. IA32_MTRRCAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
Figure 11-6. IA32_MTRR_DEF_TYPE MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
Figure 11-7. IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn Variable-Range Register
Pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-36
Figure 11-8. IA32_SMRR_PHYSBASE and IA32_SMRR_PHYSMASK SMRR Pair. . . . . . . . . . . . . 11-38
Figure 11-9. IA32_PAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
Figure 12-1. Mapping of MMX Registers to Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . .12-2
Figure 12-2. Mapping of MMX Registers to x87 FPU Data Register Stack . . . . . . . . . . . . . . . . . . .12-7
Figure 13-1. Example of Saving the x87 FPU, MMX, SSE, SSE2, SSE3, and SSSE3 State During an
Operating-System Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
Figure 13-2. Future Layout of XSAVE/XRSTOR Area and XSTATE_BV with Five Sets of Processor
State Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Figure 13-3. OS Enabling of Processor Extended State Support . . . . . . . . . . . . . . . . . . . . . . . . . . 13-17
Figure 13-4. Application Detection of New Instruction Extensions and Processor Extended State
13-19
Figure 14-1. IA32_MPERF MSR and IA32_APERF MSR for P-state Coordination . . . . . . . . . . . . .14-2
Figure 14-2. IA32_PERF_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-6
Figure 14-3. Periodic Query of Activity Ratio of Opportunistic Processor Operation . . . . . . . . .14-7
Figure 14-4. IA32_ENERGY_PERF_BIAS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-9
Figure 14-5. Processor Modulation Through Stop-Clock Mechanism. . . . . . . . . . . . . . . . . . . . . . . 14-11
Figure 14-6. MSR_THERM2_CTL Register On Processors with CPUID Family/Model/Stepping
Signature Encoded as 0x69n or 0x6Dn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Figure 14-7. MSR_THERM2_CTL Register for Supporting TM2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
Figure 14-8. IA32_THERM_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15
Figure 14-9. IA32_THERM_INTERRUPT MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15
Figure 14-10. IA32_CLOCK_MODULATION MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17
Figure 14-11. IA32_THERM_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
CONTENTS
xxxvi Vol. 3A
PAGE
Figure 14-12. IA32_THERM_INTERRUPT Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
Figure 15-1. Machine-Check MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Figure 15-2. IA32_MCG_CAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4
Figure 15-3. IA32_MCG_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5
Figure 15-4. IA32_MCi_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Figure 15-5. IA32_MCi_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8
Figure 15-6. IA32_MCi_ADDR MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Figure 15-7. UCR Support in IA32_MCi_MISC Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
Figure 15-8. IA32_MCi_CTL2 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-14
Figure 15-9. CMCI Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-19
Figure 15-10. Local APIC CMCI LVT Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-20
Figure 16-1. Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
Figure 16-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology . . . . . . . . . . . . . . 16-9
Figure 16-3. IA32_DEBUGCTL MSR for Processors based
on Intel Core
microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-15
Figure 16-4. 64-bit Address Layout of LBR MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
Figure 16-5. DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-23
Figure 16-6. 32-bit Branch Trace Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-24
Figure 16-7. PEBS Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-25
Figure 16-8. IA-32e Mode DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-26
Figure 16-9. 64-bit Branch Trace Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-27
Figure 16-10. 64-bit PEBS Record Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-27
Figure 16-11. IA32_DEBUGCTL MSR for Processors based
on Intel microarchitecture (Nehalem). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-34
Figure 16-12. MSR_DEBUGCTLA MSR for Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . 16-38
Figure 16-13. LBR MSR Branch Record Layout for the Pentium 4
and Intel Xeon Processor Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-40
Figure 16-14. IA32_DEBUGCTL MSR for Intel Core Solo
and Intel Core
Duo Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-42
Figure 16-15. LBR Branch Record Layout for the Intel Core Solo
and Intel
Core Duo Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-43
Figure 16-16. MSR_DEBUGCTLB MSR for Pentium M Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-44
Figure 16-17. LBR Branch Record Layout for the Pentium M Processor . . . . . . . . . . . . . . . . . . . . . 16-45
Figure 16-18. DEBUGCTLMSR Register (P6 Family Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-46
Figure 17-1. Real-Address Mode Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-4
Figure 17-2. Interrupt Vector Table in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7
Figure 17-3. Entering and Leaving Virtual-8086 Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-13
Figure 17-4. Privilege Level 0 Stack After Interrupt or
Exception in Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-19
Figure 17-5. Software Interrupt Redirection Bit Map in TSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-27
Figure 18-1. Stack after Far 16- and 32-Bit Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
Figure 19-1. I/O Map Base Address Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-34
Figure 20-1. Interaction of a Virtual-Machine Monitor and Guests . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
Figure 21-1. States of VMCS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-3
Figure 25-1. Formats of EPTP and EPT Paging-Structure Entries. . . . . . . . . . . . . . . . . . . . . . . . . . 25-11
Figure 26-1. SMRAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-6
Figure 26-2. SMM Revision Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-18
Figure 26-3. Auto HALT Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
Figure 26-4. SMBASE Relocation Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-20
Figure 26-5. I/O Instruction Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-21
Figure 27-1. VMX Transitions and States of VMCS in a Logical Processor . . . . . . . . . . . . . . . . . . . 27-4
Figure 28-1. Virtual TLB Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-7
Vol. 3A xxxvii
CONTENTS
PAGE
Figure 29-1. Host External Interrupts and Guest Virtual Interrupts . . . . . . . . . . . . . . . . . . . . . . . . .29-5
Figure 30-1. Layout of IA32_PERFEVTSELx MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-4
Figure 30-2. Layout of IA32_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-7
Figure 30-3. Layout of IA32_PERF_GLOBAL_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-8
Figure 30-4. Layout of IA32_PERF_GLOBAL_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-9
Figure 30-5. Layout of IA32_PERF_GLOBAL_OVF_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-9
Figure 30-6. Layout of IA32_PERFEVTSELx MSRs Supporting Architectural Performance
Monitoring Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-10
Figure 30-7. Layout of IA32_FIXED_CTR_CTRL MSR Supporting Architectural Performance
Monitoring Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-11
Figure 30-8. Layout of Global Performance Monitoring Control MSR . . . . . . . . . . . . . . . . . . . . . . 30-12
Figure 30-9. Layout of MSR_PERF_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-19
Figure 30-10. Layout of MSR_PERF_GLOBAL_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-20
Figure 30-11. Layout of MSR_PERF_GLOBAL_STATUS MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-20
Figure 30-12. Layout of MSR_PERF_GLOBAL_OVF_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-21
Figure 30-13. Layout of IA32_PEBS_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-28
Figure 30-14. PEBS Programming Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-30
Figure 30-15. Layout of MSR_PEBS_LD_LAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-34
Figure 30-16. Layout of MSR_OFFCORE_RSP_0 and MSR_OFFCORE_RSP_1 to Configure Off-core
Response Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Figure 30-17. Layout of MSR_UNCORE_PERF_GLOBAL_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . 30-38
Figure 30-18. Layout of MSR_UNCORE_PERF_GLOBAL_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . 30-39
Figure 30-19. Layout of MSR_UNCORE_PERF_GLOBAL_OVF_CTRL MSR . . . . . . . . . . . . . . . . . . . 30-39
Figure 30-20. Layout of MSR_UNCORE_PERFEVTSELx MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-40
Figure 30-21. Layout of MSR_UNCORE_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-41
Figure 30-22. Layout of MSR_UNCORE_ADDR_OPCODE_MATCH MSR . . . . . . . . . . . . . . . . . . . . . . 30-42
Figure 30-23. Distributed Units of the Uncore of Intel Xeon Processor 7500 Series. . . . . . . . . 30-44
Figure 30-24. Event Selection Control Register (ESCR) for Pentium 4
and Intel Xeon Processors without Intel HT Technology Support . . . . . . . . . . . . . 30-51
Figure 30-25. Performance Counter (Pentium 4 and Intel Xeon Processors) . . . . . . . . . . . . . . . . 30-53
Figure 30-26. Counter Configuration Control Register (CCCR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-54
Figure 30-27. Effects of Edge Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
Figure 30-28. Event Selection Control Register (ESCR) for the Pentium 4 Processor, Intel Xeon
Processor and Intel Xeon Processor MP Supporting Hyper-Threading Technology30-
71
Figure 30-29. Counter Configuration Control Register (CCCR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-73
Figure 30-30. Layout of IA32_PERF_CAPABILITIES MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-81
Figure 30-31. Block Diagram of 64-bit Intel Xeon Processor MP with 8-MByte L3. . . . . . . . . . . 30-82
Figure 30-32. MSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH. . . . . . . . . . . . . . . . . . . . . . . . 30-83
Figure 30-33. MSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH. . . . . . . . . . . . . . . . . . . . . . . . 30-84
Figure 30-34. MSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H. . . . . . . . . . . . . . . . . . . . . . . 30-85
Figure 30-35. MSR_IFSB_CTL6, Address: 107D2H;
MSR_IFSB_CNTR7, Address: 107D3H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-86
Figure 30-36. Block Diagram of Intel Xeon Processor 7400 Series . . . . . . . . . . . . . . . . . . . . . . . . . 30-87
Figure 30-37. Block Diagram of Intel Xeon Processor 7100 Series . . . . . . . . . . . . . . . . . . . . . . . . . 30-88
Figure 30-38. MSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH . . . . . . . . . . . . . . . . . 30-90
Figure 30-39. MSR_EMON_L3_CTR_CTL2/3, Addresses: 107CEH/107CFH. . . . . . . . . . . . . . . . . . 30-93
Figure 30-40. MSR_EMON_L3_CTR_CTL4/5/6/7, Addresses: 107D0H-107D3H. . . . . . . . . . . . . 30-94
Figure 30-41. PerfEvtSel0 and PerfEvtSel1 MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-97
Figure 30-42. CESR MSR (Pentium Processor Only). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-101
Figure C-1. MP System With Multiple Pentium III Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
CONTENTS
xxxviii Vol. 3A
PAGE
TABLES
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS2-21
Table 2-2. Summary of System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
Table 3-1. Code- and Data-Segment Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Table 3-2. System-Segment and Gate-Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Table 4-1. Properties of Different Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Table 4-2. Paging Structures in the Different Paging Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Table 4-3. Use of CR3 with 32-Bit Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page. . . . . . . . . . . 4-12
Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table. . . . . . . . 4-13
Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page. . . . . . . . . . . . . . . 4-14
Table 4-7. Use of CR3 with PAE Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
Table 4-8. Format of a PAE Page-Directory-Pointer-Table Entry (PDPTE) . . . . . . . . . . . . . . . . . 4-17
Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page . . . . . . . . . . . . . 4-20
Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table . . . . . . . . . . 4-21
Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page . . . . . . . . . . . . . . . . . 4-22
Table 4-12. Use of CR3 with IA-32e Paging and CR3.PCIDE = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
Table 4-13. Use of CR3 with IA-32e Paging and CR3.PCIDE = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
Table 4-14. Format of an IA-32e PML4 Entry (PML4E) that References a Page-Directory-Pointer
Table4-29
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-
GByte Page4-29
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page . . . . . . . . . 4-31
Table 4-16. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that References a
Page Directory4-31
Table 4-18. Format of an IA-32e Page-Directory Entry that References a Page Table . . . . . . 4-33
Table 4-19. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page . . . . . . . . . . . . . 4-34
Table 5-1. Privilege Check Rules for Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
Table 5-2. 64-Bit-Mode Stack Layout After CALLF with CPL Change. . . . . . . . . . . . . . . . . . . . . . 5-28
Table 5-3. Combined Page-Directory and Page-Table Protection . . . . . . . . . . . . . . . . . . . . . . . . . 5-42
Table 5-4. Extended Feature Enable MSR (IA32_EFER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
Table 5-5. IA-32e Mode Page Level Protection Matrix
with Execute-Disable Bit Capability5-44
Table 5-6. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix
with Execute-Disable Bit Capability5-45
Table 5-7. Legacy PAE-Enabled 2-MByte Page Level Protection
with Execute-Disable Bit Capability5-45
Table 5-8. IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit Capability
Enabled5-46
Table 5-9. Reserved Bit Checking WIth Execute-Disable Bit Capability Not Enabled. . . . . . . . 5-47
Table 6-1. Protected-Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . 6-11
Table 6-3. Debug Exception Conditions and Corresponding Exception Classes. . . . . . . . . . . . . 6-29
Table 6-4. Interrupt and Exception Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-38
Table 6-5. Conditions for Generating a Double Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-39
Table 6-6. Invalid TSS Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-42
Table 6-7. Alignment Requirements by Data Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-60
Table 6-8. SIMD Floating-Point Exceptions Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65
Table 7-1. Exception Conditions Checked During a Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Table 7-2. Effect of a Task Switch on Busy Flag, NT Flag,
Vol. 3A xxxix
CONTENTS
PAGE
Previous Task Link Field, and TS Flag7-17
Table 8-1. Initial APIC IDs for the Logical Processors in a System that has Four Intel Xeon MP
Processors Supporting Intel Hyper-Threading Technology
18-53
Table 8-2. Initial APIC IDs for the Logical Processors in a System that has Two Physical
Processors Supporting Dual-Core and Intel Hyper-Threading Technology8-54
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two Physical
Processors Supporting x2APIC and Intel Hyper-Threading Technology8-54
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT . . . . . . . . . . . . . . . . . . . . . 9-2
Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors . . . . . . . . . . . . . . . 9-7
Table 9-3. Software Emulation Settings of EM, MP, and NE Flags . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing . . . . . . . . . . . . . . . . . . . . . .9-21
Table 9-5. Relationship Between BLD Item and ASM Source File. . . . . . . . . . . . . . . . . . . . . . . . . .9-35
Table 9-6. Microcode Update Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-38
Table 9-7. Microcode Update Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-40
Table 9-8. Extended Processor Signature Table Header Structure . . . . . . . . . . . . . . . . . . . . . . . .9-41
Table 9-9. Processor Signature Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-41
Table 9-10. Processor Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-43
Table 9-11. Microcode Update Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-48
Table 9-12. Microcode Update Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-55
Table 9-13. Parameters for the Presence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-56
Table 9-14. Parameters for the Write Update Data Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-57
Table 9-15. Parameters for the Control Update Sub-function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-62
Table 9-17. Parameters for the Read Microcode Update Data Function . . . . . . . . . . . . . . . . . . . .9-63
Table 9-16. Mnemonic Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-63
Table 9-18. Return Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-65
Table 10-1 Local APIC Register Address Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-8
Table 10-2. ESR Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Table 10-3 Valid Combinations for the Pentium 4 and Intel Xeon Processors
Local xAPIC Interrupt Command Register10-28
Table 10-4 Valid Combinations for the P6 Family Processors
Local APIC Interrupt Command Register10-29
Table 10-5. x2APIC Operating Mode Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
Table 10-6. Local APIC Register Address Map Supported by x2APIC. . . . . . . . . . . . . . . . . . . . . . 10-52
Table 10-7. MSR/MMIO Interface of a Local x2APIC in Different Modes of Operation . . . . . . 10-56
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors11-2
Table 11-2. Memory Types and Their Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-9
Table 11-3. Methods of Caching Available in Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium
M, Pentium 4, Intel Xeon, P6 Family, and Pentium Processors11-10
Table 11-4. MESI Cache Line States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
Table 11-5. Cache Operating Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17
Table 11-6. Effective Page-Level Memory Type for Pentium Pro and
Pentium II Processors11-21
Table 11-7. Effective Page-Level Memory Types for Pentium III and More Recent Processor
Families11-22
Table 11-8. Memory Types That Can Be Encoded in MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Table 11-9. Address Mapping for Fixed-Range MTRRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-35
Table 11-10. Memory Types That Can Be Encoded With PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
Table 11-11. Selection of PAT Entries with PAT, PCD, and PWT Flags . . . . . . . . . . . . . . . . . . . . . 11-50
Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset. . . . . . . . . 11-50
Table 12-1. Action Taken By MMX Instructions
for Different Combinations of EM, MP and TS12-1
CONTENTS
xl Vol. 3A
PAGE
Table 12-2. Effects of MMX Instructions on x87 FPU State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Table 12-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the
x87 FPU Tag Word12-4
Table 13-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2, SSE3, EM, MP,
and TS113-4
Table 13-2. Action Taken for Combinations of OSFXSR, SSSE3, SSE4, EM, and TS . . . . . . . . . . 13-5
Table 13-3. XSAVE Header Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Table 13-4. XRSTOR Action on MXCSR, x87 FPU, XMM Register. . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Table 13-5. XSAVE Action on MXCSR, x87 FPU, XMM Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Table 14-1. On-Demand Clock Modulation Duty Cycle Field Encoding. . . . . . . . . . . . . . . . . . . . . . 14-17
Table 15-1. Bits 54:53 in IA32_MCi_STATUS MSRs
when IA32_MCG_CAP[11] = 1 and UC = 015-9
Table 15-2. Overwrite Rules for Enabled Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
Table 15-3. Address Mode in IA32_MCi_MISC[8:6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
Table 15-4. Extended Machine Check State MSRs
in Processors Without Support for Intel 64 Architecture15-15
Table 15-5. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture15-16
Table 15-6. MC Error Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-26
Table 15-7. Overwrite Rules for UC, CE, and UCR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-27
Table 15-8. IA32_MCi_Status [15:0] Simple Error Code Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
Table 15-9. IA32_MCi_Status [15:0] Compound Error Code Encoding . . . . . . . . . . . . . . . . . . . . . 15-31
Table 15-10. Encoding for TT (Transaction Type) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
Table 15-11. Level Encoding for LL (Memory Hierarchy Level) Sub-Field . . . . . . . . . . . . . . . . . . . 15-32
Table 15-12. Encoding of Request (RRRR) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
Table 15-13. Encodings of PP, T, and II Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-33
Table 15-14. Encodings of MMM and CCCC Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
Table 15-15. MCA Compound Error Code Encoding for SRAO Errors . . . . . . . . . . . . . . . . . . . . . . . . 15-35
Table 15-16. IA32_MCi_STATUS Values for SRAO Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-35
Table 15-17. IA32_MCG_STATUS Flag Indication for SRAO Errors . . . . . . . . . . . . . . . . . . . . . . . . . 15-36
Table 15-18. MCA Compound Error Code Encoding for SRAR Errors . . . . . . . . . . . . . . . . . . . . . . . . 15-36
Table 15-19. IA32_MCi_STATUS Values for SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-37
Table 15-20. IA32_MCG_STATUS Flag Indication for SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . 15-37
Table 16-1. Breakpoint Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7
Table 16-2. Debug Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
Table 16-3. LBR Stack Size and TOS Pointer Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19
Table 16-4. IA32_DEBUGCTL Flag Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-29
Table 16-5. CPL-Qualified Branch Trace Store Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-30
Table 16-6. IA32_LASTBRACH_x_FROM_IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-7. IA32_LASTBRACH_x_TO_IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-8. LBR Stack Size and TOS Pointer Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-9. MSR_LBR_SELECT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-10. LBR MSR Stack Size and TOS Pointer Range for the Pentium 4 and the
Intel Xeon Processor Family16-39
Table 17-1. Real-Address Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
Table 17-2. Software Interrupt Handling Methods While in Virtual-8086 Mode. . . . . . . . . . . . 17-26
Table 18-1. Characteristics of 16-Bit and 32-Bit Program Modules . . . . . . . . . . . . . . . . . . . . . . . . 18-1
Table 19-1. New Instruction in the Pentium Processor and
Later IA-32 Processors19-6
Table 19-2. Recommended Values of the EM, MP, and NE Flags for Intel486 SX
Microprocessor/Intel 487 SX Math Coprocessor System19-22
Table 19-3. EM and MP Flag Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-23
Vol. 3A xli
CONTENTS
PAGE
Table 21-1. Format of the VMCS Region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-3
Table 21-2. Format of Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-6
Table 21-3. Format of Interruptibility State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-8
Table 21-4. Format of Pending-Debug-Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-9
Table 21-5. Definitions of Pin-Based VM-Execution Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-12
Table 21-6. Definitions of Primary Processor-Based VM-Execution Controls . . . . . . . . . . . . . . 21-13
Table 21-7. Definitions of Secondary Processor-Based VM-Execution Controls . . . . . . . . . . . 21-15
Table 21-8. Format of Extended-Page-Table Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-20
Table 21-9. Definitions of VM-Exit Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-21
Table 21-10. Format of an MSR Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-23
Table 21-11. Definitions of VM-Entry Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-24
Table 21-12. Format of the VM-Entry Interruption-Information Field . . . . . . . . . . . . . . . . . . . . . . 21-25
Table 21-13. Format of Exit Reason. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-27
Table 21-14. Format of the VM-Exit Interruption-Information Field. . . . . . . . . . . . . . . . . . . . . . . . 21-28
Table 21-15. Format of the IDT-Vectoring Information Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-29
Table 21-16. Structure of VMCS Component Encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-32
Table 24-1. Exit Qualification for Debug Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-6
Table 24-2. Exit Qualification for Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-6
Table 24-3. Exit Qualification for Control-Register Accesses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-8
Table 24-4. Exit Qualification for MOV DR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-9
Table 24-5. Exit Qualification for I/O Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-9
Table 24-6. Exit Qualification for APIC-Access VM Exits from Linear Accesses and Guest-Physical
Accesses24-10
Table 24-7. Exit Qualification for EPT Violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-11
Table 24-8. Format of the VM-Exit Instruction-Information Field as Used for INS and OUTS . . 24-
18
Table 24-9. Format of the VM-Exit Instruction-Information Field as Used for LIDT, LGDT, SIDT, or
SGDT24-19
Table 24-10. Format of the VM-Exit Instruction-Information Field as Used for LLDT, LTR, SLDT, and
STR24-21
Table 24-11. Format of the VM-Exit Instruction-Information Field as Used for VMCLEAR, VMPTRLD,
VMPTRST, and VMXON24-22
Table 24-12. Format of the VM-Exit Instruction-Information Field as Used for VMREAD and
VMWRITE24-23
Table 24-13. Format of the VM-Exit Instruction-Information Field as Used for INVEPT and INVVPID
24-25
Table 25-1. Format of an EPT PML4 Entry (PML4E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25-5
Table 25-2. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte
Page25-6
Table 25-3. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that References an
EPT Page Directory25-7
Table 25-4. Format of an EPT Page-Directory Entry (PDE) that Maps a 2-MByte Page. . . . . . .25-8
Table 25-5. Format of an EPT Page-Directory Entry (PDE) that References an EPT Page Table . .
25-9
Table 25-6. Format of an EPT Page-Table Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-10
Table 26-1. SMRAM State Save Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26-6
Table 26-2. Processor Signatures and 64-bit SMRAM State Save Map Format. . . . . . . . . . . . . .26-9
Table 26-3. SMRAM State Save Map for Intel 64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26-9
Table 26-4. Processor Register Initialization in SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-13
Table 26-5. I/O Instruction Information in the SMM State Save Map . . . . . . . . . . . . . . . . . . . . . . 26-16
Table 26-6. I/O Instruction Type Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-17
Table 26-7. Auto HALT Restart Flag Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
CONTENTS
xlii Vol. 3A
PAGE
Table 26-8. I/O Instruction Restart Field Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-21
Table 26-9. Exit Qualification for SMIs That Arrive Immediately
After the Retirement of an I/O Instruction26-28
Table 26-10. Format of MSEG Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-35
Table 27-1. Operating Modes for Host and Guest Environments. . . . . . . . . . . . . . . . . . . . . . . . . . 27-18
Table 30-1. UMask and Event Select Encodings for Pre-Defined
Architectural Performance Events30-13
Table 30-2. Core Specificity Encoding within a Non-Architectural Umask. . . . . . . . . . . . . . . . . . 30-15
Table 30-3. Agent Specificity Encoding within a Non-Architectural Umask . . . . . . . . . . . . . . . . 30-15
Table 30-4. HW Prefetch Qualification Encoding within a Non-Architectural Umask. . . . . . . . 30-16
Table 30-5. MESI Qualification Definitions within a Non-Architectural Umask. . . . . . . . . . . . . . 30-16
Table 30-6. Bus Snoop Qualification Definitions within a Non-Architectural Umask . . . . . . . . 30-17
Table 30-7. Snoop Type Qualification Definitions within a Non-Architectural Umask. . . . . . . 30-17
Table 30-8. Association of Fixed-Function Performance Counters with
Architectural Performance Events30-18
Table 30-10. PEBS Performance Events for Intel Core Microarchitecture. . . . . . . . . . . . . . . . . . . 30-22
Table 30-9. At-Retirement Performance Events for Intel Core Microarchitecture. . . . . . . . . . 30-22
Table 30-11. Requirements to Program PEBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-24
Table 30-12. PEBS Record Format for Intel Core i7 Processor Family . . . . . . . . . . . . . . . . . . . . . . 30-29
Table 30-13. Data Source Encoding for Load Latency Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-33
Table 30-14. Off-Core Response Event Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Table 30-15. MSR_OFFCORE_RSP_Z Bit Field Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Table 30-16. Opcode Field Encoding for MSR_UNCORE_ADDR_OPCODE_MATCH. . . . . . . . . . . . 30-42
Table 30-17. Uncore PMU MSR Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-44
Table 30-18. Performance Counter MSRs and Associated CCCR and
ESCR MSRs (Pentium 4 and Intel Xeon Processors)30-47
Table 30-19. Event Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
Table 30-20. CCR Names and Bit Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-62
Table 30-21. Effect of Logical Processor and CPL Qualification
for Logical-Processor-Specific (TS) Events30-75
Table 30-22. Effect of Logical Processor and CPL Qualification
for Non-logical-Processor-specific (TI) Events30-76
Table A-1. Architectural Performance Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7
Processor and Intel Xeon Processor 5500 SeriesA-3
Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7
Processor and Intel Xeon Processor 5500 SeriesA-32
Table A-4. Non-Architectural Performance Events In Next Generation Processor Core
(Codenamed Westmere)A-54
Table A-5. Non-Architectural Performance Events In the Processor Uncore for Next Generation
Intel Processor (Codenamed Westmere)A-82
Table A-6. Non-Architectural Performance Events for Processors based on Enhanced Intel Core
MicroarchitectureA-108
Table A-7. Fixed-Function Performance Counter
and Pre-defined Performance EventsA-109
Table A-8. Non-Architectural Performance Events
in Processors Based on Intel Core MicroarchitectureA-110
Table A-9. Non-Architectural Performance Events for Intel Atom Processors . . . . . . . . . . . . A-153
Table A-10. Non-Architectural Performance Events
in Intel Core Solo and Intel Core Duo ProcessorsA-176
Table A-11. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for
Non-Retirement CountingA-185
Vol. 3A xliii
CONTENTS
PAGE
Table A-12. Performance Monitoring Events For Intel NetBurst
Microarchitecture for At-Retirement CountingA-217
Table A-13. Intel NetBurst Microarchitecture Model-Specific Performance Monitoring Events (For
Model Encoding 3, 4 or 6)A-224
Table A-15. List of Metrics Available for Execution Tagging
(For Execution Event Only)A-225
Table A-14. List of Metrics Available for Front_end Tagging
(For Front_end Event Only)A-225
Table A-16. List of Metrics Available for Replay Tagging
(For Replay Event Only)A-226
Table A-17. Event Mask Qualification for Logical Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-228
Table A-18. Performance Monitoring Events on Intel
Pentium
M
ProcessorsA-234
Table A-19. Performance Monitoring Events Modified on Intel
Pentium
M Processors . . A-236
Table A-20. Events That Can Be Counted with the P6 Family Performance-
Monitoring CountersA-238
Table A-21. Events That Can Be Counted with Pentium Processor
Performance-Monitoring CountersA-255
Table B-1. CPUID Signature Values of DisplayFamily_DisplayModel . . . . . . . . . . . . . . . . . . . . . . . . B-1
Table B-2. IA-32 Architectural MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Table B-3. MSRs in Processors Based on Intel Core Microarchitecture . . . . . . . . . . . . . . . . . . . . .B-41
Table B-4. MSRs in Intel Atom Processor Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-61
Table B-5. MSRs in Processors Based on Intel Microarchitecture codename Nehalem . . . . . .B-76
Table B-6. Additional MSRs in Intel Xeon Processor 5500 and 3400 Series. . . . . . . . . . . . . . . .B-95
Table B-7. Additional MSRs in Intel Xeon Processor 7500 Series. . . . . . . . . . . . . . . . . . . . . . . . . .B-98
Table B-8. Additional MSRs supported by Next Generation Intel Processors (Codenamed
Westmere)B-120
Table B-9. MSRs in the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . B-121
Table B-10. MSRs Unique to 64-bit Intel Xeon Processor MP with
Up to an 8 MB L3 CacheB-161
Table B-11. MSRs Unique to Intel Xeon Processor 7100 Series . . . . . . . . . . . . . . . . . . . . . . . . . . B-163
Table B-12. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon
Processor LVB-164
Table B-13. MSRs in Pentium M Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-178
Table B-14. MSRs in the P6 Family Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-187
Table B-15. MSRs in the Pentium Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-199
Table C-1. Boot Phase IPI Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Table E-1. CPUID DisplayFamily_DisplayModel Signatures for Family 6 . . . . . . . . . . . . . . . . . . . . E-1
Table E-2. Incremental Decoding Information: Processor Family 06H
Machine Error Codes For Machine CheckE-2
Table E-3. CPUID DisplayFamily_DisplayModel Signatures for Processors Based on Intel Core
MicroarchitectureE-5
Table E-4. Incremental Bus Error Codes of Machine Check for Processors Based on Intel Core
MicroarchitectureE-6
Table E-5. Incremental MCA Error Code Types for Intel Xeon Processor 7400. . . . . . . . . . . . . . E-9
Table E-6. Type B Bus and Interconnect Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
Table E-7. Type C Cache Bus Controller Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
Table E-8. QPI Machine Check Error codes for IA32_MC0_STATUS and IA32_MC1_STATUSE-12
Table E-9. QPI Machine Check Error codes for IA32_MC0_MISC and IA32_MC1_MISC. . . . . . . E-13
Table E-10. Machine Check Error codes for IA32_MC7_STATUS. . . . . . . . . . . . . . . . . . . . . . . . . . . . E-13
Table E-11. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_STATUS
E-14
CONTENTS
xliv Vol. 3A
PAGE
Table E-12. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_MISC E-
15
Table E-13. Incremental Decoding Information: Processor Family 0FH
Machine Error Codes For Machine CheckE-15
Table E-14. MCi_STATUS Register Bit Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-17
Table E-15. Incremental MCA Error Code for Intel Xeon Processor MP 7100 . . . . . . . . . . . . . . . E-18
Table E-16. Other Information Field Bit Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-20
Table E-17. Type A: L3 Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
Table E-18. Type B Bus and Interconnect Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-22
Table E-19. Type C Cache Bus Controller Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-23
Table E-20. Decoding Family 0FH Machine Check Codes for Cache Hierarchy Errors . . . . . . . . E-24
Table F-1. EOI Message (14 Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-1
Table F-2. Short Message (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-2
Table F-3. Non-Focused Lowest Priority Message (34 Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-3
Table F-4. APIC Bus Status Cycles Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-5
Table G-1. Memory Types Used For VMCS Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
Table H-1. Encoding for 32-Bit Control Fields (0000_00xx_xxxx_xxx0B) . . . . . . . . . . . . . . . . . H-1
Table H-2. Encodings for 16-Bit Guest-State Fields (0000_10xx_xxxx_xxx0B) . . . . . . . . . . . . H-1
Table H-3. Encodings for 16-Bit Host-State Fields (0000_11xx_xxxx_xxx0B) . . . . . . . . . . . . . H-2
Table H-4. Encodings for 64-Bit Control Fields (0010_00xx_xxxx_xxxAb). . . . . . . . . . . . . . . . . H-3
Table H-5. Encodings for 64-Bit Read-Only Data Field (0010_01xx_xxxx_xxxAb) . . . . . . . . . H-4
Table H-6. Encodings for 64-Bit Guest-State Fields (0010_10xx_xxxx_xxxAb) . . . . . . . . . . . . H-4
Table H-7. Encodings for 64-Bit Host-State Fields (0010_11xx_xxxx_xxxAb) . . . . . . . . . . . . . H-5
Table H-8. Encodings for 32-Bit Control Fields (0100_00xx_xxxx_xxx0B) . . . . . . . . . . . . . . . . H-6
Table H-9. Encodings for 32-Bit Read-Only Data Fields (0100_01xx_xxxx_xxx0B) . . . . . . . . H-7
Table H-10. Encodings for 32-Bit Guest-State Fields
(0100_10xx_xxxx_xxx0B)H-7
Table H-11. Encoding for 32-Bit Host-State Field (0100_11xx_xxxx_xxx0B) . . . . . . . . . . . . . . . H-9
Table H-12. Encodings for Natural-Width Control Fields (0110_00xx_xxxx_xxx0B) . . . . . . . . . H-9
Table H-13. Encodings for Natural-Width Read-Only Data Fields (0110_01xx_xxxx_xxx0B) H-10
Table H-14. Encodings for Natural-Width Guest-State Fields (0110_10xx_xxxx_xxx0B) . . . H-10
Table H-15. Encodings for Natural-Width Host-State Fields (0110_11xx_xxxx_xxx0B) . . . . H-11
Table I-1. Basic Exit Reasons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-1
Vol. 3 1-1
CHAPTER 1
ABOUT THIS MANUAL
The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3A:
Syst em Programming Guide, Part 1 ( order number 253668) and t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B: Syst em Programming
Guide, Part 2 ( order number 253669) are part of a set t hat describes t he archit ect ure
and programming environment of I nt el 64 and I A- 32 Archit ect ure processors. The
ot her volumes in t his set are:
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1: Basic
Archit ect ure ( order number 253665) .
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volumes
2A & 2B: I nst ruct ion Set Reference ( order numbers 253666 and 253667) .
The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1,
describes t he basic archit ect ure and programming environment of I nt el 64 and I A- 32
processors. The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volumes 2A & 2B, describe t he inst ruct ion set of t he processor and t he opcode st ruc-
t ure. These volumes apply t o applicat ion programmers and t o programmers who
writ e operat ing syst ems or execut ives. The I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volumes 3A & 3B, describe t he operat ing- syst em support
environment of I nt el 64 and I A- 32 processors. These volumes t arget operat ing-
syst em and BI OS designers. I n addit ion, I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 3B, addresses t he programming environment for
classes of soft ware t hat host operat ing syst ems.
1.1 PROCESSORS COVERED IN THIS MANUAL
This manual set includes informat ion pert aining primarily t o t he most recent I nt el
64 and I A- 32 processors, which include:
Pent ium
processors
P6 family processors
Pent ium
4 processors
Pent ium
M processors
I nt el
Xeon
processors
Pent ium
D processors
Pent ium
Xeon
processors
I nt el
Xeon
processor LV
I nt el
Xeon
Xeon
Xeon
Xeon
Pent ium
Xeon
Xeon
Core
TM
2 Ext reme processor QX9000 and X9000 series
I nt el
Core
TM
2 Quad processor Q9000 series
I nt el
Core
TM
2 Duo processor E8000, T9000 series
I nt el
At om
TM
processor family
I nt el
Core
TM
i7 processor
I nt el
Core
TM
i5 processor
P6 family processors are I A- 32 processors based on t he P6 family microarchit ect ure.
This includes t he Pent ium
I I , Pent ium
III Xeon
processors.
The Pent ium
4, Pent ium
Xeon
processors are
based on t he I nt el Net Burst
microarchit ect ure. I nt el Xeon processor 5000, 7100
series are based on t he I nt el Net Burst
microarchit ect ure.
The I nt el
Core Duo, I nt el
processor LV
are based on an improved Pent ium
Xeon
Pent ium
dual- core, I nt el
Core2 Duo, I nt el
Core2
Ext reme processors are based on I nt el
Core microarchit ect ure.
The I nt el
Xeon
Core
TM
2 Quad processor
Q9000 series, and I nt el
Core
TM
2 Ext reme processors QX9000, X9000 series, I nt el
Core
TM
2 processor E8000 series are based on Enhanced I nt el
Core
TM
microarchit ec-
t ure.
The I nt el
At om
TM
processor family is based on t he I nt el
At om
TM
microarchit ect ure
and support s I nt el 64 archit ect ure.
Vol. 3 1-3
ABOUT THIS MANUAL
The I nt el
Core
TM
i7 processor and t he I nt el
Core
TM
i5 processor are based on t he
I nt el
microarchit ect ure codename Nehalem and support I nt el 64 archit ect ure.
Processors based on t he Next Generat ion I nt el Processor, codenamed West mere,
support I nt el 64 archit ect ure.
P6 family, Pent ium
M, I nt el
Core Solo, I nt el
processor LV, and early generat ions of Pent ium 4 and I nt el Xeon
processors support I A- 32 archit ect ure. The I nt el
Xeon
Core2 Duo, I nt el
Dual- Core
processor, newer generat ions of Pent ium 4 and I nt el Xeon processor family support
I nt el
Hyper-Threading Technology.
Chapt er 9 Pr ocessor Management and I ni t i al i zat i on. Defines t he st at e of an
I nt el 64 or I A- 32 processor aft er reset init ializat ion. This chapt er also explains how t o
set up an I nt el 64 or I A- 32 processor for real- address mode operat ion and prot ect ed-
mode operat ion, and how t o swit ch bet ween modes.
Chapt er 10 Advanced Pr ogr ammabl e I nt er r upt Cont r ol l er ( API C) .
Describes t he programming int erface t o t he local API C and gives an overview of t he
int erface bet ween t he local API C and t he I / O API C.
Chapt er 11 Memor y Cache Cont r ol . Describes t he general concept of caching
and t he caching mechanisms support ed by t he I nt el 64 or I A- 32 archit ect ures. This
chapt er also describes t he memory t ype range regist ers ( MTRRs) and how t hey can
be used t o map memory t ypes of physical memory. I nformat ion on using t he new
cache cont rol and memory st reaming inst ruct ions int roduced wit h t he Pent ium III,
Pent ium 4, and I nt el Xeon processors is also given.
Chapt er 12 I nt el
64 and I A- 32 Archit ect ures Soft ware Developer s Manual ( in five volumes)
ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el
Processor I dent ificat ion wit h t he CPUI D I nst ruct ion, AP- 485
ht t p: / / www. int el. com/ design/ processor/ applnot s/ 241618. ht m
I nt el
Hyper-Threading Technology ( I nt el
HT Technology) :
ht t p: / / developer. int el. com/ t echnology/ hypert hread/
Vol. 3 2-1
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
I A- 32 archit ect ure ( beginning wit h t he I nt el386 processor family) provides ext ensive
support for operat ing- syst em and syst em- development soft ware. This support offers
mult iple modes of operat ion, which include:
Real mode, prot ect ed mode, virt ual 8086 mode, and syst em management mode.
These are somet imes referred t o as legacy modes.
I nt el 64 archit ect ure support s almost all t he syst em programming facilit ies available
in I A- 32 archit ect ure and ext ends t hem t o a new operat ing mode ( I A- 32e mode) t hat
support s a 64- bit programming environment . I A- 32e mode allows soft ware t o
operat e in one of t wo sub- modes:
64- bit mode support s 64- bit OS and 64- bit applicat ions
Compat ibilit y mode allows most legacy soft ware t o run; it co- exist s wit h 64- bit
applicat ions under a 64- bit OS.
The I A- 32 syst em- level archit ect ure and includes feat ures t o assist in t he following
operat ions:
Memory management
Prot ect ion of soft ware modules
Mult it asking
Except ion and int errupt handling
Mult iprocessing
Cache management
Hardware resource and power management
Debugging and performance monit oring
This chapt er provides a descript ion of each part of t his archit ect ure. I t also describes
t he syst em regist ers t hat are used t o set up and cont rol t he processor at t he syst em
level and gives a brief overview of t he processor s syst em- level ( operat ing syst em)
inst ruct ions.
Many feat ures of t he syst em- level archit ect ural are used only by syst em program-
mers. However, applicat ion programmers may need t o read t his chapt er and t he
following chapt ers in order t o creat e a reliable and secure environment for applica-
t ion programs.
This overview and most subsequent chapt ers of t his book focus on prot ect ed- mode
operat ion of t he I A- 32 archit ect ure. I A- 32e mode operat ion of t he I nt el 64 archit ec-
t ure, as it differs from prot ect ed mode operat ion, is also described.
All I nt el 64 and I A- 32 processors ent er real- address mode following a power- up or
reset ( see Chapt er 9, Processor Management and I nit ializat ion ) . Soft ware t hen
2-2 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
init iat es t he swit ch from real- address mode t o prot ect ed mode. I f I A- 32e mode oper-
at ion is desired, soft ware also init iat es a swit ch from prot ect ed mode t o I A- 32e
mode.
2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE
Syst em- level archit ect ure consist s of a set of regist ers, dat a st ruct ures, and inst ruc-
t ions designed t o support basic syst em- level operat ions such as memory manage-
ment , int errupt and except ion handling, t ask management , and cont rol of mult iple
processors.
Figure 2- 1 provides a summary of syst em regist ers and dat a st ruct ures t hat applies
t o 32- bit modes. Syst em regist ers and dat a st ruct ures t hat apply t o I A- 32e mode are
shown in Figure 2- 2.
Vol. 3 2-3
SYSTEM ARCHITECTURE OVERVIEW
Figure 2-1. IA-32 System-Level Registers and Data Structures
Local Descriptor
Table (LDT)
EFLAGS Register
Control Registers
CR1
CR2
CR3
CR4
CR0
Global Descriptor
Table (GDT)
Interrupt Descriptor
Table (IDT)
IDTR
GDTR
Interrupt Gate
Trap Gate
LDT Desc.
TSS Desc.
Code
Stack
Code
Stack
Code
Stack
Task-State
Segment (TSS)
Code
Data
Stack
Task
Interrupt Handler
Exception Handler
Protected Procedure
TSS Seg. Sel.
Call-Gate
Segment Selector
Dir Table Offset
Linear Address
Page Directory
Pg. Dir. Entry
Linear Address Space
Linear Addr.
0
Seg. Desc.
Segment Sel.
Code, Data or
Stack Segment
Interrupt
Vector
TSS Desc.
Seg. Desc.
Task Gate
Current
TSS
Call Gate
Task-State
Segment (TSS)
Code
Data
Stack
Task
Seg. Desc.
Current
TSS
Current
TSS
Segment Selector
Linear Address
Task Register
CR3*
Page Table
Pg. Tbl. Entry
Page
Physical Addr.
LDTR
This page mapping example is for 4-KByte pages
and the normal 32-bit physical address size.
Register
*Physical Address
Physical Address
XCR0 (XFEM)
2-4 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode
Local Descriptor
Table (LDT)
CR1
CR2
CR3
CR4
CR0
Global Descriptor
Table (GDT)
Interrupt Descriptor
Table (IDT)
IDTR
GDTR
Interrupt Gate
Trap Gate
LDT Desc.
TSS Desc.
Code
Stack
Code
Stack
Code
Stack
Current TSS
Code
Stack
Interr. Handler
Interrupt Handler
Exception Handler
Protected Procedure
TR
Call-Gate
Segment Selector
Linear Address
PML4
PML4.
Linear Address Space
Linear Addr.
0
Seg. Desc.
Segment Sel.
Code, Data or Stack
Segment (Base =0)
Interrupt
Vector
Seg. Desc.
Seg. Desc.
NULL
Call Gate
Task-State
Segment (TSS)
Seg. Desc.
NULL
NULL
Segment Selector
Linear Address
Task Register
CR3*
Page
LDTR
This page mapping example is for 4-KByte pages
and 40-bit physical address size.
Register
*Physical Address
Physical Address
CR8
Control Register
RFLAGS
Offset Table Directory
Page Table
Entry
Physical
Addr. Page Tbl
Entry
Page Dir. Pg. Dir. Ptr.
PML4 Dir. Pointer
Pg. Dir.
Entry
Interrupt Gate
IST
XCR0 (XFEM)
Vol. 3 2-5
SYSTEM ARCHITECTURE OVERVIEW
2.1.1 Global and Local Descriptor Tables
When operat ing in prot ect ed mode, all memory accesses pass t hrough eit her t he
global descript or t able ( GDT) or an opt ional local descript or t able ( LDT) as shown in
Figure 2- 1. These t ables cont ain ent ries called segment descript ors. Segment
descript ors provide t he base address of segment s well as access right s, t ype, and
usage informat ion.
Each segment descript or has an associat ed segment select or. A segment select or
provides t he soft ware t hat uses it wit h an index int o t he GDT or LDT ( t he offset of it s
associat ed segment descript or) , a global/ local flag ( det ermines whet her t he select or
point s t o t he GDT or t he LDT) , and access right s informat ion.
To access a byt e in a segment , a segment select or and an offset must be supplied.
The segment select or provides access t o t he segment descript or for t he segment ( in
t he GDT or LDT) . From t he segment descript or, t he processor obt ains t he base
address of t he segment in t he linear address space. The offset t hen provides t he
locat ion of t he byt e relat ive t o t he base address. This mechanism can be used t o
access any valid code, dat a, or st ack segment , provided t he segment is accessible
from t he current privilege level ( CPL) at which t he processor is operat ing. The CPL is
defined as t he prot ect ion level of t he current ly execut ing code segment .
See Figure 2- 1. The solid arrows in t he figure indicat e a linear address, dashed lines
indicat e a segment select or, and t he dot t ed arrows indicat e a physical address. For
simplicit y, many of t he segment select ors are shown as direct point ers t o a segment .
However, t he act ual pat h from a segment select or t o it s associat ed segment is always
t hrough a GDT or LDT.
The linear address of t he base of t he GDT is cont ained in t he GDT regist er ( GDTR) ;
t he linear address of t he LDT is cont ained in t he LDT regist er ( LDTR) .
2.1.1.1 Global and Local Descriptor Tables in IA-32e Mode
GDTR and LDTR regist ers are expanded t o 64- bit s wide in bot h I A- 32e sub- modes
( 64- bit mode and compat ibilit y mode) . For more informat ion: see Sect ion 3. 5. 2,
Segment Descript or Tables in I A- 32e Mode.
Global and local descript or t ables are expanded in 64- bit mode t o support 64- bit base
addresses, ( 16- byt e LDT descript ors hold a 64- bit base address and various
at t ribut es) . I n compat ibilit y mode, descript ors are not expanded.
2.1.2 System Segments, Segment Descriptors, and Gates
Besides code, dat a, and st ack segment s t hat make up t he execut ion environment of
a program or procedure, t he archit ect ure defines t wo syst em segment s: t he t ask-
st at e segment ( TSS) and t he LDT. The GDT is not considered a segment because it is
not accessed by means of a segment select or and segment descript or. TSSs and LDTs
have segment descript ors defined for t hem.
2-6 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The archit ect ure also defines a set of special descript ors called gat es ( call gat es,
int errupt gat es, t rap gat es, and t ask gat es) . These provide prot ect ed gat eways t o
syst em procedures and handlers t hat may operat e at a different privilege level t han
applicat ion programs and most procedures. For example, a CALL t o a call gat e can
provide access t o a procedure in a code segment t hat is at t he same or a numerically
lower privilege level ( more privileged) t han t he current code segment . To access a
procedure t hrough a call gat e, t he calling procedure
1
supplies t he select or for t he call
gat e. The processor t hen performs an access right s check on t he call gat e, comparing
t he CPL wit h t he privilege level of t he call gat e and t he dest inat ion code segment
point ed t o by t he call gat e.
I f access t o t he dest inat ion code segment is allowed, t he processor get s t he segment
select or for t he dest inat ion code segment and an offset int o t hat code segment from
t he call gat e. I f t he call requires a change in privilege level, t he processor also
swit ches t o t he st ack for t he t arget ed privilege level. The segment select or for t he
new st ack is obt ained from t he TSS for t he current ly running t ask. Gat es also facili-
t at e t ransit ions bet ween 16- bit and 32- bit code segment s, and vice versa.
2.1.2.1 Gates in IA-32e Mode
I n I A- 32e mode, t he following descript ors are 16- byt e descript ors ( expanded t o allow
a 64- bit base) : LDT descript ors, 64- bit TSSs, call gat es, int errupt gat es, and t rap
gat es.
Call gat es facilit at e t ransit ions bet ween 64- bit mode and compat ibilit y mode. Task
gat es are not support ed in I A- 32e mode. On privilege level changes, st ack segment
select ors are not read from t he TSS. I nst ead, t hey are set t o NULL.
2.1.3 Task-State Segments and Task Gates
The TSS ( see Figure 2- 1) defines t he st at e of t he execut ion environment for a t ask.
I t includes t he st at e of general- purpose regist ers, segment regist ers, t he EFLAGS
regist er, t he EI P regist er, and segment select ors wit h st ack point ers for t hree st ack
segment s ( one st ack for each privilege level) . The TSS also includes t he segment
select or for t he LDT associat ed wit h t he t ask and t he base address of t he paging-
st ruct ure hierarchy.
All program execut ion in prot ect ed mode happens wit hin t he cont ext of a t ask ( called
t he current t ask) . The segment select or for t he TSS for t he current t ask is st ored in
t he t ask regist er. The simplest met hod for swit ching t o a t ask is t o make a call or
j ump t o t he new t ask. Here, t he segment select or for t he TSS of t he new t ask is given
in t he CALL or JMP inst ruct ion. I n swit ching t asks, t he processor performs t he
following act ions:
1. St ores t he st at e of t he current t ask in t he current TSS.
1. The word procedure is commonly used in this document as a general term for a logical unit or
block of code (such as a program, procedure, function, or routine).
Vol. 3 2-7
SYSTEM ARCHITECTURE OVERVIEW
2. Loads t he t ask regist er wit h t he segment select or for t he new t ask.
3. Accesses t he new TSS t hrough a segment descript or in t he GDT.
4. Loads t he st at e of t he new t ask from t he new TSS int o t he general- purpose
regist ers, t he segment regist ers, t he LDTR, cont rol regist er CR3 ( base address of
t he paging- st ruct ure hierarchy) , t he EFLAGS regist er, and t he EI P regist er.
5. Begins execut ion of t he new t ask.
A t ask can also be accessed t hrough a t ask gat e. A t ask gat e is similar t o a call gat e,
except t hat it provides access ( t hrough a segment select or) t o a TSS rat her t han a
code segment .
2.1.3.1 Task-State Segments in IA-32e Mode
Hardware t ask swit ches are not support ed in I A- 32e mode. However, TSSs cont inue
t o exist . The base address of a TSS is specified by it s descript or.
A 64- bit TSS holds t he following informat ion t hat is import ant t o 64- bit operat ion:
St ack point er addresses for each privilege level
Point er addresses for t he int errupt st ack t able
Offset address of t he I O- permission bit map ( from t he TSS base)
The t ask regist er is expanded t o hold 64- bit base addresses in I A- 32e mode. See
also: Sect ion 7. 7, Task Management in 64- bit Mode.
2.1.4 Interrupt and Exception Handling
Ext ernal int errupt s, soft ware int errupt s and except ions are handled t hrough t he
int errupt descript or t able ( I DT) . The I DT st ores a collect ion of gat e descript ors t hat
provide access t o int errupt and except ion handlers. Like t he GDT, t he I DT is not a
segment . The linear address for t he base of t he I DT is cont ained in t he I DT regist er
( I DTR) .
Gat e descript ors in t he I DT can be int errupt , t rap, or t ask gat e descript ors. To access
an int errupt or except ion handler, t he processor first receives an int errupt vect or
( int errupt number) from int ernal hardware, an ext ernal int errupt cont roller, or from
soft ware by means of an I NT, I NTO, I NT 3, or BOUND inst ruct ion. The int errupt
vect or provides an index int o t he I DT. I f t he select ed gat e descript or is an int errupt
gat e or a t rap gat e, t he associat ed handler procedure is accessed in a manner similar
t o calling a procedure t hrough a call gat e. I f t he descript or is a t ask gat e, t he handler
is accessed t hrough a t ask swit ch.
2.1.4.1 Interrupt and Exception Handling IA-32e Mode
I n I A- 32e mode, int errupt descript ors are expanded t o 16 byt es t o support 64- bit
base addresses. This is t rue for 64- bit mode and compat ibilit y mode.
2-8 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The I DTR regist er is expanded t o hold a 64- bit base address. Task gat es are not
support ed.
2.1.5 Memory Management
Syst em archit ect ure support s eit her direct physical addressing of memory or virt ual
memory ( t hrough paging) . When physical addressing is used, a linear address is
t reat ed as a physical address. When paging is used: all code, dat a, st ack, and syst em
segment s ( including t he GDT and I DT) can be paged wit h only t he most recent ly
accessed pages being held in physical memory.
The locat ion of pages ( somet imes called page frames) in physical memory is
cont ained in t he paging st ruct ures. These st ruct ures reside in physical memory ( see
Figure 2- 1 for t he case of 32- bit paging) .
The base physical address of t he paging- st ruct ure hierarchy is cont ained in cont rol
regist er CR3. The ent ries in t he paging st ruct ures det ermine t he physical address of
t he base of a page frame, access right s and memory management informat ion.
To use t his paging mechanism, a linear address is broken int o part s. The part s
provide separat e offset s int o t he paging st ruct ures and t he page frame. A syst em can
have a single hierarchy of paging st ruct ures or several. For example, each t ask can
have it s own hierarchy.
2.1.5.1 Memory Management in IA-32e Mode
I n I A- 32e mode, physical memory pages are managed by a set of syst em dat a st ruc-
t ures. I n compat ibilit y mode and 64- bit mode, four levels of syst em dat a st ruct ures
are used. These include:
The page map l evel 4 ( PML4) An ent ry in a PML4 t able cont ains t he physical
address of t he base of a page direct ory point er t able, access right s, and memory
management informat ion. The base physical address of t he PML4 is st ored in
CR3.
A set of page di r ect or y poi nt er t abl es An ent ry in a page direct ory point er
t able cont ains t he physical address of t he base of a page direct ory t able, access
right s, and memory management informat ion.
Set s of page di r ect or i es An ent ry in a page direct ory t able cont ains t he
physical address of t he base of a page t able, access right s, and memory
management informat ion.
Set s of page t abl es An ent ry in a page t able cont ains t he physical address of
a page frame, access right s, and memory management informat ion.
Vol. 3 2-9
SYSTEM ARCHITECTURE OVERVIEW
2.1.6 System Registers
To assist in init ializing t he processor and cont rolling syst em operat ions, t he syst em
archit ect ure provides syst em flags in t he EFLAGS regist er and several syst em
regist ers:
The syst em flags and I OPL field in t he EFLAGS regist er cont rol t ask and mode
swit ching, int errupt handling, inst ruct ion t racing, and access right s. See also:
Sect ion 2. 3, Syst em Flags and Fields in t he EFLAGS Regist er.
The cont rol regist ers ( CR0, CR2, CR3, and CR4) cont ain a variet y of flags and
dat a fields for cont rolling syst em- level operat ions. Ot her flags in t hese regist ers
are used t o indicat e support for specific processor capabilit ies wit hin t he
operat ing syst em or execut ive. See also: Sect ion 2. 5, Cont rol Regist ers.
The debug regist ers ( not shown in Figure 2- 1) allow t he set t ing of breakpoint s for
use in debugging programs and syst ems soft ware. See also: Chapt er 16,
Debugging, Profiling Branches and Time- St amp Count er.
The GDTR, LDTR, and I DTR regist ers cont ain t he linear addresses and sizes
( limit s) of t heir respect ive t ables. See also: Sect ion 2. 4, Memory- Management
Regist ers.
The t ask regist er cont ains t he linear address and size of t he TSS for t he current
t ask. See also: Sect ion 2. 4, Memory- Management Regist ers.
Model- specific regist ers ( not shown in Figure 2- 1) .
The model- specific regist ers ( MSRs) are a group of regist ers available primarily t o
operat ing- syst em or execut ive procedures ( t hat is, code running at privilege level 0) .
These regist ers cont rol it ems such as t he debug ext ensions, t he performance- moni-
t oring count ers, t he machine- check archit ect ure, and t he memory t ype ranges
( MTRRs) .
The number and funct ion of t hese regist ers varies among different members of t he
I nt el 64 and I A- 32 processor families. See also: Sect ion 9. 4, Model- Specific Regis-
t ers ( MSRs) , and Appendix B, Model- Specific Regist ers ( MSRs) .
Most syst ems rest rict access t o syst em regist ers ( ot her t han t he EFLAGS regist er) by
applicat ion programs. Syst ems can be designed, however, where all programs and
procedures run at t he most privileged level ( privilege level 0) . I n such a case, appli-
cat ion programs would be allowed t o modify t he syst em regist ers.
2.1.6.1 System Registers in IA-32e Mode
I n I A- 32e mode, t he four syst em- descript or- t able regist ers ( GDTR, I DTR, LDTR, and
TR) are expanded in hardware t o hold 64- bit base addresses. EFLAGS becomes t he
64- bit RFLAGS regist er. CR0CR4 are expanded t o 64 bit s. CR8 becomes available.
CR8 provides read- writ e access t o t he t ask priorit y regist er ( TPR) so t hat t he oper-
at ing syst em can cont rol t he priorit y classes of ext ernal int errupt s.
I n 64- bit mode, debug regist ers DR0DR7 are 64 bit s. I n compat ibilit y mode,
address- mat ching in DR0DR3 is also done at 64- bit granularit y.
2-10 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
On syst ems t hat support I A- 32e mode, t he ext ended feat ure enable regist er
( I A32_EFER) is available. This model- specific regist er cont rols act ivat ion of I A- 32e
mode and ot her I A- 32e mode operat ions. I n addit ion, t here are several model-
specific regist ers t hat govern I A- 32e mode inst ruct ions:
I A32_Ker nel GSbase Used by SWAPGS inst ruct ion.
I A32_LSTAR Used by SYSCALL inst ruct ion.
I A32_SYSCALL_FLAG_MASK Used by SYSCALL inst ruct ion.
I A32_STAR_CS Used by SYSCALL and SYSRET inst ruct ion.
2.1.7 Other System Resources
Besides t he syst em regist ers and dat a st ruct ures described in t he previous sect ions,
syst em archit ect ure provides t he following addit ional resources:
Operat ing syst em inst ruct ions ( see also: Sect ion 2. 7, Syst em I nst ruct ion
Summary ) .
Performance- monit oring count ers ( not shown in Figure 2- 1) .
I nt ernal caches and buffers ( not shown in Figure 2- 1) .
Performance- monit oring count ers are event count ers t hat can be programmed t o
count processor event s such as t he number of inst ruct ions decoded, t he number of
int errupt s received, or t he number of cache loads. See also: Sect ion 20, I nt roduc-
t ion t o Virt ual- Machine Ext ensions.
The processor provides several int ernal caches and buffers. The caches are used t o
st ore bot h dat a and inst ruct ions. The buffers are used t o st ore t hings like decoded
addresses t o syst em and applicat ion segment s and writ e operat ions wait ing t o be
performed. See also: Chapt er 11, Memory Cache Cont rol.
2.2 MODES OF OPERATION
The I A- 32 support s t hree operat ing modes and one quasi- operat ing mode:
Pr ot ect ed mode This is t he nat ive operat ing mode of t he processor. I t
provides a rich set of archit ect ural feat ures, flexibilit y, high performance and
backward compat ibilit y t o exist ing soft ware base.
Real - addr ess mode This operat ing mode provides t he programming
environment of t he I nt el 8086 processor, wit h a few ext ensions ( such as t he
abilit y t o swit ch t o prot ect ed or syst em management mode) .
Sy st em management mode ( SMM) SMM is a st andard archit ect ural feat ure
in all I A- 32 processors, beginning wit h t he I nt el386 SL processor. This mode
provides an operat ing syst em or execut ive wit h a t ransparent mechanism for
implement ing power management and OEM different iat ion feat ures. SMM is
ent ered t hrough act ivat ion of an ext ernal syst em int errupt pin ( SMI # ) , which
generat es a syst em management int errupt ( SMI ) . I n SMM, t he processor
swit ches t o a separat e address space while saving t he cont ext of t he current ly
Vol. 3 2-11
SYSTEM ARCHITECTURE OVERVIEW
running program or t ask. SMM- specific code may t hen be execut ed t ransparent ly.
Upon ret urning from SMM, t he processor is placed back int o it s st at e prior t o t he
SMI .
Vi r t ual - 8086 mode I n prot ect ed mode, t he processor support s a quasi-
operat ing mode known as virt ual- 8086 mode. This mode allows t he processor
execut e 8086 soft ware in a prot ect ed, mult it asking environment .
I nt el 64 archit ect ure support s all operat ing modes of I A- 32 archit ect ure and I A- 32e
modes:
I A- 32e mode I n I A- 32e mode, t he processor support s t wo sub- modes:
compat ibilit y mode and 64- bit mode. 64- bit mode provides 64- bit linear
addressing and support for physical address space larger t han 64 GByt es.
Compat ibilit y mode allows most legacy prot ect ed- mode applicat ions t o run
unchanged.
Figure 2- 3 shows how t he processor moves bet ween operat ing modes.
The processor is placed in real- address mode following power- up or a reset . The PE
flag in cont rol regist er CR0 t hen cont rols whet her t he processor is operat ing in real-
address or prot ect ed mode. See also: Sect ion 9. 9, Mode Swit ching. and Sect ion
4. 1. 2, Paging- Mode Enabling.
Figure 2-3. Transitions Among the Processors Operating Modes
Real-Address
Protected Mode
Virtual-8086
Mode
System
Management
Mode
PE=1
Reset or
VM=1 VM=0
PE=0
Reset
or
RSM
SMI#
RSM
SMI#
RSM
SMI#
Reset
Mode
IA-32e
Mode
RSM
SMI#
LME=1, CR0.PG=1*
See**
* See Section 9.8.5
** See Section 9.8.5.4
2-12 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The VM flag in t he EFLAGS regist er det ermines whet her t he processor is operat ing in
prot ect ed mode or virt ual- 8086 mode. Transit ions bet ween prot ect ed mode and
virt ual- 8086 mode are generally carried out as part of a t ask swit ch or a ret urn from
an int errupt or except ion handler. See also: Sect ion 17. 2. 5, Ent ering Virt ual- 8086
Mode.
The LMA bit ( I A32_EFER.LMA. LMA[ bit 10] ) det ermines whet her t he processor is
operat ing in I A- 32e mode. When running in I A- 32e mode, 64- bit or compat ibilit y
sub- mode operat ion is det ermined by CS. L bit of t he code segment . The processor
ent ers int o I A- 32e mode from prot ect ed mode by enabling paging and set t ing t he
LME bit ( I A32_EFER. LME[ bit 8] ) . See also: Chapt er 9, Processor Management and
I nit ializat ion.
The processor swit ches t o SMM whenever it receives an SMI while t he processor is in
real- address, prot ect ed, virt ual- 8086, or I A- 32e modes. Upon execut ion of t he RSM
inst ruct ion, t he processor always ret urns t o t he mode it was in when t he SMI
occurred.
2.3 SYSTEM FLAGS AND FIELDS IN THE EFLAGS
REGISTER
The syst em flags and I OPL field of t he EFLAGS regist er cont rol I / O, maskable hard-
ware int errupt s, debugging, t ask swit ching, and t he virt ual- 8086 mode ( see
Figure 2- 4) . Only privileged code ( t ypically operat ing syst em or execut ive code)
should be allowed t o modify t hese bit s.
The syst em flags and I OPL are:
TF Tr ap ( bi t 8) Set t o enable single- st ep mode for debugging; clear t o
disable single- st ep mode. I n single- st ep mode, t he processor generat es a
debug except ion aft er each inst ruct ion. This allows t he execut ion st at e of a
program t o be inspect ed aft er each inst ruct ion. I f an applicat ion program
set s t he TF flag using a POPF, POPFD, or I RET inst ruct ion, a debug except ion
is generat ed aft er t he inst ruct ion t hat follows t he POPF, POPFD, or I RET.
Vol. 3 2-13
SYSTEM ARCHITECTURE OVERVIEW
I F I nt er r upt enabl e ( bi t 9) Cont rols t he response of t he processor t o
maskable hardware int errupt request s ( see also: Sect ion 6. 3. 2, Maskable
Hardware I nt errupt s ) . The flag is set t o respond t o maskable hardware
int errupt s; cleared t o inhibit maskable hardware int errupt s. The I F flag does
not affect t he generat ion of except ions or nonmaskable int errupt s ( NMI
int errupt s) . The CPL, I OPL, and t he st at e of t he VME flag in cont rol regist er
CR4 det ermine whet her t he I F flag can be modified by t he CLI , STI , POPF,
POPFD, and I RET.
I OPL I / O pr i v i l ege l ev el f i el d ( bi t s 12 and 13) I ndicat es t he I / O privilege
level ( I OPL) of t he curr ent ly r unning program or t ask. The CPL of t he
cur rent ly running program or t ask must be less t han or equal t o t he I OPL t o
access t he I / O address space. This field can only be modified by t he POPF
and I RET inst ruct ions when operat ing at a CPL of 0.
The I OPL is also one of t he mechanisms t hat cont rols t he modificat ion of t he
I F flag and t he handling of int errupt s in virt ual- 8086 mode when virt ual
mode ext ensions are in effect ( when CR4. VME = 1) . See also: Chapt er 13,
I nput / Out put , in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volume 1.
NT Nest ed t ask ( bi t 14) Cont rols t he chaining of int errupt ed and called
t asks. The processor set s t his flag on calls t o a t ask init iat ed wit h a CALL
inst ruct ion, an int errupt , or an except ion. I t examines and modifies t his flag
on ret urns from a t ask init iat ed wit h t he I RET inst ruct ion. The flag can be
explicit ly set or cleared wit h t he POPF/ POPFD inst ruct ions; however,
Figure 2-4. System Flags in the EFLAGS Register
31 22 21 20 19 18 17 16
R
F
I
D
A
C
V
M
VM Virtual-8086 Mode
RF Resume Flag
NT Nested Task Flag
IOPLI/O Privilege Level
IF Interrupt Enable Flag
AC Alignment Check
ID Identification Flag
VIP Virtual Interrupt Pending
15 13 14 12 11 10 9 8 7 6 5 4 3 2 1 0
0
C
F
A
F
P
F
1
D
F
I
F
T
F
S
F
Z
F
N
T
0 0
V
I
P
V
I
F
O
F
I
O
P
L
VIF Virtual Interrupt Flag
TF Trap Flag
Reserved
Reserved (set to 0)
2-14 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
changing t o t he st at e of t his flag can generat e unexpect ed except ions in
applicat ion programs.
See also: Sect ion 7.4, Task Linking.
RF Resume ( bi t 16) Cont rols t he processor s response t o inst ruct ion- break-
point condit ions. When set , t his flag t emporarily disables debug except ions
( # DB) from being gener at ed f or i nst r uct i on br eakpoi nt s ( al t hough ot her
except i on condi t i ons can cause an except i on t o be generat ed) . When clear,
inst ruct ion breakpoint s will generat e debug except ions.
The primary funct ion of t he RF flag is t o allow t he rest art ing of an inst ruct ion
following a debug except ion t hat was caused by an inst ruct ion breakpoint
condit ion. Here, debug soft ware must set t his flag in t he EFLAGS image on
t he st ack j ust prior t o ret urning t o t he int errupt ed program wit h I RETD ( t o
prevent t he inst ruct ion breakpoint from causing anot her debug except ion) .
The pr ocessor t hen aut omat ically clears t his flag aft er t he inst ruct ion
ret urned t o has been successfully execut ed, enabling inst ruct ion breakpoint
fault s again.
See also: Sect ion 16. 3. 1. 1, I nst ruct ion- Breakpoint Except ion Condit ion.
VM Vi r t ual - 8086 mode ( bi t 17) Set t o enable virt ual- 8086 mode; clear t o
ret urn t o prot ect ed mode.
See also: Sect ion 17. 2. 1, Enabling Virt ual- 8086 Mode.
AC Al i gnment check ( bi t 18) Set t his flag and t he AM flag in cont rol regist er
CR0 t o enable alignment checking of memory references; clear t he AC flag
and/ or t he AM flag t o disable alignment checking. An alignment - check
except ion is generat ed when reference is made t o an unaligned operand,
such as a word at an odd byt e address or a doubleword at an address which
is not an int egral mult iple of four. Alignment - check except ions are generat ed
only in user mode ( privilege level 3) . Memory references t hat default t o priv-
ilege level 0, such as segment descript or loads, do not generat e t his excep-
t ion even when caused by inst ruct ions execut ed in user- mode.
The alignment - check except ion can be used t o check alignment of dat a. This
is useful when exchanging dat a wit h processors which require all dat a t o be
aligned. The alignment - check except ion can also be used by int erpret ers t o
flag some point ers as special by misaligning t he point er. This eliminat es
overhead of checking each point er and only handles t he special point er when
used.
VI F Vi r t ual I nt er r upt ( bi t 19) Cont ains a virt ual image of t he I F flag. This
flag is used in conj unct ion wit h t he VI P flag. The processor only recognizes
t he VI F flag when eit her t he VME flag or t he PVI flag in cont rol regist er CR4 is
set and t he I OPL is less t han 3. ( The VME flag enables t he virt ual- 8086 mode
ext ensions; t he PVI flag enables t he prot ect ed- mode virt ual int errupt s. )
See also: Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling, and
Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
Vol. 3 2-15
SYSTEM ARCHITECTURE OVERVIEW
VI P Vi r t ual i nt er r upt pendi ng ( bi t 20) Set by soft ware t o indicat e t hat an
int errupt is pending; cleared t o indicat e t hat no int errupt is pending. This flag
is used in conj unct ion wit h t he VI F flag. The processor reads t his flag but
never modifies it . The processor only recognizes t he VI P flag when eit her t he
VME flag or t he PVI flag in cont rol regist er CR4 is set and t he I OPL is less t han
3. The VME flag enables t he virt ual- 8086 mode ext ensions; t he PVI flag
enables t he prot ect ed- mode virt ual int errupt s.
See Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling, and Sect ion
17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
I D I dent i f i cat i on ( bi t 21) . The abilit y of a program or procedure t o set or
clear t his flag indicat es support for t he CPUI D inst ruct ion.
2.3.1 System Flags and Fields in IA-32e Mode
I n 64- bit mode, t he RFLAGS regist er expands t o 64 bit s wit h t he upper 32 bit s
reserved. Syst em flags in RFLAGS ( 64- bit mode) or EFLAGS ( compat ibilit y mode)
are shown in Figure 2- 4.
I n I A- 32e mode, t he processor does not allow t he VM bit t o be set because virt ual-
8086 mode is not support ed ( at t empt s t o set t he bit are ignored) . Also, t he processor
will not set t he NT bit . The processor does, however, allow soft ware t o set t he NT bit
( not e t hat an I RET causes a general prot ect ion fault in I A- 32e mode if t he NT bit is
set ) .
I n I A- 32e mode, t he SYSCALL/ SYSRET inst ruct ions have a programmable met hod of
specifying which bit s are cleared in RFLAGS/ EFLAGS. These inst ruct ions save/ rest ore
EFLAGS/ RFLAGS.
2.4 MEMORY-MANAGEMENT REGISTERS
The processor provides four memory- management regist ers ( GDTR, LDTR, I DTR,
and TR) t hat specify t he locat ions of t he dat a st ruct ures which cont rol segment ed
memory management ( see Figure 2- 5) . Special inst ruct ions are provided for loading
and st oring t hese regist ers.
2-16 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
2.4.1 Global Descriptor Table Register (GDTR)
The GDTR regist er holds t he base address ( 32 bit s in prot ect ed mode; 64 bit s in
I A- 32e mode) and t he 16- bit t able limit for t he GDT. The base address specifies t he
linear address of byt e 0 of t he GDT; t he t able limit specifies t he number of byt es in
t he t able.
The LGDT and SGDT inst ruct ions load and st ore t he GDTR regist er, respect ively. On
power up or reset of t he processor, t he base address is set t o t he default value of 0
and t he limit is set t o 0FFFFH. A new base address must be loaded int o t he GDTR as
part of t he processor init ializat ion process for prot ect ed- mode operat ion.
See also: Sect ion 3.5. 1, Segment Descript or Tables.
2.4.2 Local Descriptor Table Register (LDTR)
The LDTR regist er holds t he 16- bit segment select or, base address ( 32 bit s in
prot ect ed mode; 64 bit s in I A- 32e mode) , segment limit , and descript or at t ribut es
for t he LDT. The base address specifies t he linear address of byt e 0 of t he LDT
segment ; t he segment limit specifies t he number of byt es in t he segment . See also:
Sect ion 3.5. 1, Segment Descript or Tables.
The LLDT and SLDT inst ruct ions load and st ore t he segment select or part of t he LDTR
regist er, respect ively. The segment t hat cont ains t he LDT must have a segment
descript or in t he GDT. When t he LLDT inst ruct ion loads a segment select or in t he
LDTR: t he base address, limit , and descript or at t ribut es from t he LDT descript or are
aut omat ically loaded in t he LDTR.
When a t ask swit ch occurs, t he LDTR is aut omat ically loaded wit h t he segment
select or and descript or for t he LDT for t he new t ask. The cont ent s of t he LDTR are not
aut omat ically saved prior t o writ ing t he new LDT informat ion int o t he regist er.
On power up or reset of t he processor, t he segment select or and base address are set
t o t he default value of 0 and t he limit is set t o 0FFFFH.
Figure 2-5. Memory Management Registers
0 47(79)
GDTR
IDTR
System Table Registers
32(64)-bit Linear Base Address 16-Bit Table Limit
15 16
32(64)-bit Linear Base Address
0
Task
LDTR
System Segment
Seg. Sel.
15
Seg. Sel.
Segment Descriptor Registers (Automatically Loaded)
32(64)-bit Linear Base Address Segment Limit
Attributes
Registers
32(64)-bit Linear Base Address Segment Limit
Register
16-Bit Table Limit
Vol. 3 2-17
SYSTEM ARCHITECTURE OVERVIEW
2.4.3 IDTR Interrupt Descriptor Table Register
The I DTR regist er holds t he base address ( 32 bit s in prot ect ed mode; 64 bit s in
I A- 32e mode) and 16- bit t able limit for t he I DT. The base address specifies t he linear
address of byt e 0 of t he I DT; t he t able limit specifies t he number of byt es in t he t able.
The LI DT and SI DT inst ruct ions load and st ore t he I DTR regist er, respect ively. On
power up or reset of t he processor, t he base address is set t o t he default value of 0
and t he limit is set t o 0FFFFH. The base address and limit in t he regist er can t hen be
changed as part of t he processor init ializat ion process.
See also: Sect ion 6. 10, I nt errupt Descript or Table ( I DT) .
2.4.4 Task Register (TR)
The t ask regist er holds t he 16- bit segment select or, base address ( 32 bit s in
prot ect ed mode; 64 bit s in I A- 32e mode) , segment limit , and descript or at t ribut es
for t he TSS of t he current t ask. The select or references t he TSS descript or in t he GDT.
The base address specifies t he linear address of byt e 0 of t he TSS; t he segment limit
specifies t he number of byt es in t he TSS. See also: Sect ion 7. 2. 4, Task Regist er.
The LTR and STR inst ruct ions load and st ore t he segment select or part of t he t ask
regist er, respect ively. When t he LTR inst ruct ion loads a segment select or in t he t ask
regist er, t he base address, limit , and descript or at t ribut es from t he TSS descript or
are aut omat ically loaded int o t he t ask regist er. On power up or reset of t he processor,
t he base address is set t o t he default value of 0 and t he limit is set t o 0FFFFH.
When a t ask swit ch occurs, t he t ask regist er is aut omat ically loaded wit h t he
segment select or and descript or for t he TSS for t he new t ask. The cont ent s of t he
t ask regist er are not aut omat ically saved prior t o writ ing t he new TSS informat ion
int o t he regist er.
2.5 CONTROL REGISTERS
Cont rol regist ers ( CR0, CR1, CR2, CR3, and CR4; see Figure 2- 6) det ermine oper-
at ing mode of t he processor and t he charact erist ics of t he current ly execut ing t ask.
These regist ers are 32 bit s in all 32- bit modes and compat ibilit y mode.
I n 64- bit mode, cont rol regist ers are expanded t o 64 bit s. The MOV CRn inst ruct ions
are used t o manipulat e t he regist er bit s. Operand- size prefixes for t hese inst ruct ions
are ignored. The following is also t rue:
Bit s 63: 32 of CR0 and CR4 are reserved and must be writ t en wit h zeros. Writ ing
a nonzero value t o any of t he upper 32 bit s result s in a general- prot ect ion
except ion, # GP( 0) .
All 64 bit s of CR2 are writ able by soft ware.
Bit s 51: 40 of CR3 are reserved and must be 0.
2-18 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The MOV CRn inst ruct ions do not check t hat addresses writ t en t o CR2 and CR3
are wit hin t he linear- address or physical- address limit at ions of t he implemen-
t at ion.
Regist er CR8 is available in 64- bit mode only.
The cont rol regist ers are summarized below, and each archit ect urally defined cont rol
field in t hese cont rol regist ers are described individually. I n Figure 2- 6, t he widt h of
t he regist er in 64- bit mode is indicat ed in parent hesis ( except for CR0) .
CR0 Cont ains syst em cont rol flags t hat cont rol operat ing mode and st at es of
t he processor.
CR1 Reserved.
CR2 Cont ains t he page- fault linear address ( t he linear address t hat caused a
page fault ) .
CR3 Cont ains t he physical address of t he base of t he paging- st ruct ure
hierarchy and t wo flags ( PCD and PWT) . Only t he most - significant bit s ( less t he
lower 12 bit s) of t he base address are specified; t he lower 12 bit s of t he address
are assumed t o be 0. The first paging st ruct ure must t hus be aligned t o a page
( 4- KByt e) boundary. The PCD and PWT flags cont rol caching of t hat paging
st ruct ure in t he processor s int ernal dat a caches ( t hey do not cont rol TLB caching
of page- direct ory informat ion) .
When using t he physical address ext ension, t he CR3 regist er cont ains t he base
address of t he page- direct ory- point er t able I n I A- 32e mode, t he CR3 regist er
cont ains t he base address of t he PML4 t able.
See also: Chapt er 4, Paging.
CR4 Cont ains a group of flags t hat enable several archit ect ural ext ensions,
and indicat e operat ing syst em or execut ive support for specific processor capabil-
it ies. The cont rol regist ers can be read and loaded ( or modified) using t he move-
t o- or- from- cont rol- regist ers forms of t he MOV inst ruct ion. I n prot ect ed mode,
t he MOV inst ruct ions allow t he cont rol regist ers t o be read or loaded ( at privilege
level 0 only) . This rest rict ion means t hat applicat ion programs or operat ing-
syst em procedures ( running at privilege levels 1, 2, or 3) are prevent ed from
reading or loading t he cont rol regist ers.
CR8 Provides read and writ e access t o t he Task Priorit y Regist er ( TPR) . I t
specifies t he priorit y t hreshold value t hat operat ing syst ems use t o cont rol t he
priorit y class of ext ernal int errupt s allowed t o int errupt t he processor. This
regist er is available only in 64- bit mode. However, int errupt filt ering cont inues t o
apply in compat ibilit y mode.
Vol. 3 2-19
SYSTEM ARCHITECTURE OVERVIEW
When loading a cont rol regist er, reserved bit s should always be set t o t he values
previously read. The flags in cont rol regist ers are:
PG Pagi ng ( bi t 31 of CR0) Enables paging when set ; disables paging when
clear. When paging is disabled, all linear addresses are t reat ed as physical
addresses. The PG flag has no effect if t he PE flag ( bit 0 of regist er CR0) is
not also set ; set t ing t he PG flag when t he PE flag is clear causes a general-
prot ect ion except ion ( # GP) . See also: Chapt er 4, Paging.
On I nt el 64 processors, enabling and disabling I A- 32e mode operat ion also
requires modifying CR0. PG.
CD Cache Di sabl e ( bi t 30 of CR0) When t he CD and NW flags are clear,
caching of memory locat ions for t he whole of physical memory in t he
processor s int ernal ( and ext ernal) caches is enabled. When t he CD flag is
set , caching is rest rict ed as described in Table 11- 5. To prevent t he processor
from accessing and updat ing it s caches, t he CD flag must be set and t he
caches must be invalidat ed so t hat no cache hit s can occur.
Figure 2-6. Control Registers
CR1
W
P
A
M
Page-Directory Base
V
M
E
P
S
E
T
S
D
D
E
P
V
I
P
G
E
M
C
E
P
A
E
P
C
E
N
W
P
G
C
D
P
W
T
P
C
D
Page-Fault Linear Address
P
E
E
M
M
P
T
S
N
E
E
T
CR2
CR0
CR4
Reserved
CR3
Reserved (set to 0)
31 29 30 28 19 18 17 16 15 6 5 4 3 2 1 0
31(63) 0
31(63) 0
31(63) 12 11 5 4 3 2
31(63) 9 8 7 6 5 4 3 2 1 0
(PDBR)
13 12 11 10
OSFXSR
OSXMMEXCPT
V
M
X
E
0 0
E
X
M
S
14 18
OSXSAVE
PCIDE
17
2-20 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
See also: Sect ion 11. 5. 3, Prevent ing Caching, and Sect ion 11. 5, Cache
Cont rol.
NW Not Wr i t e- t hr ough ( bi t 29 of CR0) When t he NW and CD flags are
clear, writ e- back ( for Pent ium 4, I nt el Xeon, P6 family, and Pent ium proces-
sors) or writ e- t hrough ( for I nt el486 processors) is enabled for writ es t hat hit
t he cache and invalidat ion cycles are enabled. See Table 11- 5 for det ailed
informat ion about t he affect of t he NW flag on caching for ot her set t ings of
t he CD and NW flags.
AM Al i gnment Mask ( bi t 18 of CR0) Enables aut omat ic alignment checking
when set ; disables alignment checking when clear. Alignment checking is
performed only when t he AM flag is set , t he AC flag in t he EFLAGS regist er is
set , CPL is 3, and t he processor is operat ing in eit her prot ect ed or virt ual-
8086 mode.
WP Wr i t e Pr ot ect ( bi t 16 of CR0) When set , inhibit s supervisor- level proce-
dures from writ ing int o read- only pages; when clear, allows supervisor- level
procedures t o writ e int o read- only pages ( regardless of t he U/ S bit set t ing;
see Sect ion 4. 1. 3 and Sect ion 4.6) . This flag facilit at es implement at ion of t he
copy- on- writ e met hod of creat ing a new process ( forking) used by operat ing
syst ems such as UNI X.
NE Numer i c Er r or ( bi t 5 of CR0) Enables t he nat ive ( int ernal) mechanism
for report ing x87 FPU errors when set ; enables t he PC- st yle x87 FPU error
report ing mechanism when clear. When t he NE flag is clear and t he I GNNE#
input is assert ed, x87 FPU errors are ignored. When t he NE flag is clear and
t he I GNNE# input is deassert ed, an unmasked x87 FPU error causes t he
processor t o assert t he FERR# pin t o generat e an ext ernal int errupt and t o
st op inst ruct ion execut ion immediat ely before execut ing t he next wait ing
float ing- point inst ruct ion or WAI T/ FWAI T inst ruct ion.
The FERR# pin is int ended t o drive an input t o an ext ernal int errupt
cont roller ( t he FERR# pin emulat es t he ERROR# pin of t he I nt el 287 and
I nt el 387 DX mat h coprocessors) . The NE flag, I GNNE# pin, and FERR# pin
are used wit h ext ernal logic t o implement PC- st yle error report ing. Using
FERR# and I GNNE# t o handle float ing- point except ions is deprecat ed by
modern operat ing syst ems; t his non- nat ive approach also limit s newer
processors t o operat e wit h one logical processor act ive.
See also: Soft ware Except ion Handling in Chapt er 8, Programming wit h
t he x87 FPU, and Appendix A, EFLAGS Cross- Reference, in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1.
ET Ex t ensi on Ty pe ( bi t 4 of CR0) Reserved in t he Pent ium 4, I nt el Xeon, P6
family, and Pent ium processors. I n t he Pent ium 4, I nt el Xeon, and P6 family
processors, t his flag is hardcoded t o 1. I n t he I nt el386 and I nt el486 proces-
sors, t his flag indicat es support of I nt el 387 DX mat h coprocessor inst ruc-
t ions when set .
TS Task Sw i t ched ( bi t 3 of CR0) Allows t he saving of t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 cont ext on a t ask swit ch t o be
Vol. 3 2-21
SYSTEM ARCHITECTURE OVERVIEW
delayed unt il an x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion is
act ually execut ed by t he new t ask. The processor set s t his flag on every t ask
swit ch and t est s it when execut ing x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
I f t he TS flag is set and t he EM flag ( bit 2 of CR0) is clear, a device- not -
available except ion ( # NM) is raised prior t o t he execut ion of any x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion; wit h t he except ion
of PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH,
CRC32, and POPCNT. See t he paragraph below for t he special case of t he
WAI T/ FWAI T inst ruct ions.
I f t he TS flag is set and t he MP flag ( bit 1 of CR0) and EM flag are clear, an
# NM except ion is not raised prior t o t he execut ion of an x87 FPU
WAI T/ FWAI T inst ruct ion.
I f t he EM flag is set , t he set t ing of t he TS flag has no affect on t he
execut ion of x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
Table 2- 1 shows t he act ions t aken when t he processor encount ers an x87
FPU inst ruct ion based on t he set t ings of t he TS, EM, and MP flags. Table 12- 1
and 13- 1 show t he act ions t aken when t he processor encount ers an
MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion.
The processor does not aut omat ically save t he cont ext of t he x87 FPU, XMM,
and MXCSR regist ers on a t ask swit ch. I nst ead, it set s t he TS flag, which
causes t he processor t o raise an # NM except ion whenever it encount ers an
x87 FPU/ MMX/ SSE / SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion in t he inst ruct ion
st ream for t he new t ask ( wit h t he except ion of t he inst ruct ions list ed above) .
The fault handler for t he # NM except ion can t hen be used t o clear t he TS flag ( wit h
t he CLTS inst ruct ion) and save t he cont ext of t he x87 FPU, XMM, and MXCSR regis-
t ers. I f t he t ask never encount ers an x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ / SSSE3/ SSE4
inst ruct ion; t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 cont ext is never saved.
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS
CR0 Flags x87 FPU Instruction Type
EM MP TS Floating-Point WAIT/FWAIT
0 0 0 Execute Execute.
0 0 1 #NM Exception Execute.
0 1 0 Execute Execute.
0 1 1 #NM Exception #NM exception.
1 0 0 #NM Exception Execute.
1 0 1 #NM Exception Execute.
1 1 0 #NM Exception Execute.
2-22 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
EM Emul at i on ( bi t 2 of CR0) I ndicat es t hat t he processor does not have an
int ernal or ext ernal x87 FPU when set ; indicat es an x87 FPU is present when
clear. This flag also affect s t he execut ion of
MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
When t he EM flag is set , execut ion of an x87 FPU inst ruct ion generat es a
device- not - available except ion ( # NM) . This flag must be set when t he
processor does not have an int ernal x87 FPU or is not connect ed t o an
ext er nal mat h copr ocessor. Set t ing t his f lag f or ces all f loat ing- point inst r uc-
t ions t o be handled by soft war e emulat ion. Table 9- 2 shows t he recom-
mended set t ing of t his flag, depending on t he I A- 32 processor and x87 FPU
or mat h coprocessor present in t he syst em. Table 2- 1 shows t he int eract ion
of t he EM, MP, and TS flags.
Also, when t he EM flag is set , execut ion of an MMX inst ruct ion causes an
invalid- opcode except ion ( # UD) t o be generat ed ( see Table 12- 1) . Thus, if an
I A- 32 or I nt el 64 processor incorporat es MMX t echnology, t he EM flag must
be set t o 0 t o enable execut ion of MMX inst ruct ions.
Similarly for SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, when t he EM flag is
set , execut ion of most SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions causes an
invalid opcode except ion ( # UD) t o be generat ed ( see Table 13- 1) . I f an I A- 32
or I nt el 64 processor incorporat es t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext en-
sions, t he EM flag must be set t o 0 t o enable execut ion of t hese ext ensions.
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions not affect ed by t he EM flag
include: PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH,
CRC32, and POPCNT.
MP Moni t or Copr ocessor ( bi t 1 of CR0) . Cont rols t he int eract ion of t he
WAI T ( or FWAI T) inst ruct ion wit h t he TS flag ( bit 3 of CR0) . I f t he MP flag is
set , a WAI T inst ruct ion generat es a device- not - available except ion ( # NM) if
t he TS flag is also set . I f t he MP flag is clear, t he WAI T inst ruct ion ignores t he
set t ing of t he TS flag. Table 9- 2 shows t he recommended set t ing of t his flag,
depending on t he I A- 32 processor and x87 FPU or mat h coprocessor present
in t he syst em. Table 2- 1 shows t he int eract ion of t he MP, EM, and TS flags.
PE Pr ot ect i on Enabl e ( bi t 0 of CR0) Enables prot ect ed mode when set ;
enables real- address mode when clear. This flag does not enable paging
direct ly. I t only enables segment - level prot ect ion. To enable paging, bot h t he
PE and PG flags must be set .
See also: Sect ion 9.9, Mode Swit ching.
PCD Page- l evel Cache Di sabl e ( bi t 4 of CR3) Cont rols caching of t he first
paging st ruct ure of t he current paging- st ruct ure hierarchy. When t he PCD
1 1 1 #NM Exception #NM exception.
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS
CR0 Flags x87 FPU Instruction Type
Vol. 3 2-23
SYSTEM ARCHITECTURE OVERVIEW
flag is set , caching of t he page- direct ory is prevent ed; when t he flag is clear,
t he page- direct ory can be cached. This flag affect s only t he processor s
int ernal caches ( bot h L1 and L2, when present ) . The processor ignores t his
flag if paging is not used ( t he PG flag in regist er CR0 is clear) or t he CD
( cache disable) flag in CR0 is set .
See also: Chapt er 11, Memory Cache Cont rol ( for more about t he use of
t he PCD flag) and Sect ion 4. 9, Paging and Memory Typing ( for a discussion
of a companion PCD flag in page- direct ory and page- t able ent ries) .
PWT Page- l ev el Wr i t e- Thr ough ( bi t 3 of CR3) Cont rols t he writ e- t hrough or
writ e- back caching policy of t he first paging st ruct ure of t he current paging-
st ruct ure hierarchy. When t he PWT flag is set , writ e- t hrough caching is
enabled; when t he flag is clear, writ e- back caching is enabled. This flag
affect s only int ernal caches ( bot h L1 and L2, when present ) . The processor
ignores t his flag if paging is not used ( t he PG flag in regist er CR0 is clear) or
t he CD ( cache disable) flag in CR0 is set .
See also: Sect ion 11. 5, Cache Cont rol ( for more informat ion about t he use
of t his flag) , and Sect ion 4. 9, Paging and Memory Typing ( for a discussion
of a companion PCD flag in t he page- direct ory and page- t able ent ries) .
VME Vi r t ual - 8086 Mode Ex t ensi ons ( bi t 0 of CR4) Enables int errupt - and
except ion- handling ext ensions in virt ual- 8086 mode when set ; disables t he
ext ensions when clear. Use of t he virt ual mode ext ensions can improve t he
performance of virt ual- 8086 applicat ions by eliminat ing t he overhead of
calling t he virt ual- 8086 monit or t o handle int errupt s and except ions t hat
occur while execut ing an 8086 program and, inst ead, redirect ing t he int er-
rupt s and except ions back t o t he 8086 programs handlers. I t also provides
hardware support for a virt ual int errupt flag ( VI F) t o improve reliabilit y of
running 8086 programs in mult it asking and mult iple- processor environ-
ment s.
See also: Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086
Mode.
PVI Pr ot ect ed- Mode Vi r t ual I nt er r upt s ( bi t 1 of CR4) Enables hardware
support for a virt ual int errupt flag ( VI F) in prot ect ed mode when set ; disables
t he VI F flag in prot ect ed mode when clear.
See also: Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
TSD Ti me St amp Di sabl e ( bi t 2 of CR4) Rest rict s t he execut ion of t he
RDTSC inst ruct ion ( including RDTSCP inst ruct ion if
CPUI D. 80000001H: EDX[ 27] = 1) t o procedures running at privilege level 0
when set ; allows RDTSC inst ruct ion ( including RDTSCP inst ruct ion if
CPUI D. 80000001H: EDX[ 27] = 1) t o be execut ed at any privilege level when
clear.
DE Debuggi ng Ex t ensi ons ( bi t 3 of CR4) References t o debug regist ers
DR4 and DR5 cause an undefined opcode ( # UD) except ion t o be generat ed
2-24 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
when set ; when clear, processor aliases references t o regist ers DR4 and DR5
for compat ibilit y wit h soft ware writ t en t o run on earlier I A- 32 processors.
See also: Sect ion 16. 2. 2, Debug Regist ers DR4 and DR5.
PSE Page Si ze Ex t ensi ons ( bi t 4 of CR4) Enables 4- MByt e pages wit h 32- bit
paging when set ; rest rict s 32- bit paging t o pages t o 4 KByt es when clear.
See also: Sect ion 4.3, 32- Bit Paging.
PAE Phy si cal Addr ess Ex t ensi on ( bi t 5 of CR4) When set , enables paging
t o produce physical addresses wit h more t han 32 bit s. When clear, rest rict s
physical addresses t o 32 bit s. PAE must be set before ent ering I A- 32e mode.
See also: Chapt er 4, Paging.
MCE Machi ne- Check Enabl e ( bi t 6 of CR4) Enables t he machine- check
except ion when set ; disables t he machine- check except ion when clear.
See also: Chapt er 15, Machine- Check Archit ect ure.
PGE Page Gl obal Enabl e ( bi t 7 of CR4) ( I nt roduced in t he P6 family proces-
sors. ) Enables t he global page feat ure when set ; disables t he global page
feat ure when clear. The global page feat ure allows frequent ly used or shared
pages t o be marked as global t o all users ( done wit h t he global flag, bit 8, in
a page- direct ory or page- t able ent ry) . Global pages are not flushed from t he
t ranslat ion- lookaside buffer ( TLB) on a t ask swit ch or a writ e t o regist er CR3.
When enabling t he global page feat ure, paging must be enabled ( by set t ing
t he PG flag in cont rol regist er CR0) before t he PGE flag is set . Reversing t his
sequence may affect program correct ness, and processor performance will
be impact ed.
See also: Sect ion 4. 10, Caching Translat ion I nformat ion.
PCE Per f or mance- Moni t or i ng Count er Enabl e ( bi t 8 of CR4) Enables
execut ion of t he RDPMC inst ruct ion for programs or procedures running at
any prot ect ion level when set ; RDPMC inst ruct ion can be execut ed only at
prot ect ion level 0 when clear.
OSFXSR
Oper at i ng Syst em Suppor t f or FXSAVE and FXRSTOR i nst r uct i ons
( bi t 9 of CR4) When set , t his flag: ( 1) indicat es t o soft ware t hat t he oper-
at ing syst em support s t he use of t he FXSAVE and FXRSTOR inst ruct ions, ( 2)
enables t he FXSAVE and FXRSTOR inst ruct ions t o save and rest ore t he
cont ent s of t he XMM and MXCSR regist ers along wit h t he cont ent s of t he x87
FPU and MMX regist ers, and ( 3) enables t he processor t o execut e
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions, wit h t he except ion of t he PAUSE,
PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH, CRC32, and
POPCNT.
I f t his flag is clear, t he FXSAVE and FXRSTOR inst ruct ions will save and
rest ore t he cont ent s of t he x87 FPU and MMX inst ruct ions, but t hey may not
save and rest ore t he cont ent s of t he XMM and MXCSR regist ers. Also, t he
Vol. 3 2-25
SYSTEM ARCHITECTURE OVERVIEW
processor will generat e an invalid opcode except ion ( # UD) if it at t empt s t o
execut e any SSE/ SSE2/ SSE3 inst ruct ion, wit h t he except ion of PAUSE,
PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH, CRC32, and
POPCNT. The operat ing syst em or execut ive must explicit ly set t his flag.
NOTE
CPUI D feat ure flags FXSR indicat es availabilit y of t he
FXSAVE/ FXRSTOR inst ruct ions. The OSFXSR bit provides operat ing
syst em soft ware wit h a means of enabling FXSAVE/ FXRSTOR t o
save/ rest ore t he cont ent s of t he X87 FPU, XMM and MXCSR regist ers.
Consequent ly OSFXSR bit indicat es t hat t he operat ing syst em
provides cont ext swit ch support for SSE/ SSE2/ SSE3/ SSSE3/ SSE4.
OSXMMEXCPT
Oper at i ng Syst em Suppor t f or Unmask ed SI MD Fl oat i ng- Poi nt Ex cep-
t i ons ( bi t 10 of CR4) When set , indicat es t hat t he operat ing syst em
support s t he handling of unmasked SI MD float ing- point except ions t hrough
an except ion handler t hat is invoked when a SI MD float ing- point except ion
( # XF) is generat ed. SI MD float ing- point except ions are only generat ed by
SSE/ SSE2/ SSE3/ SSE4.1 SI MD float ing- point inst ruct ions.
The operat ing syst em or execut ive must explicit ly set t his flag. I f t his flag is
not set , t he processor will generat e an invalid opcode except ion ( # UD)
whenever it det ect s an unmasked SI MD float ing- point except ion.
VMXE
VMX- Enabl e Bi t ( bi t 13 of CR4) Enables VMX operat ion when set . See
Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext ensions.
SMXE
SMX- Enabl e Bi t ( bi t 14 of CR4) Enables SMX operat ion when set . See
Chapt er 6, Safer Mode Ext ensions Reference of I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 2B.
PCI DE
PCI D- Enabl e Bi t ( bi t 17 of CR4) Enables process- cont ext ident ifiers
( PCI Ds) when set . See Sect ion 4. 10. 1, Process- Cont ext I dent ifiers
( PCI Ds) . Can be set only in I A- 32e mode ( if I A32_EFER. LMA = 1) .
OSXSAVE
XSAVE and Pr ocessor Ex t ended St at es- Enabl e Bi t ( bi t 18 of CR4)
When set , t his flag: ( 1) indicat es ( via CPUI D.01H: ECX. OSXSAVE[ bit 27] )
t hat t he operat ing syst em support s t he use of t he XGETBV, XSAVE and
XRSTOR inst ruct ions by general soft ware; ( 2) enables t he XSAVE and
XRSTOR inst ruct ions t o save and rest ore t he x87 FPU st at e ( including MMX
regist ers) , t he SSE st at e ( XMM regist ers and MXCSR) , along wit h ot her
processor ext ended st at es enabled in t he XFEATURE_ENABLED_MASK
regist er ( XCR0) ; ( 3) enables t he processor t o execut e XGETBV and XSETBV
inst ruct ions in order t o read and writ e XCR0. See Sect ion 2. 6 and Chapt er
2-26 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
13, Syst em Programming for I nst ruct ion Set Ext ensions and Processor
Ext ended St at es .
TPL Task Pr i or i t y Lev el ( bi t 3: 0 of CR8) This set s t he t hreshold value corre-
sponding t o t he highest - priorit y int errupt t o be blocked. A value of 0 means
all int errupt s are enabled. This field is available in 64- bit mode. A value of 15
means all int errupt s will be disabled.
2.5.1 CPUID Qualification of Control Register Flags
Not all flags in cont rol regist er CR4 are implement ed on all processors. Wit h t he
except ion of t he PCE flag, t hey can be qualified wit h t he CPUI D inst ruct ion t o det er-
mine if t hey are implement ed on t he processor before t hey are used.
The CR8 regist er is available on processors t hat support I nt el 64 archit ect ure.
2.6 EXTENDED CONTROL REGISTERS (INCLUDING THE
XFEATURE_ENABLED_MASK REGISTER)
I f CPUI D. 01H: ECX. XSAVE[ bit 26] is 1, t he processor support s one or more
ex t ended cont r ol r egi st er s ( XCRs) . Current ly, t he only such regist er defined is
XCR0, t he XFEATURE_ENABLED_MASK r egi st er. This regist er specifies t he set of
processor st at es t hat t he operat ing syst em enables on t hat processor, e. g. x87 FPU
St at es, SSE st at es, and ot her processor ext ended st at es t hat I nt el 64 archit ect ure
may int roduce in t he fut ure. The OS programs XCR0 t o reflect t he feat ures it
support s.
Figure 2-7. XFEATURE_ENABLED_MASK Register (XCR0)
63
x87 FPU/MMX state (must be 1)
Reserved for XCR0 bit vector expansion
Reserved / Future processor extended states
2 1 0
SSE state
Reserved (must be 0)
1
Vol. 3 2-27
SYSTEM ARCHITECTURE OVERVIEW
Soft ware can access XCR0 only if CR4. OSXSAVE[ bit 18] = 1. ( This bit is also readable
as CPUI D.01H: ECX. OSXSAVE[ bit 27] . ) The layout of XCR0 is archit ect ed t o allow
soft ware t o use CPUI D leaf funct ion 0DH t o enumerat e t he set of bit s t hat t he
processor support s in XCR0 ( see CPUI D inst ruct ion in I nt el 64 and I A- 32 Archit ec-
t ures Soft ware Developers Manual, Volume 2A) . Each processor st at e ( X87 FPU
st at e, SSE st at e, or a fut ure processor ext ended st at e) is represent ed by a bit in
XCR0. The OS can enable fut ure processor ext ended st at es in a forward manner by
specifying t he appropriat e bit mask value using t he XSETBV inst ruct ion according t o
t he result s of t he CPUI D leaf 0DH.
Wit h t he except ion of bit 63, each bit in t he XFEATURE_ENABLED_MASK regist er
( XCR0) corresponds t o a subset of t he processor st at es. XCR0 t hus provides space
for up t o 63 set s of processor st at e ext ensions. Bit 63 of XCR0 is reserved for fut ure
expansion and will not represent a processor ext ended st at e.
Current ly, t he XFEATURE_ENABLED_MASK regist er ( XCR0) has t wo processor st at es
defined, wit h up t o 61 bit s reserved for fut ure processor ext ended st at es:
XCR0. X87 ( bit 0) : I f 1, indicat es x87 FPU st at e ( including MMX regist er st at es) is
support ed in t he processor. Bit 0 must be 1. An at t empt t o writ e 0 causes a # GP
except ion.
XCR0. SSE ( bit 1) : I f 1, indicat es MXCSR and XMM regist ers ( XMM0-XMM15 in 64-
bit mode, ot herwise XMM0-XMM7) are support ed by XSAVE/ XRESTOR in t he
processor.
Any at t empt t o set a reserved bit ( as det ermined by t he cont ent s of EAX and EDX
aft er execut ing CPUI D wit h EAX= 0DH, ECX= 0H) in t he XFEATURE_ENABLED_MASK
regist er for a given processor will result in a # GP except ion. An at t empt t o writ e 0 t o
XFEATURE_ENABLED_MASK. x87 ( bit 0) will result in a # GP except ion.
I f a bit in t he XFEATURE_ENABLED_MASK regist er is 1, XSAVE inst ruct ion can selec-
t ively ( in conj unct ion wit h a save mask) save a part ial or full set of processor st at es
t o memory ( See XSAVE inst ruct ion in I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2B) .
Aft er reset all bit s ( except bit 0) in t he XFEATURE_ENABLED_MASK regist er ( XCR0)
are cleared t o zero. XCR0[ 0] is set t o 1.
2.7 SYSTEM INSTRUCTION SUMMARY
Syst em inst ruct ions handle syst em- level funct ions such as loading syst em regist ers,
managing t he cache, managing int errupt s, or set t ing up t he debug regist ers. Many of
t hese inst ruct ions can be execut ed only by operat ing- syst em or execut ive proce-
dures ( t hat is, procedures running at privilege level 0) . Ot hers can be execut ed at
any privilege level and are t hus available t o applicat ion programs.
Table 2- 2 list s t he syst em inst ruct ions and indicat es whet her t hey are available and
useful for applicat ion programs. These inst ruct ions are described in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volumes 2A & 2B.
2-28 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Table 2-2. Summary of System Instructions
Instruction Description
Useful to
Application?
Protected from
Application?
LLDT Load LDT Register No Yes
SLDT Store LDT Register No No
LGDT Load GDT Register No Yes
SGDT Store GDT Register No No
LTR Load Task Register No Yes
STR Store Task Register No No
LIDT Load IDT Register No Yes
SIDT Store IDT Register No No
MOV CRn Load and store control registers No Yes
SMSW Store MSW Yes No
LMSW Load MSW No Yes
CLTS Clear TS flag in CR0 No Yes
ARPL Adjust RPL Yes
1, 5
No
LAR Load Access Rights Yes No
LSL Load Segment Limit Yes No
VERR Verify for Reading Yes No
VERW Verify for Writing Yes No
MOV DRn Load and store debug registers No Yes
INVD Invalidate cache, no writeback No Yes
WBINVD Invalidate cache, with writeback No Yes
INVLPG Invalidate TLB entry No Yes
HLT Halt Processor No Yes
LOCK (Prefix) Bus Lock Yes No
RSM Return from system management
mode
No Yes
RDMSR
3
Read Model-Specific Registers No Yes
WRMSR
3
Write Model-Specific Registers No Yes
RDPMC
4
Read Performance-Monitoring
Counter
Yes Yes
2
RDTSC
3
Read Time-Stamp Counter Yes Yes
2
Vol. 3 2-29
SYSTEM ARCHITECTURE OVERVIEW
2.7.1 Loading and Storing System Registers
The GDTR, LDTR, I DTR, and TR regist ers each have a load and st ore inst ruct ion for
loading dat a int o and st oring dat a from t he regist er:
LGDT ( Load GDTR Regi st er ) Loads t he GDT base address and limit from
memory int o t he GDTR regist er.
SGDT ( St or e GDTR Regi st er ) St ores t he GDT base address and limit from
t he GDTR regist er int o memory.
LI DT ( Load I DTR Regi st er ) Loads t he I DT base address and limit from
memory int o t he I DTR regist er.
SI DT ( Load I DTR Regi st er St ores t he I DT base address and limit from t he
I DTR regist er int o memory.
LLDT ( Load LDT Regi st er ) Loads t he LDT segment select or and segment
descript or from memory int o t he LDTR. ( The segment select or operand can also
be locat ed in a general- purpose regist er. )
SLDT ( St or e LDT Regi st er ) St ores t he LDT segment select or from t he LDTR
regist er int o memory or a general- purpose regist er.
LTR ( Load Task Regi st er ) Loads segment select or and segment descript or
for a TSS from memory int o t he t ask regist er. ( The segment select or operand can
also be locat ed in a general- purpose regist er. )
RDTSCP
7
Read Serialized Time-Stamp Counter Yes Yes
2
XGETBV Return the state of the the
XFEATURE_ENABLED_MASK register
Yes No
XSETBV Enable one or more processor
extended states
No
6
Yes
NOTES:
1. Useful to application programs running at a CPL of 1 or 2.
2. The TSD and PCE flags in control register CR4 control access to these instructions by application
programs running at a CPL of 3.
3. These instructions were introduced into the IA-32 Architecture with the Pentium processor.
4. This instruction was introduced into the IA-32 Architecture with the Pentium Pro processor and
the Pentium processor with MMX technology.
5. This instruction is not supported in 64-bit mode.
6. Application uses XGETBV to query which set of processor extended states are enabled.
7. RDTSCP is introduced in Intel Core i7 processor.
Table 2-2. Summary of System Instructions (Contd.)
Instruction Description
Useful to
Application?
Protected from
Application?
2-30 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
STR ( St or e Task Regi st er ) St ores t he segment select or for t he current t ask
TSS from t he t ask regist er int o memory or a general- purpose regist er.
The LMSW ( load machine st at us word) and SMSW ( st ore machine st at us word)
inst ruct ions operat e on bit s 0 t hrough 15 of cont rol regist er CR0. These inst ruct ions
are provided for compat ibilit y wit h t he 16- bit I nt el 286 processor. Programs writ t en
t o run on 32- bit I A- 32 processors should not use t hese inst ruct ions. I nst ead, t hey
should access t he cont rol regist er CR0 using t he MOV inst ruct ion.
The CLTS ( clear TS flag in CR0) inst ruct ion is provided for use in handling a
device- not - available except ion ( # NM) t hat occurs when t he processor at t empt s t o
execut e a float ing- point inst ruct ion when t he TS flag is set . This inst ruct ion allows
t he TS flag t o be cleared aft er t he x87 FPU cont ext has been saved, prevent ing
furt her # NM except ions. See Sect ion 2. 5, Cont rol Regist ers, for more informat ion
on t he TS flag.
The cont rol regist ers ( CR0, CR1, CR2, CR3, CR4, and CR8) are loaded using t he MOV
inst ruct ion. The inst ruct ion loads a cont rol regist er from a general- purpose regist er
or st ores t he cont ent of a cont rol regist er in a general- purpose regist er.
2.7.2 Verifying of Access Privileges
The processor provides several inst ruct ions for examining segment select ors
and segment descript ors t o det ermine if access t o t heir associat ed segment s
is allowed. These inst ruct ions duplicat e some of t he aut omat ic access right s
and t ype checking done by t he processor, t hus allowing operat ing- syst em or
execut ive soft ware t o prevent except ions fr om being generat ed.
The ARPL ( adj ust RPL) inst ruct ion adj ust s t he RPL ( request or privilege level)
of a segment select or t o mat ch t hat of t he program or procedure t hat
supplied t he segment select or. See Sect ion 5. 10. 4, Checking Caller Access
Privileges ( ARPL I nst ruct ion) , for a det ailed explanat ion of t he funct ion and
use of t his inst ruct ion. Not e t hat ARPL is not support ed in 64- bit mode.
The LAR ( load access right s) inst ruct ion verifies t he accessibilit y of a speci-
fied segment and loads access right s informat ion from t he segment s
segment descript or int o a general- purpose regist er. Soft ware can t hen
examine t he access right s t o det ermine if t he segment t ype is compat ible
wit h it s int ended use. See Sect ion 5. 10. 1, Checking Access Right s ( LAR
I nst ruct ion) , for a det ailed explanat ion of t he funct ion and use of t his
inst ruct ion.
The LSL ( load segment limit ) inst ruct ion verifies t he accessibilit y of a speci-
fied segment and loads t he segment limit from t he segment s segment
descript or int o a general- purpose regist er. Soft ware can t hen compare t he
segment limit wit h an offset int o t he segment t o det ermine whet her t he
offset lies wit hin t he segment . See Sect ion 5. 10. 3, Checking That t he
Point er Offset I s Wit hin Limit s ( LSL I nst ruct ion) , for a det ailed explanat ion
of t he funct ion and use of t his inst ruct ion.
Vol. 3 2-31
SYSTEM ARCHITECTURE OVERVIEW
The VERR ( verify for reading) and VERW ( verify for writ ing) inst ruct ions
verify if a select ed segment is readable or writ able, r espect ively, at a given
CPL. See Sect ion 5. 10. 2, Checking Read/ Writ e Right s ( VERR and VERW
I nst ruct ions) , for a det ailed explanat ion of t he funct ion and use of t his
inst ruct ion.
2.7.3 Loading and Storing Debug Registers
I nt ernal debugging facilit ies in t he processor are cont rolled by a set of 8 debug regis-
t ers ( DR0- DR7) . The MOV inst ruct ion allows set up dat a t o be loaded t o and st ored
from t hese regist ers.
On processors t hat support I nt el 64 archit ect ure, debug regist ers DR0- DR7 are 64
bit s. I n 32- bit modes and compat ibilit y mode, writ es t o a debug regist er fill t he upper
32 bit s wit h zeros. Reads ret urn t he lower 32 bit s. I n 64- bit mode, t he upper 32 bit s
of DR6- DR7 are reserved and must be writ t en wit h zeros. Writ ing one t o any of t he
upper 32 bit s causes an except ion, # GP( 0) .
I n 64- bit mode, MOV DRn inst ruct ions read or writ e all 64 bit s of a debug regist er
( operand- size prefixes are ignored) . All 64 bit s of DR0- DR3 are writ able by soft ware.
However, MOV DRn inst ruct ions do not check t hat addresses writ t en t o DR0- DR3 are
in t he limit s of t he implement at ion. Address mat ching is support ed only on valid
addresses generat ed by t he processor implement at ion.
2.7.4 Invalidating Caches and TLBs
The processor provides several inst ruct ions for use in explicit ly invalidat ing it s caches
and TLB ent ries. The I NVD ( invalidat e cache wit h no writ eback) inst ruct ion invali-
dat es all dat a and inst ruct ion ent ries in t he int ernal caches and sends a signal t o t he
ext ernal caches indicat ing t hat t hey should be also be invalidat ed.
The WBI NVD ( invalidat e cache wit h writ eback) inst ruct ion performs t he same func-
t ion as t he I NVD inst ruct ion, except t hat it writ es back modified lines in it s int ernal
caches t o memory before it invalidat es t he caches. Aft er invalidat ing t he int ernal
caches, WBI NVD signals ext ernal caches t o writ e back modified dat a and invalidat e
t heir cont ent s.
The I NVLPG ( invalidat e TLB ent ry) inst ruct ion invalidat es ( flushes) t he TLB ent ry for
a specified page.
2.7.5 Controlling the Processor
The HLT ( halt processor) inst ruct ion st ops t he processor unt il an enabled int errupt
( such as NMI or SMI , which are normally enabled) , a debug except ion, t he BI NI T#
signal, t he I NI T# signal, or t he RESET# signal is received. The processor generat es a
special bus cycle t o indicat e t hat t he halt mode has been ent ered.
2-32 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Hardware may respond t o t his signal in a number of ways. An indicat or light on t he
front panel may be t urned on. An NMI int errupt for recording diagnost ic informat ion
may be generat ed. Reset init ializat ion may be invoked ( not e t hat t he BI NI T# pin was
int roduced wit h t he Pent ium Pro processor) . I f any non- wake event s are pending
during shut down, t hey will be handled aft er t he wake event from shut down is
processed ( for example, A20M# int errupt s) .
The LOCK prefix invokes a locked ( at omic) read- modify- writ e operat ion when modi-
fying a memory operand. This mechanism is used t o allow reliable communicat ions
bet ween processors in mult iprocessor syst ems, as described below:
I n t he Pent ium processor and earlier I A- 32 processors, t he LOCK prefix causes
t he processor t o assert t he LOCK# signal during t he inst ruct ion. This always
causes an explicit bus lock t o occur.
I n t he Pent ium 4, I nt el Xeon, and P6 family processors, t he locking operat ion is
handled wit h eit her a cache lock or bus lock. I f a memory access is cacheable and
affect s only a single cache line, a cache lock is invoked and t he syst em bus and
t he act ual memory locat ion in syst em memory are not locked during t he
operat ion. Here, ot her Pent ium 4, I nt el Xeon, or P6 family processors on t he bus
writ e- back any modified dat a and invalidat e t heir caches as necessary t o
maint ain syst em memory coherency. I f t he memory access is not cacheable
and/ or it crosses a cache line boundary, t he processor s LOCK# signal is assert ed
and t he processor does not respond t o request s for bus cont rol during t he locked
operat ion.
The RSM ( ret urn from SMM) inst ruct ion rest ores t he processor ( from a cont ext
dump) t o t he st at e it was in prior t o an syst em management mode ( SMM) int errupt .
2.7.6 Reading Performance-Monitoring and Time-Stamp Counters
The RDPMC ( read performance- monit oring count er) and RDTSC ( read t ime- st amp
count er) inst ruct ions allow applicat ion programs t o read t he processor s perfor-
mance- monit oring and t ime- st amp count ers, respect ively. Processors based on I nt el
Net Burst
microarchit ect ure have eight een 40- bit performance- monit oring
count ers; P6 family processors have t wo 40- bit count ers. I nt el
At om processors
and most of t he processors based on t he I nt el Core microarchit ect ure support t wo
t ypes of performance monit oring count ers: t wo programmable performance
count ers similar t o t hose available in t he P6 family, and t hree fixed- funct ion perfor-
mance monit oring count ers.
The programmable performance count ers can support count ing eit her t he occurrence
or durat ion of event s. Event s t hat can be monit ored on programmable count ers
generally are model specific ( except for archit ect ural performance event s enumer-
at ed by CPUI D leaf 0AH) ; t hey may include t he number of inst ruct ions decoded,
int errupt s received, or t he number of cache loads. I ndividual count ers can be set up
t o monit or different event s. Use t he syst em inst ruct ion WRMSR t o set up values in
I A32_PERFEVTSEL0/ 1 ( for I nt el At om, I nt el Core 2, I nt el Core Duo, and I nt el
Pent ium M processors) , in one of t he 45 ESCRs and one of t he 18 CCCR MSRs ( for
Vol. 3 2-33
SYSTEM ARCHITECTURE OVERVIEW
Pent ium 4 and I nt el Xeon processors) ; or in t he PerfEvt Sel0 or t he PerfEvt Sel1 MSR
( for t he P6 family processors) . The RDPMC inst ruct ion loads t he current count from
t he select ed count er int o t he EDX: EAX regist ers.
Fixed- funct ion performance count ers record only specific event s t hat are defined in
Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext ensions , and t he widt h/ number of
fixed- funct ion count ers are enumerat ed by CPUI D leaf 0AH.
The t ime- st amp count er is a model- specific 64- bit count er t hat is reset t o zero each
t ime t he pr ocessor i s r eset . I f not r eset , t he count er wi l l i ncr ement ~ 9. 5 x 10
16
t i mes per year when t he pr ocessor i s operat i ng at a cl ock rat e of 3GHz. At t hi s
cl ock f r equency, i t woul d t ake over 190 year s f or t he count er t o wrap ar ound. The
RDTSC i nst r uct i on l oads t he cur r ent count of t he t i me- st amp count er i nt o t he
EDX: EAX regist ers.
See Sect ion 30. 1, Performance Monit oring Overview, and Sect ion 16. 11, Time-
St amp Count er, for more informat ion about t he performance monit oring and t ime-
st amp count ers.
The RDTSC inst ruct ion was int roduced int o t he I A- 32 archit ect ure wit h t he Pent ium
processor. The RDPMC inst ruct ion was int roduced int o t he I A- 32 archit ect ure wit h t he
Pent ium Pro processor and t he Pent ium processor wit h MMX t echnology. Earlier
Pent ium processors have t wo performance- monit oring count ers, but t hey can be
read only wit h t he RDMSR inst ruct ion, and only at privilege level 0.
2.7.6.1 Reading Counters in 64-Bit Mode
I n 64- bit mode, RDTSC operat es t he same as in prot ect ed mode. The count in t he
t ime- st amp count er is st ored in EDX: EAX ( or RDX[ 31: 0] : RAX[ 31: 0] wit h
RDX[ 63: 32] : RAX[ 63: 32] cleared) .
RDPMC requires an index t o specify t he offset of t he performance- monit oring
count er. I n 64- bit mode for Pent ium 4 or I nt el Xeon processor families, t he index is
specified in ECX[ 30: 0] . The current count of t he performance- monit oring count er is
st ored in EDX: EAX ( or RDX[ 31: 0] : RAX[ 31: 0] wit h RDX[ 63: 32] : RAX[ 63: 32]
cleared) .
2.7.7 Reading and Writing Model-Specific Registers
The RDMSR ( read model- specific regist er) and WRMSR ( writ e model- specific
regist er) inst ruct ions allow a processor s 64- bit model- specific regist ers ( MSRs) t o be
read and writ t en, respect ively. The MSR t o be read or writ t en is specified by t he value
in t he ECX regist er.
RDMSR reads t he value from t he specified MSR t o t he EDX: EAX regist ers; WRMSR
writ es t he value in t he EDX: EAX regist ers t o t he specified MSR. RDMSR and WRMSR
were int roduced int o t he I A- 32 archit ect ure wit h t he Pent ium processor.
See Sect ion 9.4, Model- Specific Regist ers ( MSRs) , for more informat ion.
2-34 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
2.7.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode
RDMSR and WRMSR require an index t o specify t he address of an MSR. I n 64- bit
mode, t he index is 32 bit s; it is specified using ECX.
2.7.8 Enabling Processor Extended States
The XSETBV inst ruct ion is required t o enable OS support of individual processor
ext ended st at es in t he XFEATURE_ENABLED_MASK regist er ( see Sect ion 2. 6) .
Vol. 3 3-1
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
This chapt er describes t he I nt el 64 and I A- 32 archit ect ures prot ect ed- mode memory
management facilit ies, including t he physical memory requirement s, segment at ion
mechanism, and paging mechanism.
See also: Chapt er 5, Prot ect ion ( for a descript ion of t he processor s prot ect ion
mechanism) and Chapt er 17, 8086 Emulat ion ( for a descript ion of memory
addressing prot ect ion in real- address and virt ual- 8086 modes) .
3.1 MEMORY MANAGEMENT OVERVIEW
The memory management facilit ies of t he I A- 32 archit ect ure are divided int o t wo
part s: segment at ion and paging. Segment at ion provides a mechanism of isolat ing
individual code, dat a, and st ack modules so t hat mult iple programs ( or t asks) can
run on t he same processor wit hout int erfering wit h one anot her. Paging provides a
mechanism for implement ing a convent ional demand- paged, virt ual- memory syst em
where sect ions of a programs execut ion environment are mapped int o physical
memory as needed. Paging can also be used t o provide isolat ion bet ween mult iple
t asks. When operat ing in prot ect ed mode, some form of segment at ion must be used.
Ther e i s no mode bi t t o di sabl e segment at i on. The use of paging, however, is
opt ional.
These t wo mechanisms ( segment at ion and paging) can be configured t o support
simple single- program ( or single- t ask) syst ems, mult it asking syst ems, or mult iple-
processor syst ems t hat used shared memory.
As shown in Figure 3- 1, segment at ion provides a mechanism for dividing t he
processor s addressable memory space ( called t he l i near addr ess space) int o
smaller prot ect ed address spaces called segment s. Segment s can be used t o hold
t he code, dat a, and st ack for a program or t o hold syst em dat a st ruct ures ( such as a
TSS or LDT) . I f more t han one program ( or t ask) is running on a processor, each
program can be assigned it s own set of segment s. The processor t hen enforces t he
boundaries bet ween t hese segment s and insures t hat one program does not int erfere
wit h t he execut ion of anot her program by writ ing int o t he ot her programs segment s.
The segment at ion mechanism also allows t yping of segment s so t hat t he operat ions
t hat may be performed on a part icular t ype of segment can be rest rict ed.
All t he segment s in a syst em are cont ained in t he processor s linear address space.
To locat e a byt e in a part icular segment , a l ogi cal addr ess ( also called a far point er)
must be provided. A logical address consist s of a segment select or and an offset . The
segment select or is a unique ident ifier for a segment . Among ot her t hings it provides
an offset int o a descript or t able ( such as t he global descript or t able, GDT) t o a dat a
st ruct ure called a segment descript or. Each segment has a segment descript or, which
specifies t he size of t he segment , t he access right s and privilege level for t he
3-2 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
segment , t he segment t ype, and t he locat ion of t he first byt e of t he segment in t he
linear address space ( called t he base address of t he segment ) . The offset part of t he
logical address is added t o t he base address for t he segment t o locat e a byt e wit hin
t he segment . The base address plus t he offset t hus forms a l i near addr ess in t he
processor s linear address space.
I f paging is not used, t he linear address space of t he processor is mapped direct ly
int o t he physical address space of processor. The physical address space is defined as
t he range of addresses t hat t he processor can generat e on it s address bus.
Because mult it asking comput ing syst ems commonly define a linear address space
much larger t han it is economically feasible t o cont ain all at once in physical memory,
some met hod of virt ualizing t he linear address space is needed. This virt ualizat ion
of t he linear address space is handled t hrough t he processor s paging mechanism.
Paging support s a virt ual memory environment where a large linear address space
is simulat ed wit h a small amount of physical memory ( RAM and ROM) and some disk
Figure 3-1. Segmentation and Paging
Global Descriptor
Table (GDT)
Linear Address
Space
Segment
Segment
Descriptor
Offset
Logical Address
Segment
Base Address
Page
Phy. Addr.
Lin. Addr.
Segment
Selector
Dir Table Offset
Linear Address
Page Table
Page Directory
Entry
Physical
Space
Entry
(or Far Pointer)
Paging Segmentation
Address
Page
Vol. 3 3-3
PROTECTED-MODE MEMORY MANAGEMENT
st orage. When using paging, each segment is divided int o pages ( t ypically 4 KByt es
each in size) , which are st ored eit her in physical memory or on t he disk. The oper-
at ing syst em or execut ive maint ains a page direct ory and a set of page t ables t o keep
t rack of t he pages. When a program ( or t ask) at t empt s t o access an address locat ion
in t he linear address space, t he processor uses t he page direct ory and page t ables t o
t ranslat e t he linear address int o a physical address and t hen performs t he request ed
operat ion ( read or writ e) on t he memory locat ion.
I f t he page being accessed is not current ly in physical memory, t he processor int er-
rupt s execut ion of t he program ( by generat ing a page- fault except ion) . The oper-
at ing syst em or execut ive t hen reads t he page int o physical memory from t he disk
and cont inues execut ing t he program.
When paging is implement ed properly in t he operat ing- syst em or execut ive, t he
swapping of pages bet ween physical memory and t he disk is t ransparent t o t he
correct execut ion of a program. Even programs writ t en for 16- bit I A- 32 processors
can be paged ( t ransparent ly) when t hey are run in virt ual- 8086 mode.
3.2 USING SEGMENTS
The segment at ion mechanism support ed by t he I A- 32 archit ect ure can be used t o
implement a wide variet y of syst em designs. These designs range from flat models
t hat make only minimal use of segment at ion t o prot ect programs t o mult i-
segment ed models t hat employ segment at ion t o creat e a robust operat ing environ-
ment in which mult iple programs and t asks can be execut ed reliably.
The following sect ions give several examples of how segment at ion can be employed
in a syst em t o improve memory management performance and reliabilit y.
3.2.1 Basic Flat Model
The simplest memory model for a syst em is t he basic flat model, in which t he oper-
at ing syst em and applicat ion programs have access t o a cont inuous, unsegment ed
address space. To t he great est ext ent possible, t his basic flat model hides t he
segment at ion mechanism of t he archit ect ure from bot h t he syst em designer and t he
applicat ion programmer.
To implement a basic flat memory model wit h t he I A- 32 archit ect ure, at least t wo
segment descript ors must be creat ed, one for referencing a code segment and one
for referencing a dat a segment ( see Figure 3- 2) . Bot h of t hese segment s, however,
are mapped t o t he ent ire linear address space: t hat is, bot h segment descript ors
have t he same base address value of 0 and t he same segment limit of 4 GByt es. By
set t ing t he segment limit t o 4 GByt es, t he segment at ion mechanism is kept from
generat ing except ions for out of limit memory references, even if no physical
memory resides at a part icular address. ROM ( EPROM) is generally locat ed at t he t op
of t he physical address space, because t he processor begins execut ion at
3-4 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
FFFF_FFF0H. RAM ( DRAM) is placed at t he bot t om of t he address space because t he
init ial base address for t he DS dat a segment aft er reset init ializat ion is 0.
3.2.2 Protected Flat Model
The prot ect ed flat model is similar t o t he basic flat model, except t he segment limit s
are set t o include only t he range of addresses for which physical memory act ually
exist s ( see Figure 3- 3) . A general- prot ect ion except ion ( # GP) is t hen generat ed on
any at t empt t o access nonexist ent memory. This model provides a minimum level of
hardware prot ect ion against some kinds of program bugs.
Figure 3-2. Flat Model
Figure 3-3. Protected Flat Model
Linear Address Space
(or Physical Memory)
Data and
FFFFFFFFH
Segment
Limit Access
Base Address
Registers
CS
SS
DS
ES
FS
GS
Code
0
Code- and Data-Segment
Descriptors
Stack
Not Present
Linear Address Space
(or Physical Memory)
Data and
FFFFFFFFH
Segment
Limit Access
Base Address
Registers
CS
ES
SS
DS
FS
GS
Code
0
Segment
Descriptors
Limit Access
Base Address
Memory I/O
Stack
Not Present
Vol. 3 3-5
PROTECTED-MODE MEMORY MANAGEMENT
More complexit y can be added t o t his prot ect ed flat model t o provide more prot ec-
t ion. For example, for t he paging mechanism t o provide isolat ion bet ween user and
supervisor code and dat a, four segment s need t o be defined: code and dat a
segment s at privilege level 3 for t he user, and code and dat a segment s at privilege
level 0 for t he supervisor. Usually t hese segment s all overlay each ot her and st art at
address 0 in t he linear address space. This flat segment at ion model along wit h a
simple paging st ruct ure can prot ect t he operat ing syst em from applicat ions, and by
adding a separat e paging st ruct ure for each t ask or process, it can also prot ect appli-
cat ions from each ot her. Similar designs are used by several popular mult it asking
operat ing syst ems.
3.2.3 Multi-Segment Model
A mult i- segment model ( such as t he one shown in Figure 3- 4) uses t he full capabili-
t ies of t he segment at ion mechanism t o provided hardware enforced prot ect ion of
code, dat a st ruct ures, and programs and t asks. Here, each program ( or t ask) is given
it s own t able of segment descript ors and it s own segment s. The segment s can be
complet ely privat e t o t heir assigned programs or shared among programs. Access t o
all segment s and t o t he execut ion environment s of individual programs running on
t he syst em is cont rolled by hardware.
3-6 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
Access checks can be used t o prot ect not only against referencing an address out side
t he limit of a segment , but also against performing disallowed operat ions in cert ain
segment s. For example, since code segment s are designat ed as read- only segment s,
hardware can be used t o prevent writ es int o code segment s. The access right s infor-
mat ion creat ed for segment s can also be used t o set up prot ect ion rings or levels.
Prot ect ion levels can be used t o prot ect operat ing- syst em procedures from unaut ho-
rized access by applicat ion programs.
3.2.4 Segmentation in IA-32e Mode
I n I A- 32e mode of I nt el 64 archit ect ure, t he effect s of segment at ion depend on
whet her t he processor is running in compat ibilit y mode or 64- bit mode. I n compat i-
bilit y mode, segment at ion funct ions j ust as it does using legacy 16- bit or 32- bit
prot ect ed mode semant ics.
Figure 3-4. Multi-Segment Model
Linear Address Space
(or Physical Memory)
Segment
Registers
CS
Segment
Descriptors
Limit Access
Base Address
SS
Limit Access
Base Address
DS
Limit Access
Base Address
ES
Limit Access
Base Address
FS
Limit Access
Base Address
GS
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Stack
Code
Data
Data
Data
Data
Vol. 3 3-7
PROTECTED-MODE MEMORY MANAGEMENT
I n 64- bit mode, segment at ion is generally ( but not complet ely) disabled, creat ing a
flat 64- bit linear- address space. The processor t reat s t he segment base of CS, DS,
ES, SS as zero, creat ing a linear address t hat is equal t o t he effect ive address. The FS
and GS segment s are except ions. These segment regist ers ( which hold t he segment
base) can be used as an addit ional base regist ers in linear address calculat ions. They
facilit at e addressing local dat a and cert ain operat ing syst em dat a st ruct ures.
Not e t hat t he processor does not perform segment limit checks at runt ime in 64- bit
mode.
3.2.5 Paging and Segmentation
Paging can be used wit h any of t he segment at ion models described in Figures 3- 2,
3- 3, and 3- 4. The processor s paging mechanism divides t he linear address space
( int o which segment s are mapped) int o pages ( as shown in Figure 3- 1) . These linear-
address- space pages are t hen mapped t o pages in t he physical address space. The
paging mechanism offers several page- level prot ect ion facilit ies t hat can be used
wit h or inst ead of t he segment - prot ect ion facilit ies. For example, it let s read- writ e
prot ect ion be enforced on a page- by- page basis. The paging mechanism also
provides t wo- level user- supervisor prot ect ion t hat can also be specified on a page-
by- page basis.
3.3 PHYSICAL ADDRESS SPACE
I n prot ect ed mode, t he I A- 32 archit ect ure provides a normal physical address space
of 4 GByt es ( 2
32
byt es) . This is t he address space t hat t he processor can address on
it s address bus. This address space is flat ( unsegment ed) , wit h addresses ranging
cont inuously from 0 t o FFFFFFFFH. This physical address space can be mapped t o
read- writ e memory, read- only memory, and memory mapped I / O. The memory
mapping facilit ies described in t his chapt er can be used t o divide t his physical
memory up int o segment s and/ or pages.
St art ing wit h t he Pent ium Pro processor, t he I A- 32 archit ect ure also support s an
ext ension of t he physical address space t o 2
36
byt es ( 64 GByt es) ; wit h a maximum
physical address of FFFFFFFFFH. This ext ension is invoked in eit her of t wo ways:
Using t he physical address ext ension ( PAE) flag, locat ed in bit 5 of cont rol
regist er CR4.
Using t he 36- bit page size ext ension ( PSE- 36) feat ure ( int roduced in t he Pent ium
III processors) .
Physical address support has since been ext ended beyond 36 bit s. See Chapt er 4,
Paging for more informat ion about 36- bit physical addressing.
3-8 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.3.1 Intel
64 Processors and Physical Address Space
On processors t hat support I nt el 64 archit ect ure ( CPUI D. 80000001: EDX[ 29] = 1) ,
t he size of t he physical address range is implement at ion- specific and indicat ed by
CPUI D.80000008H: EAX[ bit s 7- 0] .
For t he format of informat ion ret urned in EAX, see CPUI DCPU I dent ificat ion in
Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2A. See also: Chapt er 4, Paging.
3.4 LOGICAL AND LINEAR ADDRESSES
At t he syst em- archit ect ure level in prot ect ed mode, t he processor uses t wo st ages of
address t ranslat ion t o arrive at a physical address: logical- address t ranslat ion and
linear address space paging.
Even wit h t he minimum use of segment s, every byt e in t he processor s address
space is accessed wit h a logical address. A logical address consist s of a 16- bit
segment select or and a 32- bit offset ( see Figure 3- 5) . The segment select or ident i-
fies t he segment t he byt e is locat ed in and t he offset specifies t he locat ion of t he byt e
in t he segment relat ive t o t he base address of t he segment .
The processor t ranslat es every logical address int o a linear address. A linear address
is a 32- bit address in t he processor s linear address space. Like t he physical address
space, t he linear address space is a flat ( unsegment ed) , 2
32
- byt e address space,
wit h addresses ranging from 0 t o FFFFFFFFH. The linear address space cont ains all
t he segment s and syst em t ables defined for a syst em.
To t ranslat e a logical address int o a linear address, t he processor does t he following:
1. Uses t he offset in t he segment select or t o locat e t he segment descript or for t he
segment in t he GDT or LDT and reads it int o t he processor. ( This st ep is needed
only when a new segment select or is loaded int o a segment regist er. )
2. Examines t he segment descript or t o check t he access right s and range of t he
segment t o insure t hat t he segment is accessible and t hat t he offset is wit hin t he
limit s of t he segment .
3. Adds t he base address of t he segment from t he segment descript or t o t he offset
t o form a linear address.
Vol. 3 3-9
PROTECTED-MODE MEMORY MANAGEMENT
I f paging is not used, t he processor maps t he linear address direct ly t o a physical
address ( t hat is, t he linear address goes out on t he processor s address bus) . I f t he
linear address space is paged, a second level of address t ranslat ion is used t o t rans-
lat e t he linear address int o a physical address.
See also: Chapt er 4, Paging.
3.4.1 Logical Address Translation in IA-32e Mode
I n I A- 32e mode, an I nt el 64 processor uses t he st eps described above t o t ranslat e a
logical address t o a linear address. I n 64- bit mode, t he offset and base address of t he
segment are 64- bit s inst ead of 32 bit s. The linear address format is also 64 bit s wide
and is subj ect t o t he canonical form requirement .
Each code segment descript or provides an L bit . This bit allows a code segment t o
execut e 64- bit code or legacy 32- bit code by code segment .
3.4.2 Segment Selectors
A segment select or is a 16- bit ident ifier for a segment ( see Figure 3- 6) . I t does not
point direct ly t o t he segment , but inst ead point s t o t he segment descript or t hat
defines t he segment . A segment select or cont ains t he following it ems:
I ndex ( Bit s 3 t hrough 15) Select s one of 8192 descript ors in t he GDT or
LDT. The processor mult iplies t he index value by 8 ( t he number of
byt es in a segment descript or) and adds t he result t o t he base address
of t he GDT or LDT ( from t he GDTR or LDTR regist er, respect ively) .
Figure 3-5. Logical Address to Linear Address Translation
Offset (Effective Address)
0
Base Address
Descriptor Table
Segment
Descriptor
31(63)
Seg. Selector
0 15
Logical
Address
+
Linear Address
0 31(63)
3-10 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
TI ( t abl e i ndi cat or ) f l ag
( Bit 2) Specifies t he descript or t able t o use: clearing t his flag
select s t he GDT; set t ing t his flag select s t he current LDT.
Request ed Pr i vi l ege Lev el ( RPL)
( Bit s 0 and 1) Specifies t he privilege level of t he select or. The priv-
ilege level can range from 0 t o 3, wit h 0 being t he most privileged
level. See Sect ion 5. 5, Privilege Levels , for a descript ion of t he rela-
t ionship of t he RPL t o t he CPL of t he execut ing program ( or t ask) and
t he descript or privilege level ( DPL) of t he descript or t he segment
select or point s t o.
The first ent ry of t he GDT is not used by t he processor. A segment select or t hat point s
t o t his ent ry of t he GDT ( t hat is, a segment select or wit h an index of 0 and t he TI flag
set t o 0) is used as a null segment select or. The processor does not generat e an
except ion when a segment regist er ( ot her t han t he CS or SS regist ers) is loaded wit h
a null select or. I t does, however, generat e an except ion when a segment regist er
holding a null select or is used t o access memory. A null select or can be used t o
init ialize unused segment regist ers. Loading t he CS or SS regist er wit h a null
segment select or causes a general- prot ect ion except ion ( # GP) t o be generat ed.
Segment select ors are visible t o applicat ion programs as part of a point er variable,
but t he values of select ors are usually assigned or modified by link edit ors or linking
loaders, not applicat ion programs.
3.4.3 Segment Registers
To reduce address t ranslat ion t ime and coding complexit y, t he processor provides
regist ers for holding up t o 6 segment select ors ( see Figure 3- 7) . Each of t hese
segment regist ers support a specific kind of memory reference ( code, st ack, or
dat a) . For virt ually any kind of program execut ion t o t ake place, at least t he code-
segment ( CS) , dat a- segment ( DS) , and st ack- segment ( SS) regist ers must be
loaded wit h valid segment select ors. The processor also provides t hree addit ional
dat a- segment regist ers ( ES, FS, and GS) , which can be used t o make addit ional dat a
segment s available t o t he current ly execut ing program ( or t ask) .
Figure 3-6. Segment Selector
15 3 2 1 0
T
I
Index
Table Indicator
0 = GDT
1 = LDT
Requested Privilege Level (RPL)
RPL
Vol. 3 3-11
PROTECTED-MODE MEMORY MANAGEMENT
For a program t o access a segment , t he segment select or for t he segment must have
been loaded in one of t he segment regist ers. So, alt hough a syst em can define t hou-
sands of segment s, only 6 can be available for immediat e use. Ot her segment s can
be made available by loading t heir segment select ors int o t hese regist ers during
program execut ion.
Every segment regist er has a visible part and a hidden part . ( The hidden part is
somet imes referred t o as a descript or cache or a shadow regist er. ) When a
segment select or is loaded int o t he visible part of a segment regist er, t he processor
also loads t he hidden part of t he segment regist er wit h t he base address, segment
limit , and access cont rol informat ion from t he segment descript or point ed t o by t he
segment select or. The informat ion cached in t he segment regist er ( visible and
hidden) allows t he processor t o t ranslat e addresses wit hout t aking ext ra bus cycles
t o read t he base address and limit from t he segment descript or. I n syst ems in which
mult iple processors have access t o t he same descript or t ables, it is t he responsibilit y
of soft ware t o reload t he segment regist ers when t he descript or t ables are modified.
I f t his is not done, an old segment descript or cached in a segment regist er might be
used aft er it s memory- resident version has been modified.
Two kinds of load inst ruct ions are provided for loading t he segment regist ers:
1. Direct load inst ruct ions such as t he MOV, POP, LDS, LES, LSS, LGS, and LFS
inst ruct ions. These inst ruct ions explicit ly reference t he segment regist ers.
2. I mplied load inst ruct ions such as t he far point er versions of t he CALL, JMP, and
RET inst ruct ions, t he SYSENTER and SYSEXI T inst ruct ions, and t he I RET, I NTn,
I NTO and I NT3 inst ruct ions. These inst ruct ions change t he cont ent s of t he CS
regist er ( and somet imes ot her segment regist ers) as an incident al part of t heir
operat ion.
The MOV inst ruct ion can also be used t o st ore visible part of a segment regist er in a
general- purpose regist er.
Figure 3-7. Segment Registers
CS
SS
DS
ES
FS
GS
Segment Selector Base Address, Limit, Access Information
Visible Part Hidden Part
3-12 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.4.4 Segment Loading Instructions in IA-32e Mode
Because ES, DS, and SS segment regist ers are not used in 64- bit mode, t heir fields
( base, limit , and at t ribut e) in segment descript or regist ers are ignored. Some forms
of segment load inst ruct ions are also invalid ( for example, LDS, POP ES) . Address
calculat ions t hat reference t he ES, DS, or SS segment s are t reat ed as if t he segment
base is zero.
The processor checks t hat all linear- address references are in canonical form inst ead
of performing limit checks. Mode swit ching does not change t he cont ent s of t he
segment regist ers or t he associat ed descript or regist ers. These regist ers are also not
changed during 64- bit mode execut ion, unless explicit segment loads are performed.
I n order t o set up compat ibilit y mode for an applicat ion, segment - load inst ruct ions
( MOV t o Sreg, POP Sreg) work normally in 64- bit mode. An ent ry is read from t he
syst em descript or t able ( GDT or LDT) and is loaded in t he hidden port ion of t he
segment descript or regist er. The descript or- regist er base, limit , and at t ribut e fields
are all loaded. However, t he cont ent s of t he dat a and st ack segment select or and t he
descript or regist ers are ignored.
When FS and GS segment overrides are used in 64- bit mode, t heir respect ive base
addresses are used in t he linear address calculat ion: ( FS or GS) . base + index +
displacement . FS. base and GS. base are t hen expanded t o t he full linear- address size
support ed by t he implement at ion. The result ing effect ive address calculat ion can
wrap across posit ive and negat ive addresses; t he result ing linear address must be
canonical.
I n 64- bit mode, memory accesses using FS- segment and GS- segment overrides are
not checked for a runt ime limit nor subj ect ed t o at t ribut e- checking. Normal segment
loads ( MOV t o Sreg and POP Sreg) int o FS and GS load a st andard 32- bit base value
in t he hidden port ion of t he segment descript or regist er. The base address bit s above
t he st andard 32 bit s are cleared t o 0 t o allow consist ency for implement at ions t hat
use less t han 64 bit s.
The hidden descript or regist er fields for FS. base and GS. base are physically mapped
t o MSRs in order t o load all address bit s support ed by a 64- bit implement at ion. Soft -
ware wit h CPL = 0 ( privileged soft ware) can load all support ed linear- address bit s
int o FS. base or GS. base using WRMSR. Addresses writ t en int o t he 64- bit FS. base and
GS. base regist ers must be in canonical form. A WRMSR inst ruct ion t hat at t empt s t o
writ e a non- canonical address t o t hose regist ers causes a # GP fault .
When in compat ibilit y mode, FS and GS overrides operat e as defined by 32- bit mode
behavior regardless of t he value loaded int o t he upper 32 linear- address bit s of t he
hidden descript or regist er base field. Compat ibilit y mode ignores t he upper 32 bit s
when calculat ing an effect ive address.
A new 64- bit mode inst ruct ion, SWAPGS, can be used t o load GS base. SWAPGS
exchanges t he kernel dat a st ruct ure point er from t he I A32_KernelGSbase MSR wit h
t he GS base regist er. The kernel can t hen use t he GS prefix on normal memory refer-
ences t o access t he kernel dat a st ruct ures. An at t empt t o writ e a non- canonical value
( using WRMSR) t o t he I A32_KernelGSBase MSR causes a # GP fault .
Vol. 3 3-13
PROTECTED-MODE MEMORY MANAGEMENT
3.4.5 Segment Descriptors
A segment descript or is a dat a st ruct ure in a GDT or LDT t hat provides t he processor
wit h t he size and locat ion of a segment , as well as access cont rol and st at us informa-
t ion. Segment descript ors are t ypically creat ed by compilers, linkers, loaders, or t he
operat ing syst em or execut ive, but not applicat ion programs. Figure 3- 8 illust rat es
t he general descript or format for all t ypes of segment descript ors.
The flags and fields in a segment descript or are as follows:
Segment l i mi t f i el d
Specifies t he size of t he segment . The processor put s t oget her t he
t wo segment limit fields t o form a 20- bit value. The processor int er-
pret s t he segment limit in one of t wo ways, depending on t he set t ing
of t he G ( granularit y) flag:
I f t he granularit y flag is clear, t he segment size can range from
1 byt e t o 1 MByt e, in byt e increment s.
I f t he granularit y flag is set , t he segment size can range from
4 KByt es t o 4 GByt es, in 4- KByt e increment s.
The processor uses t he segment limit in t wo different ways,
depending on whet her t he segment is an expand- up or an expand-
down segment . See Sect ion 3. 4.5. 1, Code- and Dat a- Segment
Descript or Types , for more informat ion about segment t ypes. For
expand- up segment s, t he offset in a logical address can range from 0
Figure 3-8. Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type S L
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
D
/
B
A
V
L
Seg.
Limit
19:16
G Granularity
LIMIT Segment Limit
P Segment present
S Descriptor type (0 = system; 1 = code or data)
TYPE Segment type
DPL Descriptor privilege level
AVL Available for use by system software
BASE Segment base address
D/B Default operation size (0 = 16-bit segment; 1 = 32-bit segment)
L 64-bit code segment (IA-32e mode only)
3-14 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
t o t he segment limit . Offset s great er t han t he segment limit generat e
general- prot ect ion except ions ( # GP) . For expand- down segment s,
t he segment limit has t he reverse funct ion; t he offset can range from
t he segment limit t o FFFFFFFFH or FFFFH, depending on t he set t ing of
t he B flag. Offset s less t han t he segment limit generat e general-
prot ect ion except ions. Decreasing t he value in t he segment limit field
for an expand- down segment allocat es new memory at t he bot t om of
t he segment ' s address space, rat her t han at t he t op. I A- 32 archit ec-
t ure st acks always grow downwards, making t his mechanism conve-
nient for expandable st acks.
Base addr ess f i el ds
Defines t he locat ion of byt e 0 of t he segment wit hin t he 4- GByt e
linear address space. The processor put s t oget her t he t hree base
address fields t o form a single 32- bit value. Segment base addresses
should be aligned t o 16- byt e boundaries. Alt hough 16- byt e alignment
is not required, t his alignment allows programs t o maximize perfor-
mance by aligning code and dat a on 16- byt e boundaries.
Type f i el d I ndicat es t he segment or gat e t ype and specifies t he kinds of access
t hat can be made t o t he segment and t he direct ion of growt h. The
int erpret at ion of t his field depends on whet her t he descript or t ype flag
specifies an applicat ion ( code or dat a) descript or or a syst em
descript or. The encoding of t he t ype field is different for code, dat a,
and syst em descript ors ( see Figure 5- 1) . See Sect ion 3. 4. 5.1, Code-
and Dat a- Segment Descript or Types , for a descript ion of how t his
field is used t o specify code and dat a- segment t ypes.
S ( descr i pt or t ype) f l ag
Specifies whet her t he segment descript or is for a syst em segment
( S flag is clear) or a code or dat a segment ( S flag is set ) .
DPL ( descr i pt or pr i vi l ege l ev el ) f i el d
Specifies t he privilege level of t he segment . The privilege level can
range from 0 t o 3, wit h 0 being t he most privileged level. The DPL is
used t o cont rol access t o t he segment . See Sect ion 5.5, Privilege
Levels , for a descript ion of t he relat ionship of t he DPL t o t he CPL of
t he execut ing code segment and t he RPL of a segment select or.
P ( segment - pr esent ) f l ag
I ndicat es whet her t he segment is present in memory ( set ) or not
present ( clear) . I f t his flag is clear, t he processor generat es a
segment - not - present except ion ( # NP) when a segment select or t hat
point s t o t he segment descript or is loaded int o a segment regist er.
Memory management soft ware can use t his flag t o cont rol which
segment s are act ually loaded int o physical memory at a given t ime. I t
offers a cont rol in addit ion t o paging for managing virt ual memory.
Figure 3- 9 shows t he format of a segment descript or when t he
segment - present flag is clear. When t his flag is clear, t he operat ing
syst em or execut ive is free t o use t he locat ions marked Available t o
Vol. 3 3-15
PROTECTED-MODE MEMORY MANAGEMENT
st ore it s own dat a, such as informat ion regarding t he whereabout s of
t he missing segment .
D/ B ( def aul t oper at i on si ze/ def aul t st ack poi nt er si ze and/ or upper bound)
f l ag
Performs different funct ions depending on whet her t he segment
descript or is an execut able code segment , an expand- down dat a
segment , or a st ack segment . ( This flag should always be set t o 1 for
32- bit code and dat a segment s and t o 0 for 16- bit code and dat a
segment s. )
Ex ecut abl e code segment . The flag is called t he D flag and it
indicat es t he default lengt h for effect ive addresses and operands
referenced by inst ruct ions in t he segment . I f t he flag is set , 32- bit
addresses and 32- bit or 8- bit operands are assumed; if it is clear,
16- bit addresses and 16- bit or 8- bit operands are assumed.
The inst ruct ion prefix 66H can be used t o select an operand size
ot her t han t he default , and t he prefix 67H can be used select an
address size ot her t han t he default .
St ack segment ( dat a segment poi nt ed t o by t he SS
r egi st er ) . The flag is called t he B ( big) flag and it specifies t he
size of t he st ack point er used for implicit st ack operat ions ( such as
pushes, pops, and calls) . I f t he flag is set , a 32- bit st ack point er is
used, which is st ored in t he 32- bit ESP regist er; if t he flag is clear,
a 16- bit st ack point er is used, which is st ored in t he 16- bit SP
regist er. I f t he st ack segment is set up t o be an expand- down dat a
segment ( described in t he next paragraph) , t he B flag also
specifies t he upper bound of t he st ack segment .
Ex pand- dow n dat a segment . The flag is called t he B flag and it
specifies t he upper bound of t he segment . I f t he flag is set , t he
upper bound is FFFFFFFFH ( 4 GByt es) ; if t he flag is clear, t he
upper bound is FFFFH ( 64 KByt es) .
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear
31 16 15 13 14 12 11 8 7 0
0 Available
D
P
L
Type S 4
31 0
Available
0
Available
3-16 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
G ( gr anul ar i t y ) f l ag
Det ermines t he scaling of t he segment limit field. When t he granu-
larit y flag is clear, t he segment limit is int erpret ed in byt e unit s; when
flag is set , t he segment limit is int erpret ed in 4- KByt e unit s. ( This flag
does not affect t he granularit y of t he base address; it is always byt e
granular. ) When t he granularit y flag is set , t he t welve least significant
bit s of an offset are not t est ed when checking t he offset against t he
segment limit . For example, when t he granularit y flag is set , a limit of
0 result s in valid offset s from 0 t o 4095.
L ( 64- bi t code segment ) f l ag
I n I A- 32e mode, bit 21 of t he second doubleword of t he segment
descript or indicat es whet her a code segment cont ains nat ive 64- bit
code. A value of 1 indicat es inst ruct ions in t his code segment are
execut ed in 64- bit mode. A value of 0 indicat es t he inst ruct ions in t his
code segment are execut ed in compat ibilit y mode. I f L- bit is set , t hen
D- bit must be cleared. When not in I A- 32e mode or for non- code
segment s, bit 21 is reserved and should always be set t o 0.
Avai l abl e and r eser v ed bi t s
Bit 20 of t he second doubleword of t he segment descript or is available
for use by syst em soft ware.
3.4.5.1 Code- and Data-Segment Descriptor Types
When t he S ( descript or t ype) flag in a segment descript or is set , t he descript or is for
eit her a code or a dat a segment . The highest order bit of t he t ype field ( bit 11 of t he
second double word of t he segment descript or) t hen det ermines whet her t he
descript or is for a dat a segment ( clear) or a code segment ( set ) .
For dat a segment s, t he t hree low- order bit s of t he t ype field ( bit s 8, 9, and 10) are
int erpret ed as accessed ( A) , writ e- enable ( W) , and expansion- direct ion ( E) . See
Table 3- 1 for a descript ion of t he encoding of t he bit s in t he t ype field for code and
dat a segment s. Dat a segment s can be read- only or read/ writ e segment s, depending
on t he set t ing of t he writ e- enable bit .
Vol. 3 3-17
PROTECTED-MODE MEMORY MANAGEMENT
St ack segment s are dat a segment s which must be read/ writ e segment s. Loading t he
SS regist er wit h a segment select or for a nonwrit able dat a segment generat es a
general- prot ect ion except ion ( # GP) . I f t he size of a st ack segment needs t o be
changed dynamically, t he st ack segment can be an expand- down dat a segment
( expansion- direct ion flag set ) . Here, dynamically changing t he segment limit causes
st ack space t o be added t o t he bot t om of t he st ack. I f t he size of a st ack segment is
int ended t o remain st at ic, t he st ack segment may be eit her an expand- up or expand-
down t ype.
The accessed bit indicat es whet her t he segment has been accessed since t he last
t ime t he operat ing- syst em or execut ive cleared t he bit . The processor set s t his bit
whenever it loads a segment select or for t he segment int o a segment regist er,
assuming t hat t he t ype of memory t hat cont ains t he segment descript or support s
processor writ es. The bit remains set unt il explicit ly cleared. This bit can be used bot h
for virt ual memory management and for debugging.
Table 3-1. Code- and Data-Segment Types
Type Field Descriptor
Type
Description
Decimal 11 10
E
9
W
8
A
0 0 0 0 0 Data Read-Only
1 0 0 0 1 Data Read-Only, accessed
2 0 0 1 0 Data Read/Write
3 0 0 1 1 Data Read/Write, accessed
4 0 1 0 0 Data Read-Only, expand-down
5 0 1 0 1 Data Read-Only, expand-down, accessed
6 0 1 1 0 Data Read/Write, expand-down
7 0 1 1 1 Data Read/Write, expand-down, accessed
C R A
8 1 0 0 0 Code Execute-Only
9 1 0 0 1 Code Execute-Only, accessed
10 1 0 1 0 Code Execute/Read
11 1 0 1 1 Code Execute/Read, accessed
12 1 1 0 0 Code Execute-Only, conforming
13 1 1 0 1 Code Execute-Only, conforming, accessed
14 1 1 1 0 Code Execute/Read, conforming
15 1 1 1 1 Code Execute/Read, conforming, accessed
3-18 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
For code segment s, t he t hree low- order bit s of t he t ype field are int erpret ed as
accessed ( A) , read enable ( R) , and conforming ( C) . Code segment s can be execut e-
only or execut e/ read, depending on t he set t ing of t he read- enable bit . An
execut e/ read segment might be used when const ant s or ot her st at ic dat a have been
placed wit h inst ruct ion code in a ROM. Here, dat a can be read from t he code segment
eit her by using an inst ruct ion wit h a CS override prefix or by loading a segment
select or for t he code segment in a dat a- segment regist er ( t he DS, ES, FS, or GS
regist ers) . I n prot ect ed mode, code segment s are not writ able.
Code segment s can be eit her conforming or nonconforming. A t ransfer of execut ion
int o a more- privileged conforming segment allows execut ion t o cont inue at t he
current privilege level. A t ransfer int o a nonconforming segment at a different privi-
lege level result s in a general- prot ect ion except ion ( # GP) , unless a call gat e or t ask
gat e is used ( see Sect ion 5. 8. 1, Direct Calls or Jumps t o Code Segment s , for more
informat ion on conforming and nonconforming code segment s) . Syst em ut ilit ies t hat
do not access prot ect ed facilit ies and handlers for some t ypes of except ions ( such as,
divide error or overflow) may be loaded in conforming code segment s. Ut ilit ies t hat
need t o be prot ect ed from less privileged programs and procedures should be placed
in nonconforming code segment s.
NOTE
Execut ion cannot be t ransferred by a call or a j ump t o a less-
privileged ( numerically higher privilege level) code segment ,
regardless of whet her t he t arget segment is a conforming or noncon-
forming code segment . At t empt ing such an execut ion t ransfer will
result in a general- prot ect ion except ion.
All dat a segment s are nonconforming, meaning t hat t hey cannot be accessed by less
privileged programs or procedures ( code execut ing at numerically high privilege
levels) . Unlike code segment s, however, dat a segment s can be accessed by more
privileged programs or procedures ( code execut ing at numerically lower privilege
levels) wit hout using a special access gat e.
I f t he segment descript ors in t he GDT or an LDT are placed in ROM, t he processor can
ent er an indefinit e loop if soft ware or t he processor at t empt s t o updat e ( writ e t o) t he
ROM- based segment descript ors. To prevent t his problem, set t he accessed bit s for
all segment descript ors placed in a ROM. Also, remove operat ing- syst em or execut ive
code t hat at t empt s t o modify segment descript ors locat ed in ROM.
3.5 SYSTEM DESCRIPTOR TYPES
When t he S ( descript or t ype) flag in a segment descript or is clear, t he descript or t ype
is a syst em descript or. The processor recognizes t he following t ypes of syst em
descript ors:
Local descript or- t able ( LDT) segment descript or.
Vol. 3 3-19
PROTECTED-MODE MEMORY MANAGEMENT
Task- st at e segment ( TSS) descript or.
Call- gat e descript or.
I nt errupt - gat e descript or.
Trap- gat e descript or.
Task- gat e descript or.
These descript or t ypes fall int o t wo cat egories: syst em- segment descript ors and gat e
descript ors. Syst em- segment descript ors point t o syst em segment s ( LDT and TSS
segment s) . Gat e descript ors are in t hemselves gat es, which hold point ers t o proce-
dure ent ry point s in code segment s ( call, int errupt , and t rap gat es) or which hold
segment select ors for TSSs ( t ask gat es) .
Table 3- 2 shows t he encoding of t he t ype field for syst em- segment descript ors and
gat e descript ors. Not e t hat syst em descript ors in I A- 32e mode are 16 byt es inst ead
of 8 byt es.
Table 3-2. System-Segment and Gate-Descriptor Types
Type Field Description
Decimal 11 10 9 8 32-Bit Mode IA-32e Mode
0 0 0 0 0 Reserved Upper 8 byte of an 16-
byte descriptor
1 0 0 0 1 16-bit TSS (Available) Reserved
2 0 0 1 0 LDT LDT
3 0 0 1 1 16-bit TSS (Busy) Reserved
4 0 1 0 0 16-bit Call Gate Reserved
5 0 1 0 1 Task Gate Reserved
6 0 1 1 0 16-bit Interrupt Gate Reserved
7 0 1 1 1 16-bit Trap Gate Reserved
8 1 0 0 0 Reserved Reserved
9 1 0 0 1 32-bit TSS (Available) 64-bit TSS (Available)
10 1 0 1 0 Reserved Reserved
11 1 0 1 1 32-bit TSS (Busy) 64-bit TSS (Busy)
12 1 1 0 0 32-bit Call Gate 64-bit Call Gate
13 1 1 0 1 Reserved Reserved
14 1 1 1 0 32-bit Interrupt Gate 64-bit Interrupt Gate
15 1 1 1 1 32-bit Trap Gate 64-bit Trap Gate
3-20 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
See also: Sect ion 3.5. 1, Segment Descript or Tables , and Sect ion 7. 2. 2, TSS
Descript or ( for more informat ion on t he syst em- segment descript ors) ; see Sect ion
5.8. 3, Call Gat es , Sect ion 6.11, I DT Descript ors , and Sect ion 7. 2.5, Task- Gat e
Descript or ( for more informat ion on t he gat e descript ors) .
3.5.1 Segment Descriptor Tables
A segment descript or t able is an array of segment descript ors ( see Figure 3- 10) . A
descript or t able is variable in lengt h and can cont ain up t o 8192 ( 2
13
) 8- byt e descrip-
t ors. There are t wo kinds of descript or t ables:
The global descript or t able ( GDT)
The local descript or t ables ( LDT)
Figure 3-10. Global and Local Descriptor Tables
Segment
Selector
Global
Descriptor
T
First Descriptor in
GDT is Not Used
TI = 0
I
56
40
48
32
24
16
8
0
TI = 1
56
40
48
32
24
16
8
0
Table (GDT)
Local
Descriptor
Table (LDT)
Base Address
Limit
GDTR Register LDTR Register
Base Address
Seg. Sel.
Limit
Vol. 3 3-21
PROTECTED-MODE MEMORY MANAGEMENT
Each syst em must have one GDT defined, which may be used for all programs and
t asks in t he syst em. Opt ionally, one or more LDTs can be defined. For example, an
LDT can be defined for each separat e t ask being run, or some or all t asks can share
t he same LDT.
The GDT is not a segment it self; inst ead, it is a dat a st ruct ure in linear address space.
The base linear address and limit of t he GDT must be loaded int o t he GDTR regist er
( see Sect ion 2. 4, Memory- Management Regist ers ) . The base addresses of t he GDT
should be aligned on an eight - byt e boundary t o yield t he best processor perfor-
mance. The limit value for t he GDT is expressed in byt es. As wit h segment s, t he limit
value is added t o t he base address t o get t he address of t he last valid byt e. A limit
value of 0 result s in exact ly one valid byt e. Because segment descript ors are always
8 byt es long, t he GDT limit should always be one less t han an int egral mult iple of
eight ( t hat is, 8N 1) .
The first descript or in t he GDT is not used by t he processor. A segment select or t o
t his null descript or does not generat e an except ion when loaded int o a dat a-
segment regist er ( DS, ES, FS, or GS) , but it always generat es a general- prot ect ion
except ion ( # GP) when an at t empt is made t o access memory using t he descript or. By
init ializing t he segment regist ers wit h t his segment select or, accident al reference t o
unused segment regist ers can be guarant eed t o generat e an except ion.
The LDT is locat ed in a syst em segment of t he LDT t ype. The GDT must cont ain a
segment descript or for t he LDT segment . I f t he syst em support s mult iple LDTs, each
must have a separat e segment select or and segment descript or in t he GDT. The
segment descript or for an LDT can be locat ed anywhere in t he GDT. See Sect ion 3. 5,
Syst em Descript or Types , informat ion on t he LDT segment - descript or t ype.
An LDT is accessed wit h it s segment select or. To eliminat e address t ranslat ions when
accessing t he LDT, t he segment select or, base linear address, limit , and access right s
of t he LDT are st ored in t he LDTR regist er ( see Sect ion 2. 4, Memory- Management
Regist ers ) .
When t he GDTR regist er is st ored ( using t he SGDT inst ruct ion) , a 48- bit pseudo-
descript or is st ored in memory ( see t op diagram in Figure 3- 11) . To avoid alignment
check fault s in user mode ( privilege level 3) , t he pseudo- descript or should be locat ed
at an odd word address ( t hat is, address MOD 4 is equal t o 2) . This causes t he
processor t o st ore an aligned word, followed by an aligned doubleword. User- mode
programs normally do not st ore pseudo- descript ors, but t he possibilit y of generat ing
an alignment check fault can be avoided by aligning pseudo- descript ors in t his way.
The same alignment should be used when st oring t he I DTR regist er using t he SI DT
inst ruct ion. When st oring t he LDTR or t ask regist er ( using t he SLTR or STR inst ruc-
t ion, respect ively) , t he pseudo- descript or should be locat ed at a doubleword address
( t hat is, address MOD 4 is equal t o 0) .
3-22 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.5.2 Segment Descriptor Tables in IA-32e Mode
I n I A- 32e mode, a segment descript or t able can cont ain up t o 8192 ( 2
13
) 8- byt e
descript ors. An ent ry in t he segment descript or t able can be 8 byt es. Syst em descrip-
t ors are expanded t o 16 byt es ( occupying t he space of t wo ent ries) .
GDTR and LDTR regist ers are expanded t o hold 64- bit base address. The corre-
sponding pseudo- descript or is 80 bit s. ( see t he bot t om diagram in Figure 3- 11) .
The following syst em descript ors expand t o 16 byt es:
Call gat e descript ors ( see Sect ion 5. 8. 3.1, I A- 32e Mode Call Gat es )
I DT gat e descript ors ( see Sect ion 6.14. 1, 64- Bit Mode I DT )
LDT and TSS descript ors ( see Sect ion 7.2. 3, TSS Descript or in 64- bit
mode ) .
Figure 3-11. Pseudo-Descriptor Formats
0
32-bit Base Address Limit
47 15 16
0
64-bit Base Address Limit
79 15 16
Vol. 3 4-1
CHAPTER 4
PAGING
Chapt er 3 explains how segment at ion convert s logical addresses t o linear addresses.
Pagi ng ( or linear- address t ranslat ion) is t he process of t ranslat ing linear addresses
so t hat t hey can be used t o access memory or I / O devices. Paging t ranslat es each
linear address t o a phy si cal addr ess and det ermines, for each t ranslat ion, what
accesses t o t he linear address are allowed ( t he addresss access r i ght s) and t he
t ype of caching used for such accesses ( t he addresss memor y t ype) .
I nt el- 64 processors support t hree different paging modes. These modes are ident i-
fied and defined in Sect ion 4. 1. Sect ion 4. 2 gives an overview of t he t ranslat ion
mechanism t hat is used in all modes. Sect ion 4. 3, Sect ion 4. 4, and Sect ion 4. 5
discuss t he t hree paging modes in det ail.
Sect ion 4. 6 det ails how paging det ermines and uses access right s. Sect ion 4.7
discusses except ions t hat may be generat ed by paging ( page- fault except ions) .
Sect ion 4. 8 considers dat a which t he processor writ es in response t o linear- address
accesses ( accessed and dirt y flags) .
Sect ion 4. 9 describes how paging det ermines t he memory t ypes used for accesses t o
linear addresses. Sect ion 4. 10 provides det ails of how a processor may cache infor-
mat ion about linear- address t ranslat ion. Sect ion 4. 11 out lines int eract ions bet ween
paging and cert ain VMX feat ures. Sect ion 4. 12 gives an overview of how paging can
be used t o implement virt ual memory.
4.1 PAGING MODES AND CONTROL BITS
Paging behavior is cont rolled by t he following cont rol bit s:
The WP and PG flags in cont rol regist er CR0 ( bit 16 and bit 31, respect ively) .
The PSE, PAE, PGE, and PCI DE flags in cont rol regist er CR4 ( bit 4, bit 5, bit 7,
and bit 17, respect ively) .
The LME and NXE flags in t he I A32_EFER MSR ( bit 8 and bit 11, respect ively) .
Soft ware enables paging by using t he MOV t o CR0 inst ruct ion t o set CR0. PG. Before
doing so, soft ware should ensure t hat cont rol regist er CR3 cont ains t he physical
address of t he first paging st ruct ure t hat t he processor will use for linear- address
t ranslat ion ( see Sect ion 4. 2) and t hat st ruct ure is init ialized as desired. See
Table 4- 3, Table 4- 7, and Table 4- 12 for t he use of CR3 in t he different paging
modes.
Sect ion 4. 1. 1 describes how t he values of CR0.PG, CR4.PAE, and I A32_EFER. LME
det ermine whet her paging is in use and, if so, which of t hree paging modes is in use.
Sect ion 4. 1. 2 explains how t o manage t hese bit s t o est ablish or make changes in
4-2 Vol. 3
PAGING
paging modes. Sect ion 4. 1. 3 discusses how CR0. WP, CR4. PSE, CR4. PGE, CR4. PCI DE,
and I A32_EFER. NXE modify t he operat ion of t he different paging modes.
4.1.1 Three Paging Modes
I f CR0.PG = 0, paging is not used. The logical processor t reat s all linear addresses as
if t hey were physical addresses. CR4. PAE and I A32_EFER. LME are ignored by t he
processor, as are CR0. WP, CR4. PSE, and CR4. PGE, and I A32_EFER. NXE.
Paging is enabled if CR0. PG = 1. Paging can be enabled only if prot ect ion is enabled
( CR0.PE = 1) . I f paging is enabled, one of t hree paging modes is used. The values of
CR4.PAE and I A32_EFER. LME det ermine which paging mode is used:
I f CR0.PG = 1 and CR4.PAE = 0, 32- bi t pagi ng is used. 32- bit paging is det ailed
in Sect ion 4.3. 32- bit paging uses CR0.WP, CR4.PSE, and CR4. PGE as described
in Sect ion 4.1. 3.
I f CR0.PG = 1, CR4. PAE = 1, and I A32_EFER. LME = 0, PAE pagi ng is used. PAE
paging is det ailed in Sect ion 4. 4. PAE paging uses CR0. WP, CR4. PGE, and
I A32_EFER. NXE as described in Sect ion 4. 1. 3.
I f CR0. PG = 1, CR4.PAE = 1, and I A32_EFER. LME = 1, I A- 32e pagi ng is used.
1
I A- 32e paging is det ailed in Sect ion 4. 5. I A- 32e paging uses CR0.WP, CR4.PGE,
CR4. PCI DE, and I A32_EFER. NXE as described in Sect ion 4.1. 3. I A- 32e paging is
available only on processors t hat support t he I nt el 64 archit ect ure.
The t hree paging modes differ wit h regard t o t he following det ails:
Linear- address widt h. The size of t he linear addresses t hat can be t ranslat ed.
Physical- address widt h. The size of t he physical addresses produced by paging.
Page size. The granularit y at which linear addresses are t ranslat ed. Linear
addresses on t he same page are t ranslat ed t o corresponding physical addresses
on t he same page.
Support for execut e- disable access right s. I n some paging modes, soft ware can
be prevent ed from fet ching inst ruct ions from pages t hat are ot herwise readable.
Table 4- 1 illust rat es t he key differences bet ween t he t hree paging modes.
Because t hey are used only if I A32_EFER. LME = 0, 32- bit paging and PAE paging is
used only in legacy prot ect ed mode. Because legacy prot ect ed mode cannot produce
1. The LMA flag in the IA32_EFER MSR (bit 10) is a status bit that indicates whether the logical pro-
cessor is in IA-32e mode (and thus using IA-32e paging). The processor always sets
IA32_EFER.LMA to CR0.PG & IA32_EFER.LME. Software cannot directly modify IA32_EFER.LMA;
an execution of WRMSR to the IA32_EFER MSR ignores bit 10 of its source operand.
Vol. 3 4-3
PAGING
linear addresses larger t han 32 bit s, 32- bit paging and PAE paging t ranslat e 32- bit
linear addresses.
Because it is used only if I A32_EFER.LME = 1, I A- 32e paging is used only in I A- 32e
mode. ( I n fact , it is t he use of I A- 32e paging t hat defines I A- 32e mode. ) I A- 32e
mode has t wo sub- modes:
Compat ibilit y mode. This mode uses only 32- bit linear addresses. I A- 32e paging
t reat s bit s 47: 32 of such an address as all 0.
64- bit mode. While t his mode produces 64- bit linear addresses, t he processor
ensures t hat bit s 63: 47 of such an address are ident ical.
1
I A- 32e paging does not
use bit s 63: 48 of such addresses.
Table 4-1. Properties of Different Paging Modes
Paging
Mode
CR0.PG CR4.PAE
LME in
IA32_EFER
Linear-
Address
Width
Physical-
Address
Width
1
Page
Size(s)
Supports
Execute-
Disable?
None 0 N/A N/A 32 32 N/A No
32-bit 1 0 0
2
32 Up to 40
3
4-KByte
4-MByte
4
No
PAE 1 1 0 32 Up to 52
4-KByte
2-MByte
Yes
5
IA-32e 1 1 2 48 Up to 52
4-KByte
2-MByte
1-GByte
6
Yes
5
NOTES:
1. The physical-address width is always bounded by MAXPHYADDR; see Section 4.1.4.
2. The processor ensures that IA32_EFER.LME must be 0 if CR0.PG = 1 and CR4.PAE = 0.
3. 32-bit paging supports physical-address widths of more than 32 bits only for 4-MByte pages and
only if the PSE-36 mechanism is supported; see Section 4.1.4 and Section 4.3.
4. 4-MByte pages are used with 32-bit paging only if CR4.PSE = 1; see Section 4.3.
5. Execute-disable access rights are applied only if IA32_EFER.NXE = 1; see Section 4.6.
6. Not all processors that support IA-32e paging support 1-GByte pages; see Section 4.1.4.
1. Such an address is called canonical. Use of a non-canonical linear address in 64-bit mode pro-
duces a general-protection exception (#GP(0)); the processor does not attempt to translate non-
canonical linear addresses using IA-32e paging.
4-4 Vol. 3
PAGING
4.1.2 Paging-Mode Enabling
I f CR0.PG = 1, a logical processor is in one of t hree paging modes, depending on t he
values of CR4.PAE and I A32_EFER. LME. Figure 4- 1 illust rat es how soft ware can
enable t hese modes and make t ransit ions bet ween t hem. The following it ems ident ify
cert ain limit at ions and ot her det ails:
I A32_EFER. LME cannot be modified while paging is enabled ( CR0. PG = 1) .
At t empt s t o do so using WRMSR cause a general- prot ect ion except ion ( # GP( 0) ) .
Paging cannot be enabled ( by set t ing CR0. PG t o 1) while CR4. PAE = 0 and
I A32_EFER. LME = 1. At t empt s t o do so using MOV t o CR0 cause a general-
prot ect ion except ion ( # GP( 0) ) .
Figure 4-1. Enabling and Changing Paging Modes
PG = 1
No Paging
PAE Paging
PAE = 1
LME = 0
PG = 0
PAE = 0
LME = 0
32-bit Paging
PG = 1
PAE = 0
LME = 0
PG = 0
PAE = 0
LME = 1
Set PG Set PAE
Clear PAE
Clear PG
No Paging
PG = 0
PAE = 1
LME = 0
No Paging
PG = 1
IA-32e Paging
PAE = 1
LME = 1
C
l
e
a
r
L
M
E
S
e
t
r
L
M
E
PG = 0
PAE = 1
LME = 1
No Paging
Clear PAE
Set PAE
Clear PG
Set PG
Set PAE
Clear PAE
S
e
t
r
L
M
E
C
l
e
a
r
L
M
E
Clear PG
Set PG
#GP
Set LME
#GP
Set LME
#GP
Set PG
Clear PAE
#GP
C
l
e
a
r
L
M
E
#GP
Vol. 3 4-5
PAGING
CR4. PAE cannot be cleared while I A- 32e paging is act ive ( CR0. PG = 1 and
I A32_EFER. LME = 1) . At t empt s t o do so using MOV t o CR4 cause a general-
prot ect ion except ion ( # GP( 0) ) .
Regardless of t he current paging mode, soft ware can disable paging by clearing
CR0. PG wit h MOV t o CR0.
1
Soft ware can make t ransit ions bet ween 32- bit paging and PAE paging by
changing t he value of CR4. PAE wit h MOV t o CR4.
Soft ware cannot make t ransit ions direct ly bet ween I A- 32e paging and eit her of
t he ot her t wo paging modes. I t must first disable paging ( by clearing CR0.PG wit h
MOV t o CR0) , t hen set CR4.PAE and I A32_EFER. LME t o t he desired values ( wit h
MOV t o CR4 and WRMSR) , and t hen re- enable paging ( by set t ing CR0. PG wit h
MOV t o CR0) . As not ed earlier, an at t empt t o clear eit her CR4. PAE or
I A32_EFER. LME cause a general- prot ect ion except ion ( # GP( 0) ) .
VMX t ransit ions allow t ransit ions bet ween paging modes t hat are not possible
using MOV t o CR or WRMSR. This is because VMX t ransit ions can load CR0, CR4,
and I A32_EFER in one operat ion. See Sect ion 4. 11. 1.
4.1.3 Paging-Mode Modifiers
Det ails of how each paging mode operat es are det ermined by t he following cont rol
bit s:
The WP flag in CR0 ( bit 16) .
The PSE, PGE, and PCI DE flags in CR4 ( bit 4, bit 7, and bit 17, respect ively) .
The NXE flag in t he I A32_EFER MSR ( bit 11) .
CR0.WP allows pages t o be prot ect ed from supervisor- mode writ es. I f CR0. WP = 0,
soft ware operat ing wit h CPL < 3 ( supervisor mode) can writ e t o linear addresses wit h
read- only access right s; if CR0.WP = 1, it cannot . ( Soft ware operat ing wit h CPL = 3
user mode cannot writ e t o linear addresses wit h read- only access right s,
regardless of t he value of CR0. WP. ) Sect ion 4. 6 explains how access right s are det er-
mined.
CR4.PSE enables 4- MByt e pages for 32- bit paging. I f CR4.PSE = 0, 32- bit paging can
use only 4- KByt e pages; if CR4. PSE = 1, 32- bit paging can use bot h 4- KByt e pages
and 4- MByt e pages. See Sect ion 4. 3 for more informat ion. ( PAE paging and I A- 32e
paging can use mult iple page sizes regardless of t he value of CR4. PSE. )
CR4.PGE enables global pages. I f CR4. PGE = 0, no t ranslat ions are shared across
address spaces; if CR4. PGE = 1, specified t ranslat ions may be shared across address
spaces. See Sect ion 4. 10. 2. 4 for more informat ion.
CR4.PCI DE enables process- cont ext ident ifiers ( PCI Ds) for I A- 32e paging
( CR4.PCI DE can be 1 only when I A- 32e paging is in use) . PCI Ds allow a logical
1. If CR4.PCIDE = 1, an attempt to clear CR0.PG causes a general-protection exception (#GP); soft-
ware should clear CR4.PCIDE before attempting to disable paging.
4-6 Vol. 3
PAGING
processor t o cache informat ion for mult iple linear- address spaces. See Sect ion
4. 10. 1 for more informat ion.
I A32_EFER. NXE enables execut e- disable access right s for PAE paging and I A- 32e
paging. I f I A32_EFER. NXE = 0, soft ware may fet ch inst ruct ions from any linear
address t hat paging allows t he soft ware t o read; if I A32_EFER. NXE = 1, inst ruct ions
fet ches can be prevent ed from specified linear addresses ( even if dat a reads from t he
addresses are allowed) . Sect ion 4. 6 explains how access right s are det ermined. ( 32-
bit paging always allows soft ware t o fet ch inst ruct ions from any linear address t hat
may be read; I A32_EFER. NXE has no effect wit h 32- bit paging. Soft ware t hat want s
t o limit inst ruct ion fet ches from readable pages must use eit her PAE paging or I A- 32e
paging. )
4.1.4 Enumeration of Paging Features by CPUID
Soft ware can discover support for different paging feat ures using t he CPUI D inst ruc-
t ion:
PSE: page- size ext ensions for 32- bit paging.
I f CPUI D. 01H: EDX. PSE [ bit 3] = 1, CR4. PSE may be set t o 1, enabling support
for 4- MByt e pages wit h 32- bit paging ( see Sect ion 4. 3) .
PAE: physical- address ext ension.
I f CPUI D. 01H: EDX. PAE [ bit 6] = 1, CR4. PAE may be set t o 1, enabling PAE
paging ( t his set t ing is also required for I A- 32e paging) .
PGE: global- page support .
I f CPUI D. 01H: EDX. PGE [ bit 13] = 1, CR4. PGE may be set t o 1, enabling t he
global- page feat ure ( see Sect ion 4. 10. 2. 4) .
PAT: page- at t ribut e t able.
I f CPUI D. 01H: EDX. PAT [ bit 16] = 1, t he 8- ent ry page- at t ribut e t able ( PAT) is
support ed. When t he PAT is support ed, t hree bit s in cert ain paging- st ruct ure
ent ries select a memory t ype ( used t o det ermine t ype of caching used) from t he
PAT ( see Sect ion 4. 9. 2) .
PSE- 36: 36- Bit page size ext ension.
I f CPUI D. 01H: EDX. PSE- 36 [ bit 17] = 1, t he PSE- 36 mechanism is support ed,
indicat ing t hat t ranslat ions using 4- MByt e pages wit h 32- bit paging may produce
physical addresses wit h more t han 32 bit s ( see Sect ion 4. 3) .
PCI D: process- cont ext ident ifiers.
I f CPUI D. 01H: ECX. PCI D [ bit 17] = 1, CR4. PCI DE may be set t o 1, enabling
process- cont ext ident ifiers ( see Sect ion 4. 10. 1) .
NX: execut e disable.
I f CPUI D. 80000001H: EDX. NX [ bit 20] = 1, I A32_EFER. NXE may be set t o 1,
allowing PAE paging and I A- 32e paging t o disable execut e access t o select ed
pages ( see Sect ion 4.6) . ( Processors t hat do not support CPUI D funct ion
80000001H do not allow I A32_EFER. NXE t o be set t o 1. )
Vol. 3 4-7
PAGING
Page1GB: 1- GByt e pages.
I f CPUI D.80000001H: EDX. Page1GB [ bit 26] = 1, 1- GByt e pages are support ed
wit h I A- 32e paging ( see Sect ion 4.5) .
LM: I A- 32e mode support .
I f CPUI D.80000001H: EDX. LM [ bit 29] = 1, I A32_EFER. LME may be set t o 1,
enabling I A- 32e paging. ( Processors t hat do not support CPUI D funct ion
80000001H do not allow I A32_EFER. LME t o be set t o 1. )
CPUI D.80000008H: EAX[ 7: 0] report s t he physical- address widt h support ed by
t he processor. ( For processors t hat do not support CPUI D funct ion 80000008H,
t he widt h is generally 36 if CPUI D. 01H: EDX. PAE [ bit 6] = 1 and 32 ot herwise. )
This widt h is referred t o as MAXPHYADDR. MAXPHYADDR is at most 52.
CPUI D.80000008H: EAX[ 15: 8] report s t he linear- address widt h support ed by t he
processor. Generally, t his value is 48 if CPUI D.80000001H: EDX. LM [ bit 29] = 1
and 32 ot herwise. ( Processors t hat do not support CPUI D funct ion 80000008H,
support a linear- address widt h of 32. )
4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW
All t hree paging modes t ranslat e linear addresses use hi er ar chi cal pagi ng st r uc-
t ur es. This sect ion provides an overview of t heir operat ion. Sect ion 4. 3, Sect ion 4.4,
and Sect ion 4.5 provide det ails for t he t hree paging modes.
Every paging st ruct ure is 4096 Byt es in size and comprises a number of individual
ent r i es. Wit h 32- bit paging, each ent ry is 32 bit s ( 4 byt es) ; t here are t hus 1024
ent ries in each st ruct ure. Wit h PAE paging and I A- 32e paging, each ent ry is 64 bit s
( 8 byt es) ; t here are t hus 512 ent ries in each st ruct ure. ( PAE paging includes one
except ion, a paging st ruct ure t hat is 32 byt es in size, cont aining 4 64- bit ent ries. )
The processor uses t he upper port ion of a linear address t o ident ify a series of
paging- st ruct ure ent ries. The last of t hese ent ries ident ifies t he physical address of
t he region t o which t he linear address t ranslat es ( called t he page f r ame) . The lower
port ion of t he linear address ( called t he page of f set ) ident ifies t he specific address
wit hin t hat region t o which t he linear address t ranslat es.
Each paging- st ruct ure ent ry cont ains a physical address, which is eit her t he address
of anot her paging st ruct ure or t he address of a page frame. I n t he first case, t he
ent ry is said t o r ef er ence t he ot her paging st ruct ure; in t he lat t er, t he ent ry is said
t o map a page.
The first paging st ruct ure used for any t ranslat ion is locat ed at t he physical address
in CR3. A linear address is t ranslat ed using t he following it erat ive procedure. A
port ion of t he linear address ( init ially t he uppermost bit s) select an ent ry in a paging
st ruct ure ( init ially t he one locat ed using CR3) . I f t hat ent ry references anot her
paging st ruct ure, t he process cont inues wit h t hat paging st ruct ure and wit h t he
port ion of t he linear address immediat ely below t hat j ust used. I f inst ead t he ent ry
maps a page, t he process complet es: t he physical address in t he ent ry is t hat of t he
page frame and t he remaining lower port ion of t he linear address is t he page offset .
4-8 Vol. 3
PAGING
The following it ems give an example for each of t he t hree paging modes ( each
example locat es a 4- KByt e page frame) :
Wit h 32- bit paging, each paging st ruct ure comprises 1024 = 2
10
ent ries. For t his
reason, t he t ranslat ion process uses 10 bit s at a t ime from a 32- bit linear
address. Bit s 31: 22 ident ify t he first paging- st ruct ure ent ry and bit s 21: 12
ident ify a second. The lat t er ident ifies t he page frame. Bit s 11: 0 of t he linear
address are t he page offset wit hin t he 4- KByt e page frame. ( See Figure 4- 2 for
an illust rat ion. )
Wit h PAE paging, t he first paging st ruct ure comprises only 4 = 2
2
ent ries.
Translat ion t hus begins by using bit s 31: 30 from a 32- bit linear address t o
ident ify t he first paging- st ruct ure ent ry. Ot her paging st ruct ures comprise
512 = 2
9
ent ries, so t he process cont inues by using 9 bit s at a t ime. Bit s 29: 21
ident ify a second paging- st ruct ure ent ry and bit s 20: 12 ident ify a t hird. This last
ident ifies t he page frame. ( See Figure 4- 5 for an illust rat ion. )
Wit h I A- 32e paging, each paging st ruct ure comprises 512 = 2
9
ent ries and
t ranslat ion uses 9 bit s at a t ime from a 48- bit linear address. Bit s 47: 39 ident ify
t he first paging- st ruct ure ent ry, bit s 38: 30 ident ify a second, bit s 29: 21 a t hird,
and bit s 20: 12 ident ify a fourt h. Again, t he last ident ifies t he page frame. ( See
Figure 4- 8 for an illust rat ion. )
The t ranslat ion process in each of t he examples above complet es by ident ifying a
page frame. However, t he paging st ruct ures may be configured so t hat t ranslat ion
t erminat es before doing so. This occurs if process encount ers a paging- st ruct ure
ent ry t hat is marked not present ( because it s P flag bit 0 is clear) or in which
a reserved bit is set . I n t his case, t here is no t ranslat ion for t he linear address; an
access t o t hat address causes a page- fault except ion ( see Sect ion 4.7) .
I n t he examples above, a paging- st ruct ure ent ry maps a page wit h 4- KByt e page
frame when only 12 bit s remain in t he linear address; ent ries ident ified earlier always
reference ot her paging st ruct ures. That may not apply in ot her cases. The following
it ems ident ify when an ent ry maps a page and when it references anot her paging
st ruct ure:
I f more t han 12 bit s remain in t he linear address, bit 7 ( PS page size) of t he
current paging- st ruct ure ent ry is consult ed. I f t he bit is 0, t he ent ry references
anot her paging st ruct ure; if t he bit is 1, t he ent ry maps a page.
I f only 12 bit s remain in t he linear address, t he current paging- st ruct ure ent ry
always maps a page ( bit 7 is used for ot her purposes) .
I f a paging- st ruct ure ent ry maps a page when more t han 12 bit s remain in t he linear
address, t he ent ry ident ifies a page frame larger t han 4 KByt es. For example, 32- bit
paging uses t he upper 10 bit s of a linear address t o locat e t he first paging- st ruct ure
ent ry; 22 bit s remain. I f t hat ent ry maps a page, t he page frame is 2
22
Byt es = 4
MByt es. 32- bit paging support s 4- MByt e pages if CR4. PSE = 1. PAE paging and
I A- 32e paging support 2- MByt e pages ( regardless of t he value of CR4. PSE) . I A- 32e
paging may support 1- GByt e pages ( see Sect ion 4.1. 4) .
Paging st ruct ures are given different names based t heir uses in t he t ranslat ion
process. Table 4- 2 gives t he names of t he different paging st ruct ures. I t also
Vol. 3 4-9
PAGING
provides, for each st ruct ure, t he source of t he physical address used t o locat e it ( CR3
or a different paging- st ruct ure ent ry) ; t he bit s in t he linear address used t o select an
ent ry from t he st ruct ure; and det ails of about whet her and how such an ent ry can
map a page.
4.3 32-BIT PAGING
A logical processor uses 32- bit paging if CR0. PG = 1 and CR4. PAE = 0. 32- bit paging
t ranslat es 32- bit linear addresses t o 40- bit physical addresses.
1
Alt hough 40 bit s
Table 4-2. Paging Structures in the Different Paging Modes
Paging
Structure
Entry
Name
Paging Mode
Physical
Address of
Structure
Bits
Selecting
Entry
Page Mapping
PML4 table PML4E
32-bit, PAE N/A
IA-32e CR3 47:39 N/A (PS must be 0)
Page-directory-
pointer table
PDPTE
32-bit N/A
PAE CR3 31:30 N/A (PS must be 0)
IA-32e PML4E 38:30 1-GByte page if PS=1
1
NOTES:
1. Not all processors allow the PS flag to be 1 in PDPTEs; see Section 4.1.4 for how to determine
whether 1-GByte pages are supported.
Page directory PDE
32-bit CR3 31:22 4-MByte page if PS=1
2
2. 32-bit paging ignores the PS flag in a PDE (and uses the entry to reference a page table) unless
CR4.PSE = 1. Not all processors allow CR4.PSE to be 1; see Section 4.1.4 for how to determine
whether 4-MByte pages are supported with 32-bit paging.
PAE, IA-32e PDPTE 29:21 2-MByte page if PS=1
Page table PTE
32-bit
PDE
21:12 4-KByte page
PAE, IA-32e 20:12 4-KByte page
1. Bits in the range 39:32 are 0 in any physical address used by 32-bit paging except those used to
map 4-MByte pages. If the processor does not support the PSE-36 mechanism, this is true also
for physical addresses used to map 4-MByte pages. If the processor does support the PSE-36
mechanism and MAXPHYADDR < 40, bits in the range 39:MAXPHYADDR are 0 in any physical
address used to map a 4-MByte page. (The corresponding bits are reserved in PDEs.) See Section
4.1.4 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is supported.
4-10 Vol. 3
PAGING
corresponds t o 1 TByt e, linear addresses are limit ed t o 32 bit s; at most 4 GByt es of
linear- address space may be accessed at any given t ime.
32- bit paging uses a hierarchy of paging st ruct ures t o produce a t ranslat ion for a
linear address. CR3 is used t o locat e t he first paging- st ruct ure, t he page direct ory.
Table 4- 3 illust rat es how CR3 is used wit h 32- bit paging.
32- bit paging may map linear addresses t o eit her 4- KByt e pages or 4- MByt e pages.
Figure 4- 2 illust rat es t he t ranslat ion process when it uses a 4- KByt e page; Figure 4- 3
covers t he case of a 4- MByt e page. The following it ems describe t he 32- bit paging
process in more det ail as well has how t he page size is det ermined:
A 4- KByt e nat urally aligned page direct ory is locat ed at t he physical address
specified in bit s 31: 12 of CR3 ( see Table 4- 3) . A page direct ory comprises 1024
32- bit ent ries ( PDEs) . A PDE is select ed using t he physical address defined as
follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from CR3.
Bit s 11: 2 are bit s 31: 22 of t he linear address.
Bit s 1: 0 are 0.
Because a PDE is ident ified using bit s 31: 22 of t he linear address, it cont rols access
t o a 4- Mbyt e region of t he linear- address space. Use of t he PDE depends on CR. PSE
and t he PDEs PS flag ( bit 7) :
I f CR4. PSE = 1 and t he PDEs PS flag is 1, t he PDE maps a 4- MByt e page ( see
Table 4- 4) . The final physical address is comput ed as follows:
Bit s 39: 32 are bit s 20: 13 of t he PDE.
Table 4-3. Use of CR3 with 32-Bit Paging
Bit
Position(s)
Contents
2:0 Ignored
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory during linear-address translation (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory during linear-address translation (see Section 4.9)
11:5 Ignored
31:12 Physical address of the 4-KByte aligned page directory used for linear-address
translation
63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)
Vol. 3 4-11
PAGING
Bit s 31: 22 are bit s 31: 22 of t he PDE.
1
Bit s 21: 0 are from t he original linear address.
Figure 4-2. Linear-Address Translation to a 4-KByte Page using 32-Bit Paging
Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging
1. The upper bits in the final physical address do not all come from corresponding positions in the
PDE; the physical-address bits in the PDE are not all contiguous.
0
Directory Table Offset
Page Directory
PDE with PS=0
CR3
Page Table
PTE
4-KByte Page
Physical Address
31 21 11 12 22
Linear Address
32
10
12
10
20
20
0
Directory Offset
Page Directory
PDE with PS=1
CR3
4-MByte Page
Physical Address
31 21 22
Linear Address
10
22
32
18
4-12 Vol. 3
PAGING
I f CR4. PSE = 0 or t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is
locat ed at t he physical address specified in bit s 31: 12 of t he PDE ( see Table 4- 5) .
Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-MByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-MByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see Table 4-5)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-MByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
(M20):13 Bits (M1):32 of physical address of the 4-MByte page referenced by this entry
2
21:(M19) Reserved (must be 0)
31:22 Bits 31:22 of physical address of the 4-MByte page referenced by this entry
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
2. If the PSE-36 mechanism is not supported, M is 32, and this row does not apply. If the PSE-36
mechanism is supported, M is the minimum of 40 and MAXPHYADDR (this row does not apply if
MAXPHYADDR = 32). See Section 4.1.4 for how to determine MAXPHYADDR and whether the
PSE-36 mechanism is supported.
Vol. 3 4-13
PAGING
A page t able comprises 1024 32- bit ent ries ( PTEs) . A PTE is select ed using t he
physical address defined as follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from t he PDE.
Bit s 11: 2 are bit s 21: 12 of t he linear address.
Bit s 1: 0 are 0.
Because a PTE is ident ified using bit s 31: 12 of t he linear address, every PTE
maps a 4- KByt e page ( see Table 4- 6) . The final physical address is comput ed as
follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f a paging- st ruct ure ent rys P flag ( bit 0) is 0 or if t he ent ry set s any reserved bit , t he
ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a page.
Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) If CR4.PSE = 1, must be 0 (otherwise, this entry maps a 4-MByte page; see
Table 4-4); otherwise, ignored
11:8 Ignored
31:12 Physical address of 4-KByte aligned page table referenced by this entry
4-14 Vol. 3
PAGING
A reference using a linear address whose t ranslat ion would use such a paging- st ruc-
t ure ent ry causes a page- fault except ion ( see Sect ion 4. 7) .
Wit h 32- bit paging, t here are reserved bit s only if CR4.PSE = 1:
I f t he P flag and t he PS flag ( bit 7) of a PDE are bot h 1, t he bit s reserved depend
on MAXPHYADDR whet her t he PSE- 36 mechanism is support ed:
1
I f t he PSE- 36 mechanism is not support ed, bit s 21: 13 are reserved.
Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-KByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
31:12 Physical address of the 4-KByte page referenced by this entry
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
1. See Section 1.1.5 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is
supported.
Vol. 3 4-15
PAGING
I f t he PSE- 36 mechanism is support ed, bit s 21: ( M19) are reserved, where
M is t he minimum of 40 and MAXPHYADDR.
I f t he PAT is not support ed:
1
I f t he P flag of a PTE is 1, bit 7 is reserved.
I f t he P flag and t he PS flag of a PDE are bot h 1, bit 12 is reserved.
( I f CR4. PSE = 0, no bit s are reserved wit h 32- bit paging. )
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4. 6.
Figure 4- 4 gives a summary of t he format s of CR3 and t he paging- st ruct ure ent ries
wit h 32- bit paging. For t he paging st ruct ure ent ries, it ident ifies separat ely t he
format of ent ries t hat map pages, t hose t hat reference ot her paging st ruct ures, and
t hose t hat do neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are
highlight ed because t hey det ermine how such an ent ry is used.
1. See Section 4.1.4 for how to determine whether the PAT is supported.
31302928272625242322212019181716151413121110 9 8 7 6 5 4 3 2 1 0
Address of page directory
1
NOTES:
1. CR3 has 64 bits on processors supporting the Intel-64 architecture. These bits are ignored with
32-bit paging.
Ignored
P
C
D
P
W
T
Ignored CR3
Bits 31:22 of address
of 2MB page frame
Reserved
(must be 0)
Bits 39:32
of
address
2
P
A
T
Ignored G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
4MB
page
Address of page table Ignored 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
Address of 4KB page frame Ignored G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging
4-16 Vol. 3
PAGING
4.4 PAE PAGING
A logical processor uses PAE paging if CR0.PG = 1, CR4. PAE = 1, and
I A32_EFER. LME = 0. PAE paging t ranslat es 32- bit linear addresses t o 52- bit physical
addresses.
1
Alt hough 52 bit s corresponds t o 4 PByt es, linear addresses are limit ed t o
32 bit s; at most 4 GByt es of linear- address space may be accessed at any given t ime.
Wit h PAE paging, a logical processor maint ains a set of four ( 4) PDPTE regist ers,
which are loaded from an address in CR3. Linear address are t ranslat ed using 4 hier-
archies of in- memory paging st ruct ures, each locat ed using one of t he PDPTE regis-
t ers. ( This is different from t he ot her paging modes, in which t here is one hierarchy
referenced by CR3. )
Sect ion 4.4. 1 discusses t he PDPTE regist ers. Sect ion 4. 4. 2 describes linear- address
t ranslat ion wit h PAE paging.
4.4.1 PDPTE Registers
When PAE paging is used, CR3 references t he base of a 32- Byt e page- di r ect or y -
poi nt er t abl e. Table 4- 7 illust rat es how CR3 is used wit h PAE paging.
The page- direct ory- point er- t able comprises four ( 4) 64- bit ent ries called PDPTEs.
Each PDPTE cont rols access t o a 1- GByt e region of t he linear- address space. Corre-
sponding t o t he PDPTEs, t he logical processor maint ains a set of four ( 4) int ernal,
non- archit ect ural PDPTE regist ers, called PDPTE0, PDPTE1, PDPTE2, and PDPTE3.
The logical processor loads t hese regist ers from t he PDPTEs in memory as part of
cert ain execut ions t he MOV t o CR inst ruct ion:
2. This example illustrates a processor in which MAXPHYADDR is 36. If this value is larger or smaller,
the number of bits reserved in positions 20:13 of a PDE mapping a 4-MByte will change.
1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used
by PAE paging. (The corresponding bits are reserved in the paging-structure entries.) See Section
4.1.4 for how to determine MAXPHYADDR.
Table 4-7. Use of CR3 with PAE Paging
Bit
Position(s)
Contents
4:0 Ignored
31:5 Physical address of the 32-Byte aligned page-directory-pointer table used for
linear-address translation
63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)
Vol. 3 4-17
PAGING
I f PAE paging would be in use following an execut ion of MOV t o CR0 or MOV t o
CR4 ( see Sect ion 4.1. 1) and t he inst ruct ion is modifying any of CR0. CD, CR0. NW,
CR0. PG, CR4. PAE, CR4. PGE, or CR4. PSE; t hen t he PDPTEs are loaded from t he
address in CR3.
I f MOV t o CR3 is execut ed while t he logical processor is using PAE paging, t he
PDPTEs are loaded from t he address being loaded int o CR3.
I f PAE paging is in use and a t ask swit ch changes t he value of CR3, t he PDPTEs
are loaded from t he address in t he new CR3 value.
Cert ain VMX t ransit ions load t he PDPTE regist ers. See Sect ion 4. 11. 1.
Unless t he caches are disabled, t he processor uses t he WB memory t ype t o load t he
PDPTEs from memory.
1
Table 4- 8 gives t he format of a PDPTE. I f any of t he PDPTEs set s bot h t he P flag
( bit 0) and any reserved bit , t he MOV t o CR inst ruct ion causes a general- prot ect ion
except ion ( # GP( 0) ) and t he PDPTEs are not loaded.
2
As show in Table 4- 8, bit s 2: 1,
8: 5, and 63: MAXPHYADDR are reserved in t he PDPTEs.
1. Older IA-32 processors used the UC memory type when loading the PDPTEs. This behavior is
model-specific and not architectural.
Table 4-8. Format of a PAE Page-Directory-Pointer-Table Entry (PDPTE)
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page directory
2:1 Reserved (must be 0)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9)
8:5 Reserved (must be 0)
11:9 Ignored
(M1):12 Physical address of 4-KByte aligned page directory referenced by this entry
1
NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
63:M Reserved (must be 0)
2. On some processors, reserved bits are checked even in PDPTEs in which the P flag (bit 0) is 0.
4-18 Vol. 3
PAGING
4.4.2 Linear-Address Translation with PAE Paging
PAE paging may map linear addresses t o eit her 4- KByt e pages or 2- MByt e pages.
Figure 4- 5 illust rat es t he t ranslat ion process when it produces a 4- KByt e page;
Figure 4- 6 covers t he case of a 2- MByt e page. The following it ems describe t he PAE
paging process in more det ail as well has how t he page size is det ermined:
Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging
Figure 4-6. Linear-Address Translation to a 2-MByte Page using PAE Paging
0
Directory Table Offset
Page Directory
PDE with PS=0
Page Table
PTE
4-KByte Page
Physical Address
31 20 11 12 21
Linear Address
PDPTE value
30 29
PDPTE Registers
Directory Pointer
2
9
12
9
40
40
40
0
Directory Offset
Page Directory
PDE with PS=1
2-MByte Page
Physical Address
31 20 21
Linear Address
PDPTE value
30 29
PDPTE Registers
Directory
Pointer
2
9
21
31
40
Vol. 3 4-19
PAGING
Bit s 31: 30 of t he linear address select a PDPTE regist er ( see Sect ion 4. 4. 1) ; t his
is PDPTEi, where i is t he value of bit s 31: 30.
1
Because a PDPTE regist er is
ident ified using bit s 31: 30 of t he linear address, it cont rols access t o a 1- GByt e
region of t he linear- address space. I f t he P flag ( bit 0) of PDPTEi is 0, t he
processor ignores bit s 63: 1, and t here is no mapping for t he 1- GByt e region
cont rolled by PDPTEi. A reference using a linear address in t his region causes a
page- fault except ion ( see Sect ion 4. 7) .
I f t he P flag of PDPTEi is 1, 4- KByt e nat urally aligned page direct ory is locat ed at
t he physical address specified in bit s 51: 12 of PDPTEi ( see Table 4- 8 in Sect ion
4. 4. 1) A page direct ory comprises 512 64- bit ent ries ( PDEs) . A PDE is select ed
using t he physical address defined as follows:
Bit s 51: 12 are from PDPTEi.
Bit s 11: 3 are bit s 29: 21 of t he linear address.
Bit s 2: 0 are 0.
Because a PDE is ident ified using bit s 31: 21 of t he linear address, it cont rols access
t o a 2- Mbyt e region of t he linear- address space. Use of t he PDE depends on it s PS
flag ( bit 7) :
I f t he PDEs PS flag is 1, t he PDE maps a 2- MByt e page ( see Table 4- 9) . The final
physical address is comput ed as follows:
Bit s 51: 21 are from t he PDE.
Bit s 20: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PDE ( see Table 4- 10) . A page
direct ory comprises 512 64- bit ent ries ( PTEs) . A PTE is select ed using t he
physical address defined as follows:
Bit s 51: 12 are from t he PDE.
Bit s 11: 3 are bit s 20: 12 of t he linear address.
Bit s 2: 0 are 0.
Because a PTE is ident ified using bit s 31: 12 of t he linear address, every PTE maps
a 4- KByt e page ( see Table 4- 11) . The final physical address is comput ed as
follows:
Bit s 51: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f t he P flag ( bit 0) of a PDE or a PTE is 0 or if a PDE or a PTE set s any reserved bit ,
t he ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a
1. With PAE paging, the processor does not use CR3 when translating a linear address (as it does
the other paging modes). It does not access the PDPTEs in the page-directory-pointer table dur-
ing linear-address translation.
4-20 Vol. 3
PAGING
Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 2-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see
Table 4-10)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
2-MByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
20:13 Reserved (must be 0)
(M1):21 Physical address of the 2-MByte page referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
Vol. 3 4-21
PAGING
page. A reference using a linear address whose t ranslat ion would use such a paging-
st ruct ure ent ry causes a page- fault except ion ( see Sect ion 4. 7) .
The following bit s are reserved wit h PAE paging:
I f t he P flag ( bit 0) of a PDE or a PTE is 1, bit s 62: MAXPHYADDR are reserved.
I f t he P flag and t he PS flag ( bit 7) of a PDE are bot h 1, bit s 20: 13 are reserved.
I f I A32_EFER. NXE = 0 and t he P flag of a PDE or a PTE is 1, t he XD flag ( bit 63)
is reserved.
I f t he PAT is not support ed:
1
I f t he P flag of a PTE is 1, bit 7 is reserved.
Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-9)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page table referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
1. See Section 4.1.4 for how to determine whether the PAT is supported.
4-22 Vol. 3
PAGING
I f t he P flag and t he PS flag of a PDE are bot h 1, bit 12 is reserved.
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4.6.
Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-KByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
(M1):12 Physical address of the 4-KByte page referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 4-KByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
Vol. 3 4-23
PAGING
Figure 4- 7 gives a summary of t he format s of CR3 and t he paging- st ruct ure ent ries
wit h PAE paging. For t he paging st ruct ure ent ries, it ident ifies separat ely t he format
of ent ries t hat map pages, t hose t hat reference ot her paging st ruct ures, and t hose
t hat do neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are high-
light ed because t hey det ermine how a paging- st ruct ure ent ry is used.
6
3
6
2
6
1
6
0
5
9
5
8
5
7
5
6
5
5
5
4
5
3
5
2
5
1
M
1
NOTES:
1. M is an abbreviation for MAXPHYADDR.
M-1
3
2
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
Ignored
2
2. CR3 has 64 bits only on processors supporting the Intel-64 architecture. These bits are ignored with
PAE paging.
Address of page-directory-pointer table Ignored CR3
Reserved
3
3. Reserved fields must be 0.
Address of page directory Ign. Rsvd.
P
C
D
P
W
T
Rs
vd
1
PDPTE:
present
Ignored 0
PDTPE:
not
present
X
D
4
4. If IA32_EFER.NXE = 0 and the P flag of a PDE or a PTE is 1, the XD flag (bit 63) is reserved.
Ignored Rsvd.
Address of
2MB page frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
2MB
page
X
D
Ignored Rsvd. Address of page table Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
X
D
Ignored Rsvd. Address of 4KB page frame Ign. G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-7. Formats of CR3 and Paging-Structure Entries with PAE Paging
4-24 Vol. 3
PAGING
4.5 IA-32E PAGING
A logical processor uses I A- 32e paging if CR0. PG = 1, CR4. PAE = 1, and
I A32_EFER. LME = 1. Wit h I A- 32e paging, linear address are t ranslat ed using a hier-
archy of in- memory paging st ruct ures locat ed using t he cont ent s of CR3. I A- 32e
paging t ranslat es 48- bit linear addresses t o 52- bit physical addresses.
1
Alt hough 52
bit s corresponds t o 4 PByt es, linear addresses are limit ed t o 48 bit s; at most 256
TByt es of linear- address space may be accessed at any given t ime.
I A- 32e paging uses a hierarchy of paging st ruct ures t o produce a t ranslat ion for a
linear address. CR3 is used t o locat e t he first paging- st ruct ure, t he PML4 t able. Use
of CR3 wit h I A- 32e paging depends on whet her process- cont ext ident ifiers ( PCI Ds)
have been enabled by set t ing CR4. PCI DE:
Table 4- 12 illust rat es how CR3 is used wit h I A- 32e paging if CR4. PCI DE = 0.
Table 4- 13 illust rat es how CR3 is used wit h I A- 32e paging if CR4. PCI DE = 1.
Aft er soft ware modifies t he value of CR4. PCI DE, t he logical processor immediat ely
begins using CR3 as specified for t he new value. For example, if soft ware changes
CR4. PCI DE from 1 t o 0, t he current PCI D immediat ely changes from CR3[ 11: 0] t o
000H ( see also Sect ion 4. 10. 4. 1) . I n addit ion, t he logical processor subsequent ly
1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used
by IA-32e paging. (The corresponding bits are reserved in the paging-structure entries.) See Sec-
tion 4.1.4 for how to determine MAXPHYADDR.
Table 4-12. Use of CR3 with IA-32e Paging and CR3.PCIDE = 0
Bit
Position(s)
Contents
2:0 Ignored
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the PML4 table during linear-address translation (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the PML4 table during linear-address translation (see Section 4.9.2)
11:5 Ignored
M1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address
translation
1
NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
63:M Reserved (must be 0)
Vol. 3 4-25
PAGING
det ermines t he memory t ype used t o access t he PML4 t able using CR3.PWT and
CR3.PCD, which had been bit s 4: 3 of t he PCI D.
I A- 32e paging may map linear addresses t o 4- KByt e pages, 2- MByt e pages, or 1-
GByt e pages.
1
Figure 4- 8 illust rat es t he t ranslat ion process when it produces a 4-
KByt e page; Figure 4- 9 covers t he case of a 2- MByt e page, and Figure 4- 10 t he case
of a 1- GByt e page. The following it ems describe t he I A- 32e paging process in more
det ail as well has how t he page size is det ermined:
A 4- KByt e nat urally aligned PML4 t able is locat ed at t he physical address
specified in bit s 51: 12 of CR3 ( see Table 4- 12) . A PML4 t able comprises 512 64-
bit ent ries ( PML4Es) . A PML4E is select ed using t he physical address defined as
follows:
Bit s 51: 12 are from CR3.
Bit s 11: 3 are bit s 47: 39 of t he linear address.
Bit s 2: 0 are all 0.
Because a PML4E is ident ified using bit s 47: 39 of t he linear address, it cont rols
access t o a 512- GByt e region of t he linear- address space.
A 4- KByt e nat urally aligned page- direct ory- point er t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PML4E ( see Table 4- 14) . A page-
direct ory- point er t able comprises 512 64- bit ent ries ( PDPTEs) . A PDPTE is
select ed using t he physical address defined as follows:
Bit s 51: 12 are from t he PML4E.
Bit s 11: 3 are bit s 38: 30 of t he linear address.
Bit s 2: 0 are all 0.
Table 4-13. Use of CR3 with IA-32e Paging and CR3.PCIDE = 1
Bit
Position(s)
Contents
11:0 PCID (see Section 4.10.1)
1
M1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address
translation
2
63:M Reserved (must be 0)
3
NOTES:
1. Section 4.9.2 explains how the processor determines the memory type used to access the PML4
table during linear-address translation with CR4.PCIDE = 1.
2. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
3. See Section 4.10.4.1 for use of bit 63 of the source operand of the MOV to CR3 instruction.
1. Not all processors support 1-GByte pages; see Section 4.1.4.
4-26 Vol. 3
PAGING
Because a PDPTE is ident ified using bit s 47: 30 of t he linear address, it cont rols
access t o a 1- GByt e region of t he linear- address space. Use of t he PDPTE depends on
it s PS flag ( bit 7) :
1
I f t he PDPTEs PS flag is 1, t he PDPTE maps a 1- GByt e page ( see Table 4- 15) . The
final physical address is comput ed as follows:
Bit s 51: 30 are from t he PDPTE.
Bit s 29: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page direct ory is locat ed at
t he physical address specified in bit s 51: 12 of t he PDPTE ( see Table 4- 16) . A
page direct ory comprises 512 64- bit ent ries ( PDEs) . A PDE is select ed using t he
physical address defined as follows:
Bit s 51: 12 are from t he PDPTE.
Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging
1. The PS flag of a PDPTE is reserved and must be 0 (if the P flag is 1) if 1-GByte pages are not sup-
ported. See Section 4.1.4 for how to determine whether 1-GByte pages are supported.
Directory Ptr
PTE
Linear Address
Page Table
PDPTE
CR3
39 38
Pointer Table
9
9
40
12
9
40
4-KByte Page
Offset
Physical Addr
PDE with PS=0
Table
0 11 12 20 21
Directory
30 29
Page-Directory-
Page-Directory
PML4
47
9
PML4E
40
40
40
Vol. 3 4-27
PAGING
Bit s 11: 3 are bit s 29: 21 of t he linear address.
Bit s 2: 0 are all 0.
Because a PDE is ident ified using bit s 47: 21 of t he linear address, it cont rols access
t o a 2- MByt e region of t he linear- address space. Use of t he PDE depends on it s PS
flag:
Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging
Directory Ptr
Linear Address
PDPTE
CR3
39 38
Pointer Table
9
9
40
21
31
2-MByte Page
Offset
Physical Addr
PDE with PS=1
0 20 21
Directory
30 29
Page-Directory-
Page-Directory
PML4
47
9
PML4E
40
40
4-28 Vol. 3
PAGING
Figure 4-10. Linear-Address Translation to a 1-GByte Page using IA-32e Paging
Directory Ptr
Linear Address
PDPTE with PS=1
CR3
39 38
Pointer Table
9
40
30
22
1-GByte Page
Offset
Physical Addr
0 30 29
Page-Directory-
PML4
47
9
PML4E
40
Vol. 3 4-29
PAGING
Table 4-14. Format of an IA-32e PML4 Entry (PML4E) that References a Page-
Directory-Pointer Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page-directory-pointer table
1 (R/W) Read/write; if 0, writes may not be allowed to the 512-GByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 512-GByte
region controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page-directory-pointer table referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page-directory-pointer table referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Reserved (must be 0)
11:8 Ignored
M1:12 Physical address of 4-KByte aligned page-directory-pointer table referenced by
this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 512-GByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
Maps a 1-GByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 1-GByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
4-30 Vol. 3
PAGING
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 1-GByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 1-GByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 1-GByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 1-GByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 1-GByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page directory; see
Table 4-16)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) Indirectly determines the memory type used to access the 1-GByte page
referenced by this entry (see Section 4.9.2)
1
29:13 Reserved (must be 0)
(M1):30 Physical address of the 1-GByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 1-GByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. The PAT is supported on all processors that support IA-32e paging.
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
Maps a 1-GByte Page (Contd.)
Bit
Position(s)
Contents
Vol. 3 4-31
PAGING
I f t he PDEs PS flag is 1, t he PDE maps a 2- MByt e page ( see Table 4- 17) . The final
physical address is comput ed as follows:
Table 4-16. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
References a Page Directory
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page directory
1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 1-GByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 1-GByte page; see Table 4-15)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page directory referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 1-GByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 2-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
4-32 Vol. 3
PAGING
Bit s 51: 21 are from t he PDE.
Bit s 20: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PDE ( see Table 4- 18) . A page t able
comprises 512 64- bit ent ries ( PTEs) . A PTE is select ed using t he physical address
defined as follows:
Bit s 51: 12 are from t he PDE.
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see
Table 4-18)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) Indirectly determines the memory type used to access the 2-MByte page
referenced by this entry (see Section 4.9.2)
20:13 Reserved (must be 0)
(M1):21 Physical address of the 2-MByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
Vol. 3 4-33
PAGING
Bit s 11: 3 are bit s 20: 12 of t he linear address.
Bit s 2: 0 are all 0.
Because a PTE is ident ified using bit s 47: 12 of t he linear address, every PTE
maps a 4- KByt e page ( see Table 4- 19) . The final physical address is comput ed as
follows:
Bit s 51: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f a paging- st ruct ure ent rys P flag ( bit 0) is 0 or if t he ent ry set s any reserved bit , t he
ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a page.
A reference using a linear address whose t ranslat ion would use such a paging- st ruc-
t ure ent ry causes a page- fault except ion ( see Sect ion 4.7) .
Table 4-18. Format of an IA-32e Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-17)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page table referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
4-34 Vol. 3
PAGING
The following bit s are reserved wit h I A- 32e paging:
I f t he P flag of a paging- st ruct ure ent ry is 1, bit s 51: MAXPHYADDR are reserved.
I f t he P flag of a PML4E is 1, t he PS flag is reserved.
I f 1- GByt e pages are not support ed and t he P flag of a PDPTE is 1, t he PS flag is
reserved.
1
Table 4-19. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) Indirectly determines the memory type used to access the 2-MByte page
referenced by this entry (see Section 4.9.2)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
(M1):12 Physical address of the 4-KByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 4-KByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
1. See Section 4.1.4 for how to determine whether 1-GByte pages are supported.
Vol. 3 4-35
PAGING
I f t he P flag and t he PS flag of a PDPTE are bot h 1, bit s 29: 13 are reserved.
I f t he P flag and t he PS flag of a PDE are bot h 1, bit s 20: 13 are reserved.
I f I A32_EFER. NXE = 0 and t he P flag of a paging- st ruct ure ent ry is 1, t he XD flag
( bit 63) is reserved.
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4. 6.
Figure 4- 11 gives a summary of t he format s of CR3 and t he I A- 32e paging- st ruct ure
ent ries. For t he paging st ruct ure ent ries, it ident ifies separat ely t he format of ent ries
t hat map pages, t hose t hat reference ot her paging st ruct ures, and t hose t hat do
neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are highlight ed
because t hey det ermine how a paging- st ruct ure ent ry is used.
4.6 ACCESS RIGHTS
There is a t ranslat ion for a linear address if t he processes described in Sect ion 4. 3,
Sect ion 4. 4. 2, and Sect ion 4.5 ( depending upon t he paging mode) complet es and
produces a physical address. The accesses permit t ed by a t ranslat ion is det ermined
by t he access right s specified by t he paging- st ruct ure ent ries cont rolling t he t ransla-
t ion.
1
The following it ems det ail how paging det ermines access right s:
For accesses in supervisor mode ( CPL < 3) :
Dat a reads.
Dat a may be read from any linear address wit h a valid t ranslat ion.
Dat a writ es.
I f CR0. WP = 0, dat a may be writ t en t o any linear address wit h a valid
t ranslat ion.
I f CR0. WP = 1, dat a may be writ t en t o any linear address wit h a valid
t ranslat ion for which t he R/ W flag ( bit 1) is 1 in every paging- st ruct ure
ent ry cont rolling t he t ranslat ion.
I nst ruct ion fet ches.
For 32- bit paging or if I A32_EFER. NXE = 0, inst ruct ions may be fet ched
from any linear address wit h a valid t ranslat ion.
For PAE paging or I A- 32e paging wit h I A32_EFER.NXE = 1, inst ruct ions
may be fet ched from any linear address wit h a valid t ranslat ion for which
t he XD flag ( bit 63) is 0 in every paging- st ruct ure ent ry cont rolling t he
t ranslat ion.
For accesses in user mode ( CPL = 3) :
1. With PAE paging, the PDPTEs do not determine access rights.
4-36 Vol. 3
PAGING
Dat a reads.
Dat a may be read from any linear address wit h a valid t ranslat ion for which
6
3
6
2
6
1
6
0
5
9
5
8
5
7
5
6
5
5
5
4
5
3
5
2
5
1
M
1
M-1
3
2
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
Reserved
2
Address of PML4 table Ignored
P
C
D
P
W
T
Ign. CR3
X
D
3
Ignored Rsvd. Address of page-directory-pointer table Ign.
R
s
v
d
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PML4E:
present
Ignored 0
PML4E:
not
present
X
D
Ignored Rsvd.
Address of
1GB page
frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDPTE:
1GB
page
X
D
Ignored Rsvd. Address of page directory Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDPTE:
page
directory
Ignored 0
PDTPE:
not
present
X
D
Ignored Rsvd.
Address of
2MB page frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
2MB
page
X
D
Ignored Rsvd. Address of page table Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
X
D
Ignored Rsvd. Address of 4KB page frame Ign. G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-11. Formats of CR3 and Paging-Structure Entries with IA-32e Paging
NOTES:
1. M is an abbreviation for MAXPHYADDR.
2. Reserved fields must be 0.
3. If IA32_EFER.NXE = 0 and the P flag of a paging-structure entry is 1, the XD flag (bit 63) is reserved.
Vol. 3 4-37
PAGING
t he U/ S flag ( bit 2) is 1 in every paging- st ruct ure ent ry cont rolling t he t rans-
lat ion.
Dat a writ es.
Dat a may be writ t en t o any linear address wit h a valid t ranslat ion for which
bot h t he R/ W flag and t he U/ S flag are 1 in every paging- st ruct ure ent ry
cont rolling t he t ranslat ion.
I nst ruct ion fet ches.
For 32- bit paging or if I A32_EFER. NXE = 0, inst ruct ions may be fet ched
from any linear address wit h a valid t ranslat ion for which t he U/ S flag is 1
in every paging- st ruct ure ent ry cont rolling t he t ranslat ion.
For PAE paging or I A- 32e paging wit h I A32_EFER.NXE = 1, inst ruct ions
may be fet ched from any linear address wit h a valid t ranslat ion for which
t he U/ S flag is 1 and t he XD flag is 0 in every paging- st ruct ure ent ry
cont rolling t he t ranslat ion.
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . These st ruct ures may include informa-
t ion about access right s. The processor may enforce access right s based on t he TLBs
and paging- st ruct ure caches inst ead of on t he paging st ruct ures in memory.
This fact implies t hat , if soft ware modifies a paging- st ruct ure ent ry t o change access
right s, t he processor might not use t hat change for a subsequent access t o an
affect ed linear address ( see Sect ion 4. 10. 4. 3) . See Sect ion 4.10. 4. 2 for how soft -
ware can ensure t hat t he processor uses t he modified access right s.
4.7 PAGE-FAULT EXCEPTIONS
Accesses using linear addresses may cause page- f aul t ex cept i ons ( # PF; except ion
14) . An access t o a linear address may cause page- fault except ion for eit her of t wo
reasons: ( 1) t here is no valid t ranslat ion for t he linear address; or ( 2) t here is a valid
t ranslat ion for t he linear address, but it s access right s do not permit t he access.
As not ed in Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4. 5, t here is no valid t ranslat ion
for a linear address if t he t ranslat ion process for t hat address would use a paging-
st ruct ure ent ry in which t he P flag ( bit 0) is 0 or one t hat set s a reserved bit . I f t here
is a valid t ranslat ion for a linear address, it s access right s are det ermined as specified
in Sect ion 4. 6.
Figure 4- 12 illust rat es t he error code t hat t he processor provides on delivery of a
page- fault except ion. The following it ems explain how t he bit s in t he error code
describe t he nat ure of t he page- fault except ion:
P flag ( bit 0) .
This flag is 0 if t here is no valid t ranslat ion for t he linear address because t he P
flag was 0 in one of t he paging- st ruct ure ent ries used t o t ranslat e t hat address.
4-38 Vol. 3
PAGING
W/ R ( bit 1) .
I f t he access causing t he page- fault except ion was a writ e, t his flag is 1;
ot herwise, it is 0. This flag describes t he access causing t he page- fault except ion,
not t he access right s specified by paging.
U/ S ( bit 2) .
I f a user- mode ( CPL= 3) access caused t he page- fault except ion, t his flag is 1; it
is 0 if a supervisor- mode ( CPL < 3) access did so. This bit describes t he access
causing t he page- fault except ion, not t he access right s specified by paging.
RSVD flag ( bit 3) .
This flag is 1 if t here is no valid t ranslat ion for t he linear address because a
reserved bit was set in one of t he paging- st ruct ure ent ries used t o t ranslat e t hat
address. ( Because reserved bit s are not checked in a paging- st ruct ure ent ry
whose P flag is 0, bit 3 of t he error code can be set only if bit 0 is also set . )
Bit s reserved in t he paging- st ruct ure ent ries are reserved for fut ure funct ionalit y.
Soft ware developers should be aware t hat such bit s may be used in t he fut ure
and t hat a paging- st ruct ure ent ry t hat causes a page- fault except ion on one
processor might not do so in t he fut ure.
I / D flag ( bit 4) .
Use of t his flag depends on t he set t ings of CR4.PAE and I A32_EFER. NXE:
CR4. PAE = 0 ( 32- bit paging is in use) or I A32_EFER. NXE= 0.
This flag is 0.
Figure 4-12. Page-Fault Error Code
The fault was caused by a non-present page.
The fault was caused by a page-level protection violation.
The access causing the fault was a read.
The access causing the fault was a write.
The access causing the fault originated when the processor
was executing in supervisor mode (CPL < 3).
The access causing the fault originated when the processor
was executing in user mode (CPL = 3).
31 0
Reserved
1 2 3 4
The fault was not caused by reserved bit violation.
The fault was caused by a reserved bit set to 1 in some
P 0
1
W/R 0
1
U/S
0
RSVD
0
1
1
I
/
D
I/D
0 The fault was not caused by an instruction fetch.
1 The fault was caused by an instruction fetch.
P W
/
R
U
/
S
R
S
V
D
paging-structure entry.
Vol. 3 4-39
PAGING
CR4. PAE = 1 ( eit her PAE paging or I A- 32e paging is in use) and
I A32_EFER. NXE= 1.
I f t he access causing t he page- fault except ion was an inst ruct ion fet ch, t his
flag is 1; ot herwise, it is 0. This flag describes t he access causing t he page-
fault except ion, not t he access right s specified by paging.
Page- fault except ions occur only due t o an at t empt t o use a linear address. Failures
t o load t he PDPTE regist ers wit h PAE paging ( see Sect ion 4. 4. 1) cause general-
prot ect ion except ions ( # GP( 0) ) and not page- fault except ions.
4.8 ACCESSED AND DIRTY FLAGS
For any paging- st ruct ure ent ry t hat is used during linear- address t ranslat ion, bit 5 is
t he accessed flag.
1
For paging- st ruct ure ent ries t hat map a page ( as opposed t o
referencing anot her paging st ruct ure) , bit 6 is t he di r t y flag. These flags are
provided for use by memory- management soft ware t o manage t he t ransfer of pages
and paging st ruct ures int o and out of physical memory.
Whenever t he processor uses a paging- st ruct ure ent ry as part of linear- address
t ranslat ion, it set s t he accessed flag in t hat ent ry ( if it is not already set ) .
Whenever t here is a writ e t o a linear address, t he processor set s t he dirt y flag ( if it is
not already set ) in t he paging- st ruct ure ent ry t hat ident ifies t he final physical
address for t he linear address ( eit her a PTE or a paging- st ruct ure ent ry in which t he
PS flag is 1) .
Memory- management soft ware may clear t hese flags when a page or a paging st ruc-
t ure is init ially loaded int o physical memory. These flags are st icky, meaning t hat ,
once set , t he processor does not clear t hem; only soft ware can clear t hem.
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . This fact implies t hat , if soft ware
changes an accessed flag or a dirt y flag from 1 t o 0, t he processor might not set t he
corresponding bit in memory on a subsequent access using an affect ed linear
address ( see Sect ion 4.10. 4. 3) . See Sect ion 4. 10. 4. 2 for how soft ware can ensure
t hat t hese bit s are updat ed as desired.
NOTE
The accesses used by t he processor t o set t hese flags may or may not
be exposed t o t he processor s self- modifying code det ect ion logic. I f
t he processor is execut ing code from t he same memory area t hat is
being used for t he paging st ruct ures, t he set t ing of t hese flags may
or may not result in an immediat e change t o t he execut ing code
st ream.
1. With PAE paging, the PDPTEs are not used during linear-address translation but only to load the
PDPTE registers for some executions of the MOV CR instruction (see Section 4.4.1). For this rea-
son, the PDPTEs do not contain accessed flags with PAE paging.
4-40 Vol. 3
PAGING
4.9 PAGING AND MEMORY TYPING
The memor y t ype of a memory access refers t o t he t ype of caching used for t hat
access. Chapt er 11, Memory Cache Cont rol provides many det ails regarding
memory t yping in t he I nt el- 64 and I A- 32 archit ect ures. This sect ion describes how
paging cont ribut es t o t he det erminat ion of memory t yping.
The way in which paging cont ribut es t o memory t yping depends on whet her t he
processor support s t he Page At t r i but e Tabl e ( PAT; see Sect ion 11. 12) .
1
Sect ion
4.9. 1 and Sect ion 4. 9. 2 explain how paging cont ribut es t o memory t yping depending
on whet her t he PAT is support ed.
4.9.1 Paging and Memory Typing When the PAT is Not Supported
(Pentium Pro and Pentium II Processors)
NOTE
The PAT is support ed on all processors t hat support I A- 32e paging.
Thus, t his sect ion applies only t o 32- bit paging and PAE paging.
I f t he PAT is not support ed, paging cont ribut es t o memory t yping in conj unct ion wit h
t he memory- t ype range regist ers ( MTRRs) as specified in Table 11- 6 in Sect ion
11.5.2. 1.
For any access t o a physical address, t he t able combines t he memory t ype specified
for t hat physical address by t he MTRRs wit h a PCD value and a PWT value. The lat t er
t wo values are det ermined as follows:
For an access t o a PDE wit h 32- bit paging, t he PCD and PWT values come from
CR3.
For an access t o a PDE wit h PAE paging, t he PCD and PWT values come from t he
relevant PDPTE regist er.
For an access t o a PTE, t he PCD and PWT values come from t he relevant PDE.
For an access t o t he physical address t hat is t he t ranslat ion of a linear address,
t he PCD and PWT values come from t he relevant PTE ( if t he t ranslat ion uses a 4-
KByt e page) or t he relevant PDE ( ot herwise) .
4.9.2 Paging and Memory Typing When the PAT is Supported
(Pentium III and More Recent Processor Families)
I f t he PAT is support ed, paging cont ribut es t o memory t yping in conj unct ion wit h t he
PAT and t he memory- t ype range regist ers ( MTRRs) as specified in Table 11- 7 in
Sect ion 11. 5. 2. 2.
1. The PAT is supported on Pentium III and more recent processor families. See Section 4.1.4 for
how to determine whether the PAT is supported.
Vol. 3 4-41
PAGING
The PAT is a 64- bit MSR ( I A32_PAT; MSR index 277H) comprising eight ( 8) 8- bit
ent ries ( ent ry i comprises bit s 8i+ 7: 8i of t he MSR) .
For any access t o a physical address, t he t able combines t he memory t ype specified
for t hat physical address by t he MTRRs wit h a memory t ype select ed from t he PAT.
Table 11- 11 in Sect ion 11. 12.3 specifies how a memory t ype is select ed from t he PAT.
Specifically, it comes from ent ry i of t he PAT, where i is defined as follows:
For an access t o an ent ry in a paging st ruct ure whose address is in CR3 ( e. g., t he
PML4 t able wit h I A- 32e paging) :
For I A- 32e paging wit h CR4. PCI DE = 1, i = 0.
Ot herwise, i = 2* PCD+ PWT, where t he PCD and PWT values come from CR3.
For an access t o a PDE wit h PAE paging, i = 2* PCD+ PWT, where t he PCD and PWT
values come from t he relevant PDPTE regist er.
For an access t o a paging- st ruct ure ent ry X whose address is in anot her paging-
st ruct ure ent ry Y, i = 2* PCD+ PWT, where t he PCD and PWT values come from Y.
For an access t o t he physical address t hat is t he t ranslat ion of a linear address,
i = 4* PAT+ 2* PCD+ PWT, where t he PAT, PCD, and PWT values come from t he
relevant PTE ( if t he t ranslat ion uses a 4- KByt e page) , t he relevant PDE ( if t he
t ranslat ion uses a 2- MByt e page or a 4- MByt e page) , or t he relevant PDPTE ( if
t he t ranslat ion uses a 1- GByt e page) .
4.9.3 Caching Paging-Related Information about Memory Typing
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . These st ruct ures may include informa-
t ion about memory t yping. The processor may memory- t yping informat ion from t he
TLBs and paging- st ruct ure caches inst ead of from t he paging st ruct ures in memory.
This fact implies t hat , if soft ware modifies a paging- st ruct ure ent ry t o change t he
memory- t yping bit s, t he processor might not use t hat change for a subsequent
t ranslat ion using t hat ent ry or for access t o an affect ed linear address. See Sect ion
4. 10. 4. 2 for how soft ware can ensure t hat t he processor uses t he modified memory
t yping.
4.10 CACHING TRANSLATION INFORMATION
The I nt el- 64 and I A- 32 archit ect ures may accelerat e t he address- t ranslat ion process
by caching dat a from t he paging st ruct ures on t he processor. Because t he processor
does not ensure t hat t he dat a t hat it caches are always consist ent wit h t he st ruct ures
in memory, it is import ant for soft ware developers t o underst and how and when t he
processor may cache such dat a. They should also underst and what act ions soft ware
can t ake t o remove cached dat a t hat may be inconsist ent and when it should do so.
This sect ion provides soft ware developers informat ion about t he relevant processor
operat ion.
4-42 Vol. 3
PAGING
Sect ion 4.10. 1 int roduces process- cont ext ident ifiers ( PCI Ds) , which a logical
processor may use t o dist inguish informat ion cached for different linear- address
spaces. Sect ion 4. 10. 2 and Sect ion 4.10. 3 describe how t he processor may cache
informat ion in t ranslat ion lookaside buffers ( TLBs) and paging- st ruct ure caches,
respect ively. Sect ion 4. 10. 4 explains how soft ware can remove inconsist ent cached
informat ion by invalidat ing port ions of t he TLBs and paging- st ruct ure caches. Sect ion
4.10. 5 describes special considerat ions for mult iprocessor syst ems.
4.10.1 Process-Context Identifiers (PCIDs)
Process- cont ext ident ifiers ( PCI Ds) are a facilit y by which a logical processor may
cache informat ion for mult iple linear- address spaces. The processor may ret ain
cached informat ion when soft ware swit ches t o a different linear- address space wit h a
different PCI D ( e. g., by loading CR3; see Sect ion 4. 10. 4. 1 for det ails) .
A PCI D is a 12- bit ident ifier. Non- zero PCI Ds are enabled by set t ing t he PCI DE flag
( bit 17) of CR4. I f CR4.PCI DE = 0, t he current PCI D is always 000H; ot herwise, t he
current PCI D is t he value of bit s 11: 0 of CR3. Not all processors allow CR4.PCI DE t o
be set t o 1; see Sect ion 4.1.4 for how t o det ermine whet her t his is allowed.
The processor ensures t hat CR4. PCI DE can be 1 only in I A- 32e mode ( t hus, 32- bit
paging and PAE paging use only PCI D 000H) . I n addit ion, soft ware can change
CR4. PCI DE from 0 t o 1 only if CR3[ 11: 0] = 000H. These requirement s are enforced
by t he following limit at ions on t he MOV CR inst ruct ion:
MOV t o CR4 causes a general- prot ect ion except ion ( # GP) if it would change
CR4.PCI DE from 0 t o 1 and eit her I A32_EFER. LMA = 0 or CR3[ 11: 0] 000H.
MOV t o CR0 causes a general- prot ect ion except ion if it would clear CR0.PG t o 0
while CR4. PCI DE = 1.
When a logical processor creat es ent ries in t he TLBs ( Sect ion 4. 10. 2) and paging-
st ruct ure caches ( Sect ion 4. 10. 3) , it associat es t hose ent ries wit h t he current PCI D.
When using ent ries in t he TLBs and paging- st ruct ure caches t o t ranslat e a linear
address, a logical processor uses only t hose ent ries associat ed wit h t he current PCI D
( see Sect ion 4.10.2.4 for an except ion) .
I f CR4. PCI DE = 0, a logical processor does not cache informat ion for any PCI D ot her
t han 000H. This is because ( 1) if CR4. PCI DE = 0, t he logical processor will associat e
any newly cached informat ion wit h t he current PCI D, 000H; and ( 2) if MOV t o CR4
clears CR4. PCI DE, all cached informat ion is invalidat ed ( see Sect ion 4. 10. 4. 1) .
NOTE
I n revisions of t his manual t hat were produced when no processors
allowed CR4. PCI DE t o be set t o 1, Sect ion 4. 10 discussed t he caching
of t ranslat ion informat ion wit hout any reference t o PCI Ds. While t he
sect ion now refers t o PCI Ds in it s specificat ion of t his caching, t his
document at ion change is not int ended t o imply any change t o t he
behavior of processors t hat do not allow CR4. PCI DE t o be set t o 1.
Vol. 3 4-43
PAGING
4.10.2 Translation Lookaside Buffers (TLBs)
A processor may cache informat ion about t he t ranslat ion of linear addresses in t rans-
lat ion lookaside buffers ( TLBs) . I n general, TLBs cont ain ent ries t hat map page
numbers t o page frames; t hese t erms are defined in Sect ion 4.10. 2. 1. Sect ion
4. 10. 2. 2 describes how informat ion may be cached in TLBs, and Sect ion 4.10. 2. 3
gives det ails of TLB usage. Sect ion 4. 10. 2. 4 explains t he global- page feat ure, which
allows soft ware t o indicat e t hat cert ain t ranslat ions should receive special t reat ment
when cached in t he TLBs.
4.10.2.1 Page Numbers, Page Frames, and Page Offsets
Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4.5 give det ails of how t he different paging
modes t ranslat e linear addresses t o physical addresses. Specifically, t he upper bit s of
a linear address ( called t he page number ) det ermine t he upper bit s of t he physical
address ( called t he page f r ame) ; t he lower bit s of t he linear address ( called t he
page of f set ) det ermine t he lower bit s of t he physical address. The boundary
bet ween t he page number and t he page offset is det ermined by t he page si ze.
Specifically:
32- bit paging:
I f t he t ranslat ion does not use a PTE ( because CR4. PSE = 1 and t he PS flag is
1 in t he PDE used) , t he page size is 4 MByt es and t he page number comprises
bit s 31: 22 of t he linear address.
I f t he t ranslat ion does use a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 31: 12 of t he linear address.
PAE paging:
I f t he t ranslat ion does not use a PTE ( because t he PS flag is 1 in t he PDE
used) , t he page size is 2 MByt es and t he page number comprises bit s 31: 21
of t he linear address.
I f t he t ranslat ion does uses a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 31: 12 of t he linear address.
I A- 32e paging:
I f t he t ranslat ion does not use a PDE ( because t he PS flag is 1 in t he PDPTE
used) , t he page size is 1 GByt es and t he page number comprises bit s 47: 30
of t he linear address.
I f t he t ranslat ion does use a PDE but does not uses a PTE ( because t he PS flag
is 1 in t he PDE used) , t he page size is 2 MByt es and t he page number
comprises bit s 47: 21 of t he linear address.
I f t he t ranslat ion does use a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 47: 12 of t he linear address.
4-44 Vol. 3
PAGING
4.10.2.2 Caching Translations in TLBs
The processor may accelerat e t he paging process by caching individual t ranslat ions
in t r ansl at i on l ook asi de buf f er s ( TLBs) . Each ent ry in a TLB is an individual t rans-
lat ion. Each t ranslat ion is referenced by a page number. I t cont ains t he following
informat ion from t he paging- st ruct ure ent ries used t o t ranslat e linear addresses wit h
t he page number:
The physical address corresponding t o t he page number ( t he page frame) .
The access right s from t he paging- st ruct ure ent ries used t o t ranslat e linear
addresses wit h t he page number ( see Sect ion 4. 6) :
The logical- AND of t he R/ W flags.
The logical- AND of t he U/ S flags.
The logical- OR of t he XD flags ( necessary only if I A32_EFER. NXE = 1) .
At t ribut es from a paging- st ruct ure ent ry t hat ident ifies t he final page frame for
t he page number ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag is
1) :
The dirt y flag ( see Sect ion 4. 8) .
The memory t ype ( see Sect ion 4. 9) .
( TLB ent ries may cont ain ot her informat ion as well. A processor may implement
mult iple TLBs, and some of t hese may be for special purposes, e. g., only for inst ruc-
t ion fet ches. Such special- purpose TLBs may not cont ain some of t his informat ion if
it is not necessary. For example, a TLB used only for inst ruct ion fet ches need not
cont ain informat ion about t he R/ W and dirt y flags. )
As not ed in Sect ion 4. 10. 1, any TLB ent ries creat ed by a logical processor are associ-
at ed wit h t he current PCI D.
Processors need not implement any TLBs. Processors t hat do implement TLBs may
invalidat e any TLB ent ry at any t ime. Soft ware should not rely on t he exist ence of
TLBs or on t he ret ent ion of TLB ent ries.
4.10.2.3 Details of TLB Use
Because t he TLBs cache only valid t ranslat ions, t here can be a TLB ent ry for a page
number only if t he P flag is 1 and t he reserved bit s are 0 in each of t he paging- st ruc-
t ure ent ries used t o t ranslat e t hat page number. I n addit ion, t he processor does not
cache a t ranslat ion for a page number unless t he accessed flag is 1 in each of t he
paging- st ruct ure ent ries used during t ranslat ion; before caching a t ranslat ion, t he
processor set s any of t hese accessed flags t hat is not already 1.
The processor may cache t ranslat ions required for prefet ches and for accesses t hat
are a result of speculat ive execut ion t hat would never act ually occur in t he execut ed
code pat h.
I f t he page number of a linear address corresponds t o a TLB ent ry associat ed wit h t he
current PCI D, t he processor may use t hat TLB ent ry t o det ermine t he page frame,
Vol. 3 4-45
PAGING
access right s, and ot her at t ribut es for accesses t o t hat linear address. I n t his case,
t he processor may not act ually consult t he paging st ruct ures in memory. The
processor may ret ain a TLB ent ry unmodified even if soft ware subsequent ly modifies
t he relevant paging- st ruct ure ent ries in memory. See Sect ion 4.10. 4. 2 for how soft -
ware can ensure t hat t he processor uses t he modified paging- st ruct ure ent ries.
I f t he paging st ruct ures specify a t ranslat ion using a page larger t han 4 KByt es, some
processors may choose t o cache mult iple smaller- page TLB ent ries for t hat t ransla-
t ion. Each such TLB ent ry would be associat ed wit h a page number corresponding t o
t he smaller page size ( e. g., bit s 47: 12 of a linear address wit h I A- 32e paging) , even
t hough part of t hat page number ( e. g., bit s 20: 12) are part of t he offset wit h respect
t o t he page specified by t he paging st ruct ures. The upper bit s of t he physical address
in such a TLB ent ry are derived from t he physical address in t he PDE used t o creat e
t he t ranslat ion, while t he lower bit s come from t he linear address of t he access for
which t he t ranslat ion is creat ed. There is no way for soft ware t o be aware t hat
mult iple t ranslat ions for smaller pages have been used for a large page.
I f soft ware modifies t he paging st ruct ures so t hat t he page size used for a 4- KByt e
range of linear addresses changes, t he TLBs may subsequent ly cont ain mult iple
t ranslat ions for t he address range ( one for each page size) . A reference t o a linear
address in t he address range may use any of t hese t ranslat ions. Which t ranslat ion is
used may vary from one execut ion t o anot her, and t he choice may be implement a-
t ion- specific.
4.10.2.4 Global Pages
The I nt el- 64 and I A- 32 archit ect ures also allow for gl obal pages when t he PGE flag
( bit 7) is 1 in CR4. I f t he G flag ( bit 8) is 1 in a paging- st ruct ure ent ry t hat maps a
page ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag is 1) , any TLB
ent ry cached for a linear address using t hat paging- st ruct ure ent ry is considered t o
be gl obal . Because t he G flag is used only in paging- st ruct ure ent ries t hat map a
page, and because informat ion from such ent ries are not cached in t he paging- st ruc-
t ure caches, t he global- page feat ure does not affect t he behavior of t he paging-
st ruct ure caches.
A logical processor may use a global TLB ent ry t o t ranslat e a linear address, even if
t he TLB ent ry is associat ed wit h a PCI D different from t he current PCI D.
4.10.3 Paging-Structure Caches
I n addit ion t o t he TLBs, a processor may cache ot her informat ion about t he paging
st ruct ures in memory.
4.10.3.1 Caches for Paging Structures
A processor may support any or of all t he following paging- st ruct ure caches:
4-46 Vol. 3
PAGING
PML4 cache ( I A- 32e paging only) . Each PML4- cache ent ry is referenced by a 9-
bit value and is used for linear addresses for which bit s 47: 39 have t hat value.
The ent ry cont ains informat ion from t he PML4E used t o t ranslat e such linear
addresses:
The physical address from t he PML4E ( t he address of t he page- direct ory-
point er t able) .
The value of t he R/ W flag of t he PML4E.
The value of t he U/ S flag of t he PML4E.
The value of t he XD flag of t he PML4E.
The values of t he PCD and PWT flags of t he PML4E.
The following it ems det ail how a processor may use t he PML4 cache:
I f t he processor has a PML4- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E in memory) .
The processor does not creat e a PML4- cache ent ry unless t he P flag is 1 and
all reserved bit s are 0 in t he PML4E in memory.
The processor does not creat e a PML4- cache ent ry unless t he accessed flag is
1 in t he PML4E in memory; before caching a t ranslat ion, t he processor set s
t he accessed flag if it is not already 1.
The processor may creat e a PML4- cache ent ry even if t here are no t ransla-
t ions for any linear address t hat might use t hat ent ry ( e. g., because t he P
flags are 0 in all ent ries in t he referenced page- direct ory- point er t able) .
I f t he processor creat es a PML4- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E
in memory.
PDPTE cache ( I A- 32e paging only) .
1
Each PDPTE- cache ent ry is referenced by
an 18- bit value and is used for linear addresses for which bit s 47: 30 have t hat
value. The ent ry cont ains informat ion from t he PML4E and PDPTE used t o
t ranslat e such linear addresses:
The physical address from t he PDPTE ( t he address of t he page direct ory) . ( No
PDPTE- cache ent ry is creat ed for a PDPTE t hat maps a 1- GByt e page. )
The logical- AND of t he R/ W flags in t he PML4E and t he PDPTE.
The logical- AND of t he U/ S flags in t he PML4E and t he PDPTE.
The logical- OR of t he XD flags in t he PML4E and t he PDPTE.
The values of t he PCD and PWT flags of t he PDPTE.
The following it ems det ail how a processor may use t he PDPTE cache:
1. With PAE paging, the PDPTEs are stored in internal, non-architectural registers. The operation of
these registers is described in Section 4.4.1 and differs from that described here.
Vol. 3 4-47
PAGING
I f t he processor has a PDPTE- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E and t he
PDPTE in memory) .
The processor does not creat e a PDPTE- cache ent ry unless t he P flag is 1, t he
PS flag is 0, and t he reserved bit s are 0 in t he PML4E and t he PDPTE in
memory.
The processor does not creat e a PDPTE- cache ent ry unless t he accessed flags
are 1 in t he PML4E and t he PDPTE in memory; before caching a t ranslat ion,
t he processor set s any accessed flags t hat are not already 1.
The processor may creat e a PDPTE- cache ent ry even if t here are no t ransla-
t ions for any linear address t hat might use t hat ent ry.
I f t he processor creat es a PDPTE- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E
or PDPTE in memory.
PDE cache. The use of t he PDE cache depends on t he paging mode:
For 32- bit paging, each PDE- cache ent ry is referenced by a 10- bit value and
is used for linear addresses for which bit s 31: 22 have t hat value.
For PAE paging, each PDE- cache ent ry is referenced by an 11- bit value and is
used for linear addresses for which bit s 31: 21 have t hat value.
For I A- 32e paging, each PDE- cache ent ry is referenced by a 27- bit value and
is used for linear addresses for which bit s 47: 21 have t hat value.
A PDE- cache ent ry cont ains informat ion from t he PML4E, PDPTE, and PDE used t o
t ranslat e t he relevant linear addresses ( for 32- bit paging and PAE paging, only
t he PDE applies) :
The physical address from t he PDE ( t he address of t he page t able) . ( No PDE-
cache ent ry is creat ed for a PDE t hat maps a page. )
The logical-AND of t he R/ W flags in t he PML4E, PDPTE, and PDE.
The logical-AND of t he U/ S flags in t he PML4E, PDPTE, and PDE.
The logical- OR of t he XD flags in t he PML4E, PDPTE, and PDE.
The values of t he PCD and PWT flags of t he PDE.
The following it ems det ail how a processor may use t he PDE cache ( references
below t o PML4Es and PDPTEs apply on t o I A- 32e paging) :
I f t he processor has a PDE- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E, t he PDPTE,
and t he PDE in memory) .
The processor does not creat e a PDE- cache ent ry unless t he P flag is 1, t he PS
flag is 0, and t he reserved bit s are 0 in t he PML4E, t he PDPTE, and t he PDE in
memory.
4-48 Vol. 3
PAGING
The processor does not creat e a PDE- cache ent ry unless t he accessed flag is
1 in t he PML4E, t he PDPTE, and t he PDE in memory; before caching a t rans-
lat ion, t he processor set s any accessed flags t hat are not already 1.
The processor may creat e a PDE- cache ent ry even if t here are no t ranslat ions
for any linear address t hat might use t hat ent ry.
I f t he processor creat es a PDE- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E,
t he PDPTE, or t he PDE in memory.
I nformat ion from a paging- st ruct ure ent ry can be included in ent ries in t he paging-
st ruct ure caches for ot her paging- st ruct ure ent ries referenced by t he original ent ry.
For example, if t he R/ W flag is 0 in a PML4E, t hen t he R/ W flag will be 0 in any PDPTE-
cache ent ry for a PDPTE from t he page- direct ory- point er t able referenced by t hat
PML4E. This is because t he R/ W flag of each such PDPTE- cache ent ry is t he logical-
AND of t he R/ W flags in t he appropriat e PML4E and PDPTE.
The paging- st ruct ure caches cont ain informat ion only from paging- st ruct ure ent ries
t hat reference ot her paging st ruct ures ( and not t hose t hat map pages) . Because t he
G flag is not used in such paging- st ruct ure ent ries, t he global- page feat ure does not
affect t he behavior of t he paging- st ruct ure caches.
The processor may creat e ent ries in paging- st ruct ure caches for t ranslat ions
required for prefet ches and for accesses t hat are a result of speculat ive execut ion
t hat would never act ually occur in t he execut ed code pat h.
As not ed in Sect ion 4. 10. 1, any ent ries creat ed in paging- st ruct ure caches by a
logical processor are associat ed wit h t he current PCI D.
A processor may or may not implement any of t he paging- st ruct ure caches. Soft ware
should rely on neit her t heir presence nor t heir absence. The processor may invalidat e
ent ries in t hese caches at any t ime. Because t he processor may creat e t he cache
ent ries at t he t ime of t ranslat ion and not updat e t hem following subsequent modifi-
cat ions t o t he paging st ruct ures in memory, soft ware should t ake care t o invalidat e
t he cache ent ries appropriat ely when causing such modificat ions. The invalidat ion of
TLBs and t he paging- st ruct ure caches is described in Sect ion 4. 10. 4.
4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses
When a linear address is accessed, t he processor uses a procedure such as t he
following t o det ermine t he physical address t o which it t ranslat es and whet her t he
access should be allowed:
I f t he processor finds a TLB ent ry t hat is for t he page number of t he linear
address and t hat is associat ed wit h t he current PCI D ( or which is global) , it may
use t he physical address, access right s, and ot her at t ribut es from t hat ent ry.
I f t he processor does not find a relevant TLB ent ry, it may use t he upper bit s of
t he linear address t o select an ent ry from t he PDE cache t hat is associat ed wit h
t he current PCI D ( Sect ion 4. 10. 3. 1 indicat es which bit s are used in each paging
mode) . I t can t hen use t hat ent ry t o complet e t he t ranslat ion process ( locat ing a
Vol. 3 4-49
PAGING
PTE, et c. ) as if it had t raversed t he PDE ( and, for I A- 32e paging, t he PDPTE and
PML4) corresponding t o t he PDE- cache ent ry.
The following it ems apply when I A- 32e paging is used:
I f t he processor does not find a relevant TLB ent ry or a relevant PDE- cache
ent ry, it may use bit s 47: 30 of t he linear address t o select an ent ry from t he
PDPTE cache t hat is associat ed wit h t he current PCI D. I t can t hen use t hat
ent ry t o complet e t he t ranslat ion process ( locat ing a PDE, et c. ) as if it had
t raversed t he PDPTE and t he PML4 corresponding t o t he PDPTE- cache ent ry.
I f t he processor does not find a relevant TLB ent ry, a relevant PDE- cache
ent ry, or a relevant PDPTE- cache ent ry, it may use bit s 47: 39 of t he linear
address t o select an ent ry from t he PML4 cache t hat is associat ed wit h t he
current PCI D. I t can t hen use t hat ent ry t o complet e t he t ranslat ion process
( locat ing a PDPTE, et c. ) as if it had t raversed t he corresponding PML4.
( Any of t he above st eps would be skipped if t he processor does not support t he cache
in quest ion. )
I f t he processor does not find a TLB or paging- st ruct ure- cache ent ry for t he linear
address, it uses t he linear address t o t raverse t he ent ire paging- st ruct ure hierarchy,
as described in Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4.5.
4.10.3.3 Multiple Cached Entries for a Single Paging-Structure Entry
The paging- st ruct ure caches and TLBs and paging- st ruct ure caches may cont ain
mult iple ent ries associat ed wit h a single PCI D and wit h informat ion derived from a
single paging- st ruct ure ent ry. The following it ems give some examples for I A- 32e
paging:
Suppose t hat t wo PML4Es cont ain t he same physical address and t hus reference
t he same page- direct ory- point er t able. Any PDPTE in t hat t able may result in t wo
PDPTE- cache ent ries, each associat ed wit h a different set of linear addresses.
Specifically, suppose t hat t he n
1
t h
and n
2
t h
ent ries in t he PML4 t able cont ain t he
same physical address. This implies t hat t he physical address in t he m
t h
PDPTE in
t he page- direct ory- point er t able would appear in t he PDPTE- cache ent ries
associat ed wit h bot h p
1
and p
2
, where ( p
1
9) = n
1
, ( p
2
9) = n
2
, and ( p
1
&
1FFH) = ( p
2
& 1FFH) = m. This is because bot h PDPTE- cache ent ries use t he
same PDPTE, one result ing from a reference from t he n
1
t h
PML4E and one from
t he n
2
t h
PML4E.
Suppose t hat t he first PML4E ( i. e., t he one in posit ion 0) cont ains t he physical
address X in CR3 ( t he physical address of t he PML4 t able) . This implies t he
following:
Any PML4- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 39
cont ains address X.
Any PDPTE- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 30
cont ains address X. This is because t he t ranslat ion for a linear address for
which t he value of bit s 47: 30 is 0 uses t he value of bit s 47: 39 ( 0) t o locat e a
4-50 Vol. 3
PAGING
page- direct ory- point er t able at address X ( t he address of t he PML4 t able) . I t
t hen uses t he value of bit s 38: 30 ( also 0) t o find address X again and t o st ore
t hat address in t he PDPTE- cache ent ry.
Any PDE- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 21
cont ains address X for similar reasons.
Any TLB ent ry for page number 0 ( associat ed wit h linear addresses wit h 0 in
bit s 47: 12) t ranslat es t o page frame X 12 for similar reasons.
The same PML4E cont ribut es it s address X t o all t hese cache ent ries because t he
self- referencing nat ure of t he ent ry causes it t o be used as a PML4E, a PDPTE, a
PDE, and a PTE.
4.10.4 Invalidation of TLBs and Paging-Structure Caches
As not ed in Sect ion 4. 10. 2 and Sect ion 4. 10. 3, t he processor may creat e ent ries in
t he TLBs and t he paging- st ruct ure caches when linear addresses are t ranslat ed, and
it may ret ain t hese ent ries even aft er t he paging st ruct ures used t o creat e t hem have
been modified. To ensure t hat linear- address t ranslat ion uses t he modified paging
st ruct ures, soft ware should t ake act ion t o invalidat e any cached ent ries t hat may
cont ain informat ion t hat has since been modified.
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches
The following inst ruct ions invalidat e ent ries in t he TLBs and t he paging- st ruct ure
caches:
I NVLPG. This inst ruct ion t akes a single operand, which is a linear address. The
inst ruct ion invalidat es any TLB ent ries t hat are for a page number corresponding
t o t he linear address and t hat are associat ed wit h t he current PCI D. I t also
invalidat es any global TLB ent ries wit h t hat page number, regardless of PCI D ( see
Sect ion 4. 10. 2. 4) .
1
I NVLPG also invalidat es all ent ries in all paging- st ruct ure
caches associat ed wit h t he current PCI D, regardless of t he linear addresses t o
which t hey correspond.
MOV t o CR3. The behavior of t he inst ruct ion depends on t he value of CR4. PCI DE:
I f CR4. PCI DE = 0, t he inst ruct ion invalidat es all TLB ent ries associat ed wit h
PCI D 000H except t hose for global pages. I t also invalidat es all ent ries in all
paging- st ruct ure caches associat ed wit h PCI D 000H.
I f CR4. PCI DE = 1 and bit 63 of t he inst ruct ions source operand is 0, t he
inst ruct ion invalidat es all TLB ent ries associat ed wit h t he PCI D specified in
bit s 11: 0 of t he inst ruct ions source operand except t hose for global pages. I t
also invalidat es all ent ries in all paging- st ruct ure caches associat ed wit h t hat
1. If the paging structures map the linear address using a page larger than 4 KBytes and there are
multiple TLB entries for that page (see Section 4.10.2.3), the instruction invalidates all of them.
Vol. 3 4-51
PAGING
PCI D. I t is not required t o invalidat e ent ries in t he TLBs and paging- st ruct ure
caches t hat are associat ed wit h ot her PCI Ds.
I f CR4.PCI DE = 1 and bit 63 of t he inst ruct ions source operand is 1, t he
inst ruct ion is not required t o invalidat e any TLB ent ries or ent ries in paging-
st ruct ure caches.
MOV t o CR4. The inst ruct ion invalidat es all TLB ent ries ( including global ent ries)
and all ent ries in all paging- st ruct ure caches ( for all PCI Ds) if eit her ( 1) it
changes t he value of t he CR4. PGE flag;
1
or ( 2) it changes t he value of t he
CR4. PCI DE from 1 t o 0.
Task swit ch. I f a t ask swit ch changes t he value of CR3, it invalidat es all TLB
ent ries associat ed wit h PCI D 000H except t hose for global pages. I t also
invalidat es all ent ries in all paging- st ruct ure caches for associat ed wit h PCI D
000H.
2
VMX t ransit ions. See Sect ion 4. 11. 1.
The processor is always free t o invalidat e addit ional ent ries in t he TLBs and paging-
st ruct ure caches. The following are some examples:
I NVLPG may invalidat e TLB ent ries for pages ot her t han t he one corresponding t o
it s linear- address operand. I t may invalidat e TLB ent ries and paging- st ruct ure-
cache ent ries associat ed wit h PCI Ds ot her t han t he current PCI D.
MOV t o CR3 may invalidat e TLB ent ries for global pages. I f CR4. PCI DE = 1 and
bit 63 of t he inst ruct ions source operand is 0, it may invalidat e TLB ent ries and
ent ries in t he paging- st ruct ure caches associat ed wit h PCI Ds ot her t han t he
current PCI D. I t may invalidat e ent ries if CR4. PCI DE = 1 and bit 63 of t he
inst ruct ions source operand is 1.
On a processor support ing Hyper-Threading Technology, invalidat ions performed
on one logical processor may invalidat e ent ries in t he TLBs and paging- st ruct ure
caches used by ot her logical processors.
( Ot her inst ruct ions and operat ions may invalidat e ent ries in t he TLBs and t he paging-
st ruct ure caches, but t he inst ruct ions ident ified above are recommended. )
I n addit ion t o t he inst ruct ions ident ified above, page fault s invalidat e ent ries in t he
TLBs and paging- st ruct ure caches. I n part icular, a page- fault except ion result ing
from an at t empt t o use a linear address will invalidat e any TLB ent ries t hat are for a
page number corresponding t o t hat linear address and t hat are associat ed wit h t he
current PCI D. it also invalidat es all ent ries in t he paging- st ruct ure caches t hat would
be used for t hat linear address and t hat are associat ed wit h t he current PCI D.
3
These
1. If CR4.PGE is changing from 0 to 1, there were no global TLB entries before the execution; if
CR4.PGE is changing from 1 to 0, there will be no global TLB entries after the execution.
2. Task switches do not occur in IA-32e mode and thus cannot occur with IA-32e paging. Since
CR4.PCIDE can be set only with IA-32e paging, task switches occur only with CR4.PCIDE = 0.
3. Unlike INVLPG, page faults need not invalidate all entries in the paging-structure caches, only
those that would be used to translate the faulting linear address.
4-52 Vol. 3
PAGING
invalidat ions ensure t hat t he page- fault except ion will not recur ( if t he fault ing
inst ruct ion is re- execut ed) if it would not be caused by t he cont ent s of t he paging
st ruct ures in memory ( and if, t herefore, it result ed from cached ent ries t hat were not
invalidat ed aft er t he paging st ruct ures were modified in memory) .
As not ed in Sect ion 4. 10. 2, some processors may choose t o cache mult iple smaller-
page TLB ent ries for a t ranslat ion specified by t he paging st ruct ures t o use a page
larger t han 4 KByt es. There is no way for soft ware t o be aware t hat mult iple t ransla-
t ions for smaller pages have been used for a large page. The I NVLPG inst ruct ion and
page fault s provide t he same assurances t hat t hey provide when a single TLB ent ry is
used: t hey invalidat e all TLB ent ries corresponding t o t he t ranslat ion specified by t he
paging st ruct ures.
4.10.4.2 Recommended Invalidation
The following it ems provide some recommendat ions regarding when soft ware should
perform invalidat ions:
I f soft ware modifies a paging- st ruct ure ent ry t hat ident ifies t he final page frame
for a page number ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag
is 1) , it should execut e I NVLPG for any linear address wit h a page number whose
t ranslat ion uses t hat PTE.
1
( I f t he paging- st ruct ure ent ry may be used in t he
t ranslat ion of different page numbers see Sect ion 4. 10. 3. 3 soft ware should
execut e I NVLPG for linear addresses wit h each of t hose page numbers; alt erna-
t ively, it could use MOV t o CR3 or MOV t o CR4. )
I f soft ware modifies a paging- st ruct ure ent ry t hat references anot her paging
st ruct ure, it may use one of t he following approaches depending upon t he t ypes
and number of t ranslat ions cont rolled by t he modified ent ry:
Execut e I NVLPG for linear addresses wit h each of t he page numbers wit h
t ranslat ions t hat would use t he ent ry. However, if no page numbers t hat
would use t he ent ry have t ranslat ions ( e. g., because t he P flags are 0 in all
ent ries in t he paging st ruct ure referenced by t he modified ent ry) , it remains
necessary t o execut e I NVLPG at least once.
Execut e MOV t o CR3 if t he modified ent ry cont rols no global pages.
Execut e MOV t o CR4 t o modify CR4. PGE.
I f CR4. PCI DE = 1 and soft ware modifies a paging- st ruct ure ent ry t hat does not
map a page or in which t he G flag ( bit 8) is 0, addit ional st eps are required if t he
ent ry may be used for PCI Ds ot her t han t he current one. Any one of t he following
suffices:
Execut e MOV t o CR4 t o modify CR4. PGE, eit her immediat ely or before again
using any of t he affect ed PCI Ds. For example, soft ware could use different
( previously unused) PCI Ds for t he processes t hat used t he affect ed PCI Ds.
1. One execution of INVLPG is sufficient even for a page with size greater than 4 KBytes.
Vol. 3 4-53
PAGING
For each affect ed PCI D, execut e MOV t o CR3 t o make t hat PCI D current ( and
t o load t he address of t he appropriat e PML4 t able) . I f t he modified ent ry
cont rols no global pages and bit 63 of t he source operand t o MOV t o CR3 was
0, no furt her st eps are required. Ot herwise, execut e I NVLPG for linear
addresses wit h each of t he page numbers wit h t ranslat ions t hat would use
t he ent ry; if no page numbers t hat would use t he ent ry have t ranslat ions,
execut e I NVLPG at least once.
I f soft ware using PAE paging modifies a PDPTE, it should reload CR3 wit h t he
regist er s current value t o ensure t hat t he modified PDPTE is loaded int o t he
corresponding PDPTE regist er ( see Sect ion 4.4. 1) .
I f t he nat ure of t he paging st ruct ures is such t hat a single ent ry may be used for
mult iple purposes ( see Sect ion 4. 10. 3. 3) , soft ware should perform invalidat ions
for all of t hese purposes. For example, if a single ent ry might serve as bot h a PDE
and PTE, it may be necessary t o execut e I NVLPG wit h t wo ( or more) linear
addresses, one t hat uses t he ent ry as a PDE and one t hat uses it as a PTE. ( Alt er-
nat ively, soft ware could use MOV t o CR3 or MOV t o CR4. )
As not ed in Sect ion 4.10. 2, t he TLBs may subsequent ly cont ain mult iple t ransla-
t ions for t he address range if soft ware modifies t he paging st ruct ures so t hat t he
page size used for a 4- KByt e range of linear addresses changes. A reference t o a
linear address in t he address range may use any of t hese t ranslat ions.
Soft ware wishing t o prevent t his uncert aint y should not writ e t o a paging-
st ruct ure ent ry in a way t hat would change, for any linear address, bot h t he page
size and eit her t he page frame, access right s, or ot her at t ribut es. I t can inst ead
use t he following algorit hm: first clear t he P flag in t he relevant paging- st ruct ure
ent ry ( e. g., PDE) ; t hen invalidat e any t ranslat ions for t he affect ed linear
addresses ( see Sect ion 4. 10. 4. 2) ; and t hen modify t he relevant paging- st ruct ure
ent ry t o set t he P flag and est ablish modified t ranslat ion( s) for t he new page size.
Soft ware should clear bit 63 of t he source operand t o a MOV t o CR3 inst ruct ion
t hat est ablishes a PCI D t hat had been used earlier for a different linear- address
space ( e. g., wit h a different value in bit s 51: 12 of CR3) . This ensures invalidat ion
of any informat ion t hat may have been cached for t he previous linear- address
space.
This assumes t hat bot h linear- address spaces use t he same global pages and
t hat it is t hus not necessary t o invalidat e any global TLB ent ries. I f t hat is not t he
case, soft ware should invalidat e t hose ent ries by execut ing MOV t o CR4 t o modify
CR4. PGE.
4.10.4.3 Optional Invalidation
The following it ems describe cases in which soft ware may choose not t o invalidat e
and t he pot ent ial consequences of t hat choice:
I f a paging- st ruct ure ent ry is modified t o change t he P flag from 0 t o 1, no inval-
idat ion is necessary. This is because no TLB ent ry or paging- st ruct ure cache ent ry
is creat ed wit h informat ion from a paging- st ruct ure ent ry in which t he P flag is 0.
1
4-54 Vol. 3
PAGING
I f a paging- st ruct ure ent ry is modified t o change t he accessed flag from 0 t o 1,
no invalidat ion is necessary ( assuming t hat an invalidat ion was performed t he
last t ime t he accessed flag was changed from 1 t o 0) . This is because no TLB
ent ry or paging- st ruct ure cache ent ry is creat ed wit h informat ion from a paging-
st ruct ure ent ry in which t he accessed flag is 0.
I f a paging- st ruct ure ent ry is modified t o change t he R/ W flag from 0 t o 1, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed writ e access) but no ot her adverse behavior. Such
an except ion will occur at most once for each affect ed linear address ( see Sect ion
4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he U/ S flag from 0 t o 1, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed user- mode access) but no ot her adverse behavior.
Such an except ion will occur at most once for each affect ed linear address ( see
Sect ion 4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he XD flag from 1 t o 0, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed inst ruct ion fet ch) but no ot her adverse behavior.
Such an except ion will occur at most once for each affect ed linear address ( see
Sect ion 4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he accessed flag from 1 t o 0,
failure t o perform an invalidat ion may result in t he processor not set t ing t hat bit
in response t o a subsequent access t o a linear address whose t ranslat ion uses t he
ent ry. Soft ware cannot int erpret t he bit being clear as an indicat ion t hat such an
access has not occurred.
I f soft ware modifies a paging- st ruct ure ent ry t hat ident ifies t he final physical
address for a linear address ( eit her a PTE or a paging- st ruct ure ent ry in which t he
PS flag is 1) t o change t he dirt y flag from 1 t o 0, failure t o perform an invalidat ion
may result in t he processor not set t ing t hat bit in response t o a subsequent writ e
t o a linear address whose t ranslat ion uses t he ent ry. Soft ware cannot int erpret
t he bit being clear as an indicat ion t hat such a writ e has not occurred.
The read of a paging- st ruct ure ent ry in t ranslat ing an address being used t o fet ch
an inst ruct ion may appear t o execut e before an earlier writ e t o t hat paging-
st ruct ure ent ry if t here is no serializing inst ruct ion bet ween t he writ e and t he
inst ruct ion fet ch. Not e t hat t he invalidat ing inst ruct ions ident ified in Sect ion
4. 10. 4. 1 are all serializing inst ruct ions.
Sect ion 4. 10. 3. 3 describes sit uat ions in which a single paging- st ruct ure ent ry
may cont ain informat ion cached in mult iple ent ries in t he paging- st ruct ure
caches. Because all ent ries in t hese caches are invalidat ed by any execut ion of
I NVLPG, it is not necessary t o follow t he modificat ion of such a paging- st ruct ure
1. If it is also the case that no invalidation was performed the last time the P flag was changed
from 1 to 0, the processor may use a TLB entry or paging-structure cache entry that was cre-
ated when the P flag had earlier been 1.
Vol. 3 4-55
PAGING
ent ry by execut ing I NVLPG mult iple t imes solely for t he purpose of invalidat ing
t hese mult iple cached ent ries. ( I t may be necessary t o do so t o invalidat e
mult iple TLB ent ries. )
4.10.4.4 Delayed Invalidation
Required invalidat ions may be delayed under some circumst ances. Soft ware devel-
opers should underst and t hat , bet ween t he modificat ion of a paging- st ruct ure ent ry
and execut ion of t he invalidat ion inst ruct ion recommended in Sect ion 4. 10. 4. 2, t he
processor may use t ranslat ions based on eit her t he old value or t he new value of t he
paging- st ruct ure ent ry. The following it ems describe some of t he pot ent ial conse-
quences of delayed invalidat ion:
I f a paging- st ruct ure ent ry is modified t o change from 1 t o 0 t he P flag from 1 t o
0, an access t o a linear address whose t ranslat ion is cont rolled by t his ent ry may
or may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he R/ W flag from 0 t o 1, writ e
accesses t o linear addresses whose t ranslat ion is cont rolled by t his ent ry may or
may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he U/ S flag from 0 t o 1, user-
mode accesses t o linear addresses whose t ranslat ion is cont rolled by t his ent ry
may or may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he XD flag from 1 t o 0,
inst ruct ion fet ches from linear addresses whose t ranslat ion is cont rolled by t his
ent ry may or may not cause a page- fault except ion.
As not ed in Sect ion 8. 1. 1, an x87 inst ruct ion or an SSE inst ruct ion t hat accesses dat a
larger t han a quadword may be implement ed using mult iple memory accesses. I f
such an inst ruct ion st ores t o memory and invalidat ion has been delayed, some of t he
accesses may complet e ( writ ing t o memory) while anot her causes a page- fault
except ion.
1
I n t his case, t he effect s of t he complet ed accesses may be visible t o soft -
ware even t hough t he overall inst ruct ion caused a fault .
I n some cases, t he consequences of delayed invalidat ion may not affect soft ware
adversely. For example, when freeing a port ion of t he linear- address space ( by
marking paging- st ruct ure ent ries not present ) , invalidat ion using I NVLPG may be
delayed if soft ware does not re- allocat e t hat port ion of t he linear- address space or
t he memory t hat had been associat ed wit h it . However, because of speculat ive
execut ion ( or errant soft ware) , t here may be accesses t o t he freed port ion of t he
linear- address space before t he invalidat ions occur. I n t his case, t he following can
happen:
Reads can occur t o t he freed port ion of t he linear- address space. Therefore,
invalidat ion should not be delayed for an address range t hat has read side
effect s.
1. If the accesses are to different pages, this may occur even if invalidation has not been delayed.
4-56 Vol. 3
PAGING
The processor may ret ain ent ries in t he TLBs and paging- st ruct ure caches for an
ext ended period of t ime. Soft ware should not assume t hat t he processor will not
use ent ries associat ed wit h a linear address simply because t ime has passed.
As not ed in Sect ion 4. 10. 3. 1, t he processor may creat e an ent ry in a paging-
st ruct ure cache even if t here are no t ranslat ions for any linear address t hat might
use t hat ent ry. Thus, if soft ware has marked not present all ent ries in page
t able, t he processor may subsequent ly creat e a PDE- cache ent ry for t he PDE t hat
references t hat page t able ( assuming t hat t he PDE it self is marked present ) .
I f soft ware at t empt s t o writ e t o t he freed port ion of t he linear- address space, t he
processor might not generat e a page fault . ( Such an at t empt would likely be t he
result of a soft ware error. ) For t hat reason, t he page frames previously
associat ed wit h t he freed port ion of t he linear- address space should not be
reallocat ed for anot her purpose unt il t he appropriat e invalidat ions have been
performed.
4.10.5 Propagation of Paging-Structure Changes to Multiple
Processors
As not ed in Sect ion 4. 10. 4, soft ware t hat modifies a paging- st ruct ure ent ry may
need t o invalidat e ent ries in t he TLBs and paging- st ruct ure caches t hat were derived
from t he modified ent ry before it was modified. I n a syst em cont aining more t han
one logical processor, soft ware must account for t he fact t hat t here may be ent ries in
t he TLBs and paging- st ruct ure caches of logical processors ot her t han t he one used
t o modify t he paging- st ruct ure ent ry. The process of propagat ing t he changes t o a
paging- st ruct ure ent ry is commonly referred t o as TLB shoot down.
TLB shoot down can be done using memory- based semaphores and/ or int erprocessor
int errupt s ( I PI ) . The following it ems describe a simple but inefficient example of a
TLB shoot down algorit hm for processors support ing t he I nt el- 64 and I A- 32 archit ec-
t ures:
1. Begin barrier: St op all but one logical processor; t hat is, cause all but one t o
execut e t he HLT inst ruct ion or t o ent er a spin loop.
2. Allow t he act ive logical processor t o change t he necessary paging- st ruct ure
ent ries.
3. Allow all logical processors t o perform invalidat ions appropriat e t o t he modifica-
t ions t o t he paging- st ruct ure ent ries.
4. Allow all logical processors t o resume normal operat ion.
Alt ernat ive, performance- opt imized, TLB shoot down algorit hms may be developed;
however, soft ware developers must t ake care t o ensure t hat t he following condit ions
are met :
All logical processors t hat are using t he paging st ruct ures t hat are being modified
must part icipat e and perform appropriat e invalidat ions aft er t he modificat ions
are made.
Vol. 3 4-57
PAGING
I f t he modificat ions t o t he paging- st ruct ure ent ries are made before t he barrier
or if t here is no barrier, t he operat ing syst em must ensure one of t he following:
( 1) t hat t he affect ed linear- address range is not used bet ween t he t ime of modifi-
cat ion and t he t ime of invalidat ion; or ( 2) t hat it is prepared t o deal wit h t he
consequences of t he affect ed linear- address range being used during t hat period.
For example, if t he operat ing syst em does not allow pages being freed t o be
reallocat ed for anot her purpose unt il aft er t he required invalidat ions, writ es t o
t hose pages by errant soft ware will not unexpect edly modify memory t hat is in
use.
Soft ware must be prepared t o deal wit h reads, inst ruct ion fet ches, and prefet ch
request s t o t he affect ed linear- address range t hat are a result of speculat ive
execut ion t hat would never act ually occur in t he execut ed code pat h.
When mult iple logical processors are using t he same linear- address space at t he
same t ime, t hey must coordinat e before any request t o modify t he paging- st ruct ure
ent ries t hat cont rol t hat linear- address space. I n t hese cases, t he barrier in t he TLB
shoot down rout ine may not be required. For example, when freeing a range of linear
addresses, some ot her mechanism can assure no logical processor is using t hat
range before t he request t o free it is made. I n t his case, a logical processor freeing
t he range can clear t he P flags in t he PTEs associat ed wit h t he range, free t he phys-
ical page frames associat ed wit h t he range, and t hen signal t he ot her logical proces-
sors using t hat linear- address space t o perform t he necessary invalidat ions. All t he
affect ed logical processors must complet e t heir invalidat ions before t he linear-
address range and t he physical page frames previously associat ed wit h t hat range
can be reallocat ed.
4.11 INTERACTIONS WITH VIRTUAL-MACHINE
EXTENSIONS (VMX)
The archit ect ure for virt ual- machine ext ensions ( VMX) includes feat ures t hat int eract
wit h paging. Sect ion 4. 11. 1 discusses ways in which VMX- specific cont rol t ransfers,
called VMX t ransit ions specially affect paging. Sect ion 4. 11. 2 gives an overview of
VMX feat ures specifically designed t o support address t ranslat ion.
4.11.1 VMX Transitions
The VMX archit ect ure defines t wo cont rol t ransfers called VM ent r i es and VM ex i t s;
collect ively, t hese are called VMX t r ansi t i ons. VM ent ries and VM exit s are
described in det ail in Chapt er 23 and Chapt er 24, respect ively, in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B. The following it ems
ident ify paging- relat ed det ails:
VMX t ransit ions modify t he CR0 and CR4 regist ers and t he I A32_EFER MSR
concurrent ly. For t his reason, t hey allow t ransit ions bet ween paging modes t hat
would not ot herwise be possible:
4-58 Vol. 3
PAGING
VM ent ries allow t ransit ions from I A- 32e paging direct ly t o eit her 32- bit
paging or PAE paging.
VM exit s allow t ransit ions from eit her 32- bit paging or PAE paging direct ly t o
I A- 32e paging.
VMX t ransit ions t hat result in PAE paging load t he PDPTE regist ers ( see Sect ion
4.4.1) as follows:
VM ent ries load t he PDPTE regist ers eit her from t he physical address being
loaded int o CR3 or from t he virt ual- machine cont rol st ruct ure ( VMCS) ; see
Sect ion 23.3. 2. 4.
VM exit s load t he PDPTE regist ers from t he physical address being loaded int o
CR3; see Sect ion 24. 5. 4.
VMX t ransit ions invalidat e t he TLBs and paging- st ruct ure caches based on cert ain
cont rol set t ings. See Sect ion 23. 3. 2. 5 and Sect ion 24. 5. 5 in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B.
4.11.2 VMX Support for Address Translation
Chapt er 25, VMX Support for Address Translat ion, in t he I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 3B describe t wo feat ures of t he
virt ual- machine ext ensions ( VMX) t hat int eract direct ly wit h paging. These are
vi r t ual - pr ocessor i dent i f i er s ( VPI Ds) and t he ex t ended page t abl e mechanism
( EPT) .
VPI Ds provide a way for soft ware t o ident ify t o t he processor t he address spaces for
different virt ual processors. The processor may use t his ident ificat ion t o maint ain
concurrent ly informat ion for mult iple address spaces in it s TLBs and paging- st ruct ure
caches, even when non- zero PCI Ds are not being used. See Sect ion 25. 1 for det ails.
When EPT is in use, t he addresses in t he paging- st ruct ures are not used as physical
addresses t o access memory and memory- mapped I / O. I nst ead, t hey are t reat ed as
guest - phy si cal addresses and are t ranslat ed t hrough a set of EPT paging st ruct ures
t o produce physical addresses. EPT can also specify it s own access right s and
memory t yping; t hese are used on conj unct ion wit h t hose specified in t his chapt er.
See Sect ion 25. 2 for more informat ion.
Bot h VPI Ds and EPT may change t he way t hat a processor maint ains informat ion in
TLBs and paging st ruct ure caches and t he ways in which soft ware can manage t hat
informat ion. Some of t he behaviors document ed in Sect ion 4. 10 may change. See
Sect ion 25. 3 for det ails.
4.12 USING PAGING FOR VIRTUAL MEMORY
Wit h paging, port ions of t he linear- address space need not be mapped t o t he phys-
ical- address space; dat a for t he unmapped addresses can be st ored ext ernally ( e. g.,
Vol. 3 4-59
PAGING
on disk) . This met hod of mapping t he linear- address space is referred t o as virt ual
memory or demand- paged virt ual memory.
Paging divides t he linear address space int o fixed- size pages t hat can be mapped int o
t he physical- address space and/ or ext ernal st orage. When a program ( or t ask) refer-
ences a linear address, t he processor uses paging t o t ranslat e t he linear address int o
a corresponding physical address if such an address is defined.
I f t he page cont aining t he linear address is not current ly mapped int o t he physical-
address space, t he processor generat es a page- fault except ion as described in
Sect ion 4. 7. The handler for page- fault except ions t ypically direct s t he operat ing
syst em or execut ive t o load dat a for t he unmapped page from ext ernal st orage int o
physical memory ( perhaps writ ing a different page from physical memory out t o
ext ernal st orage in t he process) and t o map it using paging ( by updat ing t he paging
st ruct ures) . When t he page has been loaded int o physical memory, a ret urn from t he
except ion handler causes t he inst ruct ion t hat generat ed t he except ion t o be
rest art ed.
Paging differs from segment at ion t hrough it s use of fixed- size pages. Unlike
segment s, which usually are t he same size as t he code or dat a st ruct ures t hey hold,
pages have a fixed size. I f segment at ion is t he only form of address t ranslat ion used,
a dat a st ruct ure present in physical memory will have all of it s part s in memory. I f
paging is used, a dat a st ruct ure can be part ly in memory and part ly in disk st orage.
4.13 MAPPING SEGMENTS TO PAGES
The segment at ion and paging mechanisms provide in t he support a wide variet y of
approaches t o memory management . When segment at ion and paging are combined,
segment s can be mapped t o pages in several ways. To implement a flat ( unseg-
ment ed) addressing environment , for example, all t he code, dat a, and st ack modules
can be mapped t o one or more large segment s ( up t o 4- GByt es) t hat share same
range of linear addresses ( see Figure 3- 2 in Sect ion 3. 2. 2) . Here, segment s are
essent ially invisible t o applicat ions and t he operat ing- syst em or execut ive. I f paging
is used, t he paging mechanism can map a single linear- address space ( cont ained in
a single segment ) int o virt ual memory. Alt ernat ively, each program ( or t ask) can
have it s own large linear- address space ( cont ained in it s own segment ) , which is
mapped int o virt ual memory t hrough it s own paging st ruct ures.
Segment s can be smaller t han t he size of a page. I f one of t hese segment s is placed
in a page which is not shared wit h anot her segment , t he ext ra memory is wast ed. For
example, a small dat a st ruct ure, such as a 1- Byt e semaphore, occupies 4 KByt es if it
is placed in a page by it self. I f many semaphores are used, it is more efficient t o pack
t hem int o a single page.
The I nt el- 64 and I A- 32 archit ect ures do not enforce correspondence bet ween t he
boundaries of pages and segment s. A page can cont ain t he end of one segment and
t he beginning of anot her. Similarly, a segment can cont ain t he end of one page and
t he beginning of anot her.
4-60 Vol. 3
PAGING
Memory- management soft ware may be simpler and more efficient if it enforces some
alignment bet ween page and segment boundaries. For example, if a segment which
can fit in one page is placed in t wo pages, t here may be t wice as much paging over-
head t o support access t o t hat segment .
One approach t o combining paging and segment at ion t hat simplifies memory-
management soft ware is t o give each segment it s own page t able, as shown in
Figure 4- 13. This convent ion gives t he segment a single ent ry in t he page direct ory,
and t his ent ry provides t he access cont rol informat ion for paging t he ent ire segment .
Figure 4-13. Memory Management Convention That Assigns a Page Table
to Each Segment
Seg. Descript.
LDT
Seg. Descript.
PDE
Page Directory
PDE
PTE
PTE
PTE
PTE
PTE
Page Tables
Page Frames
Vol. 3 5-1
CHAPTER 5
PROTECTION
I n prot ect ed mode, t he I nt el 64 and I A- 32 archit ect ures provide a prot ect ion mecha-
nism t hat operat es at bot h t he segment level and t he page level. This prot ect ion
mechanism provides t he abilit y t o limit access t o cert ain segment s or pages based on
privilege levels ( four privilege levels for segment s and t wo privilege levels for pages) .
For example, crit ical operat ing- syst em code and dat a can be prot ect ed by placing
t hem in more privileged segment s t han t hose t hat cont ain applicat ions code. The
processor s prot ect ion mechanism will t hen prevent applicat ion code from accessing
t he operat ing- syst em code and dat a in any but a cont rolled, defined manner.
Segment and page prot ect ion can be used at all st ages of soft ware development t o
assist in localizing and det ect ing design problems and bugs. I t can also be incorpo-
rat ed int o end- product s t o offer added robust ness t o operat ing syst ems, ut ilit ies soft -
ware, and applicat ions soft ware.
When t he prot ect ion mechanism is used, each memory reference is checked t o verify
t hat it sat isfies various prot ect ion checks. All checks are made before t he memory
cycle is st art ed; any violat ion result s in an except ion. Because checks are performed
in parallel wit h address t ranslat ion, t here is no performance penalt y. The prot ect ion
checks t hat are performed fall int o t he following cat egories:
Limit checks.
Type checks.
Privilege level checks.
Rest rict ion of addressable domain.
Rest rict ion of procedure ent ry- point s.
Rest rict ion of inst ruct ion set .
All prot ect ion violat ion result s in an except ion being generat ed. See Chapt er 6,
I nt errupt and Except ion Handling, for an explanat ion of t he except ion mechanism.
This chapt er describes t he prot ect ion mechanism and t he violat ions which lead t o
except ions.
The following sect ions describe t he prot ect ion mechanism available in prot ect ed
mode. See Chapt er 17, 8086 Emulat ion, for informat ion on prot ect ion in real-
address and virt ual- 8086 mode.
5.1 ENABLING AND DISABLING SEGMENT AND PAGE
PROTECTION
Set t ing t he PE flag in regist er CR0 causes t he processor t o swit ch t o prot ect ed mode,
which in t urn enables t he segment - prot ect ion mechanism. Once in prot ect ed mode,
5-2 Vol. 3
PROTECTION
t here is no cont rol bit for t urning t he prot ect ion mechanism on or off. The part of t he
segment - prot ect ion mechanism t hat is based on pr ivilege levels can essent ially be
disabled while st ill in pr ot ect ed mode by assigning a pr ivilege level of 0 ( most privi-
leged) t o all segment select ors and segment descript ors. This act ion disables t he
privilege level prot ect ion barriers bet ween segment s, but ot her prot ect ion checks
such as limit checking and t ype checking are st ill carried out .
Page- level prot ect ion is aut omat ically enabled when paging is enabled ( by set t ing t he
PG flag in regist er CR0) . Here again t here is no mode bit for t urning off page- level
prot ect ion once paging is enabled. However, page- level prot ect ion can be disabled by
performing t he following operat ions:
Clear t he WP flag in cont rol regist er CR0.
Set t he read/ writ e ( R/ W) and user/ supervisor ( U/ S) flags for each page- direct ory
and page- t able ent ry.
This act ion makes each page a writ able, user page, which in effect disables page-
level prot ect ion.
5.2 FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION
The processor s prot ect ion mechanism uses t he following fields and flags in t he
syst em dat a st ruct ures t o cont rol access t o segment s and pages:
Descr i pt or t ype ( S) f l ag ( Bit 12 in t he second doubleword of a segment
descript or. ) Det ermines if t he segment descript or is for a syst em segment or a
code or dat a segment .
Ty pe f i el d ( Bit s 8 t hrough 11 in t he second doubleword of a segment
descript or. ) Det ermines t he t ype of code, dat a, or syst em segment .
Li mi t f i el d ( Bit s 0 t hrough 15 of t he first doubleword and bit s 16 t hrough 19
of t he second doubleword of a segment descript or. ) Det ermines t he size of t he
segment , along wit h t he G flag and E flag ( for dat a segment s) .
G f l ag ( Bit 23 in t he second doubleword of a segment descript or. ) Det ermines
t he size of t he segment , along wit h t he limit field and E flag ( for dat a segment s) .
E f l ag ( Bit 10 in t he second doubleword of a dat a- segment descript or.)
Det ermines t he size of t he segment , along wit h t he limit field and G flag.
Descr i pt or pr i vi l ege l ev el ( DPL) f i el d ( Bit s 13 and 14 in t he second
doubleword of a segment descript or.) Det ermines t he privilege level of t he
segment .
Request ed pr i vi l ege l ev el ( RPL) f i el d ( Bit s 0 and 1 of any segment
select or. ) Specifies t he request ed privilege level of a segment select or.
Cur r ent pr i vi l ege l evel ( CPL) f i el d ( Bit s 0 and 1 of t he CS segment
regist er. ) I ndicat es t he privilege level of t he current ly execut ing program or
Vol. 3 5-3
PROTECTION
procedure. The t erm current privilege level ( CPL) refers t o t he set t ing of t his
field.
User / super vi sor ( U/ S) f l ag ( Bit 2 of paging- st ruct ure ent ries. ) Det ermines
t he t ype of page: user or supervisor.
Read/ w r i t e ( R/ W) f l ag ( Bit 1 of paging- st ruct ure ent ries. ) Det ermines t he
t ype of access allowed t o a page: read- only or read/ writ e.
Ex ecut e- di sabl e ( XD) f l ag ( Bit 63 of cert ain paging- st ruct ure ent ries. )
Det ermines t he t ype of access allowed t o a page: execut able or not - execut able.
Figure 5- 1 shows t he locat ion of t he various fields and flags in t he dat a, code, and
syst em- segment descript ors; Figure 3- 6 shows t he locat ion of t he RPL ( or CPL) field
in a segment select or ( or t he CS regist er) ; and Chapt er 4 ident ifies t he locat ions of
t he U/ S, R/ W, and XD flags in t he paging- st ruct ure ent ries.
5-4 Vol. 3
PROTECTION
Many different st yles of prot ect ion schemes can be implement ed wit h t hese fields
and flags. When t he operat ing syst em creat es a descript or, it places values in t hese
fields and flags in keeping wit h t he part icular prot ect ion st yle chosen for an operat ing
syst em or execut ive. Applicat ion program do not generally access or modify t hese
fields and flags.
Figure 5-1. Descriptor Fields Used for Protection
Base 23:16
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
1
0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
A
V
L
Limit
19:16
B
A W E 0
Data-Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
1
0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
A
V
L
Limit
19:16
D
A R C 1
Code-Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type 0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Limit
19:16
System-Segment Descriptor
A
B
C
D
DPL
Accessed
Big
Conforming
Default
Descriptor Privilege Level
Reserved
E
G
R
LIMIT
W
P
Expansion Direction
Granularity
Readable
Segment Limit
Writable
Present
0
AVL Available to Sys. Programmers
Vol. 3 5-5
PROTECTION
The following sect ions describe how t he processor uses t hese fields and flags t o
perform t he various cat egories of checks described in t he int roduct ion t o t his chapt er.
5.2.1 Code Segment Descriptor in 64-bit Mode
Code segment s cont inue t o exist in 64- bit mode even t hough, for address calcula-
t ions, t he segment base is t reat ed as zero. Some code- segment ( CS) descript or
cont ent ( t he base address and limit fields) is ignored; t he remaining fields funct ion
normally ( except for t he readable bit in t he t ype field) .
Code segment descript ors and select ors are needed in I A- 32e mode t o est ablish t he
processor s operat ing mode and execut ion privilege- level. The usage is as follows:
I A- 32e mode uses a previously unused bit in t he CS descript or. Bit 53 is defined
as t he 64- bit ( L) flag and is used t o select bet ween 64- bit mode and compat ibilit y
mode when I A- 32e mode is act ive ( I A32_EFER. LMA = 1) . See Figure 5- 2.
I f CS. L = 0 and I A- 32e mode is act ive, t he processor is running in compat i-
bilit y mode. I n t his case, CS. D select s t he default size for dat a and addresses.
I f CS. D = 0, t he default dat a and address size is 16 bit s. I f CS. D = 1, t he
default dat a and address size is 32 bit s.
I f CS. L = 1 and I A- 32e mode is act ive, t he only valid set t ing is CS. D = 0. This
set t ing indicat es a default operand size of 32 bit s and a default address size
of 64 bit s. The CS. L = 1 and CS. D = 1 bit combinat ion is reserved for fut ure
use and a # GP fault will be generat ed on an at t empt t o use a code segment
wit h t hese bit s set in I A- 32e mode.
I n I A- 32e mode, t he CS descript or s DPL is used for execut ion privilege checks
( as in legacy 32- bit mode) .
5-6 Vol. 3
PROTECTION
5.3 LIMIT CHECKING
The limit field of a segment descript or prevent s programs or procedures from
addressing memory locat ions out side t he segment . The effect ive value of t he limit
depends on t he set t ing of t he G ( granularit y) flag ( see Figure 5- 1) . For dat a
segment s, t he limit also depends on t he E ( expansion direct ion) flag and t he B
( default st ack point er size and/ or upper bound) flag. The E flag is one of t he bit s in
t he t ype field when t he segment descript or is for a dat a- segment t ype.
When t he G flag is clear ( byt e granularit y) , t he effect ive limit is t he value of t he
20- bit limit field in t he segment descript or. Here, t he limit ranges from 0 t o FFFFFH
( 1 MByt e) . When t he G flag is set ( 4- KByt e page granularit y) , t he processor scales
t he value in t he limit field by a fact or of 2
12
( 4 KByt es) . I n t his case, t he effect ive
limit ranges from FFFH ( 4 KByt es) t o FFFFFFFFH ( 4 GByt es) . Not e t hat when scaling
is used ( G flag is set ) , t he lower 12 bit s of a segment offset ( address) are not checked
against t he limit ; for example, not e t hat if t he segment limit is 0, offset s 0 t hrough
FFFH are st ill valid.
For all t ypes of segment s except expand- down dat a segment s, t he effect ive limit is
t he last address t hat is allowed t o be accessed in t he segment , which is one less t han
t he size, in byt es, of t he segment . The processor causes a general- prot ect ion excep-
t ion ( or, if t he segment is SS, a st ack- fault except ion) any t ime an at t empt is made
t o access t he following addresses in a segment :
A byt e at an offset great er t han t he effect ive limit
A word at an offset great er t han t he ( effect ive- limit 1)
Figure 5-2. Descriptor Fields with Flags used in IA-32e Mode
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P G
D
P
L
Type
1
L
4
0
0
A
V
L
D
A R C 1
Code-Segment Descriptor
31
A
C
D
DPL
Accessed
Conforming
Default
Descriptor Privilege Level
G
R
Granularity
Readable
AVL Available to Sys. Programmers
L 64-Bit Flag
P Present
Vol. 3 5-7
PROTECTION
A doubleword at an offset great er t han t he ( effect ive- limit 3)
A quadword at an offset great er t han t he ( effect ive- limit 7)
A double quadword at an offset great er t han t he ( effect ive limit 15)
When t he effect ive limit is FFFFFFFFH ( 4 GByt es) , t hese accesses may or may not
cause t he indicat ed except ions. Behavior is implement at ion- specific and may vary
from one execut ion t o anot her.
For expand- down dat a segment s, t he segment limit has t he same funct ion but is
int erpret ed different ly. Here, t he effect ive limit specifies t he last address t hat is not
allowed t o be accessed wit hin t he segment ; t he range of valid offset s is from ( effec-
t ive- limit + 1) t o FFFFFFFFH if t he B flag is set and from ( effect ive- limit + 1) t o FFFFH
if t he B flag is clear. An expand- down segment has maximum size when t he segment
limit is 0.
Limit checking cat ches programming errors such as runaway code, runaway
subscript s, and invalid point er calculat ions. These errors are det ect ed when t hey
occur, so ident ificat ion of t he cause is easier. Wit hout limit checking, t hese errors
could overwrit e code or dat a in anot her segment .
I n addit ion t o checking segment limit s, t he processor also checks descript or t able
limit s. The GDTR and I DTR regist ers cont ain 16- bit limit values t hat t he processor
uses t o prevent programs from select ing a segment descript ors out side t he respec-
t ive descript or t ables. The LDTR and t ask regist ers cont ain 32- bit segment limit value
( read from t he segment descript ors for t he current LDT and TSS, respect ively) . The
processor uses t hese segment limit s t o prevent accesses beyond t he bounds of t he
current LDT and TSS. See Sect ion 3.5.1, Segment Descript or Tables, for more infor-
mat ion on t he GDT and LDT limit fields; see Sect ion 6.10, I nt errupt Descript or Table
( I DT) , for more informat ion on t he I DT limit field; and see Sect ion 7. 2. 4, Task
Regist er, for more informat ion on t he TSS segment limit field.
5.3.1 Limit Checking in 64-bit Mode
I n 64- bit mode, t he processor does not perform runt ime limit checking on code or
dat a segment s. However, t he processor does check descript or- t able limit s.
5.4 TYPE CHECKING
Segment descript ors cont ain t ype informat ion in t wo places:
The S ( descript or t ype) flag.
The t ype field.
The processor uses t his informat ion t o det ect programming errors t hat result in an
at t empt t o use a segment or gat e in an incorrect or unint ended manner.
The S flag indicat es whet her a descript or is a syst em t ype or a code or dat a t ype. The
t ype field provides 4 addit ional bit s for use in defining various t ypes of code, dat a,
5-8 Vol. 3
PROTECTION
and syst em descript ors. Table 3- 1 shows t he encoding of t he t ype field for code and
dat a descript ors; Table 3- 2 shows t he encoding of t he field for syst em descript ors.
The processor examines t ype informat ion at various t imes while operat ing on
segment select ors and segment descript ors. The following list gives examples of
t ypical operat ions where t ype checking is performed ( t his list is not exhaust ive) :
When a segment sel ect or i s l oaded i nt o a segment r egi st er Cert ain
segment regist ers can cont ain only cert ain descript or t ypes, for example:
The CS regist er only can be loaded wit h a select or for a code segment .
Segment select ors for code segment s t hat are not readable or for syst em
segment s cannot be loaded int o dat a- segment regist ers ( DS, ES, FS, and
GS) .
Only segment select ors of writ able dat a segment s can be loaded int o t he SS
regist er.
When a segment select or is loaded int o t he LDTR or t ask regist er For example:
The LDTR can only be loaded wit h a select or for an LDT.
The t ask regist er can only be loaded wit h a segment select or for a TSS.
When i nst r uct i ons access segment s w hose descr i pt or s ar e al r eady
l oaded i nt o segment r egi st er s Cert ain segment s can be used by inst ruc-
t ions only in cert ain predefined ways, for example:
No inst ruct ion may writ e int o an execut able segment .
No inst ruct ion may writ e int o a dat a segment if it is not writ able.
No inst ruct ion may read an execut able segment unless t he readable flag is
set .
When an i nst r uct i on oper and cont ai ns a segment sel ect or Cert ain
inst ruct ions can access segment s or gat es of only a part icular t ype, for example:
A far CALL or far JMP inst ruct ion can only access a segment descript or for a
conforming code segment , nonconforming code segment , call gat e, t ask
gat e, or TSS.
The LLDT inst ruct ion must reference a segment descript or for an LDT.
The LTR inst ruct ion must reference a segment descript or for a TSS.
The LAR inst ruct ion must reference a segment or gat e descript or for an LDT,
TSS, call gat e, t ask gat e, code segment , or dat a segment .
The LSL inst ruct ion must reference a segment descript or for a LDT, TSS, code
segment , or dat a segment .
I DT ent ries must be int errupt , t rap, or t ask gat es.
Dur i ng cer t ai n i nt er nal oper at i ons For example:
On a far call or far j ump ( execut ed wit h a far CALL or far JMP inst ruct ion) , t he
processor det ermines t he t ype of cont rol t ransfer t o be carried out ( call or
Vol. 3 5-9
PROTECTION
j ump t o anot her code segment , a call or j ump t hrough a gat e, or a t ask
swit ch) by checking t he t ype field in t he segment ( or gat e) descript or point ed
t o by t he segment ( or gat e) select or given as an operand in t he CALL or JMP
inst ruct ion. I f t he descript or t ype is for a code segment or call gat e, a call or
j ump t o anot her code segment is indicat ed; if t he descript or t ype is for a TSS
or t ask gat e, a t ask swit ch is indicat ed.
On a call or j ump t hrough a call gat e ( or on an int errupt - or except ion- handler
call t hrough a t rap or int errupt gat e) , t he processor aut omat ically checks t hat
t he segment descript or being point ed t o by t he gat e is for a code segment .
On a call or j ump t o a new t ask t hrough a t ask gat e ( or on an int errupt - or
except ion- handler call t o a new t ask t hrough a t ask gat e) , t he processor
aut omat ically checks t hat t he segment descript or being point ed t o by t he
t ask gat e is for a TSS.
On a call or j ump t o a new t ask by a direct reference t o a TSS, t he processor
aut omat ically checks t hat t he segment descript or being point ed t o by t he
CALL or JMP inst ruct ion is for a TSS.
On ret urn from a nest ed t ask ( init iat ed by an I RET inst ruct ion) , t he processor
checks t hat t he previous t ask link field in t he current TSS point s t o a TSS.
5.4.1 Null Segment Selector Checking
At t empt ing t o load a null segment select or ( see Sect ion 3.4.2, Segment Select ors )
int o t he CS or SS segment regist er generat es a general- prot ect ion except ion ( # GP) .
A null segment select or can be loaded int o t he DS, ES, FS, or GS regist er, but any
at t empt t o access a segment t hrough one of t hese regist ers when it is loaded wit h a
null segment select or result s in a # GP except ion being generat ed. Loading unused
dat a- segment regist ers wit h a null segment select or is a useful met hod of det ect ing
accesses t o unused segment regist ers and/ or prevent ing unwant ed accesses t o dat a
segment s.
5.4.1.1 NULL Segment Checking in 64-bit Mode
I n 64- bit mode, t he processor does not perform runt ime checking on NULL segment
select ors. The processor does not cause a # GP fault when an at t empt is made t o
access memory where t he referenced segment regist er has a NULL segment select or.
5.5 PRIVILEGE LEVELS
The processor s segment - prot ect ion mechanism recognizes 4 privilege levels,
numbered from 0 t o 3. The great er numbers mean lesser privileges. Figure 5- 3
shows how t hese levels of privilege can be int erpret ed as rings of prot ect ion.
5-10 Vol. 3
PROTECTION
The cent er ( reserved for t he most privileged code, dat a, and st acks) is used for t he
segment s cont aining t he crit ical soft ware, usually t he kernel of an operat ing syst em.
Out er rings are used for less crit ical soft ware. ( Syst ems t hat use only 2 of t he 4
possible privilege levels should use levels 0 and 3. )
The processor uses privilege levels t o prevent a program or t ask operat ing at a lesser
privilege level from accessing a segment wit h a great er privilege, except under
cont rolled sit uat ions. When t he processor det ect s a privilege level violat ion, it gener-
at es a general- prot ect ion except ion ( # GP) .
To carry out privilege- level checks bet ween code segment s and dat a segment s, t he
processor recognizes t he following t hree t ypes of privilege levels:
Cur r ent pr i vi l ege l evel ( CPL) The CPL is t he privilege level of t he current ly
execut ing program or t ask. I t is st ored in bit s 0 and 1 of t he CS and SS segment
regist ers. Normally, t he CPL is equal t o t he privilege level of t he code segment
from which inst ruct ions are being fet ched. The processor changes t he CPL when
program cont rol is t ransferred t o a code segment wit h a different privilege level.
The CPL is t reat ed slight ly different ly when accessing conforming code segment s.
Conforming code segment s can be accessed from any privilege level t hat is equal
t o or numerically great er ( less privileged) t han t he DPL of t he conforming code
segment . Also, t he CPL is not changed when t he processor accesses a conforming
code segment t hat has a different privilege level t han t he CPL.
Descr i pt or pr i vi l ege l evel ( DPL) The DPL is t he privilege level of a segment
or gat e. I t is st ored in t he DPL field of t he segment or gat e descript or for t he
segment or gat e. When t he current ly execut ing code segment at t empt s t o access
a segment or gat e, t he DPL of t he segment or gat e is compared t o t he CPL and
RPL of t he segment or gat e select or ( as described lat er in t his sect ion) . The DPL
Figure 5-3. Protection Rings
Level 0
Level 1
Level 2
Level 3
Protection Rings
Operating
Operating System
Services
System
Kernel
Applications
Vol. 3 5-11
PROTECTION
is int erpret ed different ly, depending on t he t ype of segment or gat e being
accessed:
Dat a segment The DPL indicat es t he numerically highest privilege level
t hat a program or t ask can have t o be allowed t o access t he segment . For
example, if t he DPL of a dat a segment is 1, only programs running at a CPL of
0 or 1 can access t he segment .
Nonconf or mi ng code segment ( w i t hout usi ng a cal l gat e) The DPL
indicat es t he privilege level t hat a program or t ask must be at t o access t he
segment . For example, if t he DPL of a nonconforming code segment is 0, only
programs running at a CPL of 0 can access t he segment .
Cal l gat e The DPL indicat es t he numerically highest privilege level t hat t he
current ly execut ing program or t ask can be at and st ill be able t o access t he
call gat e. ( This is t he same access rule as for a dat a segment . )
Conf or mi ng code segment and nonconf or mi ng code segment
accessed t hr ough a cal l gat e The DPL indicat es t he numerically lowest
privilege level t hat a program or t ask can have t o be allowed t o access t he
segment . For example, if t he DPL of a conforming code segment is 2,
programs running at a CPL of 0 or 1 cannot access t he segment .
TSS The DPL indicat es t he numerically highest privilege level t hat t he
current ly execut ing program or t ask can be at and st ill be able t o access t he
TSS. ( This is t he same access rule as for a dat a segment . )
Request ed pr i v i l ege l evel ( RPL) The RPL is an override privilege level t hat
is assigned t o segment select ors. I t is st ored in bit s 0 and 1 of t he segment
select or. The processor checks t he RPL along wit h t he CPL t o det ermine if access
t o a segment is allowed. Even if t he program or t ask request ing access t o a
segment has sufficient privilege t o access t he segment , access is denied if t he
RPL is not of sufficient privilege level. That is, if t he RPL of a segment select or is
numerically great er t han t he CPL, t he RPL overrides t he CPL, and vice versa. The
RPL can be used t o insure t hat privileged code does not access a segment on
behalf of an applicat ion program unless t he program it self has access privileges
for t hat segment . See Sect ion 5. 10. 4, Checking Caller Access Privileges ( ARPL
I nst ruct ion) , for a det ailed descript ion of t he purpose and t ypical use of t he RPL.
Privilege levels are checked when t he segment select or of a segment descript or is
loaded int o a segment regist er. The checks used for dat a access differ from t hose
used for t ransfers of program cont rol among code segment s; t herefore, t he t wo
kinds of accesses are considered separat ely in t he following sect ions.
5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA
SEGMENTS
To access operands in a dat a segment , t he segment select or for t he dat a segment
must be loaded int o t he dat a- segment regist ers ( DS, ES, FS, or GS) or int o t he st ack-
5-12 Vol. 3
PROTECTION
segment regist er ( SS) . ( Segment regist ers can be loaded wit h t he MOV, POP, LDS,
LES, LFS, LGS, and LSS inst ruct ions. ) Before t he processor loads a segment select or
int o a segment regist er, it performs a privilege check ( see Figure 5- 4) by comparing
t he privilege levels of t he current ly running program or t ask ( t he CPL) , t he RPL of t he
segment select or, and t he DPL of t he segment s segment descript or. The processor
loads t he segment select or int o t he segment regist er if t he DPL is numerically great er
t han or equal t o bot h t he CPL and t he RPL. Ot herwise, a general- prot ect ion fault is
generat ed and t he segment regist er is not loaded.
Figure 5- 5 shows four procedures ( locat ed in codes segment s A, B, C, and D) , each
running at different privilege levels and each at t empt ing t o access t he same dat a
segment .
1. The procedure in code segment A is able t o access dat a segment E using segment
select or E1, because t he CPL of code segment A and t he RPL of segment select or
E1 are equal t o t he DPL of dat a segment E.
2. The procedure in code segment B is able t o access dat a segment E using segment
select or E2, because t he CPL of code segment B and t he RPL of segment select or
E2 are bot h numerically lower t han ( more privileged) t han t he DPL of dat a
segment E. A code segment B procedure can also access dat a segment E using
segment select or E1.
3. The procedure in code segment C is not able t o access dat a segment E using
segment select or E3 ( dot t ed line) , because t he CPL of code segment C and t he
RPL of segment select or E3 are bot h numerically great er t han ( less privileged)
t han t he DPL of dat a segment E. Even if a code segment C procedure were t o use
segment select or E1 or E2, such t hat t he RPL would be accept able, it st ill could
not access dat a segment E because it s CPL is not privileged enough.
4. The procedure in code segment D should be able t o access dat a segment E
because code segment Ds CPL is numerically less t han t he DPL of dat a segment
Figure 5-4. Privilege Check for Data Access
CPL
RPL
DPL
Privilege
Check
Data-Segment Descriptor
CS Register
Segment Selector
For Data Segment
Vol. 3 5-13
PROTECTION
E. However, t he RPL of segment select or E3 ( which t he code segment D
procedure is using t o access dat a segment E) is numerically great er t han t he DPL
of dat a segment E, so access is not allowed. I f t he code segment D procedure
were t o use segment select or E1 or E2 t o access t he dat a segment , access would
be allowed.
As demonst rat ed in t he previous examples, t he addressable domain of a program or
t ask varies as it s CPL changes. When t he CPL is 0, dat a segment s at all privilege
levels are accessible; when t he CPL is 1, only dat a segment s at privilege levels 1
t hrough 3 are accessible; when t he CPL is 3, only dat a segment s at privilege level 3
are accessible.
The RPL of a segment select or can always override t he addressable domain of a
program or t ask. When properly used, RPLs can prevent problems caused by acci-
dent al ( or int ensional) use of segment select ors for privileged dat a segment s by less
privileged programs or procedures.
I t is import ant t o not e t hat t he RPL of a segment select or for a dat a segment is under
soft ware cont rol. For example, an applicat ion program running at a CPL of 3 can set
t he RPL for a dat a- segment select or t o 0. Wit h t he RPL set t o 0, only t he CPL checks,
not t he RPL checks, will provide prot ect ion against deliberat e, direct at t empt s t o
violat e privilege- level securit y for t he dat a segment . To prevent t hese t ypes of privi-
lege- level- check violat ions, a program or procedure can check access privileges
whenever it receives a dat a- segment select or from anot her procedure ( see Sect ion
5. 10. 4, Checking Caller Access Privileges ( ARPL I nst ruct ion) ) .
Figure 5-5. Examples of Accessing Data Segments From Various Privilege Levels
Data
Lowest Privilege
Highest Privilege
Segment E
3
2
1
0
CPL=1
CPL=3
CPL=0
DPL=2
CPL=2
Segment Sel. E3
RPL=3
Segment Sel. E1
RPL=2
Segment Sel. E2
RPL=1
Code
Segment C
Code
Segment A
Code
Segment B
Code
Segment D
5-14 Vol. 3
PROTECTION
5.6.1 Accessing Data in Code Segments
I n some inst ances it may be desirable t o access dat a st ruct ures t hat are cont ained in
a code segment . The following met hods of accessing dat a in code segment s are
possible:
Load a dat a- segment regist er wit h a segment select or for a nonconforming,
readable, code segment .
Load a dat a- segment regist er wit h a segment select or for a conforming,
readable, code segment .
Use a code- segment override prefix ( CS) t o read a readable, code segment
whose select or is already loaded in t he CS regist er.
The same rules for accessing dat a segment s apply t o met hod 1. Met hod 2 is always
valid because t he privilege level of a conforming code segment is effect ively t he
same as t he CPL, regardless of it s DPL. Met hod 3 is always valid because t he DPL of
t he code segment select ed by t he CS regist er is t he same as t he CPL.
5.7 PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS
REGISTER
Privilege level checking also occurs when t he SS regist er is loaded wit h t he segment
select or for a st ack segment . Here all privilege levels relat ed t o t he st ack segment
must mat ch t he CPL; t hat is, t he CPL, t he RPL of t he st ack- segment select or, and t he
DPL of t he st ack- segment descript or must be t he same. I f t he RPL and DPL are not
equal t o t he CPL, a general- prot ect ion except ion ( # GP) is generat ed.
5.8 PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING
PROGRAM CONTROL BETWEEN CODE SEGMENTS
To t ransfer program cont rol from one code segment t o anot her, t he segment select or
for t he dest inat ion code segment must be loaded int o t he code- segment regist er
( CS) . As part of t his loading process, t he processor examines t he segment descript or
for t he dest inat ion code segment and performs various limit , t ype, and privilege
checks. I f t hese checks are successful, t he CS regist er is loaded, program cont rol is
t ransferred t o t he new code segment , and program execut ion begins at t he inst ruc-
t ion point ed t o by t he EI P regist er.
Program cont rol t ransfers are carried out wit h t he JMP, CALL, RET, SYSENTER,
SYSEXI T, I NT n, and I RET inst ruct ions, as well as by t he except ion and int errupt
mechanisms. Except ions, int errupt s, and t he I RET inst ruct ion are special cases
discussed in Chapt er 6, I nt errupt and Except ion Handling. This chapt er discusses
only t he JMP, CALL, RET, SYSENTER, and SYSEXI T inst ruct ions.
A JMP or CALL inst ruct ion can reference anot her code segment in any of four ways:
Vol. 3 5-15
PROTECTION
The t arget operand cont ains t he segment select or for t he t arget code segment .
The t arget operand point s t o a call- gat e descript or, which cont ains t he segment
select or for t he t arget code segment .
The t arget operand point s t o a TSS, which cont ains t he segment select or for t he
t arget code segment .
The t arget operand point s t o a t ask gat e, which point s t o a TSS, which in t urn
cont ains t he segment select or for t he t arget code segment .
The following sect ions describe first t wo t ypes of references. See Sect ion 7.3, Task
Swit ching, for informat ion on t ransferring program cont rol t hrough a t ask gat e
and/ or TSS.
The SYSENTER and SYSEXI T inst ruct ions are special inst ruct ions for making fast calls
t o and ret urns from operat ing syst em or execut ive procedures. These inst ruct ions
are discussed briefly in Sect ion 5. 8. 7, Performing Fast Calls t o Syst em Procedures
wit h t he SYSENTER and SYSEXI T I nst ruct ions.
5.8.1 Direct Calls or Jumps to Code Segments
The near forms of t he JMP, CALL, and RET inst ruct ions t ransfer program cont rol
wit hin t he current code segment , so privilege- level checks are not performed. The far
forms of t he JMP, CALL, and RET inst ruct ions t ransfer cont rol t o ot her code segment s,
so t he processor does perform privilege- level checks.
When t ransferring program cont rol t o anot her code segment wit hout going t hrough a
call gat e, t he processor examines four kinds of privilege level and t ype informat ion
( see Figure 5- 6) :
The CPL. ( Here, t he CPL is t he privilege level of t he calling code segment ; t hat is,
t he code segment t hat cont ains t he procedure t hat is making t he call or j ump. )
Figure 5-6. Privilege Check for Control Transfer Without Using a Gate
CPL
RPL
DPL
Privilege
Check
CS Register
Segment Selector
For Code Segment
Destination Code
Segment Descriptor
C
5-16 Vol. 3
PROTECTION
The DPL of t he segment descript or for t he dest inat ion code segment t hat
cont ains t he called procedure.
The RPL of t he segment select or of t he dest inat ion code segment .
The conforming ( C) flag in t he segment descript or for t he dest inat ion code
segment , which det ermines whet her t he segment is a conforming ( C flag is set )
or nonconforming ( C flag is clear) code segment . See Sect ion 3.4. 5. 1, Code-
and Dat a- Segment Descript or Types, for more informat ion about t his flag.
The rules t hat t he processor uses t o check t he CPL, RPL, and DPL depends on t he
set t ing of t he C flag, as described in t he following sect ions.
5.8.1.1 Accessing Nonconforming Code Segments
When accessing nonconforming code segment s, t he CPL of t he calling procedure
must be equal t o t he DPL of t he dest inat ion code segment ; ot herwise, t he processor
generat es a general- prot ect ion except ion ( # GP) . For example in Figure 5- 7:
Code segment C is a nonconforming code segment . A procedure in code segment
A can call a procedure in code segment C ( using segment select or C1) because
t hey are at t he same privilege level ( CPL of code segment A is equal t o t he DPL of
code segment C) .
A procedure in code segment B cannot call a procedure in code segment C ( using
segment select or C2 or C1) because t he t wo code segment s are at different
privilege levels.
Vol. 3 5-17
PROTECTION
The RPL of t he segment select or t hat point s t o a nonconforming code segment has a
limit ed effect on t he privilege check. The RPL must be numerically less t han or equal
t o t he CPL of t he calling procedure for a successful cont rol t ransfer t o occur. So, in t he
example in Figure 5- 7, t he RPLs of segment select ors C1 and C2 could legally be set
t o 0, 1, or 2, but not t o 3.
When t he segment select or of a nonconforming code segment is loaded int o t he CS
regist er, t he privilege level field is not changed; t hat is, it remains at t he CPL ( which
is t he privilege level of t he calling procedure) . This is t rue, even if t he RPL of t he
segment select or is different from t he CPL.
5.8.1.2 Accessing Conforming Code Segments
When accessing conforming code segment s, t he CPL of t he calling procedure may be
numerically equal t o or great er t han ( less privileged) t he DPL of t he dest inat ion code
segment ; t he processor generat es a general- prot ect ion except ion ( # GP) only if t he
CPL is less t han t he DPL. ( The segment select or RPL for t he dest inat ion code segment
is not checked if t he segment is a conforming code segment . )
Figure 5-7. Examples of Accessing Conforming and Nonconforming Code Segments
From Various Privilege Levels
Code
Segment D
Code
Segment C
Code
Segment A
Lowest Privilege
Highest Privilege
CPL=3
Code
Segment B
Nonconforming
Code Segment
Conforming
Code Segment
3
2
1
0
CPL=2
DPL=2
DPL=1
Segment Sel. D1
RPL=2
Segment Sel. D2
RPL=3
Segment Sel. C2
RPL=3
Segment Sel. C1
RPL=2
5-18 Vol. 3
PROTECTION
I n t he example in Figure 5- 7, code segment D is a conforming code segment . There-
fore, calling procedures in bot h code segment A and B can access code segment D
( using eit her segment select or D1 or D2, respect ively) , because t hey bot h have CPLs
t hat are great er t han or equal t o t he DPL of t he conforming code segment . For
conf or mi ng code segment s, t he DPL r epr esent s t he numer i cal l y l ow est pr i v-
i l ege l ev el t hat a cal l i ng pr ocedur e may be at t o successf ul l y mak e a cal l t o
t he code segment .
( Not e t hat segment s select ors D1 and D2 are ident ical except for t heir respect ive
RPLs. But since RPLs are not checked when accessing conforming code segment s,
t he t wo segment select ors are essent ially int erchangeable.)
When program cont rol is t ransferred t o a conforming code segment , t he CPL does not
change, even if t he DPL of t he dest inat ion code segment is less t han t he CPL. This
sit uat ion is t he only one where t he CPL may be different from t he DPL of t he current
code segment . Also, since t he CPL does not change, no st ack swit ch occurs.
Conforming segment s are used for code modules such as mat h libraries and excep-
t ion handlers, which support applicat ions but do not require access t o prot ect ed
syst em facilit ies. These modules are part of t he operat ing syst em or execut ive soft -
ware, but t hey can be execut ed at numerically higher privilege levels ( less privileged
levels) . Keeping t he CPL at t he level of a calling code segment when swit ching t o a
conforming code segment prevent s an applicat ion program from accessing noncon-
forming code segment s while at t he privilege level ( DPL) of a conforming code
segment and t hus prevent s it from accessing more privileged dat a.
Most code segment s are nonconforming. For t hese segment s, program cont rol can
be t ransferred only t o code segment s at t he same level of privilege, unless t he
t ransfer is carried out t hrough a call gat e, as described in t he following sect ions.
5.8.2 Gate Descriptors
To provide cont rolled access t o code segment s wit h different privilege levels, t he
processor provides special set of descript ors called gat e descript ors. There are four
kinds of gat e descript ors:
Call gat es
Trap gat es
I nt errupt gat es
Task gat es
Task gat es are used for t ask swit ching and are discussed in Chapt er 7, Task Manage-
ment . Trap and int errupt gat es are special kinds of call gat es used for calling excep-
t ion and int errupt handlers. The are described in Chapt er 6, I nt errupt and Except ion
Handling. This chapt er is concerned only wit h call gat es.
Vol. 3 5-19
PROTECTION
5.8.3 Call Gates
Call gat es facilit at e cont rolled t ransfers of program cont rol bet ween different privi-
lege levels. They are t ypically used only in operat ing syst ems or execut ives t hat use
t he privilege- level prot ect ion mechanism. Call gat es are also useful for t ransferring
program cont rol bet ween 16- bit and 32- bit code segment s, as described in Sect ion
18. 4, Transferring Cont rol Among Mixed- Size Code Segment s.
Figure 5- 8 shows t he format of a call- gat e descript or. A call- gat e descript or may
reside in t he GDT or in an LDT, but not in t he int errupt descript or t able ( I DT) . I t
performs six funct ions:
I t specifies t he code segment t o be accessed.
I t defines an ent ry point for a procedure in t he specified code segment .
I t specifies t he privilege level required for a caller t rying t o access t he procedure.
I f a st ack swit ch occurs, it specifies t he number of opt ional paramet ers t o be
copied bet ween st acks.
I t defines t he size of values t o be pushed ont o t he t arget st ack: 16- bit gat es force
16- bit pushes and 32- bit gat es force 32- bit pushes.
I t specifies whet her t he call- gat e descript or is valid.
The segment select or field in a call gat e specifies t he code segment t o be accessed.
The offset field specifies t he ent ry point in t he code segment . This ent ry point is
generally t o t he first inst ruct ion of a specific procedure. The DPL field indicat es t he
privilege level of t he call gat e, which in t urn is t he privilege level required t o access
t he select ed procedure t hrough t he gat e. The P flag indicat es whet her t he call- gat e
descript or is valid. ( The presence of t he code segment t o which t he gat e point s is
indicat ed by t he P flag in t he code segment s descript or.) The paramet er count field
indicat es t he number of paramet ers t o copy from t he calling procedures st ack t o t he
new st ack if a st ack swit ch occurs ( see Sect ion 5. 8.5, St ack Swit ching ) . The param-
et er count specifies t he number of words for 16- bit call gat es and doublewords for
32- bit call gat es.
Figure 5-8. Call-Gate Descriptor
31 16 15 13 14 12 11 8 7 0
P Offset in Segment 31:16
D
P
L
Type
0
4
31 16 15 0
Segment Selector Offset in Segment 15:00 0
Param.
0 0 1 1
P
DPL
Gate Valid
Descriptor Privilege Level
Count
4 5 6
0 0 0
5-20 Vol. 3
PROTECTION
Not e t hat t he P flag in a gat e descript or is normally always set t o 1. I f it is set t o 0, a
not present ( # NP) except ion is generat ed when a program at t empt s t o access t he
descript or. The operat ing syst em can use t he P flag for special purposes. For
example, it could be used t o t rack t he number of t imes t he gat e is used. Here, t he P
flag is init ially set t o 0 causing a t rap t o t he not - present except ion handler. The
except ion handler t hen increment s a count er and set s t he P flag t o 1, so t hat on
ret urning from t he handler, t he gat e descript or will be valid.
5.8.3.1 IA-32e Mode Call Gates
Call- gat e descript ors in 32- bit mode provide a 32- bit offset for t he inst ruct ion point er
( EI P) ; 64- bit ext ensions double t he size of 32- bit mode call gat es in order t o st ore
64- bit inst ruct ion point ers ( RI P) . See Figure 5- 9:
The first eight byt es ( byt es 7: 0) of a 64- bit mode call gat e are similar but not
ident ical t o legacy 32- bit mode call gat es. The paramet er- copy- count field has
been removed.
Byt es 11: 8 hold t he upper 32 bit s of t he t arget - segment offset in canonical form.
A general- prot ect ion except ion ( # GP) is generat ed if soft ware at t empt s t o use a
call gat e wit h a t arget offset t hat is not in canonical form.
16- byt e descript ors may reside in t he same descript or t able wit h 16- bit and
32- bit descript ors. A t ype field, used for consist ency checking, is defined in bit s
12: 8 of t he 64- bit descript or s highest dword ( cleared t o zero) . A general-
prot ect ion except ion ( # GP) result s if an at t empt is made t o access t he upper half
of a 64- bit mode descript or as a 32- bit mode descript or.
Vol. 3 5-21
PROTECTION
Target code segment s referenced by a 64- bit call gat e must be 64- bit code
segment s ( CS.L = 1, CS. D = 0) . I f not , t he reference generat es a general-
prot ect ion except ion, # GP ( CS select or) .
Only 64- bit mode call gat es can be referenced in I A- 32e mode ( 64- bit mode and
compat ibilit y mode) . The legacy 32- bit mode call gat e t ype ( 0CH) is redefined in
I A- 32e mode as a 64- bit call- gat e t ype; no 32- bit call- gat e t ype exist s in I A- 32e
mode.
I f a far call references a 16- bit call gat e t ype ( 04H) in I A- 32e mode, a general-
prot ect ion except ion ( # GP) is generat ed.
When a call references a 64- bit mode call gat e, act ions t aken are ident ical t o t hose
t aken in 32- bit mode, wit h t he following except ions:
St ack pushes are made in eight - byt e increment s.
A 64- bit RI P is pushed ont o t he st ack.
Paramet er copying is not performed.
Use a mat ching far- ret urn inst ruct ion size for correct operat ion ( ret urns from 64- bit
calls must be performed wit h a 64- bit operand- size ret urn t o process t he st ack
correct ly) .
Figure 5-9. Call-Gate Descriptor in IA-32e Mode
31 8 7 0
P Offset in Segment 31:16
D
P
L
Type
0
4
31 16 15 0
Segment Selector Offset in Segment 15:00 0
.
0 0 1 1
P
DPL
Gate Valid
Descriptor Privilege Level
31 0
0
16
31 0
Offset in Segment 63:31 8
0 0 0 0
0
13 12 11 10 9 8 7
16 15 14 13 12 11
Reserved
Reserved
Type
5-22 Vol. 3
PROTECTION
5.8.4 Accessing a Code Segment Through a Call Gate
To access a call gat e, a far point er t o t he gat e is provided as a t arget operand in a
CALL or JMP inst ruct ion. The segment select or from t his point er ident ifies t he call
gat e ( see Figure 5- 10) ; t he offset from t he point er is required, but not used or
checked by t he processor. ( The offset can be set t o any value.)
When t he processor has accessed t he call gat e, it uses t he segment select or from t he
call gat e t o locat e t he segment descript or for t he dest inat ion code segment . ( This
segment descript or can be in t he GDT or t he LDT. ) I t t hen combines t he base address
from t he code- segment descript or wit h t he offset from t he call gat e t o form t he linear
address of t he procedure ent ry point in t he code segment .
As shown in Figure 5- 11, four different privilege levels are used t o check t he validit y
of a program cont rol t ransfer t hrough a call gat e:
The CPL ( current privilege level) .
The RPL ( request or' s privilege level) of t he call gat es select or.
The DPL ( descript or privilege level) of t he call gat e descript or.
The DPL of t he segment descript or of t he dest inat ion code segment .
The C flag ( conforming) in t he segment descript or for t he dest inat ion code segment
is also checked.
Figure 5-10. Call-Gate Mechanism
Offset Segment Selector
Far Pointer to Call Gate
Required but not used by processor
Call-Gate
Descriptor
Code-Segment
Descriptor
Descriptor Table
Offset
Base
Base
Offset
Base
Segment Selector
+
Procedure
Entry Point
Vol. 3 5-23
PROTECTION
The privilege checking rules are different depending on whet her t he cont rol t ransfer
was init iat ed wit h a CALL or a JMP inst ruct ion, as shown in Table 5- 1.
The DPL field of t he call- gat e descript or specifies t he numerically highest privilege
level from which a calling procedure can access t he call gat e; t hat is, t o access a call
gat e, t he CPL of a calling procedure must be equal t o or less t han t he DPL of t he call
gat e. For example, in Figure 5- 15, call gat e A has a DPL of 3. So calling procedures at
all CPLs ( 0 t hrough 3) can access t his call gat e, which includes calling procedures in
code segment s A, B, and C. Call gat e B has a DPL of 2, so only calling procedures at
a CPL or 0, 1, or 2 can access call gat e B, which includes calling procedures in code
Figure 5-11. Privilege Check for Control Transfer with Call Gate
Table 5-1. Privilege Check Rules for Call Gates
Instruction Privilege Check Rules
CALL CPL call gate DPL; RPL call gate DPL
Destination conforming code segment DPL CPL
Destination nonconforming code segment DPL CPL
JMP CPL call gate DPL; RPL call gate DPL
Destination conforming code segment DPL CPL
Destination nonconforming code segment DPL = CPL
CPL
RPL
DPL
DPL
Privilege
Check
Call Gate (Descriptor)
Destination Code-
CS Register
Call-Gate Selector
Segment Descriptor
5-24 Vol. 3
PROTECTION
segment s B and C. The dot t ed line shows t hat a calling procedure in code segment A
cannot access call gat e B.
The RPL of t he segment select or t o a call gat e must sat isfy t he same t est as t he CPL
of t he calling procedure; t hat is, t he RPL must be less t han or equal t o t he DPL of t he
call gat e. I n t he example in Figure 5- 15, a calling procedure in code segment C can
access call gat e B using gat e select or B2 or B1, but it could not use gat e select or B3
t o access call gat e B.
I f t he privilege checks bet ween t he calling procedure and call gat e are successful, t he
processor t hen checks t he DPL of t he code- segment descript or against t he CPL of t he
calling procedure. Here, t he privilege check rules vary bet ween CALL and JMP
inst ruct ions. Only CALL inst ruct ions can use call gat es t o t ransfer program cont rol t o
more privileged ( numerically lower privilege level) nonconforming code segment s;
t hat is, t o nonconforming code segment s wit h a DPL less t han t he CPL. A JMP inst ruc-
t ion can use a call gat e only t o t ransfer program cont rol t o a nonconforming code
segment wit h a DPL equal t o t he CPL. CALL and JMP inst ruct ion can bot h t ransfer
program cont rol t o a more privileged conforming code segment ; t hat is, t o a
conforming code segment wit h a DPL less t han or equal t o t he CPL.
I f a call is made t o a more privileged ( numerically lower privilege level) noncon-
forming dest inat ion code segment , t he CPL is lowered t o t he DPL of t he dest inat ion
code segment and a st ack swit ch occurs ( see Sect ion 5.8. 5, St ack Swit ching ) . I f a
call or j ump is made t o a more privileged conforming dest inat ion code segment , t he
CPL is not changed and no st ack swit ch occurs.
Vol. 3 5-25
PROTECTION
Call gat es allow a single code segment t o have procedures t hat can be accessed at
different privilege levels. For example, an operat ing syst em locat ed in a code
segment may have some services which are int ended t o be used by bot h t he oper-
at ing syst em and applicat ion soft ware ( such as procedures for handling charact er
I / O) . Call gat es for t hese procedures can be set up t hat allow access at all privilege
levels ( 0 t hrough 3) . More privileged call gat es ( wit h DPLs of 0 or 1) can t hen be set
up for ot her operat ing syst em services t hat are int ended t o be used only by t he oper-
at ing syst em ( such as procedures t hat init ialize device drivers) .
5.8.5 Stack Switching
Whenever a call gat e is used t o t ransfer program cont rol t o a more privileged
nonconforming code segment ( t hat is, when t he DPL of t he nonconforming dest ina-
t ion code segment is less t han t he CPL) , t he processor aut omat ically swit ches t o t he
st ack for t he dest inat ion code segment s privilege level. This st ack swit ching is
carried out t o prevent more privileged procedures from crashing due t o insufficient
st ack space. I t also prevent s less privileged procedures from int erfering ( by accident
or int ent ) wit h more privileged procedures t hrough a shared st ack.
Figure 5-12. Example of Accessing Call Gates At Various Privilege Levels
Code
Segment A
Stack Switch No Stack
Switch Occurs Occurs
Lowest Privilege
Highest Privilege
3
2
1
0
Call
Gate A
Code
Segment B
Call
Gate B
Code
Segment C
Code
Segment D
Code
Segment E
Nonconforming
Code Segment
Conforming
Code Segment
Gate Selector A
RPL=3
Gate Selector B1
RPL=2
Gate Selector B2
RPL=1
CPL=3
CPL=2
CPL=1
DPL=3
DPL=2
DPL=0 DPL=0
Gate Selector B3
RPL=3
5-26 Vol. 3
PROTECTION
Each t ask must define up t o 4 st acks: one for applicat ions code ( running at privilege
level 3) and one for each of t he privilege levels 2, 1, and 0 t hat are used. ( I f only t wo
privilege levels are used [ 3 and 0] , t hen only t wo st acks must be defined. ) Each of
t hese st acks is locat ed in a separat e segment and is ident ified wit h a segment
select or and an offset int o t he st ack segment ( a st ack point er) .
The segment select or and st ack point er for t he privilege level 3 st ack is locat ed in t he
SS and ESP regist ers, respect ively, when privilege- level- 3 code is being execut ed and
is aut omat ically st ored on t he called procedures st ack when a st ack swit ch occurs.
Point ers t o t he privilege level 0, 1, and 2 st acks are st ored in t he TSS for t he current ly
running t ask ( see Figure 7- 2) . Each of t hese point ers consist s of a segment select or
and a st ack point er ( loaded int o t he ESP regist er) . These init ial point ers are st rict ly
read- only values. The processor does not change t hem while t he t ask is running.
They are used only t o creat e new st acks when calls are made t o more privileged
levels ( numerically lower privilege levels) . These st acks are disposed of when a
ret urn is made from t he called procedure. The next t ime t he procedure is called, a
new st ack is creat ed using t he init ial st ack point er. ( The TSS does not specify a st ack
for privilege level 3 because t he processor does not allow a t ransfer of program
cont rol from a procedure running at a CPL of 0, 1, or 2 t o a procedure running at a
CPL of 3, except on a ret urn.)
The operat ing syst em is responsible for creat ing st acks and st ack- segment descrip-
t ors for all t he privilege levels t o be used and for loading init ial point ers for t hese
st acks int o t he TSS. Each st ack must be read/ writ e accessible ( as specified in t he
t ype field of it s segment descript or) and must cont ain enough space ( as specified in
t he limit field) t o hold t he following it ems:
The cont ent s of t he SS, ESP, CS, and EI P regist ers for t he calling procedure.
The paramet ers and t emporary variables required by t he called procedure.
The EFLAGS regist er and error code, when implicit calls are made t o an except ion
or int errupt handler.
The st ack will need t o require enough space t o cont ain many frames of t hese it ems,
because procedures oft en call ot her procedures, and an operat ing syst em may
support nest ing of mult iple int errupt s. Each st ack should be large enough t o allow for
t he worst case nest ing scenario at it s privilege level.
( I f t he operat ing syst em does not use t he processor s mult it asking mechanism, it st ill
must creat e at least one TSS for t his st ack- relat ed purpose. )
When a procedure call t hrough a call gat e result s in a change in privilege level, t he
processor performs t he following st eps t o swit ch st acks and begin execut ion of t he
called procedure at a new privilege level:
1. Uses t he DPL of t he dest inat ion code segment ( t he new CPL) t o select a point er
t o t he new st ack ( segment select or and st ack point er) from t he TSS.
2. Reads t he segment select or and st ack point er for t he st ack t o be swit ched t o from
t he current TSS. Any limit violat ions det ect ed while reading t he st ack- segment
select or, st ack point er, or st ack- segment descript or cause an invalid TSS ( # TS)
except ion t o be generat ed.
Vol. 3 5-27
PROTECTION
3. Checks t he st ack- segment descript or for t he proper privileges and t ype and
generat es an invalid TSS ( # TS) except ion if violat ions are det ect ed.
4. Temporarily saves t he current values of t he SS and ESP regist ers.
5. Loads t he segment select or and st ack point er for t he new st ack in t he SS and ESP
regist ers.
6. Pushes t he t emporarily saved values for t he SS and ESP regist ers ( for t he calling
procedure) ont o t he new st ack ( see Figure 5- 13) .
7. Copies t he number of paramet er specified in t he paramet er count field of t he call
gat e from t he calling procedures st ack t o t he new st ack. I f t he count is 0, no
paramet ers are copied.
8. Pushes t he ret urn inst ruct ion point er ( t he current cont ent s of t he CS and EI P
regist ers) ont o t he new st ack.
9. Loads t he segment select or for t he new code segment and t he new inst ruct ion
point er from t he call gat e int o t he CS and EI P regist ers, respect ively, and begins
execut ion of t he called procedure.
See t he descript ion of t he CALL inst ruct ion in Chapt er 3, I nst ruct ion Set Reference, in
t he I A- 32 I nt el Archit ect ure Soft ware Developers Manual, Volume 2, for a det ailed
descript ion of t he privilege level checks and ot her prot ect ion checks t hat t he
processor performs on a far call t hrough a call gat e.
The paramet er count field in a call gat e specifies t he number of dat a it ems ( up t o 31)
t hat t he processor should copy from t he calling procedures st ack t o t he st ack of t he
called procedure. I f more t han 31 dat a it ems need t o be passed t o t he called proce-
Figure 5-13. Stack Switching During an Interprivilege-Level Call
Parameter 1
Parameter 2
Parameter 3
Calling SS
Calling ESP
Parameter 1
Parameter 2
Parameter 3
Calling CS
Calling EIP
Called Procedures Stack
ESP
ESP
Calling Procedures Stack
5-28 Vol. 3
PROTECTION
dure, one of t he paramet ers can be a point er t o a dat a st ruct ure, or t he saved
cont ent s of t he SS and ESP regist ers may be used t o access paramet ers in t he old
st ack space. The size of t he dat a it ems passed t o t he called procedure depends on
t he call gat e size, as described in Sect ion 5. 8. 3, Call Gat es.
5.8.5.1 Stack Switching in 64-bit Mode
Alt hough prot ect ion- check rules for call gat es are unchanged from 32- bit mode,
st ack- swit ch changes in 64- bit mode are different .
When st acks are swit ched as part of a 64- bit mode privilege- level change t hrough a
call gat e, a new SS ( st ack segment ) descript or is not loaded; 64- bit mode only loads
an inner- level RSP from t he TSS. The new SS is forced t o NULL and t he SS select or s
RPL field is forced t o t he new CPL. The new SS is set t o NULL in order t o handle
nest ed far t ransfers ( CALLF, I NTn, int errupt s and except ions) . The old SS and RSP
are saved on t he new st ack.
On a subsequent RETF, t he old SS is popped from t he st ack and loaded int o t he SS
regist er. See Table 5- 2.
I n 64- bit mode, st ack operat ions result ing from a privilege- level- changing far call or
far ret urn are eight - byt es wide and change t he RSP by eight . The mode does not
support t he aut omat ic paramet er- copy feat ure found in 32- bit mode. The call- gat e
count field is ignored. Soft ware can access t he old st ack, if necessary, by referencing
t he old st ack- segment select or and st ack point er saved on t he new process st ack.
I n 64- bit mode, RETF is allowed t o load a NULL SS under cert ain condit ions. I f t he
t arget mode is 64- bit mode and t he t arget CPL< > 3, I RET allows SS t o be loaded wit h
a NULL select or. I f t he called procedure it self is int errupt ed, t he NULL SS is pushed on
t he st ack frame. On t he subsequent RETF, t he NULL SS on t he st ack act s as a flag t o
t ell t he processor not t o load a new SS descript or.
5.8.6 Returning from a Called Procedure
The RET inst ruct ion can be used t o perform a near ret urn, a far ret urn at t he same
privilege level, and a far ret urn t o a different privilege level. This inst ruct ion is
Table 5-2. 64-Bit-Mode Stack Layout After CALLF with CPL Change
32-bit Mode IA-32e mode
Old SS Selector +12 +24 Old SS Selector
Old ESP +8 +16 Old RSP
CS Selector +4 +8 Old CS Selector
EIP 0 ESP RSP 0 RIP
< 4 Bytes > < 8 Bytes >
Vol. 3 5-29
PROTECTION
int ended t o execut e ret urns from procedures t hat were called wit h a CALL inst ruc-
t ion. I t does not support ret urns from a JMP inst ruct ion, because t he JMP inst ruct ion
does not save a ret urn inst ruct ion point er on t he st ack.
A near ret urn only t ransfers program cont rol wit hin t he current code segment ; t here-
fore, t he processor performs only a limit check. When t he processor pops t he ret urn
inst ruct ion point er from t he st ack int o t he EI P regist er, it checks t hat t he point er does
not exceed t he limit of t he current code segment .
On a far ret urn at t he same privilege level, t he processor pops bot h a segment
select or for t he code segment being ret urned t o and a ret urn inst ruct ion point er from
t he st ack. Under normal condit ions, t hese point ers should be valid, because t hey
were pushed on t he st ack by t he CALL inst ruct ion. However, t he processor performs
privilege checks t o det ect sit uat ions where t he current procedure might have alt ered
t he point er or failed t o maint ain t he st ack properly.
A far ret urn t hat requires a privilege- level change is only allowed when ret urning t o a
less privileged level ( t hat is, t he DPL of t he ret urn code segment is numerically
great er t han t he CPL) . The processor uses t he RPL field from t he CS regist er value
saved for t he calling procedure ( see Figure 5- 13) t o det ermine if a ret urn t o a numer-
ically higher privilege level is required. I f t he RPL is numerically great er ( less privi-
leged) t han t he CPL, a ret urn across privilege levels occurs.
The processor performs t he following st eps when performing a far ret urn t o a calling
procedure ( see Figures 6- 2 and 6- 4 in t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 1, for an illust rat ion of t he st ack cont ent s prior t o
and aft er a ret urn) :
1. Checks t he RPL field of t he saved CS regist er value t o det ermine if a privilege
level change is required on t he ret urn.
2. Loads t he CS and EI P regist ers wit h t he values on t he called procedures st ack.
( Type and privilege level checks are performed on t he code- segment descript or
and RPL of t he code- segment select or. )
3. ( I f t he RET inst ruct ion includes a paramet er count operand and t he ret urn
requires a privilege level change. ) Adds t he paramet er count ( in byt es obt ained
from t he RET inst ruct ion) t o t he current ESP regist er value ( aft er popping t he CS
and EI P values) , t o st ep past t he paramet ers on t he called procedures st ack. The
result ing value in t he ESP regist er point s t o t he saved SS and ESP values for t he
calling procedures st ack. ( Not e t hat t he byt e count in t he RET inst ruct ion must
be chosen t o mat ch t he paramet er count in t he call gat e t hat t he calling
procedure referenced when it made t he original call mult iplied by t he size of t he
paramet ers. )
4. ( I f t he ret urn requires a privilege level change. ) Loads t he SS and ESP regist ers
wit h t he saved SS and ESP values and swit ches back t o t he calling procedures
st ack. The SS and ESP values for t he called procedures st ack are discarded. Any
limit violat ions det ect ed while loading t he st ack- segment select or or st ack
point er cause a general- prot ect ion except ion ( # GP) t o be generat ed. The new
st ack- segment descript or is also checked for t ype and privilege violat ions.
5-30 Vol. 3
PROTECTION
5. ( I f t he RET inst ruct ion includes a paramet er count operand. ) Adds t he paramet er
count ( in byt es obt ained from t he RET inst ruct ion) t o t he current ESP regist er
value, t o st ep past t he paramet ers on t he calling procedures st ack. The result ing
ESP value is not checked against t he limit of t he st ack segment . I f t he ESP value
is beyond t he limit , t hat fact is not recognized unt il t he next st ack operat ion.
6. ( I f t he ret urn requires a privilege level change. ) Checks t he cont ent s of t he DS,
ES, FS, and GS segment regist ers. I f any of t hese regist ers refer t o segment s
whose DPL is less t han t he new CPL ( excluding conforming code segment s) , t he
segment regist er is loaded wit h a null segment select or.
See t he descript ion of t he RET inst ruct ion in Chapt er 4 of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 2B, for a det ailed descript ion of
t he privilege level checks and ot her prot ect ion checks t hat t he processor performs on
a far ret urn.
5.8.7 Performing Fast Calls to System Procedures with the
SYSENTER and SYSEXIT Instructions
The SYSENTER and SYSEXI T inst ruct ions were int roduced int o t he I A- 32 archit ect ure
in t he Pent ium I I processors for t he purpose of providing a fast ( low overhead) mech-
anism for calling operat ing syst em or execut ive procedures. SYSENTER is int ended
for use by user code running at privilege level 3 t o access operat ing syst em or exec-
ut ive procedures running at privilege level 0. SYSEXI T is int ended for use by privilege
level 0 operat ing syst em or execut ive procedures for fast ret urns t o privilege level 3
user code. SYSENTER can be execut ed from privilege levels 3, 2, 1, or 0; SYSEXI T
can only be execut ed from privilege level 0.
The SYSENTER and SYSEXI T inst ruct ions are companion inst ruct ions, but t hey do not
const it ut e a call/ ret urn pair. This is because SYSENTER does not save any st at e infor-
mat ion for use by SYSEXI T on a ret urn.
The t arget inst ruct ion and st ack point er for t hese inst ruct ions are not specified
t hrough inst ruct ion operands. I nst ead, t hey are specified t hrough paramet ers
ent ered in MSRs and general- purpose regist ers.
For SYSENTER, t arget fields are generat ed using t he following sources:
Tar get code segment Reads t his from I A32_SYSENTER_CS.
Tar get i nst r uct i on Reads t his from I A32_SYSENTER_EI P.
St ack segment Comput ed by adding 8 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Reads t his from t he I A32_SYSENTER_ESP.
For SYSEXI T, t arget fields are generat ed using t he following sources:
Tar get code segment Comput ed by adding 16 t o t he value in t he
I A32_SYSENTER_CS.
Tar get i nst r uct i on Reads t his from EDX.
Vol. 3 5-31
PROTECTION
St ack segment Comput ed by adding 24 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Reads t his from ECX.
The SYSENTER and SYSEXI T inst ruct ions preform fast calls and ret urns because
t hey force t he processor int o a predefined privilege level 0 st at e when SYSENTER is
execut ed and int o a predefined privilege level 3 st at e when SYSEXI T is execut ed. By
forcing predefined and consist ent processor st at es, t he number of privilege checks
ordinarily required t o perform a far call t o anot her privilege levels are great ly
reduced. Also, by predefining t he t arget cont ext st at e in MSRs and general- purpose
regist ers eliminat es all memory accesses except when fet ching t he t arget code.
Any addit ional st at e t hat needs t o be saved t o allow a ret urn t o t he calling procedure
must be saved explicit ly by t he calling procedure or be predefined t hrough program-
ming convent ions.
5.8.7.1 SYSENTER and SYSEXIT Instructions in IA-32e Mode
For I nt el 64 processors, t he SYSENTER and SYSEXI T inst ruct ions are enhanced t o
allow fast syst em calls from user code running at privilege level 3 ( in compat ibilit y
mode or 64- bit mode) t o 64- bit execut ive procedures running at privilege level 0.
I A32_SYSENTER_EI P MSR and I A32_SYSENTER_ESP MSR are expanded t o hold
64- bit addresses. I f I A- 32e mode is inact ive, only t he lower 32- bit addresses st ored
in t hese MSRs are used. I f 64- bit mode is act ive, addresses st ored in
I A32_SYSENTER_EI P and I A32_SYSENTER_ESP must be canonical. Not e t hat , in
64- bit mode, I A32_SYSENTER_CS must not cont ain a NULL select or.
When SYSENTER t ransfers cont rol, t he following fields are generat ed and bit s set :
Tar get code segment Reads non- NULL select or from I A32_SYSENTER_CS.
New CS at t r i but es CS base = 0, CS limit = FFFFFFFFH.
Tar get i nst r uct i on Reads 64- bit canonical address from
I A32_SYSENTER_EI P.
St ack segment Comput ed by adding 8 t o t he value from
I A32_SYSENTER_CS.
St ack poi nt er Reads 64- bit canonical address from I A32_SYSENTER_ESP.
New SS at t r i but es SS base = 0, SS limit = FFFFFFFFH.
When t he SYSEXI T inst ruct ion t ransfers cont rol t o 64- bit mode user code using
REX. W, t he following fields are generat ed and bit s set :
Tar get code segment Comput ed by adding 32 t o t he value in
I A32_SYSENTER_CS.
New CS at t r i but es L- bit = 1 ( go t o 64- bit mode) .
Tar get i nst r uct i on Reads 64- bit canonical address in RDX.
St ack segment Comput ed by adding 40 t o t he value of I A32_SYSENTER_CS.
St ack poi nt er Updat e RSP using 64- bit canonical address in RCX.
5-32 Vol. 3
PROTECTION
When SYSEXI T t ransfers cont rol t o compat ibilit y mode user code when t he operand
size at t ribut e is 32 bit s, t he following fields are generat ed and bit s set :
Tar get code segment Comput ed by adding 16 t o t he value in
I A32_SYSENTER_CS.
New CS at t r i but es L- bit = 0 ( go t o compat ibilit y mode) .
Tar get i nst r uct i on Fet ch t he t arget inst ruct ion from 32- bit address in EDX.
St ack segment Comput ed by adding 24 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Updat e ESP from 32- bit address in ECX.
5.8.8 Fast System Calls in 64-bit Mode
The SYSCALL and SYSRET inst ruct ions are designed for operat ing syst ems t hat use a
flat memory model ( segment at ion is not used) . The inst ruct ions, along wit h
SYSENTER and SYSEXI T, are suit ed for I A- 32e mode operat ion. SYSCALL and
SYSRET, however, are not support ed in compat ibilit y mode. Use CPUI D t o check if
SYSCALL and SYSRET are available ( CPUI D. 80000001H. EDX[ bit 11] = 1) .
SYSCALL is int ended for use by user code running at privilege level 3 t o access oper-
at ing syst em or execut ive procedures running at privilege level 0. SYSRET is
int ended for use by privilege level 0 operat ing syst em or execut ive procedures for
fast ret urns t o privilege level 3 user code.
St ack point ers for SYSCALL/ SYSRET are not specified t hrough model specific regis-
t ers. The clearing of bit s in RFLAGS is programmable rat her t han fixed.
SYSCALL/ SYSRET save and rest ore t he RFLAGS regist er.
For SYSCALL, t he processor saves RFLAGS int o R11 and t he RI P of t he next inst ruc-
t ion int o RCX; it t hen get s t he privilege- level 0 t arget inst ruct ion and st ack point er
from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 47: 32] .
Tar get i nst r uct i on Reads a 64- bit canonical address from I A32_LSTAR.
St ack segment Comput ed by adding 8 t o t he value in I A32_STAR[ 47: 32] .
Sy st em f l ags The processor set s RFLAGS t o t he logical- AND of it s current
value wit h t he complement of t he value in t he I A32_FMASK MSR.
When SYSRET t ransfers cont rol t o 64- bit mode user code using REX. W, t he processor
get s t he privilege level 3 t arget inst ruct ion and st ack point er from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 63: 48] +
16.
Tar get i nst r uct i on Copies t he value in RCX int o RI P.
St ack segment I A32_STAR[ 63: 48] + 8.
EFLAGS Loaded from R11.
Vol. 3 5-33
PROTECTION
When SYSRET t ransfers cont rol t o 32- bit mode user code using a 32- bit operand size,
t he processor get s t he privilege level 3 t arget inst ruct ion and st ack point er from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 63: 48] .
Tar get i nst r uct i on Copies t he value in ECX int o EI P.
St ack segment I A32_STAR[ 63: 48] + 8.
EFLAGS Loaded from R11.
I t is t he responsibilit y of t he OS t o ensure t he descript ors in t he GDT/ LDT correspond
t o t he select ors loaded by SYSCALL/ SYSRET ( consist ent wit h t he base, limit , and
at t ribut e values forced by t he inst ruct ions) .
Any address writ t en t o I A32_LSTAR is first checked by WRMSR t o ensure canonical
form. I f an address is not canonical, an except ion is generat ed ( # GP) .
See Figure 5- 14 for t he layout of I A32_STAR, I A32_LSTAR and I A32_FMASK.
5.9 PRIVILEGED INSTRUCTIONS
Some of t he syst em inst ruct ions ( called privileged inst ruct ions ) are prot ect ed from
use by applicat ion programs. The privileged inst ruct ions cont rol syst em funct ions
( such as t he loading of syst em regist ers) . They can be execut ed only when t he CPL is
0 ( most privileged) . I f one of t hese inst ruct ions is execut ed when t he CPL is not 0, a
Figure 5-14. MSRs Used by SYSCALL and SYSRET
63 32 31 0
63 0
63 0
Target RIP for 64-bit Mode Calling Program
SYSRET CS and SS SYSCALL CS and SS
48 47
IA32_STAR
IA32_LSTAR
IA32_FMASK
32 31
SYSCALL EFLAGS Mask
Reserved
Reserved
5-34 Vol. 3
PROTECTION
general- prot ect ion except ion ( # GP) is generat ed. The following syst em inst ruct ions
are privileged inst ruct ions:
LGDT Load GDT regist er.
LLDT Load LDT regist er.
LTR Load t ask regist er.
LI DT Load I DT regist er.
MOV ( cont rol regist ers) Load and st ore cont rol regist ers.
LMSW Load machine st at us word.
CLTS Clear t ask- swit ched flag in regist er CR0.
MOV ( debug regist ers) Load and st ore debug regist ers.
I NVD I nvalidat e cache, wit hout writ eback.
WBI NVD I nvalidat e cache, wit h writ eback.
I NVLPG I nvalidat e TLB ent ry.
HLT Halt processor.
RDMSR Read Model- Specific Regist ers.
WRMSR Writ e Model- Specific Regist ers.
RDPMC Read Performance- Monit oring Count er.
RDTSC Read Time- St amp Count er.
Some of t he privileged inst ruct ions are available only in t he more recent families of
I nt el 64 and I A- 32 processors ( see Sect ion 19. 13, New I nst ruct ions I n t he Pent ium
and Lat er I A- 32 Processors ) .
The PCE and TSD flags in regist er CR4 ( bit s 4 and 2, respect ively) enable t he RDPMC
and RDTSC inst ruct ions, respect ively, t o be execut ed at any CPL.
5.10 POINTER VALIDATION
When operat ing in prot ect ed mode, t he processor validat es all point ers t o enforce
prot ect ion bet ween segment s and maint ain isolat ion bet ween privilege levels.
Point er validat ion consist s of t he following checks:
1. Checking access right s t o det ermine if t he segment t ype is compat ible wit h it s
use.
2. Checking read/ writ e right s.
3. Checking if t he point er offset exceeds t he segment limit .
4. Checking if t he supplier of t he point er is allowed t o access t he segment .
5. Checking t he offset alignment .
Vol. 3 5-35
PROTECTION
The processor aut omat ically performs first , second, and t hird checks during inst ruc-
t ion execut ion. Soft ware must explicit ly request t he fourt h check by issuing an ARPL
inst ruct ion. The fift h check ( offset alignment ) is performed aut omat ically at privilege
level 3 if alignment checking is t urned on. Offset alignment does not affect isolat ion
of privilege levels.
5.10.1 Checking Access Rights (LAR Instruction)
When t he processor accesses a segment using a far point er, it performs an access
right s check on t he segment descript or point ed t o by t he far point er. This check is
performed t o det ermine if t ype and privilege level ( DPL) of t he segment descript or
are compat ible wit h t he operat ion t o be performed. For example, when making a far
call in prot ect ed mode, t he segment - descript or t ype must be for a conforming or
nonconforming code segment , a call gat e, a t ask gat e, or a TSS. Then, if t he call is t o
a nonconforming code segment , t he DPL of t he code segment must be equal t o t he
CPL, and t he RPL of t he code segment s segment select or must be less t han or equal
t o t he DPL. I f t ype or privilege level are found t o be incompat ible, t he appropriat e
except ion is generat ed.
To prevent t ype incompat ibilit y except ions from being generat ed, soft ware can check
t he access right s of a segment descript or using t he LAR ( load access right s) inst ruc-
t ion. The LAR inst ruct ion specifies t he segment select or for t he segment descript or
whose access right s are t o be checked and a dest inat ion regist er. The inst ruct ion t hen
performs t he following operat ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code, dat a, LDT, call gat e, t ask gat e, or
TSS segment - descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or are less t han or equal t o t he DPL) .
5. I f t he privilege level and t ype checks pass, loads t he second doubleword of t he
segment descript or int o t he dest inat ion regist er ( masked by t he value
00FXFF00H, where X indicat es t hat t he corresponding 4 bit s are undefined) and
set s t he ZF flag in t he EFLAGS regist er. I f t he segment select or is not visible at
t he current privilege level or is an invalid t ype for t he LAR inst ruct ion, t he
inst ruct ion does not modify t he dest inat ion regist er and clears t he ZF flag.
Once loaded in t he dest inat ion regist er, soft ware can preform addit ional checks on
t he access right s informat ion.
5-36 Vol. 3
PROTECTION
5.10.2 Checking Read/Write Rights (VERR and VERW Instructions)
When t he processor accesses any code or dat a segment it checks t he read/ writ e priv-
ileges assigned t o t he segment t o verify t hat t he int ended read or writ e operat ion is
allowed. Soft ware can check read/ writ e right s using t he VERR ( verify for reading)
and VERW ( verify for writ ing) inst ruct ions. Bot h t hese inst ruct ions specify t he
segment select or for t he segment being checked. The inst ruct ions t hen perform t he
following operat ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code or dat a- segment descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or are less t han or equal t o t he DPL) .
5. Checks t hat t he segment is readable ( for t he VERR inst ruct ion) or writ able ( for
t he VERW) inst ruct ion.
The VERR inst ruct ion set s t he ZF flag in t he EFLAGS regist er if t he segment is visible
at t he CPL and readable; t he VERW set s t he ZF flag if t he segment is visible and writ -
able. ( Code segment s are never writ able. ) The ZF flag is cleared if any of t hese
checks fail.
5.10.3 Checking That the Pointer Offset Is Within Limits (LSL
Instruction)
When t he processor accesses any segment it performs a limit check t o insure t hat t he
offset is wit hin t he limit of t he segment . Soft ware can perform t his limit check using
t he LSL ( load segment limit ) inst ruct ion. Like t he LAR inst ruct ion, t he LSL inst ruct ion
specifies t he segment select or for t he segment descript or whose limit is t o be
checked and a dest inat ion regist er. The inst ruct ion t hen performs t he following oper-
at ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code, dat a, LDT, or TSS segment -
descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or less t han or equal t o t he DPL) .
5. I f t he privilege level and t ype checks pass, loads t he unscrambled limit ( t he limit
scaled according t o t he set t ing of t he G flag in t he segment descript or) int o t he
Vol. 3 5-37
PROTECTION
dest inat ion regist er and set s t he ZF flag in t he EFLAGS regist er. I f t he segment
select or is not visible at t he current privilege level or is an invalid t ype for t he LSL
inst ruct ion, t he inst ruct ion does not modify t he dest inat ion regist er and clears
t he ZF flag.
Once loaded in t he dest inat ion regist er, soft ware can compare t he segment limit wit h
t he offset of a point er.
5.10.4 Checking Caller Access Privileges (ARPL Instruction)
The request or s privilege level ( RPL) field of a segment select or is int ended t o carry
t he privilege level of a calling procedure ( t he calling procedures CPL) t o a called
procedure. The called procedure t hen uses t he RPL t o det ermine if access t o a
segment is allowed. The RPL is said t o weaken t he privilege level of t he called
procedure t o t hat of t he RPL.
Operat ing- syst em procedures t ypically use t he RPL t o prevent less privileged appli-
cat ion programs from accessing dat a locat ed in more privileged segment s. When an
operat ing- syst em procedure ( t he called procedure) receives a segment select or from
an applicat ion program ( t he calling procedure) , it set s t he segment select or s RPL t o
t he privilege level of t he calling procedure. Then, when t he operat ing syst em uses
t he segment select or t o access it s associat ed segment , t he processor performs priv-
ilege checks using t he calling procedures privilege level ( st ored in t he RPL) rat her
t han t he numerically lower privilege level ( t he CPL) of t he operat ing- syst em proce-
dure. The RPL t hus insures t hat t he operat ing syst em does not access a segment on
behalf of an applicat ion program unless t hat program it self has access t o t he
segment .
Figure 5- 15 shows an example of how t he processor uses t he RPL field. I n t his
example, an applicat ion program ( locat ed in code segment A) possesses a segment
select or ( segment select or D1) t hat point s t o a privileged dat a st ruct ure ( t hat is, a
dat a st ruct ure locat ed in a dat a segment D at privilege level 0) .
The applicat ion program cannot access dat a segment D, because it does not have
sufficient privilege, but t he operat ing syst em ( locat ed in code segment C) can. So, in
an at t empt t o access dat a segment D, t he applicat ion program execut es a call t o t he
operat ing syst em and passes segment select or D1 t o t he operat ing syst em as a
paramet er on t he st ack. Before passing t he segment select or, t he ( well behaved)
applicat ion program set s t he RPL of t he segment select or t o it s current privilege level
( which in t his example is 3) . I f t he operat ing syst em at t empt s t o access dat a
segment D using segment select or D1, t he processor compares t he CPL ( which is
now 0 following t he call) , t he RPL of segment select or D1, and t he DPL of dat a
segment D ( which is 0) . Since t he RPL is great er t han t he DPL, access t o dat a
segment D is denied. The processor s prot ect ion mechanism t hus prot ect s dat a
segment D from access by t he operat ing syst em, because applicat ion programs priv-
ilege level ( represent ed by t he RPL of segment select or B) is great er t han t he DPL of
dat a segment D.
5-38 Vol. 3
PROTECTION
Now assume t hat inst ead of set t ing t he RPL of t he segment select or t o 3, t he appli-
cat ion program set s t he RPL t o 0 ( segment select or D2) . The operat ing syst em can
now access dat a segment D, because it s CPL and t he RPL of segment select or D2 are
bot h equal t o t he DPL of dat a segment D.
Because t he applicat ion program is able t o change t he RPL of a segment select or t o
any value, it can pot ent ially use a procedure operat ing at a numerically lower privi-
lege level t o access a prot ect ed dat a st ruct ure. This abilit y t o lower t he RPL of a
segment select or breaches t he processor s prot ect ion mechanism.
Because a called procedure cannot rely on t he calling procedure t o set t he RPL
correct ly, operat ing- syst em procedures ( execut ing at numerically lower privilege-
levels) t hat receive segment select ors from numerically higher privilege- level proce-
dures need t o t est t he RPL of t he segment select or t o det ermine if it is at t he appro-
priat e level. The ARPL ( adj ust request ed privilege level) inst ruct ion is provided for
t his purpose. This inst ruct ion adj ust s t he RPL of one segment select or t o mat ch t hat
of anot her segment select or.
Figure 5-15. Use of RPL to Weaken Privilege Level of Called Procedure
Passed as a
parameter on
the stack.
Access
allowed
Access
allowed
Application Program
Operating
System
Lowest Privilege
Highest Privilege
3
2
1
0
Data
Segment D
not
Segment Sel. D1
RPL=3
Segment Sel. D2
RPL=0
Gate Selector B
RPL=3
Code
Segment A
CPL=3
Code
Segment C
DPL=0
Call
Gate B
DPL=3
DPL=0
Vol. 3 5-39
PROTECTION
The example in Figure 5- 15 demonst rat es how t he ARPL inst ruct ion is int ended t o be
used. When t he operat ing- syst em receives segment select or D2 from t he applicat ion
program, it uses t he ARPL inst ruct ion t o compare t he RPL of t he segment select or
wit h t he privilege level of t he applicat ion program ( represent ed by t he code- segment
select or pushed ont o t he st ack) . I f t he RPL is less t han applicat ion programs privi-
lege level, t he ARPL inst ruct ion changes t he RPL of t he segment select or t o mat ch t he
privilege level of t he applicat ion pr ogram ( segment select or D1) . Using t his inst r uc-
t ion t hus pr event s a pr ocedur e r unning at a numer ically higher privilege level from
accessing numerically lower privilege- level ( more privileged) segment s by lowering
t he RPL of a segment select or.
Not e t hat t he privilege level of t he applicat ion program can be det ermined by reading
t he RPL field of t he segment select or for t he applicat ion- programs code segment .
This segment select or is st ored on t he st ack as part of t he call t o t he operat ing
syst em. The operat ing syst em can copy t he segment sel ect or f r om t he st ack i nt o a
r egi st er f or use as an operand f or t he ARPL i nst r uct i on.
5.10.5 Checking Alignment
When t he CPL is 3, alignment of memory references can be checked by set t ing t he
AM flag in t he CR0 regist er and t he AC flag in t he EFLAGS regist er. Unaligned memory
references generat e alignment except ions ( # AC) . The processor does not generat e
alignment except ions when operat ing at privilege level 0, 1, or 2. See Table 6- 7 for a
descript ion of t he alignment requirement s when alignment checking is enabled.
5.11 PAGE-LEVEL PROTECTION
Page- level prot ect ion can be used alone or applied t o segment s. When page- level
prot ect ion is used wit h t he flat memory model, it allows supervisor code and dat a
( t he operat ing syst em or execut ive) t o be prot ect ed from user code and dat a ( appli-
cat ion programs) . I t also allows pages cont aining code t o be writ e prot ect ed. When
t he segment - and page- level prot ect ion are combined, page- level read/ writ e prot ec-
t ion allows more prot ect ion granularit y wit hin segment s.
Wit h page- level prot ect ion ( as wit h segment - level prot ect ion) each memory refer-
ence is checked t o verify t hat prot ect ion checks are sat isfied. All checks are made
before t he memory cycle is st art ed, and any violat ion prevent s t he cycle from
st art ing and result s in a page- fault except ion being generat ed. Because checks are
performed in parallel wit h address t ranslat ion, t here is no performance penalt y.
The processor performs t wo page- level prot ect ion checks:
Rest rict ion of addressable domain ( supervisor and user modes) .
Page t ype ( read only or read/ writ e) .
Violat ions of eit her of t hese checks result s in a page- fault except ion being generat ed.
See Chapt er 6, I nt errupt 14Page- Fault Except ion ( # PF) , for an explanat ion of t he
5-40 Vol. 3
PROTECTION
page- fault except ion mechanism. This chapt er describes t he prot ect ion violat ions
which lead t o page- fault except ions.
5.11.1 Page-Protection Flags
Prot ect ion informat ion for pages is cont ained in t wo flags in a paging- st ruct ure ent ry
( see Chapt er 4) : t he read/ writ e flag ( bit 1) and t he user/ supervisor flag ( bit 2) . The
prot ect ion checks use t he flags in all paging st ruct ures.
5.11.2 Restricting Addressable Domain
The page- level prot ect ion mechanism allows rest rict ing access t o pages based on
t wo privilege levels:
Supervisor mode ( U/ S flag is 0) ( Most privileged) For t he operat ing syst em or
execut ive, ot her syst em soft ware ( such as device drivers) , and prot ect ed syst em
dat a ( such as page t ables) .
User mode ( U/ S flag is 1) ( Least privileged) For applicat ion code and dat a.
The segment privilege levels map t o t he page privilege levels as follows. I f t he
processor is current ly operat ing at a CPL of 0, 1, or 2, it is in supervisor mode; if it is
operat ing at a CPL of 3, it is in user mode. When t he processor is in supervisor mode,
it can access all pages; when in user mode, it can access only user- level pages. ( Not e
t hat t he WP flag in cont rol regist er CR0 modifies t he supervisor permissions, as
described in Sect ion 5. 11. 3, Page Type. )
Not e t hat t o use t he page- level prot ect ion mechanism, code and dat a segment s must
be set up for at least t wo segment - based privilege levels: level 0 for supervisor code
and dat a segment s and level 3 for user code and dat a segment s. ( I n t his model, t he
st acks are placed in t he dat a segment s. ) To minimize t he use of segment s, a flat
memory model can be used ( see Sect ion 3.2. 1, Basic Flat Model ) .
Here, t he user and supervisor code and dat a segment s all begin at address zero in
t he linear address space and overlay each ot her. Wit h t his arrangement , operat ing-
syst em code ( running at t he supervisor level) and applicat ion code ( running at t he
user level) can execut e as if t here are no segment s. Prot ect ion bet ween operat ing-
syst em and applicat ion code and dat a is provided by t he processor s page- level
prot ect ion mechanism.
5.11.3 Page Type
The page- level prot ect ion mechanism recognizes t wo page t ypes:
Read- only access ( R/ W flag is 0) .
Read/ writ e access ( R/ W flag is 1) .
Vol. 3 5-41
PROTECTION
When t he processor is in supervisor mode and t he WP flag in regist er CR0 is clear ( it s
st at e following reset init ializat ion) , all pages are bot h readable and writ able ( writ e-
prot ect ion is ignored) . When t he processor is in user mode, it can writ e only t o user-
mode pages t hat are read/ writ e accessible. User- mode pages which are read/ writ e or
read- only are readable; supervisor- mode pages are neit her readable nor writ able
from user mode. A page- fault except ion is generat ed on any at t empt t o violat e t he
prot ect ion rules.
St art ing wit h t he P6 family, I nt el processors allow user- mode pages t o be writ e-
prot ect ed against supervisor- mode access. Set t ing CR0. WP = 1 enables supervisor-
mode sensit ivit y t o writ e prot ect ed pages. I f CR0. WP = 1, read- only pages are not
writ able from any privilege level. This supervisor writ e- prot ect feat ure is useful for
implement ing a copy- on- writ e st rat egy used by some operat ing syst ems, such as
UNI X* , for t ask creat ion ( also called forking or spawning) . When a new t ask is
creat ed, it is possible t o copy t he ent ire address space of t he parent t ask. This gives
t he child t ask a complet e, duplicat e set of t he parent ' s segment s and pages. An alt er-
nat ive copy- on- writ e st rat egy saves memory space and t ime by mapping t he child' s
segment s and pages t o t he same segment s and pages used by t he parent t ask. A
privat e copy of a page get s creat ed only when one of t he t asks writ es t o t he page. By
using t he WP flag and marking t he shared pages as read- only, t he supervisor can
det ect an at t empt t o writ e t o a page, and can copy t he page at t hat t ime.
5.11.4 Combining Protection of Both Levels of Page Tables
For any one page, t he prot ect ion at t ribut es of it s page- direct ory ent ry ( first - level
page t able) may differ from t hose of it s page- t able ent ry ( second- level page t able) .
The processor checks t he prot ect ion for a page in bot h it s page- direct ory and t he
page- t able ent ries. Table 5- 3 shows t he prot ect ion provided by t he possible combina-
t ions of prot ect ion at t ribut es when t he WP flag is clear.
5.11.5 Overrides to Page Protection
The following t ypes of memory accesses are checked as if t hey are privilege- level 0
accesses, regardless of t he CPL at which t he processor is current ly operat ing:
Access t o segment descript ors in t he GDT, LDT, or I DT.
Access t o an inner- privilege- level st ack during an int er- privilege- level call or a
call t o in except ion or int errupt handler, when a change of privilege level occurs.
5.12 COMBINING PAGE AND SEGMENT PROTECTION
When paging is enabled, t he processor evaluat es segment prot ect ion first , t hen
evaluat es page prot ect ion. I f t he processor det ect s a prot ect ion violat ion at eit her
t he segment level or t he page level, t he memory access is not carried out and an
5-42 Vol. 3
PROTECTION
except ion is generat ed. I f an except ion is generat ed by segment at ion, no paging
except ion is generat ed.
Page- level prot ect ions cannot be used t o override segment - level prot ect ion. For
example, a code segment is by definit ion not writ able. I f a code segment is paged,
set t ing t he R/ W flag for t he pages t o read- writ e does not make t he pages writ able.
At t empt s t o writ e int o t he pages will be blocked by segment - level prot ect ion checks.
Page- level prot ect ion can be used t o enhance segment - level prot ect ion. For
example, if a large read- writ e dat a segment is paged, t he page- prot ect ion mecha-
nism can be used t o writ e- prot ect individual pages.
Table 5-3. Combined Page-Directory and Page-Table Protection
Page-Directory Entry Page-Table Entry Combined Effect
Privilege Access Type Privilege Access Type Privilege Access Type
User Read-Only User Read-Only User Read-Only
User Read-Only User Read-Write User Read-Only
User Read-Write User Read-Only User Read-Only
User Read-Write User Read-Write User Read/Write
User Read-Only Supervisor Read-Only Supervisor Read/Write*
User Read-Only Supervisor Read-Write Supervisor Read/Write*
User Read-Write Supervisor Read-Only Supervisor Read/Write*
User Read-Write Supervisor Read-Write Supervisor Read/Write
Supervisor Read-Only User Read-Only Supervisor Read/Write*
Supervisor Read-Only User Read-Write Supervisor Read/Write*
Supervisor Read-Write User Read-Only Supervisor Read/Write*
Supervisor Read-Write User Read-Write Supervisor Read/Write
Supervisor Read-Only Supervisor Read-Only Supervisor Read/Write*
Supervisor Read-Only Supervisor Read-Write Supervisor Read/Write*
Supervisor Read-Write Supervisor Read-Only Supervisor Read/Write*
Supervisor Read-Write Supervisor Read-Write Supervisor Read/Write
NOTE:
* If CR0.WP = 1, access type is determined by the R/W flags of the page-directory and page-table
entries. IF CR0.WP = 0, supervisor privilege permits read-write access.
Vol. 3 5-43
PROTECTION
5.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE
BIT
I n addit ion t o page- level prot ect ion offered by t he U/ S and R/ W flags, paging st ruc-
t ures used wit h PAE paging and I A- 32e paging ( see Chapt er 4) provide t he execut e-
disable bit . This bit offers addit ional prot ect ion for dat a pages.
An I nt el 64 or I A- 32 processor wit h t he execut e- disable bit capabilit y can prevent
dat a pages from being used by malicious soft ware t o execut e code. This capabilit y is
provided in:
32- bit prot ect ed mode wit h PAE enabled.
I A- 32e mode.
While t he execut e- disable bit capabilit y does not int roduce new inst ruct ions, it does
require operat ing syst ems t o use a PAE- enabled environment and est ablish a page-
granular prot ect ion policy for memory pages.
I f t he execut e- disable bit of a memory page is set , t hat page can be used only as
dat a. An at t empt t o execut e code from a memory page wit h t he execut e- disable bit
set causes a page- fault except ion.
The execut e- disable capabilit y is support ed only wit h PAE paging and I A- 32e paging.
I t is not support ed wit h 32- bit paging. Exist ing page- level prot ect ion mechanisms
( see Sect ion 5. 11, Page- Level Prot ect ion ) cont inue t o apply t o memory pages inde-
pendent of t he execut e- disable set t ing.
5.13.1 Detecting and Enabling the Execute-Disable Capability
Soft ware can det ect t he presence of t he execut e- disable capabilit y using t he CPUI D
inst ruct ion. CPUI D.80000001H: EDX. NX [ bit 20] = 1 indicat es t he capabilit y is avail-
able.
I f t he capabilit y is available, soft ware can enable it by set t ing I A32_EFER. NXE[ bit 11]
t o 1. I A32_EFER is available if CPUI D.80000001H. EDX[ bit 20 or 29] = 1.
I f t he execut e- disable capabilit y is not available, a writ e t o set I A32_EFER. NXE
produces a # GP except ion. See Table 5- 4.
Table 5-4. Extended Feature Enable MSR (IA32_EFER)
63:12 11 10 9 8 7:1 0
Reserved Execute-
disable bit
enable (NXE)
IA-32e mode
active (LMA)
Reserve
d
IA-32e mode
enable (LME)
Reserve
d
SysCall enable
(SCE)
5-44 Vol. 3
PROTECTION
5.13.2 Execute-Disable Page Protection
The execut e- disable bit in t he paging st ruct ures enhances page prot ect ion for dat a
pages. I nst ruct ions cannot be fet ched from a memory page if I A32_EFER. NXE = 1
and t he execut e- disable bit is set in any of t he paging- st ruct ure ent ries used t o map
t he page. Table 5- 5 list s t he valid usage of a page in relat ion t o t he value of execut e-
disable bit ( bit 63) of t he corresponding ent ry in each level of t he paging st ruct ures.
Execut e- disable prot ect ion can be act ivat ed using t he execut e- disable bit at any level
of t he paging st ruct ure, irrespect ive of t he corresponding ent ry in ot her levels. When
execut e- disable prot ect ion is not act ivat ed, t he page can be used as code or dat a.
I n legacy PAE- enabled mode, Table 5- 6 and Table 5- 7 show t he effect of set t ing t he
execut e- disable bit for code and dat a pages.
Table 5-5. IA-32e Mode Page Level Protection Matrix
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PML4 PDP PDE PTE
Bit 63 = 1 * * * Data
* Bit 63 = 1 * * Data
* * Bit 63 = 1 * Data
* * * Bit 63 = 1 Data
Bit 63 = 0 Bit 63 = 0 Bit 63 = 0 Bit 63 = 0 Data/Code
NOTES:
* Value not checked.
Vol. 3 5-45
PROTECTION
5.13.3 Reserved Bit Checking
The processor enforces reserved bit checking in paging dat a st ruct ure ent ries. The
bit s being checked varies wit h paging mode and may vary wit h t he size of physical
address space.
Table 5- 8 shows t he reserved bit s t hat are checked when t he execut e disable bit
capabilit y is enabled ( CR4. PAE = 1 and I A32_EFER. NXE = 1) . Table 5- 8 and Table
show t he following paging modes:
Non- PAE 4- KByt e paging: 4- KByt e- page only paging ( CR4. PAE = 0,
CR4. PSE = 0) .
PSE36: 4- KByt e and 4- MByt e pages ( CR4. PAE = 0, CR4.PSE = 1) .
PAE: 4- KByt e and 2- MByt e pages ( CR4.PAE = 1, CR4.PSE = X) .
The reserved bit checking depends on t he physical address size support ed by t he
implement at ion, which is report ed in CPUI D.80000008H. See t he t able not e.
Table 5-6. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PDE PTE
Bit 63 = 1 * Data
* Bit 63 = 1 Data
Bit 63 = 0 Bit 63 = 0 Data/Code
NOTE:
* Value not checked.
Table 5-7. Legacy PAE-Enabled 2-MByte Page Level Protection
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PDE
Bit 63 = 1 Data
Bit 63 = 0 Data/Code
5-46 Vol. 3
PROTECTION
I f execut e disable bit capabilit y is not enabled or not available, reserved bit checking
in 64- bit mode includes bit 63 and addit ional bit s. This and reserved bit checking for
legacy 32- bit paging modes are shown in Table 5- 10.
Table 5-8. IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit
Capability Enabled
Mode Paging Mode Check Bits
32-bit 4-KByte paging (non-PAE) No reserved bits checked
PSE36 - PDE, 4-MByte page Bit [21]
PSE36 - PDE, 4-KByte page No reserved bits checked
PSE36 - PTE No reserved bits checked
PAE - PDP table entry Bits [63:MAXPHYADDR] & [8:5] & [2:1] *
PAE - PDE, 2-MByte page Bits [62:MAXPHYADDR] & [20:13] *
PAE - PDE, 4-KByte page Bits [62:MAXPHYADDR] *
PAE - PTE Bits [62:MAXPHYADDR] *
64-bit PML4E Bits [51:MAXPHYADDR] *
PDPTE Bits [51:MAXPHYADDR] *
PDE, 2-MByte page Bits [51:MAXPHYADDR] & [20:13] *
PDE, 4-KByte page Bits [51:MAXPHYADDR] *
PTE Bits [51:MAXPHYADDR] *
NOTES:
* MAXPHYADDR is the maximum physical address size and is indicated by
CPUID.80000008H:EAX[bits 7-0].
Vol. 3 5-47
PROTECTION
5.13.4 Exception Handling
When execut e disable bit capabilit y is enabled ( I A32_EFER. NXE = 1) , condit ions for
a page fault t o occur include t he same condit ions t hat apply t o an I nt el 64 or I A- 32
processor wit hout execut e disable bit capabilit y plus t he following new condit ion: an
inst ruct ion fet ch t o a linear address t hat t ranslat es t o physical address in a memory
page t hat has t he execut e- disable bit set .
An Execut e Disable Bit page fault can occur at all privilege levels. I t can occur on any
inst ruct ion fet ch, including ( but not limit ed t o) : near branches, far branches,
CALL/ RET/ I NT/ I RET execut ion, sequent ial inst ruct ion fet ches, and t ask swit ches. The
execut e- disable bit in t he page t ranslat ion mechanism is checked only when:
I A32_EFER. NXE = 1.
The inst ruct ion t ranslat ion look- aside buffer ( I TLB) is loaded wit h a page t hat is
not already present in t he I TLB.
Table 5-9. Reserved Bit Checking WIth Execute-Disable Bit Capability Not Enabled
Mode Paging Mode Check Bits
32-bit KByte paging (non-PAE) No reserved bits checked
PSE36 - PDE, 4-MByte page Bit [21]
PSE36 - PDE, 4-KByte page No reserved bits checked
PSE36 - PTE No reserved bits checked
PAE - PDP table entry Bits [63:MAXPHYADDR] & [8:5] & [2:1]*
PAE - PDE, 2-MByte page Bits [63:MAXPHYADDR] & [20:13]*
PAE - PDE, 4-KByte page Bits [63:MAXPHYADDR]*
PAE - PTE Bits [63:MAXPHYADDR]*
64-bit PML4E Bit [63], bits [51:MAXPHYADDR]*
PDPTE Bit [63], bits [51:MAXPHYADDR]*
PDE, 2-MByte page Bit [63], bits [51:MAXPHYADDR] & [20:13]*
PDE, 4-KByte page Bit [63], bits [51:MAXPHYADDR]*
PTE Bit [63], bits [51:MAXPHYADDR]*
NOTES:
* MAXPHYADDR is the maximum physical address size and is indicated by
CPUID.80000008H:EAX[bits 7-0].
5-48 Vol. 3
PROTECTION
Vol. 3 6-1
CHAPTER 6
INTERRUPT AND EXCEPTION HANDLING
This chapt er describes t he int errupt and except ion- handling mechanism when oper-
at ing in prot ect ed mode on an I nt el 64 or I A- 32 processor. Most of t he informat ion
provided here also applies t o int errupt and except ion mechanisms used in real-
address, virt ual- 8086 mode, and 64- bit mode.
Chapt er 17, 8086 Emulat ion, describes informat ion specific t o int errupt and excep-
t ion mechanisms in real- address and virt ual- 8086 mode. Sect ion 6. 14, Except ion
and I nt errupt Handling in 64- bit Mode, describes informat ion specific t o int errupt
and except ion mechanisms in I A- 32e mode and 64- bit sub- mode.
6.1 INTERRUPT AND EXCEPTION OVERVIEW
I nt errupt s and except ions are event s t hat indicat e t hat a condit ion exist s somewhere
in t he syst em, t he processor, or wit hin t he current ly execut ing program or t ask t hat
requires t he at t ent ion of a processor. They t ypically result in a forced t ransfer of
execut ion from t he current ly running program or t ask t o a special soft ware rout ine or
t ask called an int errupt handler or an except ion handler. The act ion t aken by a
processor in response t o an int errupt or except ion is referred t o as servicing or
handling t he int errupt or except ion.
I nt errupt s occur at random t imes during t he execut ion of a program, in response t o
signals from hardware. Syst em hardware uses int errupt s t o handle event s ext ernal
t o t he processor, such as request s t o service peripheral devices. Soft ware can also
generat e int errupt s by execut ing t he I NT n inst ruct ion.
Except ions occur when t he processor det ect s an error condit ion while execut ing an
inst ruct ion, such as division by zero. The processor det ect s a variet y of error condi-
t ions including prot ect ion violat ions, page fault s, and int ernal machine fault s. The
machine- check archit ect ure of t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium
processors also permit s a machine- check except ion t o be generat ed when int ernal
hardware errors and bus errors are det ect ed.
When an int errupt is received or an except ion is det ect ed, t he current ly running
procedure or t ask is suspended while t he processor execut es an int errupt or excep-
t ion handler. When execut ion of t he handler is complet e, t he processor resumes
execut ion of t he int errupt ed procedure or t ask. The resumpt ion of t he int errupt ed
procedure or t ask happens wit hout loss of program cont inuit y, unless recovery from
an except ion was not possible or an int errupt caused t he current ly running program
t o be t erminat ed.
This chapt er describes t he processor s int errupt and except ion- handling mechanism,
when operat ing in prot ect ed mode. A descript ion of t he except ions and t he condit ions
t hat cause t hem t o be generat ed is given at t he end of t his chapt er.
6-2 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.2 EXCEPTION AND INTERRUPT VECTORS
To aid in handling except ions and int errupt s, each archit ect urally defined except ion
and each int errupt condit ion requir ing special handling by t he processor is assigned
a unique ident ificat ion number, called a vect or number. The processor uses t he vect or
number assigned t o an except ion or int errupt as an index int o t he int errupt
descript or t able ( I DT) . The t able provides t he ent ry point t o an except ion or int errupt
handler ( see Sect ion 6. 10, I nt errupt Descript or Table ( I DT) ) .
The allowable range for vect or numbers is 0 t o 255. Vect or number s in t he range 0
t hrough 31 are reserved by t he I nt el 64 and I A- 32 archit ect ures for archit ect ure-
defined except ions and int errupt s. Not all of t he vect or numbers in t his range have a
current ly defined funct ion. The unassigned vect or numbers in t his range are
reserved. Do not use t he reserved vect or numbers.
Vect or number s in t he range 32 t o 255 are designat ed as user- defined int errupt s and
are not reserved by t he I nt el 64 and I A- 32 archit ect ure. These int errupt s are gener-
ally assigned t o ext ernal I / O devices t o enable t hose devices t o send int errupt s t o t he
processor t hrough one of t he ext ernal hardware int errupt mechanisms ( see Sect ion
6. 3, Sources of I nt errupt s ) .
Table 6- 1 shows vect or number assignment s for archit ect urally defined except ions
and for t he NMI int errupt . This t able gives t he except ion t ype ( see Sect ion 6. 5,
Except ion Classificat ions ) and indicat es whet her an error code is saved on t he st ack
for t he except ion. The source of each predefined except ion and t he NMI int errupt is
also given.
6.3 SOURCES OF INTERRUPTS
The processor receives int errupt s from t wo sources:
Ext ernal ( hardware generat ed) int errupt s.
Soft ware- generat ed int errupt s.
6.3.1 External Interrupts
Ext ernal int errupt s are received t hrough pins on t he processor or t hrough t he local
API C. The primary int errupt pins on Pent ium 4, I nt el Xeon, P6 family, and Pent ium
processors are t he LI NT[ 1: 0] pins, which are connect ed t o t he local API C ( see
Chapt er 10, Advanced Programmable I nt errupt Cont roller ( API C) ) . When t he local
API C is enabled, t he LI NT[ 1: 0] pins can be programmed t hrough t he API Cs local
vect or t able ( LVT) t o be associat ed wit h any of t he processor s except ion or int errupt
vect ors.
When t he local API C is global/ hardware disabled, t hese pins are configured as I NTR
and NMI pins, respect ively. Assert ing t he I NTR pin signals t he processor t hat an
ext ernal int errupt has occurred. The processor reads from t he syst em bus t he int er-
Vol. 3 6-3
INTERRUPT AND EXCEPTION HANDLING
rupt vect or number provided by an ext ernal int errupt cont roller, such as an 8259A
( see Sect ion 6. 2, Except ion and I nt errupt Vect ors ) . Assert ing t he NMI pin signals a
non- maskable int errupt ( NMI ) , which is assigned t o int errupt vect or 2.
Table 6-1. Protected-Mode Exceptions and Interrupts
Vector
No.
Mne-
monic
Description Type Error
Code
Source
0 #DE Divide Error Fault No DIV and IDIV instructions.
1 #DB RESERVED Fault/
Trap
No For Intel use only.
2 NMI Interrupt Interrupt No Nonmaskable external
interrupt.
3 #BP Breakpoint Trap No INT 3 instruction.
4 #OF Overflow Trap No INTO instruction.
5 #BR BOUND Range Exceeded Fault No BOUND instruction.
6 #UD Invalid Opcode (Undefined
Opcode)
Fault No UD2 instruction or reserved
opcode.
1
7 #NM Device Not Available (No
Math Coprocessor)
Fault No Floating-point or WAIT/FWAIT
instruction.
8 #DF Double Fault Abort Yes
(zero)
Any instruction that can
generate an exception, an NMI,
or an INTR.
9 Coprocessor Segment
Overrun (reserved)
Fault No Floating-point instruction.
2
10 #TS Invalid TSS Fault Yes Task switch or TSS access.
11 #NP Segment Not Present Fault Yes Loading segment registers or
accessing system segments.
12 #SS Stack-Segment Fault Fault Yes Stack operations and SS
register loads.
13 #GP General Protection Fault Yes Any memory reference and
other protection checks.
14 #PF Page Fault Fault Yes Any memory reference.
15 (Intel reserved. Do not
use.)
No
16 #MF x87 FPU Floating-Point
Error (Math Fault)
Fault No x87 FPU floating-point or
WAIT/FWAIT instruction.
17 #AC Alignment Check Fault Yes
(Zero)
Any data reference in
memory.
3
6-4 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
The processor s local API C is normally connect ed t o a syst em- based I / O API C. Here,
ext ernal int errupt s received at t he I / O API Cs pins can be direct ed t o t he local API C
t hrough t he syst em bus ( Pent ium 4, I nt el Core Duo, I nt el Core 2, I nt el
At om, and
I nt el Xeon processors) or t he API C serial bus ( P6 family and Pent ium processors) .
The I / O API C det ermines t he vect or number of t he int errupt and sends t his number
t o t he local API C. When a syst em cont ains mult iple processors, processors can also
send int errupt s t o one anot her by means of t he syst em bus ( Pent ium 4, I nt el Core
Duo, I nt el Core 2, I nt el At om, and I nt el Xeon processors) or t he API C serial bus ( P6
family and Pent ium processors) .
The LI NT[ 1: 0] pins are not available on t he I nt el486 processor and earlier Pent ium
processors t hat do not cont ain an on- chip local API C. These processors have dedi-
cat ed NMI and I NTR pins. Wit h t hese processors, ext ernal int errupt s are t ypically
generat ed by a syst em- based int errupt cont roller ( 8259A) , wit h t he int errupt s being
signaled t hrough t he I NTR pin.
Not e t hat several ot her pins on t he processor can cause a processor int errupt t o
occur. However, t hese int errupt s are not handled by t he int errupt and except ion
mechanism described in t his chapt er. These pins include t he RESET# , FLUSH# ,
STPCLK# , SMI # , R/ S# , and I NI T# pins. Whet her t hey are included on a part icular
processor is implement at ion dependent . Pin funct ions are described in t he dat a
books for t he individual processors. The SMI # pin is described in Chapt er 26,
Syst em Management .
6.3.2 Maskable Hardware Interrupts
Any ext ernal int errupt t hat is delivered t o t he processor by means of t he I NTR pin or
t hrough t he local API C is called a maskable hardware int errupt . Maskable hardware
int errupt s t hat can be delivered t hrough t he I NTR pin include all I A- 32 archit ect ure
18 #MC Machine Check Abort No Error codes (if any) and source
are model dependent.
4
19 #XM SIMD Floating-Point
Exception
Fault No SSE/SSE2/SSE3 floating-point
instructions
5
20-31 Intel reserved. Do not use.
32-
255
User Defined (Non-
reserved) Interrupts
Interrupt External interrupt or INT n
instruction.
NOTES:
1. The UD2 instruction was introduced in the Pentium Pro processor.
2. Processors after the Intel386 processor do not generate this exception.
3. This exception was introduced in the Intel486 processor.
4. This exception was introduced in the Pentium processor and enhanced in the P6 family proces-
sors.
5. This exception was introduced in the Pentium III processor.
Table 6-1. Protected-Mode Exceptions and Interrupts (Contd.)
Vol. 3 6-5
INTERRUPT AND EXCEPTION HANDLING
defined int errupt vect ors from 0 t hrough 255; t hose t hat can be delivered t hrough
t he local API C include int errupt vect ors 16 t hrough 255.
The I F flag in t he EFLAGS regist er permit s all maskable hardware int errupt s t o be
masked as a group ( see Sect ion 6. 8. 1, Masking Maskable Hardware I nt errupt s ) .
Not e t hat when int errupt s 0 t hrough 15 are delivered t hrough t he local API C, t he
API C indicat es t he receipt of an illegal vect or.
6.3.3 Software-Generated Interrupts
The I NT n inst ruct ion permit s int errupt s t o be generat ed from wit hin soft ware by
supplying an int errupt vect or number as an operand. For example, t he I NT 35
inst ruct ion forces an implicit call t o t he int errupt handler for int errupt 35.
Any of t he int errupt vect ors from 0 t o 255 can be used as a paramet er in t his inst ruc-
t ion. I f t he processor s predefined NMI vect or is used, however, t he response of t he
processor will not be t he same as it would be from an NMI int errupt generat ed in t he
normal manner. I f vect or number 2 ( t he NMI vect or) is used in t his inst ruct ion, t he
NMI int errupt handler is called, but t he processor s NMI - handling hardware is not
act ivat ed.
I nt errupt s generat ed in soft ware wit h t he I NT n inst ruct ion cannot be masked by t he
I F flag in t he EFLAGS regist er.
6.4 SOURCES OF EXCEPTIONS
The processor receives except ions from t hree sources:
Processor- det ect ed program- error except ions.
Soft ware- generat ed except ions.
Machine- check except ions.
6.4.1 Program-Error Exceptions
The processor generat es one or more except ions when it det ect s program errors
during t he execut ion in an applicat ion program or t he operat ing syst em or execut ive.
I nt el 64 and I A- 32 archit ect ures define a vect or number for each processor- det ect -
able except ion. Except ions are classified as f aul t s, t r aps, and abor t s ( see Sect ion
6. 5, Except ion Classificat ions ) .
6-6 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.4.2 Software-Generated Exceptions
The I NTO, I NT 3, and BOUND inst ruct ions permit except ions t o be generat ed in soft -
ware. These inst ruct ions allow checks for except ion condit ions t o be performed at
point s in t he inst ruct ion st ream. For example, I NT 3 causes a breakpoint except ion t o
be generat ed.
The I NT n inst ruct ion can be used t o emulat e except ions in soft ware; but t here is a
limit at ion. I f I NT n provides a vect or for one of t he archit ect urally- defined excep-
t ions, t he processor generat es an int errupt t o t he correct vect or ( t o access t he
except ion handler) but does not push an error code on t he st ack. This is t rue even if
t he associat ed hardware- generat ed except ion normally produces an error code. The
except ion handler will st ill at t empt t o pop an error code from t he st ack while handling
t he except ion. Because no error code was pushed, t he handler will pop off and
discard t he EI P inst ead ( in place of t he missing error code) . This sends t he ret urn t o
t he wrong locat ion.
6.4.3 Machine-Check Exceptions
The P6 family and Pent ium processors provide bot h int ernal and ext ernal machine-
check mechanisms for checking t he operat ion of t he int ernal chip hardware and bus
t ransact ions. These mechanisms are implement at ion dependent . When a machine-
check error is det ect ed, t he processor signals a machine- check except ion ( vect or 18)
and ret urns an error code.
See Chapt er 6, I nt errupt 18Machine- Check Except ion ( # MC) and Chapt er 15,
Machine- Check Archit ect ure, for more informat ion about t he machine- check
mechanism.
6.5 EXCEPTION CLASSIFICATIONS
Except ions are classified as f aul t s, t r aps, or abor t s depending on t he way t hey are
report ed and whet her t he inst ruct ion t hat caused t he except ion can be rest art ed
wit hout loss of program or t ask cont inuit y.
Faul t s A fault is an except ion t hat can generally be correct ed and t hat , once
correct ed, allows t he program t o be rest art ed wit h no loss of cont inuit y. When a
fault is report ed, t he processor rest ores t he machine st at e t o t he st at e prior t o
t he beginning of execut ion of t he fault ing inst ruct ion. The ret urn address ( saved
cont ent s of t he CS and EI P regist ers) for t he fault handler point s t o t he fault ing
inst ruct ion, rat her t han t o t he inst ruct ion following t he fault ing inst ruct ion.
Tr aps A t rap is an except ion t hat is report ed immediat ely following t he
execut ion of t he t rapping inst ruct ion. Traps allow execut ion of a program or t ask
t o be cont inued wit hout loss of program cont inuit y. The ret urn address for t he
t rap handler point s t o t he inst ruct ion t o be execut ed aft er t he t rapping
inst ruct ion.
Vol. 3 6-7
INTERRUPT AND EXCEPTION HANDLING
Abor t s An abort is an except ion t hat does not always report t he precise
locat ion of t he inst ruct ion causing t he except ion and does not allow a rest art of
t he program or t ask t hat caused t he except ion. Abort s are used t o report severe
errors, such as hardware errors and inconsist ent or illegal values in syst em
t ables.
NOTE
One except ion subset normally report ed as a fault is not rest art able.
Such except ions result in loss of some processor st at e. For example,
execut ing a POPAD inst ruct ion where t he st ack frame crosses over
t he end of t he st ack segment causes a fault t o be report ed. I n t his
sit uat ion, t he except ion handler sees t hat t he inst ruct ion point er
( CS: EI P) has been rest ored as if t he POPAD inst ruct ion had not been
execut ed. However, int ernal processor st at e ( t he general- purpose
regist ers) will have been modified. Such cases are considered
programming errors. An applicat ion causing t his class of except ions
should be t erminat ed by t he operat ing syst em.
6.6 PROGRAM OR TASK RESTART
To allow t he rest art ing of program or t ask following t he handling of an except ion or
an int errupt , all except ions ( except abort s) are guarant eed t o report except ions on
an inst ruct ion boundary. All int errupt s are guarant eed t o be t aken on an inst ruct ion
boundary.
For fault - class except ions, t he ret urn inst ruct ion point er ( saved when t he processor
generat es an except ion) point s t o t he fault ing inst ruct ion. So, when a program or t ask
is rest art ed following t he handling of a fault , t he fault ing inst ruct ion is rest art ed ( re-
execut ed) . Rest art ing t he fault ing inst ruct ion is commonly used t o handle except ions
t hat are generat ed when access t o an operand is blocked. The most common example
of t his t ype of fault is a page- fault except ion ( # PF) t hat occurs when a program or
t ask references an operand locat ed on a page t hat is not in memory. When a page-
fault except ion occurs, t he except ion handler can load t he page int o memory and
resume execut ion of t he program or t ask by rest art ing t he fault ing inst ruct ion. To
insure t hat t he rest art is handled t ransparent ly t o t he current ly execut ing program or
t ask, t he processor saves t he necessary regist ers and st ack point ers t o allow a rest art
t o t he st at e prior t o t he execut ion of t he fault ing inst ruct ion.
For t rap- class except ions, t he ret urn inst ruct ion point er point s t o t he inst ruct ion
following t he t rapping inst ruct ion. I f a t rap is det ect ed during an inst ruct ion which
t ransfers execut ion, t he ret urn inst ruct ion point er reflect s t he t ransfer. For example,
if a t rap is det ect ed while execut ing a JMP inst ruct ion, t he ret urn inst ruct ion point er
point s t o t he dest inat ion of t he JMP inst ruct ion, not t o t he next address past t he JMP
inst ruct ion. All t rap except ions allow program or t ask rest art wit h no loss of cont i-
nuit y. For example, t he overflow except ion is a t rap except ion. Here, t he ret urn
inst ruct ion point er point s t o t he inst ruct ion following t he I NTO inst ruct ion t hat t est ed
6-8 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
EFLAGS. OF ( overflow) flag. The t rap handler for t his except ion resolves t he overflow
condit ion. Upon ret urn from t he t rap handler, program or t ask execut ion cont inues at
t he inst ruct ion following t he I NTO inst ruct ion.
The abort - class except ions do not support reliable rest art ing of t he program or t ask.
Abort handlers are designed t o collect diagnost ic informat ion about t he st at e of t he
processor when t he abort except ion occurred and t hen shut down t he applicat ion and
syst em as gracefully as possible.
I nt errupt s rigorously support rest art ing of int errupt ed programs and t asks wit hout
loss of cont inuit y. The ret urn inst ruct ion point er saved for an int errupt point s t o t he
next inst ruct ion t o be execut ed at t he inst ruct ion boundary where t he processor t ook
t he int errupt . I f t he inst ruct ion j ust execut ed has a repeat prefix, t he int errupt is
t aken at t he end of t he current it erat ion wit h t he regist ers set t o execut e t he next
it erat ion.
The abilit y of a P6 family processor t o speculat ively execut e inst ruct ions does not
affect t he t aking of int errupt s by t he processor. I nt errupt s are t aken at inst ruct ion
boundaries locat ed during t he ret irement phase of inst ruct ion execut ion; so t hey are
always t aken in t he in- order inst ruct ion st ream. See Chapt er 2, I nt el 64 and I A-
32 Archit ect ures, in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 1, for more informat ion about t he P6 family processors microarchi-
t ect ure and it s support for out - of- order inst ruct ion execut ion.
Not e t hat t he Pent ium processor and earlier I A- 32 processors also perform varying
amount s of prefet ching and preliminary decoding. Wit h t hese processors as well,
except ions and int errupt s are not signaled unt il act ual in- order execut ion of t he
inst ruct ions. For a given code sample, t he signaling of except ions occurs uniformly
when t he code is execut ed on any family of I A- 32 processors ( except where new
except ions or new opcodes have been defined) .
6.7 NONMASKABLE INTERRUPT (NMI)
The nonmaskable int errupt ( NMI ) can be generat ed in eit her of t wo ways:
Ext ernal hardware assert s t he NMI pin.
The processor receives a message on t he syst em bus ( Pent ium 4, I nt el Core Duo,
I nt el Core 2, I nt el At om, and I nt el Xeon processors) or t he API C serial bus ( P6
family and Pent ium processors) wit h a delivery mode NMI .
When t he processor receives a NMI from eit her of t hese sources, t he processor
handles it immediat ely by calling t he NMI handler point ed t o by int errupt vect or
number 2. The processor also invokes cert ain hardware condit ions t o insure t hat no
ot her int errupt s, including NMI int errupt s, are received unt il t he NMI handler has
complet ed execut ing ( see Sect ion 6. 7. 1, Handling Mult iple NMI s ) .
Also, when an NMI is received from eit her of t he above sources, it cannot be masked
by t he I F flag in t he EFLAGS regist er.
Vol. 3 6-9
INTERRUPT AND EXCEPTION HANDLING
I t is possible t o issue a maskable hardware int errupt ( t hrough t he I NTR pin) t o vect or
2 t o invoke t he NMI int errupt handler; however, t his int errupt will not t ruly be an NMI
int errupt . A t rue NMI int errupt t hat act ivat es t he processor s NMI - handling hardware
can only be delivered t hrough one of t he mechanisms list ed above.
6.7.1 Handling Multiple NMIs
While an NMI int errupt handler is execut ing, t he processor disables addit ional calls t o
t he NMI handler unt il t he next I RET inst ruct ion is execut ed. This blocking of subse-
quent NMI s prevent s st acking up calls t o t he NMI handler. I t is recommended t hat t he
NMI int errupt handler be accessed t hrough an int errupt gat e t o disable maskable
hardware int errupt s ( see Sect ion 6. 8. 1, Masking Maskable Hardware I nt errupt s ) . I f
t he NMI handler is a virt ual- 8086 t ask wit h an I OPL of less t han 3, an I RET inst ruct ion
issued from t he handler generat es a general- prot ect ion except ion ( see Sect ion
17. 2. 7, Sensit ive I nst ruct ions ) . I n t his case, t he NMI is unmasked before t he
general- prot ect ion except ion handler is invoked.
6.8 ENABLING AND DISABLING INTERRUPTS
The processor inhibit s t he generat ion of some int errupt s, depending on t he st at e of
t he processor and of t he I F and RF flags in t he EFLAGS regist er, as described in t he
following sect ions.
6.8.1 Masking Maskable Hardware Interrupts
The I F flag can disable t he servicing of maskable hardware int errupt s received on t he
processor s I NTR pin or t hrough t he local API C ( see Sect ion 6.3.2, Maskable Hard-
ware I nt errupt s ) . When t he I F flag is clear, t he processor inhibit s int errupt s deliv-
ered t o t he I NTR pin or t hrough t he local API C from generat ing an int ernal int errupt
request ; when t he I F flag is set , int errupt s delivered t o t he I NTR or t hrough t he local
API C pin are processed as normal ext ernal int errupt s.
The I F flag does not affect non- maskable int errupt s ( NMI s) delivered t o t he NMI pin
or delivery mode NMI messages delivered t hrough t he local API C, nor does it affect
processor generat ed except ions. As wit h t he ot her flags in t he EFLAGS regist er, t he
processor clears t he I F flag in response t o a hardware reset .
The fact t hat t he group of maskable hardware int errupt s includes t he reserved int er-
rupt and except ion vect ors 0 t hrough 32 can pot ent ially cause confusion. Archit ect ur-
ally, when t he I F flag is set , an int errupt for any of t he vect ors from 0 t hrough 32 can
be delivered t o t he processor t hrough t he I NTR pin and any of t he vect ors from 16
t hrough 32 can be delivered t hrough t he local API C. The processor will t hen generat e
an int errupt and call t he int errupt or except ion handler point ed t o by t he vect or
number. So for example, it is possible t o invoke t he page- fault handler t hrough t he
I NTR pin ( by means of vect or 14) ; however, t his is not a t rue page- fault except ion. I t
6-10 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
is an int errupt . As wit h t he I NT n inst ruct ion ( see Sect ion 6. 4.2, Soft ware- Generat ed
Except ions ) , when an int errupt is generat ed t hrough t he I NTR pin t o an except ion
vect or, t he processor does not push an error code on t he st ack, so t he except ion
handler may not operat e correct ly.
The I F flag can be set or cleared wit h t he STI ( set int errupt - enable flag) and CLI
( clear int errupt - enable flag) inst ruct ions, respect ively. These inst ruct ions may be
execut ed only if t he CPL is equal t o or less t han t he I OPL. A general- prot ect ion excep-
t ion ( # GP) is generat ed if t hey are execut ed when t he CPL is great er t han t he I OPL.
( The effect of t he I OPL on t hese inst ruct ions is modified slight ly when t he virt ual
mode ext ension is enabled by set t ing t he VME flag in cont rol regist er CR4: see
Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode. Behavior is
also impact ed by t he PVI flag: see Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
The I F flag is also affect ed by t he following operat ions:
The PUSHF inst ruct ion st ores all flags on t he st ack, where t hey can be examined
and modified. The POPF inst ruct ion can be used t o load t he modified flags back
int o t he EFLAGS regist er.
Task swit ches and t he POPF and I RET inst ruct ions load t he EFLAGS regist er;
t herefore, t hey can be used t o modify t he set t ing of t he I F flag.
When an int errupt is handled t hrough an int errupt gat e, t he I F flag is aut omat i-
cally cleared, which disables maskable hardware int errupt s. ( I f an int errupt is
handled t hrough a t rap gat e, t he I F flag is not cleared. )
See t he descript ions of t he CLI , STI , PUSHF, POPF, and I RET inst ruct ions in Chapt er
3, I nst ruct ion Set Reference, A- M, in t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 2A, for a det ailed descript ion of t he operat ions
t hese inst ruct ions are allowed t o perform on t he I F flag.
6.8.2 Masking Instruction Breakpoints
The RF ( resume) flag in t he EFLAGS regist er cont rols t he response of t he processor
t o inst ruct ion- breakpoint condit ions ( see t he descript ion of t he RF flag in Sect ion 2. 3,
Syst em Flags and Fields in t he EFLAGS Regist er ) .
When set , it prevent s an inst ruct ion breakpoint from generat ing a debug except ion
( # DB) ; when clear, inst ruct ion breakpoint s will generat e debug except ions. The
primary funct ion of t he RF flag is t o prevent t he processor from going int o a debug
except ion loop on an inst ruct ion- breakpoint . See Sect ion 16. 3. 1. 1, I nst ruct ion-
Breakpoint Except ion Condit ion, for more informat ion on t he use of t his flag.
Vol. 3 6-11
INTERRUPT AND EXCEPTION HANDLING
6.8.3 Masking Exceptions and Interrupts When Switching Stacks
To swit ch t o a different st ack segment , soft ware oft en uses a pair of inst ruct ions, for
example:
MOV SS, AX
MOV ESP, StackTop
I f an int errupt or except ion occurs aft er t he segment select or has been loaded int o
t he SS regist er but before t he ESP regist er has been loaded, t hese t wo part s of t he
logical address int o t he st ack space are inconsist ent for t he durat ion of t he int errupt
or except ion handler.
To prevent t his sit uat ion, t he processor inhibit s int errupt s, debug except ions, and
single- st ep t rap except ions aft er eit her a MOV t o SS inst ruct ion or a POP t o SS
inst ruct ion, unt il t he inst ruct ion boundary following t he next inst ruct ion is reached.
All ot her fault s may st ill be generat ed. I f t he LSS inst ruct ion is used t o modify t he
cont ent s of t he SS regist er ( which is t he recommended met hod of modifying t his
regist er) , t his problem does not occur.
6.9 PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND
INTERRUPTS
I f more t han one except ion or int errupt is pending at an inst ruct ion boundary, t he
processor services t hem in a predict able order. Table 6- 2 shows t he priorit y among
classes of except ion and int errupt sources.
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts
Priority Description
1 (Highest) Hardware Reset and Machine Checks
- RESET
- Machine Check
2 Trap on Task Switch
- T flag in TSS is set
3 External Hardware Interventions
- FLUSH
- STOPCLK
- SMI
- INIT
4 Traps on the Previous Instruction
- Breakpoints
- Debug Trap Exceptions (TF flag set or data/I-O breakpoint)
6-12 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
While priorit y among t hese classes list ed in Table 6- 2 is consist ent t hroughout t he
archit ect ure, except ions wit hin each class are implement at ion- dependent and may
vary from processor t o processor. The processor first services a pending except ion or
int errupt from t he class which has t he highest priorit y, t ransferring execut ion t o t he
first inst ruct ion of t he handler. Lower priorit y except ions are discarded; lower priorit y
int errupt s are held pending. Discarded except ions are re- generat ed when t he int er-
rupt handler ret urns execut ion t o t he point in t he program or t ask where t he excep-
t ions and/ or int errupt s occurred.
6.10 INTERRUPT DESCRIPTOR TABLE (IDT)
The int errupt descript or t able ( I DT) associat es each except ion or int errupt vect or
wit h a gat e descript or for t he procedure or t ask used t o service t he associat ed excep-
t ion or int errupt . Like t he GDT and LDTs, t he I DT is an array of 8- byt e descript ors ( in
5 Nonmaskable Interrupts (NMI)
1
6 Maskable Hardware Interrupts
1
7 Code Breakpoint Fault
8 Faults from Fetching Next Instruction
- Code-Segment Limit Violation
- Code Page Fault
9 Faults from Decoding the Next Instruction
- Instruction length > 15 bytes
- Invalid Opcode
- Coprocessor Not Available
10 (Lowest) Faults on Executing an Instruction
- Overflow
- Bound error
- Invalid TSS
- Segment Not Present
- Stack fault
- General Protection
- Data Page Fault
- Alignment Check
- x87 FPU Floating-point exception
- SIMD floating-point exception
NOTE:
1. The Intel486 processor and earlier processors group nonmaskable and maskable interrupts in
the same priority class.
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts (Contd.)
Vol. 3 6-13
INTERRUPT AND EXCEPTION HANDLING
prot ect ed mode) . Unlike t he GDT, t he first ent ry of t he I DT may cont ain a descript or.
To form an index int o t he I DT, t he processor scales t he except ion or int errupt vect or
by eight ( t he number of byt es in a gat e descript or) . Because t here are only 256 int er-
rupt or except ion vect ors, t he I DT need not cont ain more t han 256 descript ors. I t can
cont ain fewer t han 256 descript ors, because descript ors are required only for t he
int errupt and except ion vect ors t hat may occur. All empt y descript or slot s in t he I DT
should have t he present flag for t he descript or set t o 0.
The base addresses of t he I DT should be aligned on an 8- byt e boundary t o maximize
performance of cache line fills. The limit value is expressed in byt es and is added t o
t he base address t o get t he address of t he last valid byt e. A limit value of 0 result s in
exact ly 1 valid byt e. Because I DT ent ries are always eight byt es long, t he limit should
always be one less t han an int egral mult iple of eight ( t hat is, 8N 1) .
The I DT may reside anywhere in t he linear address space. As shown in Figure 6- 1,
t he processor locat es t he I DT using t he I DTR regist er. This regist er holds bot h a
32- bit base address and 16- bit limit for t he I DT.
The LI DT ( load I DT regist er) and SI DT ( st ore I DT regist er) inst ruct ions load and st ore
t he cont ent s of t he I DTR regist er, respect ively. The LI DT inst ruct ion loads t he I DTR
regist er wit h t he base address and limit held in a memory operand. This inst ruct ion
can be execut ed only when t he CPL is 0. I t normally is used by t he init ializat ion code
of an operat ing syst em when creat ing an I DT. An operat ing syst em also may use it t o
change from one I DT t o anot her. The SI DT inst ruct ion copies t he base and limit value
st ored in I DTR t o memory. This inst ruct ion can be execut ed at any privilege level.
I f a vect or references a descript or beyond t he limit of t he I DT, a general- prot ect ion
except ion ( # GP) is generat ed.
NOTE
Because int errupt s are delivered t o t he processor core only once, an
incorrect ly configured I DT could result in incomplet e int errupt
handling and/ or t he blocking of int errupt delivery.
I A- 32 archit ect ure rules need t o be followed for set t ing up I DTR
base/ limit / access fields and each field in t he gat e descript ors. The
same apply for t he I nt el 64 archit ect ure. This includes implicit
referencing of t he dest inat ion code segment t hrough t he GDT or LDT
and accessing t he st ack.
6-14 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.11 IDT DESCRIPTORS
The I DT may cont ain any of t hree kinds of gat e descript ors:
Task- gat e descript or
I nt errupt - gat e descript or
Trap- gat e descript or
Figure 6- 2 shows t he format s for t he t ask- gat e, int errupt - gat e, and t rap- gat e
descript ors. The format of a t ask gat e used in an I DT is t he same as t hat of a t ask
gat e used in t he GDT or an LDT ( see Sect ion 7. 2. 5, Task- Gat e Descript or ) . The t ask
gat e cont ains t he segment select or for a TSS for an except ion and/ or int errupt
handler t ask.
I nt errupt and t rap gat es are very similar t o call gat es ( see Sect ion 5. 8. 3, Call
Gat es ) . They cont ain a far point er ( segment select or and offset ) t hat t he processor
uses t o t ransfer program execut ion t o a handler procedure in an except ion- or int er-
rupt - handler code segment . These gat es differ in t he way t he processor handles t he
I F flag in t he EFLAGS regist er ( see Sect ion 6. 12. 1. 2, Flag Usage By Except ion- or
I nt errupt - Handler Procedure ) .
Figure 6-1. Relationship of the IDTR and IDT
IDT Limit IDT Base Address
+
Interrupt
Descriptor Table (IDT)
Gate for
0
IDTR Register
Interrupt #n
Gate for
Interrupt #3
Gate for
Interrupt #2
Gate for
Interrupt #1
15 16 47
0 31
0
8
16
(n1)8
Vol. 3 6-15
INTERRUPT AND EXCEPTION HANDLING
6.12 EXCEPTION AND INTERRUPT HANDLING
The processor handles calls t o except ion- and int errupt - handlers similar t o t he way it
handles calls wit h a CALL inst ruct ion t o a procedure or a t ask. When responding t o an
except ion or int errupt , t he processor uses t he except ion or int errupt vect or as an
index t o a descript or in t he I DT. I f t he index point s t o an int errupt gat e or t rap gat e,
t he processor calls t he except ion or int errupt handler in a manner similar t o a CALL
t o a call gat e ( see Sect ion 5. 8.2, Gat e Descript ors, t hrough Sect ion 5. 8.6,
Figure 6-2. IDT Gate Descriptors
31 16 15 13 14 12 8 7 0
P Offset 31..16
D
P
L
0
4
31 16 15 0
Segment Selector Offset 15..0
0
0 1 1 D
Interrupt Gate
DPL
Offset
P
Selector
Descriptor Privilege Level
Offset to procedure entry point
Segment Present flag
Segment Selector for destination code segment
31 16 15 13 14 12 8 7 0
P
D
P
L
0
4
31 16 15 0
TSS Segment Selector 0
1 0 1 0
Task Gate
4 5
0 0 0
31 16 15 13 14 12 8 7 0
P Offset 31..16
D
P
L
0
4
31 16 15 0
Segment Selector Offset 15..0 0
1 1 1 D
Trap Gate
4 5
0 0 0
Reserved
Size of gate: 1 = 32 bits; 0 = 16 bits D
6-16 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Ret urning from a Called Procedure ) . I f index point s t o a t ask gat e, t he processor
execut es a t ask swit ch t o t he except ion- or int errupt - handler t ask in a manner similar
t o a CALL t o a t ask gat e ( see Sect ion 7. 3, Task Swit ching ) .
6.12.1 Exception- or Interrupt-Handler Procedures
An int errupt gat e or t rap gat e references an except ion- or int errupt - handler proce-
dure t hat runs in t he cont ext of t he current ly execut ing t ask ( see Figure 6- 3) . The
segment select or for t he gat e point s t o a segment descript or for an execut able code
segment in eit her t he GDT or t he current LDT. The offset field of t he gat e descript or
point s t o t he beginning of t he except ion- or int errupt - handling procedure.
Figure 6-3. Interrupt Procedure Call
IDT
Interrupt or
Code Segment
Segment Selector
GDT or LDT
Segment
Interrupt
Vector
Base
Address
Destination
Procedure
Interrupt
+
Descriptor
Trap Gate
Offset
Vol. 3 6-17
INTERRUPT AND EXCEPTION HANDLING
When t he processor performs a call t o t he except ion- or int errupt - handler procedure:
I f t he handler procedure is going t o be execut ed at a numerically lower privilege
level, a st ack swit ch occurs. When t he st ack swit ch occurs:
a. The segment select or and st ack point er for t he st ack t o be used by t he
handler are obt ained from t he TSS for t he current ly execut ing t ask. On t his
new st ack, t he processor pushes t he st ack segment select or and st ack
point er of t he int errupt ed procedure.
b. The processor t hen saves t he current st at e of t he EFLAGS, CS, and EI P
regist ers on t he new st ack ( see Figures 6- 4) .
c. I f an except ion causes an error code t o be saved, it is pushed on t he new
st ack aft er t he EI P value.
I f t he handler procedure is going t o be execut ed at t he same privilege level as t he
int errupt ed procedure:
a. The processor saves t he current st at e of t he EFLAGS, CS, and EI P regist ers
on t he current st ack ( see Figures 6- 4) .
b. I f an except ion causes an error code t o be saved, it is pushed on t he current
st ack aft er t he EI P value.
6-18 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
To ret urn from an except ion- or int errupt - handler procedure, t he handler must use
t he I RET ( or I RETD) inst ruct ion. The I RET inst ruct ion is similar t o t he RET inst ruct ion
except t hat it rest ores t he saved flags int o t he EFLAGS regist er. The I OPL field of t he
EFLAGS regist er is rest ored only if t he CPL is 0. The I F flag is changed only if t he CPL
is less t han or equal t o t he I OPL. See Chapt er 3, I nst ruct ion Set Reference, A- M, of
t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2A, for
a descript ion of t he complet e operat ion performed by t he I RET inst ruct ion.
I f a st ack swit ch occurred when calling t he handler procedure, t he I RET inst ruct ion
swit ches back t o t he int errupt ed procedures st ack on t he ret urn.
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures
The privilege- level prot ect ion for except ion- and int errupt - handler procedures is
similar t o t hat used for ordinary procedure calls when called t hrough a call gat e ( see
Sect ion 5.8. 4, Accessing a Code Segment Through a Call Gat e ) . The processor does
Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines
CS
Error Code
EFLAGS
CS
EIP
ESP After
Transfer to Handler
Error Code
ESP Before
Transfer to Handler
EFLAGS
EIP
SS
ESP
Stack Usage with No
Privilege-Level Change
Stack Usage with
Privilege-Level Change
Interrupted Procedures
Interrupted Procedures
and Handlers Stack
Handlers Stack
ESP After
Transfer to Handler
Transfer to Handler
ESP Before
Stack
Vol. 3 6-19
INTERRUPT AND EXCEPTION HANDLING
not permit t ransfer of execut ion t o an except ion- or int errupt - handler procedure in a
less privileged code segment ( numerically great er privilege level) t han t he CPL.
An at t empt t o violat e t his rule result s in a general- prot ect ion except ion ( # GP) . The
prot ect ion mechanism for except ion- and int errupt - handler procedures is different in
t he following ways:
Because int errupt and except ion vect ors have no RPL, t he RPL is not checked on
implicit calls t o except ion and int errupt handlers.
The processor checks t he DPL of t he int errupt or t rap gat e only if an except ion or
int errupt is generat ed wit h an I NT n, I NT 3, or I NTO inst ruct ion. Here, t he CPL
must be less t han or equal t o t he DPL of t he gat e. This rest rict ion prevent s
applicat ion programs or procedures running at privilege level 3 from using a
soft ware int errupt t o access crit ical except ion handlers, such as t he page- fault
handler, providing t hat t hose handlers are placed in more privileged code
segment s ( numerically lower privilege level) . For hardware- generat ed int errupt s
and processor- det ect ed except ions, t he processor ignores t he DPL of int errupt
and t rap gat es.
Because except ions and int errupt s generally do not occur at predict able t imes, t hese
privilege rules effect ively impose rest rict ions on t he privilege levels at which excep-
t ion and int errupt - handling procedures can run. Eit her of t he following t echniques
can be used t o avoid privilege- level violat ions.
The except ion or int errupt handler can be placed in a conforming code segment .
This t echnique can be used for handlers t hat only need t o access dat a available
on t he st ack ( for example, divide error except ions) . I f t he handler needs dat a
from a dat a segment , t he dat a segment needs t o be accessible from privilege
level 3, which would make it unprot ect ed.
The handler can be placed in a nonconforming code segment wit h privilege level
0. This handler would always run, regardless of t he CPL t hat t he int errupt ed
program or t ask is running at .
6.12.1.2 Flag Usage By Exception- or Interrupt-Handler Procedure
When accessing an except ion or int errupt handler t hrough eit her an int errupt gat e or
a t rap gat e, t he processor clears t he TF flag in t he EFLAGS regist er aft er it saves t he
cont ent s of t he EFLAGS regist er on t he st ack. ( On calls t o except ion and int errupt
handlers, t he processor also clears t he VM, RF, and NT flags in t he EFLAGS regist er,
aft er t hey are saved on t he st ack. ) Clearing t he TF flag prevent s inst ruct ion t racing
from affect ing int errupt response. A subsequent I RET inst ruct ion rest ores t he TF
( and VM, RF, and NT) flags t o t he values in t he saved cont ent s of t he EFLAGS regist er
on t he st ack.
The only difference bet ween an int errupt gat e and a t rap gat e is t he way t he
processor handles t he I F flag in t he EFLAGS regist er. When accessing an except ion-
or int errupt - handling procedure t hrough an int errupt gat e, t he processor clears t he
I F flag t o prevent ot her int errupt s from int erfering wit h t he current int errupt handler.
A subsequent I RET inst ruct ion rest ores t he I F flag t o it s value in t he saved cont ent s
6-20 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
of t he EFLAGS regist er on t he st ack. Accessing a handler procedure t hrough a t rap
gat e does not affect t he I F flag.
6.12.2 Interrupt Tasks
When an except ion or int errupt handler is accessed t hrough a t ask gat e in t he I DT, a
t ask swit ch result s. Handling an except ion or int errupt wit h a separat e t ask offers
several advant ages:
The ent ire cont ext of t he int errupt ed program or t ask is saved aut omat ically.
A new TSS permit s t he handler t o use a new privilege level 0 st ack when handling
t he except ion or int errupt . I f an except ion or int errupt occurs when t he current
privilege level 0 st ack is corrupt ed, accessing t he handler t hrough a t ask gat e can
prevent a syst em crash by providing t he handler wit h a new privilege level 0
st ack.
The handler can be furt her isolat ed from ot her t asks by giving it a separat e
address space. This is done by giving it a separat e LDT.
The disadvant age of handling an int errupt wit h a separat e t ask is t hat t he amount of
machine st at e t hat must be saved on a t ask swit ch makes it slower t han using an
int errupt gat e, result ing in increased int errupt lat ency.
A t ask gat e in t he I DT references a TSS descript or in t he GDT ( see Figure 6- 5) . A
swit ch t o t he handler t ask is handled in t he same manner as an ordinary t ask swit ch
( see Sect ion 7.3, Task Swit ching ) . The link back t o t he int errupt ed t ask is st ored in
t he previous t ask link field of t he handler t asks TSS. I f an except ion caused an error
code t o be generat ed, t his error code is copied t o t he st ack of t he new t ask.
When except ion- or int errupt - handler t asks are used in an operat ing syst em, t here
are act ually t wo mechanisms t hat can be used t o dispat ch t asks: t he soft ware sched-
uler ( part of t he operat ing syst em) and t he hardware scheduler ( part of t he
processor' s int errupt mechanism) . The soft ware scheduler needs t o accommodat e
int errupt t asks t hat may be dispat ched when int errupt s are enabled.
NOTE
Because I A- 32 archit ect ure t asks are not re- ent rant , an int errupt -
handler t ask must disable int errupt s bet ween t he t ime it complet es
handling t he int errupt and t he t ime it execut es t he I RET inst ruct ion.
This act ion prevent s anot her int errupt from occurring while t he
int errupt t asks TSS is st ill marked busy, which would cause a
general- prot ect ion ( # GP) except ion.
Vol. 3 6-21
INTERRUPT AND EXCEPTION HANDLING
6.13 ERROR CODE
When an except ion condit ion is relat ed t o a specific segment , t he processor pushes
an error code ont o t he st ack of t he except ion handler ( whet her it is a procedure or
t ask) . The error code has t he format shown in Figure 6- 6. The error code resembles
a segment select or; however, inst ead of a TI flag and RPL field, t he error code
cont ains 3 flags:
EXT Ex t er nal event ( bi t 0) When set , indicat es t hat an event ext ernal
t o t he program, such as a hardware int errupt , caused t he except ion.
I DT Descr i pt or l ocat i on ( bi t 1) When set , indicat es t hat t he index
port ion of t he error code refers t o a gat e descript or in t he I DT; when
Figure 6-5. Interrupt Task Switch
IDT
Task Gate
TSS for Interrupt-
TSS Selector
GDT
TSS Descriptor
Interrupt
Vector
TSS
Base
Address
Handling Task
6-22 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
clear, indicat es t hat t he index refers t o a descript or in t he GDT or t he
current LDT.
TI GDT/ LDT ( bi t 2) Only used when t he I DT flag is clear. When set ,
t he TI flag indicat es t hat t he index port ion of t he error code refers t o
a segment or gat e descript or in t he LDT; when clear, it indicat es t hat
t he index refers t o a descript or in t he current GDT.
The segment select or index field provides an index int o t he I DT, GDT, or current LDT
t o t he segment or gat e select or being referenced by t he error code. I n some cases
t he error code is null ( t hat is, all bit s in t he lower word are clear) . A null error code
indicat es t hat t he error was not caused by a reference t o a specific segment or t hat a
null segment descript or was referenced in an operat ion.
The format of t he error code is different for page- fault except ions ( # PF) . See t he
I nt errupt 14Page- Fault Except ion ( # PF) sect ion in t his chapt er.
The error code is pushed on t he st ack as a doubleword or word ( depending on t he
default int errupt , t rap, or t ask gat e size) . To keep t he st ack aligned for doubleword
pushes, t he upper half of t he error code is reserved. Not e t hat t he error code is not
popped when t he I RET inst ruct ion is execut ed t o ret urn from an except ion handler, so
t he handler must remove t he error code before execut ing a ret urn.
Error codes are not pushed on t he st ack for except ions t hat are generat ed ext ernally
( wit h t he I NTR or LI NT[ 1: 0] pins) or t he I NT n inst ruct ion, even if an error code is
normally produced for t hose except ions.
6.14 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT
MODE
I n 64- bit mode, int errupt and except ion handling is similar t o what has been
described for non- 64- bit modes. The following are t he except ions:
All int errupt handlers point ed by t he I DT are in 64- bit code ( t his does not apply t o
t he SMI handler) .
The size of int errupt - st ack pushes is fixed at 64 bit s; and t he processor uses
8- byt e, zero ext ended st ores.
Figure 6-6. Error Code
31 0
Reserved
I
D
T
T
I
1 2 3
Segment Selector Index
E
X
T
Vol. 3 6-23
INTERRUPT AND EXCEPTION HANDLING
The st ack point er ( SS: RSP) is pushed uncondit ionally on int errupt s. I n legacy
modes, t his push is condit ional and based on a change in current privilege level
( CPL) .
The new SS is set t o NULL if t here is a change in CPL.
I RET behavior changes.
There is a new int errupt st ack- swit ch mechanism.
The alignment of int errupt st ack frame is different .
6.14.1 64-Bit Mode IDT
I nt errupt and t rap gat es are 16 byt es in lengt h t o provide a 64- bit offset for t he
inst ruct ion point er ( RI P) . The 64- bit RI P referenced by int errupt - gat e descript ors
allows an int errupt service rout ine t o be locat ed anywhere in t he linear- address
space. See Figure 6- 7.
I n 64- bit mode, t he I DT index is formed by scaling t he int errupt vect or by 16. The
first eight byt es ( byt es 7: 0) of a 64- bit mode int errupt gat e are similar but not iden-
t ical t o legacy 32- bit int errupt gat es. The t ype field ( bit s 11: 8 in byt es 7: 4) is
described in Table 3- 2. The I nt errupt St ack Table ( I ST) field ( bit s 4: 0 in byt es 7: 4) is
used by t he st ack swit ching mechanisms described in Sect ion 6.14. 5, I nt errupt
St ack Table. Byt es 11: 8 hold t he upper 32 bit s of t he t arget RI P ( int errupt segment
offset ) in canonical form. A general- prot ect ion except ion ( # GP) is generat ed if soft -
Figure 6-7. 64-Bit IDT Gate Descriptors
31 16 15 13 14 12 8 7 0
P
Offset 31..16
D
P
L
0 4
31 16 15 0
Segment Selector Offset 15..0 0
TYPE
Interrupt/Trap Gate
DPL
Offset
P
Selector
Descriptor Privilege Level
Offset to procedure entry point
Segment Present flag
Segment Selector for destination code segment
4 5
0 0 0
31 0
Offset 63..32
8
31 0
12
11
IST 0 0
2
Reserved
IST
Interrupt Stack Table
6-24 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
ware at t empt s t o reference an int errupt gat e wit h a t arget RI P t hat is not in canonical
form.
The t arget code segment referenced by t he int errupt gat e must be a 64- bit code
segment ( CS. L = 1, CS. D = 0) . I f t he t arget is not a 64- bit code segment , a general-
prot ect ion except ion ( # GP) is generat ed wit h t he I DT vect or number report ed as t he
error code.
Only 64- bit int errupt and t rap gat es can be referenced in I A- 32e mode ( 64- bit mode
and compat ibilit y mode) . Legacy 32- bit int errupt or t rap gat e t ypes ( 0EH or 0FH) are
redefined in I A- 32e mode as 64- bit int errupt and t rap gat e t ypes. No 32- bit int errupt
or t rap gat e t ype exist s in I A- 32e mode. I f a reference is made t o a 16- bit int errupt
or t rap gat e ( 06H or 07H) , a general- prot ect ion except ion ( # GP( 0) ) is generat ed.
6.14.2 64-Bit Mode Stack Frame
I n legacy mode, t he size of an I DT ent ry ( 16 bit s or 32 bit s) det ermines t he size of
int errupt - st ack- frame pushes. SS: ESP is pushed only on a CPL change. I n 64- bit
mode, t he size of int errupt st ack- frame pushes is fixed at eight byt es. This is because
only 64- bit mode gat es can be referenced. 64- bit mode also pushes SS: RSP uncon-
dit ionally, rat her t han only on a CPL change.
Aside from error codes, pushing SS: RSP uncondit ionally present s operat ing syst ems
wit h a consist ent int errupt - st ackframe size across all int errupt s. I nt errupt service-
rout ine ent ry point s t hat handle int errupt s generat ed by t he I NTn inst ruct ion or
ext ernal I NTR# signal can push an addit ional error code place- holder t o maint ain
consist ency.
I n legacy mode, t he st ack point er may be at any alignment when an int errupt or
except ion causes a st ack frame t o be pushed. This causes t he st ack frame and
succeeding pushes done by an int errupt handler t o be at arbit rary alignment s. I n
I A- 32e mode, t he RSP is aligned t o a 16- byt e boundary before pushing t he st ack
frame. The st ack frame it self is aligned on a 16- byt e boundary when t he int errupt
handler is called. The processor can arbit rarily realign t he new RSP on int errupt s
because t he previous ( possibly unaligned) RSP is uncondit ionally saved on t he newly
aligned st ack. The previous RSP will be aut omat ically rest ored by a subsequent I RET.
Aligning t he st ack permit s except ion and int errupt frames t o be aligned on a 16- byt e
boundary before int errupt s are re- enabled. This allows t he st ack t o be format t ed for
opt imal st orage of 16- byt e XMM regist ers, which enables t he int errupt handler t o use
fast er 16- byt e aligned loads and st ores ( MOVAPS rat her t han MOVUPS) t o save and
rest ore XMM regist ers.
Alt hough t he RSP alignment is always performed when LMA = 1, it is only of conse-
quence for t he kernel- mode case where t here is no st ack swit ch or I ST used. For a
st ack swit ch or I ST, t he OS would have presumably put suit ably aligned RSP values in
t he TSS.
Vol. 3 6-25
INTERRUPT AND EXCEPTION HANDLING
6.14.3 IRET in IA-32e Mode
I n I A- 32e mode, I RET execut es wit h an 8- byt e operand size. There is not hing t hat
forces t his requirement . The st ack is format t ed in such a way t hat for act ions where
I RET is required, t he 8- byt e I RET operand size works correct ly.
Because int errupt st ack- frame pushes are always eight byt es in I A- 32e mode, an
I RET must pop eight byt e it ems off t he st ack. This is accomplished by preceding t he
I RET wit h a 64- bit operand- size prefix. The size of t he pop is det ermined by t he
address size of t he inst ruct ion. The SS/ ESP/ RSP size adj ust ment is det ermined by
t he st ack size.
I RET pops SS: RSP uncondit ionally off t he int errupt st ack frame only when it is
execut ed in 64- bit mode. I n compat ibilit y mode, I RET pops SS: RSP off t he st ack only
if t here is a CPL change. This allows legacy applicat ions t o execut e properly in
compat ibilit y mode when using t he I RET inst ruct ion. 64- bit int errupt service rout ines
t hat exit wit h an I RET uncondit ionally pop SS: RSP off of t he int errupt st ack frame,
even if t he t arget code segment is running in 64- bit mode or at CPL = 0. This is
because t he original int errupt always pushes SS: RSP.
I n I A- 32e mode, I RET is allowed t o load a NULL SS under cert ain condit ions. I f t he
t arget mode is 64- bit mode and t he t arget CPL < > 3, I RET allows SS t o be loaded
wit h a NULL select or. As part of t he st ack swit ch mechanism, an int errupt or excep-
t ion set s t he new SS t o NULL, inst ead of fet ching a new SS select or from t he TSS and
loading t he corresponding descript or from t he GDT or LDT. The new SS select or is set
t o NULL in order t o properly handle ret urns from subsequent nest ed far t ransfers. I f
t he called procedure it self is int errupt ed, t he NULL SS is pushed on t he st ack frame.
On t he subsequent I RET, t he NULL SS on t he st ack act s as a flag t o t ell t he processor
not t o load a new SS descript or.
6.14.4 Stack Switching in IA-32e Mode
The I A- 32 archit ect ure provides a mechanism t o aut omat ically swit ch st ack frames in
response t o an int errupt . The 64- bit ext ensions of I nt el 64 archit ect ure implement a
modified version of t he legacy st ack- swit ching mechanism and an alt ernat ive st ack-
swit ching mechanism called t he int errupt st ack t able ( I ST) .
I n I A- 32 modes, t he legacy I A- 32 st ack- swit ch mechanism is unchanged. I n I A- 32e
mode, t he legacy st ack- swit ch mechanism is modified. When st acks are swit ched as
part of a 64- bit mode privilege- level change ( result ing from an int errupt ) , a new SS
descript or is not loaded. I A- 32e mode loads only an inner- level RSP from t he TSS.
The new SS select or is forced t o NULL and t he SS select or s RPL field is set t o t he new
CPL. The new SS is set t o NULL in order t o handle nest ed far t ransfers ( CALLF, I NT,
int errupt s and except ions) . The old SS and RSP are saved on t he new st ack
( Figure 6- 8) . On t he subsequent I RET, t he old SS is popped from t he st ack and
loaded int o t he SS regist er.
6-26 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
I n summary, a st ack swit ch in I A- 32e mode works like t he legacy st ack swit ch,
except t hat a new SS select or is not loaded from t he TSS. I nst ead, t he new SS is
forced t o NULL.
6.14.5 Interrupt Stack Table
I n I A- 32e mode, a new int errupt st ack t able ( I ST) mechanism is available as an alt er-
nat ive t o t he modified legacy st ack- swit ching mechanism described above. This
mechanism uncondit ionally swit ches st acks when it is enabled. I t can be enabled on
an individual int errupt - vect or basis using a field in t he I DT ent ry. This means t hat
some int errupt vect ors can use t he modified legacy mechanism and ot hers can use
t he I ST mechanism.
The I ST mechanism is only available in I A- 32e mode. I t is part of t he 64- bit mode
TSS. The mot ivat ion for t he I ST mechanism is t o provide a met hod for specific int er-
rupt s ( such as NMI , double- fault , and machine- check) t o always execut e on a known
good st ack. I n legacy mode, int errupt s can use t he t ask- swit ch mechanism t o set up
a known- good st ack by accessing t he int errupt service rout ine t hrough a t ask gat e
locat ed in t he I DT. However, t he legacy t ask- swit ch mechanism is not support ed in
I A- 32e mode.
The I ST mechanism provides up t o seven I ST point ers in t he TSS. The point ers are
referenced by an int errupt - gat e descript or in t he int errupt - descript or t able ( I DT) ;
see Figure 6- 7. The gat e descript or cont ains a 3- bit I ST index field t hat provides an
offset int o t he I ST sect ion of t he TSS. Using t he I ST mechanism, t he processor loads
t he value point ed by an I ST point er int o t he RSP.
When an int errupt occurs, t he new SS select or is forced t o NULL and t he SS select or s
RPL field is set t o t he new CPL. The old SS, RSP, RFLAGS, CS, and RI P are pushed
ont o t he new st ack. I nt errupt processing t hen proceeds as normal. I f t he I ST index is
zero, t he modified legacy st ack- swit ching mechanism described above is used.
Figure 6-8. IA-32e Mode Stack Usage After Privilege Level Change
CS
Error Code
RFLAGS
RIP
SS
RSP
Stack Usage with
Privilege-Level Change
Handlers Stack
Stack Pointer After
Transfer to Handler
CS
Error Code
EFLAGS
EIP
SS
ESP
Handlers Stack
Legacy Mode
IA-32e Mode
0
+4
+8
+12
+16
+20
0
+8
+16
+24
+32
+40
Vol. 3 6-27
INTERRUPT AND EXCEPTION HANDLING
6.15 EXCEPTION AND INTERRUPT REFERENCE
The following sect ions describe condit ions which generat e except ions and int errupt s.
They are arranged in t he order of vect or numbers. The informat ion cont ained in
t hese sect ions are as follows:
Ex cept i on Cl ass I ndicat es whet her t he except ion class is a fault , t rap, or
abort t ype. Some except ions can be eit her a fault or t rap t ype, depending on
when t he error condit ion is det ect ed. ( This sect ion is not applicable t o int errupt s. )
Descr i pt i on Gives a general descript ion of t he purpose of t he except ion or
int errupt t ype. I t also describes how t he processor handles t he except ion or
int errupt .
Ex cept i on Er r or Code I ndicat es whet her an error code is saved for t he
except ion. I f one is saved, t he cont ent s of t he error code are described. ( This
sect ion is not applicable t o int errupt s. )
Saved I nst r uct i on Poi nt er Describes which inst ruct ion t he saved ( or ret urn)
inst ruct ion point er point s t o. I t also indicat es whet her t he point er can be used t o
rest art a fault ing inst ruct ion.
Pr ogr am St at e Change Describes t he effect s of t he except ion or int errupt on
t he st at e of t he current ly running program or t ask and t he possibilit ies of
rest art ing t he program or t ask wit hout loss of cont inuit y.
6-28 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 0Divide Error Exception (#DE)
Exception Class Fault .
Description
I ndicat es t he divisor operand for a DI V or I DI V inst ruct ion is 0 or t hat t he result
cannot be represent ed in t he number of bit s specified for t he dest inat ion operand.
Exception Error Code
None.
Saved Instruction Pointer
Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany t he divide error, because t he except ion
occurs before t he fault ing inst ruct ion is execut ed.
Vol. 3 6-29
INTERRUPT AND EXCEPTION HANDLING
Interrupt 1Debug Exception (#DB)
Exception Class Trap or Fault. The exception handler can distinguish
between traps or faults by examining the contents of DR6
and the other debug registers.
Description
I ndicat es t hat one or more of several debug- except ion condit ions has been det ect ed.
Whet her t he except ion is a fault or a t rap depends on t he condit ion ( see Table 6- 3) .
See Chapt er 16, Debugging, Profiling Branches and Time- St amp Count er, for
det ailed informat ion about t he debug except ions.
Exception Error Code
None. An except ion handler can examine t he debug regist ers t o det ermine which
condit ion caused t he except ion.
Saved Instruction Pointer
Fault Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat gener-
at ed t he except ion.
Trap Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he
inst ruct ion t hat generat ed t he except ion.
Program State Change
Fault A program- st at e change does not accompany t he debug except ion, because
t he except ion occurs before t he fault ing inst ruct ion is execut ed. The program can
resume normal execut ion upon ret urning from t he debug except ion handler.
Trap A program- st at e change does accompany t he debug except ion, because t he
inst ruct ion or t ask swit ch being execut ed is allowed t o complet e before t he except ion
is generat ed. However, t he new st at e of t he program is not corrupt ed and execut ion
of t he program can cont inue reliably.
Table 6-3. Debug Exception Conditions and Corresponding Exception Classes
Exception Condition Exception Class
Instruction fetch breakpoint Fault
Data read or write breakpoint Trap
I/O read or write breakpoint Trap
General detect condition (in conjunction with in-circuit emulation) Fault
Single-step Trap
Task-switch Trap
6-30 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 2NMI Interrupt
Exception Class Not applicable.
Description
The nonmaskable int errupt ( NMI ) is generat ed ext ernally by assert ing t he
processor s NMI pin or t hrough an NMI request set by t he I / O API C t o t he local API C.
This int errupt causes t he NMI int errupt handler t o be called.
Exception Error Code
Not applicable.
Saved Instruction Pointer
The processor always t akes an NMI int errupt on an inst ruct ion boundary. The saved
cont ent s of CS and EI P regist ers point t o t he next inst ruct ion t o be execut ed at t he
point t he int errupt is t aken. See Sect ion 6. 5, Except ion Classificat ions, for more
informat ion about when t he processor t akes NMI int errupt s.
Program State Change
The inst ruct ion execut ing when an NMI int errupt is received is complet ed before t he
NMI is generat ed. A program or t ask can t hus be rest art ed upon ret urning from an
int errupt handler wit hout loss of cont inuit y, provided t he int errupt handler saves t he
st at e of t he processor before handling t he int errupt and rest ores t he processor s
st at e prior t o a ret urn.
Vol. 3 6-31
INTERRUPT AND EXCEPTION HANDLING
Interrupt 3Breakpoint Exception (#BP)
Exception Class Trap.
Description
I ndicat es t hat a breakpoint inst ruct ion ( I NT 3) was execut ed, causing a breakpoint
t rap t o be generat ed. Typically, a debugger set s a breakpoint by replacing t he first
opcode byt e of an inst ruct ion wit h t he opcode for t he I NT 3 inst ruct ion. ( The I NT 3
inst ruct ion is one byt e long, which makes it easy t o replace an opcode in a code
segment in RAM wit h t he breakpoint opcode. ) The operat ing syst em or a debugging
t ool can use a dat a segment mapped t o t he same physical address space as t he code
segment t o place an I NT 3 inst ruct ion in places where it is desired t o call t he
debugger.
Wit h t he P6 family, Pent ium, I nt el486, and I nt el386 processors, it is more convenient
t o set breakpoint s wit h t he debug regist ers. ( See Sect ion 16. 3. 2, Breakpoint Excep-
t ion ( # BP) I nt errupt Vect or 3, for informat ion about t he breakpoint except ion. ) I f
more breakpoint s are needed beyond what t he debug regist ers allow, t he I NT 3
inst ruct ion can be used.
The breakpoint ( # BP) except ion can also be generat ed by execut ing t he I NT n
inst ruct ion wit h an operand of 3. The act ion of t his inst ruct ion ( I NT 3) is slight ly
different t han t hat of t he I NT 3 inst ruct ion ( see I NTn/ I NTO/ I NT3Call t o I nt errupt
Procedure in Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volume 2A) .
Exception Error Code
None.
Saved Instruction Pointer
Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he I NT 3
inst ruct ion.
Program State Change
Even t hough t he EI P point s t o t he inst ruct ion following t he breakpoint inst ruct ion, t he
st at e of t he program is essent ially unchanged because t he I NT 3 inst ruct ion does not
affect any regist er or memory locat ions. The debugger can t hus resume t he
suspended program by replacing t he I NT 3 inst ruct ion t hat caused t he breakpoint
wit h t he original opcode and decrement ing t he saved cont ent s of t he EI P regist er.
Upon ret urning from t he debugger, program execut ion resumes wit h t he replaced
inst ruct ion.
6-32 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 4Overflow Exception (#OF)
Exception Class Trap.
Description
I ndicat es t hat an overflow t rap occurred when an I NTO inst ruct ion was execut ed. The
I NTO inst ruct ion checks t he st at e of t he OF flag in t he EFLAGS regist er. I f t he OF flag
is set , an overflow t rap is generat ed.
Some arit hmet ic inst ruct ions ( such as t he ADD and SUB) perform bot h signed and
unsigned arit hmet ic. These inst ruct ions set t he OF and CF flags in t he EFLAGS
regist er t o indicat e signed overflow and unsigned overflow, respect ively. When
performing arit hmet ic on signed operands, t he OF flag can be t est ed direct ly or t he
I NTO inst ruct ion can be used. The benefit of using t he I NTO inst ruct ion is t hat if t he
overflow except ion is det ect ed, an except ion handler can be called aut omat ically t o
handle t he overflow condit ion.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he I NTO
inst ruct ion.
Program State Change
Even t hough t he EI P point s t o t he inst ruct ion following t he I NTO inst ruct ion, t he st at e
of t he program is essent ially unchanged because t he I NTO inst ruct ion does not affect
any regist er or memory locat ions. The program can t hus resume normal execut ion
upon ret urning from t he overflow except ion handler.
Vol. 3 6-33
INTERRUPT AND EXCEPTION HANDLING
Interrupt 5BOUND Range Exceeded Exception (#BR)
Exception Class Fault.
Description
I ndicat es t hat a BOUND- range- exceeded fault occurred when a BOUND inst ruct ion
was execut ed. The BOUND inst ruct ion checks t hat a signed array index is wit hin t he
upper and lower bounds of an array locat ed in memory. I f t he array index is not
wit hin t he bounds of t he array, a BOUND- range- exceeded fault is generat ed.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he BOUND inst ruct ion t hat
generat ed t he except ion.
Program State Change
A program- st at e change does not accompany t he bounds- check fault , because t he
operands for t he BOUND inst ruct ion are not modified. Ret urning from t he BOUND-
range- exceeded except ion handler causes t he BOUND inst ruct ion t o be rest art ed.
6-34 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 6Invalid Opcode Exception (#UD)
Exception Class Fault.
Description
I ndicat es t hat t he processor did one of t he following t hings:
At t empt ed t o execut e an invalid or reserved opcode.
At t empt ed t o execut e an inst ruct ion wit h an operand t ype t hat is invalid for it s
accompanying opcode; for example, t he source operand for a LES inst ruct ion is
not a memory locat ion.
At t empt ed t o execut e an MMX or SSE/ SSE2/ SSE3 inst ruct ion on an I nt el 64 or
I A- 32 processor t hat does not support t he MMX t echnology or
SSE/ SSE2/ SSE3/ SSSE3 ext ensions, respect ively. CPUI D feat ure flags MMX ( bit
23) , SSE ( bit 25) , SSE2 ( bit 26) , SSE3 ( ECX, bit 0) , SSSE3 ( ECX, bit 9) indicat e
support for t hese ext ensions.
At t empt ed t o execut e an MMX inst ruct ion or SSE/ SSE2/ SSE3/ SSSE3 SI MD
inst ruct ion ( wit h t he except ion of t he MOVNTI , PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, CLFLUSH, MONI TOR, and MWAI T inst ruct ions) when t he EM
flag in cont rol regist er CR0 is set ( 1) .
At t empt ed t o execut e an SSE/ SE2/ SSE3/ SSSE3 inst ruct ion when t he OSFXSR bit
in cont rol regist er CR4 is clear ( 0) . Not e t his does not include t he following
SSE/ SSE2/ SSE3 inst ruct ions: MASKMOVQ, MOVNTQ, MOVNTI , PREFETCHh,
SFENCE, LFENCE, MFENCE, and CLFLUSH; or t he 64- bit versions of t he PAVGB,
PAVGW, PEXTRW, PI NSRW, PMAXSW, PMAXUB, PMI NSW, PMI NUB, PMOVMSKB,
PMULHUW, PSADBW, PSHUFW, PADDQ, PSUBQ, PALI GNR, PABSB, PABSD,
PABSW, PHADDD, PHADDSW, PHADDW, PHSUBD, PHSUBSW, PHSUBW,
PMADDUBSM, PMULHRSW, PSHUFB, PSI GNB, PSI GND, and PSI GNW.
At t empt ed t o execut e an SSE/ SSE2/ SSE3/ SSSE3 inst ruct ion on an I nt el 64 or
I A- 32 processor t hat caused a SI MD float ing- point except ion when t he
OSXMMEXCPT bit in cont rol regist er CR4 is clear ( 0) .
Execut ed a UD2 inst ruct ion. Not e t hat even t hough it is t he execut ion of t he UD2
inst ruct ion t hat causes t he invalid opcode except ion, t he saved inst ruct ion
point er will st ill point s at t he UD2 inst ruct ion.
Det ect ed a LOCK prefix t hat precedes an inst ruct ion t hat may not be locked or
one t hat may be locked but t he dest inat ion operand is not a memory locat ion.
At t empt ed t o execut e an LLDT, SLDT, LTR, STR, LSL, LAR, VERR, VERW, or ARPL
inst ruct ion while in real- address or virt ual- 8086 mode.
At t empt ed t o execut e t he RSM inst ruct ion when not in SMM mode.
I n I nt el 64 and I A- 32 processors t hat implement out - of- order execut ion microarchi-
t ect ures, t his except ion is not generat ed unt il an at t empt is made t o ret ire t he result
of execut ing an invalid inst ruct ion; t hat is, decoding and speculat ively at t empt ing t o
execut e an invalid opcode does not generat e t his except ion. Likewise, in t he Pent ium
Vol. 3 6-35
INTERRUPT AND EXCEPTION HANDLING
processor and earlier I A- 32 processors, t his except ion is not generat ed as t he result
of prefet ching and preliminary decoding of an invalid inst ruct ion. ( See Sect ion 6. 5,
Except ion Classificat ions, for general rules for t aking of int errupt s and except ions. )
The opcodes D6 and F1 are undefined opcodes reserved by t he I nt el 64 and I A- 32
archit ect ures. These opcodes, even t hough undefined, do not generat e an invalid
opcode except ion.
The UD2 inst ruct ion is guarant eed t o generat e an invalid opcode except ion.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany an invalid- opcode fault , because t he
invalid inst ruct ion is not execut ed.
6-36 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 7Device Not Available Exception (#NM)
Exception Class Fault.
Description
I ndicat es one of t he following t hings:
The device- not - available except ion is generat ed by eit her of t hree condit ions:
The processor execut ed an x87 FPU float ing- point inst ruct ion while t he EM flag in
cont rol regist er CR0 was set ( 1) . See t he paragraph below for t he special case of
t he WAI T/ FWAI T inst ruct ion.
The processor execut ed a WAI T/ FWAI T inst ruct ion while t he MP and TS flags of
regist er CR0 were set , regardless of t he set t ing of t he EM flag.
The processor execut ed an x87 FPU, MMX, or SSE/ SSE2/ SSE3 inst ruct ion ( wit h
t he except ion of MOVNTI , PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, and
CLFLUSH) while t he TS flag in cont rol regist er CR0 was set and t he EM flag is
clear.
The EM flag is set when t he processor does not have an int ernal x87 FPU float ing-
point unit . A device- not - available except ion is t hen generat ed each t ime an x87 FPU
float ing- point inst ruct ion is encount ered, allowing an except ion handler t o call
float ing- point inst ruct ion emulat ion rout ines.
The TS flag indicat es t hat a cont ext swit ch ( t ask swit ch) has occurred since t he last
t ime an x87 float ing- point , MMX, or SSE/ SSE2/ SSE3 inst ruct ion was execut ed; but
t hat t he cont ext of t he x87 FPU, XMM, and MXCSR regist ers were not saved. When
t he TS flag is set and t he EM flag is clear, t he processor generat es a device- not - avail-
able except ion each t ime an x87 float ing- point , MMX, or SSE/ SSE2/ SSE3 inst ruct ion
is encount ered ( wit h t he except ion of t he inst ruct ions list ed above) . The except ion
handler can t hen save t he cont ext of t he x87 FPU, XMM, and MXCSR regist ers before
it execut es t he inst ruct ion. See Sect ion 2.5, Cont rol Regist ers, for more informat ion
about t he TS flag.
The MP flag in cont rol regist er CR0 is used along wit h t he TS flag t o det ermine if WAI T
or FWAI T inst ruct ions should generat e a device- not - available except ion. I t ext ends
t he funct ion of t he TS flag t o t he WAI T and FWAI T inst ruct ions, giving t he except ion
handler an opport unit y t o save t he cont ext of t he x87 FPU before t he WAI T or FWAI T
inst ruct ion is execut ed. The MP flag is provided primarily for use wit h t he I nt el 286
and I nt el386 DX processors. For programs running on t he Pent ium 4, I nt el Xeon, P6
family, Pent ium, or I nt el486 DX processors, or t he I nt el 487 SX coprocessors, t he MP
flag should always be set ; for programs running on t he I nt el486 SX processor, t he MP
flag should be clear.
Exception Error Code
None.
Vol. 3 6-37
INTERRUPT AND EXCEPTION HANDLING
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he float ing- point inst ruct ion or
t he WAI T/ FWAI T inst ruct ion t hat generat ed t he except ion.
Program State Change
A program- st at e change does not accompany a device- not - available fault , because
t he inst ruct ion t hat generat ed t he except ion is not execut ed.
I f t he EM flag is set , t he except ion handler can t hen read t he float ing- point inst ruc-
t ion point ed t o by t he EI P and call t he appropriat e emulat ion rout ine.
I f t he MP and TS flags are set or t he TS flag alone is set , t he except ion handler can
save t he cont ext of t he x87 FPU, clear t he TS flag, and cont inue execut ion at t he
int errupt ed float ing- point or WAI T/ FWAI T inst ruct ion.
6-38 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 8Double Fault Exception (#DF)
Exception Class Abort.
Description
I ndicat es t hat t he processor det ect ed a second except ion while calling an except ion
handler for a prior except ion. Normally, when t he processor det ect s anot her excep-
t ion while t rying t o call an except ion handler, t he t wo except ions can be handled seri-
ally. I f, however, t he processor cannot handle t hem serially, it signals t he double- fault
except ion. To det ermine when t wo fault s need t o be signalled as a double fault , t he
processor divides t he except ions int o t hree classes: benign except ions, cont ribut ory
except ions, and page fault s ( see Table 6- 4) .
Table 6- 5 shows t he various combinat ions of except ion classes t hat cause a double
fault t o be generat ed. A double- fault except ion falls in t he abort class of except ions.
The program or t ask cannot be rest art ed or resumed. The double- fault handler can
be used t o collect diagnost ic informat ion about t he st at e of t he machine and/ or, when
possible, t o shut t he applicat ion and/ or syst em down gracefully or rest art t he
syst em.
Table 6-4. Interrupt and Exception Classes
Class Vector Number Description
Benign Exceptions and
Interrupts
1
2
3
4
5
6
7
9
16
17
18
19
All
All
Debug
NMI Interrupt
Breakpoint
Overflow
BOUND Range Exceeded
Invalid Opcode
Device Not Available
Coprocessor Segment Overrun
Floating-Point Error
Alignment Check
Machine Check
SIMD floating-point
INT n
INTR
Contributory Exceptions 0
10
11
12
13
Divide Error
Invalid TSS
Segment Not Present
Stack Fault
General Protection
Page Faults 14 Page Fault
Vol. 3 6-39
INTERRUPT AND EXCEPTION HANDLING
A segment or page fault may be encount ered while prefet ching inst ruct ions;
however, t his behavior is out side t he domain of Table 6- 5. Any furt her fault s gener-
at ed while t he processor is at t empt ing t o t ransfer cont rol t o t he appropriat e fault
handler could st ill lead t o a double- fault sequence.
I f anot her except ion occurs while at t empt ing t o call t he double- fault handler, t he
processor ent ers shut down mode. This mode is similar t o t he st at e following execu-
t ion of an HLT inst ruct ion. I n t his mode, t he processor st ops execut ing inst ruct ions
unt il an NMI int errupt , SMI int errupt , hardware reset , or I NI T# is received. The
processor generat es a special bus cycle t o indicat e t hat it has ent ered shut down
mode. Soft ware designers may need t o be aware of t he response of hardware when
it goes int o shut down mode. For example, hardware may t urn on an indicat or light on
t he front panel, generat e an NMI int errupt t o record diagnost ic informat ion, invoke
reset init ializat ion, generat e an I NI T init ializat ion, or generat e an SMI . I f any event s
are pending during shut down, t hey will be handled aft er an wake event from shut -
down is processed ( for example, A20M# int errupt s) .
I f a shut down occurs while t he processor is execut ing an NMI int errupt handler, t hen
only a hardware reset can rest art t he processor. Likewise, if t he shut down occurs
while execut ing in SMM, a hardware reset must be used t o rest art t he processor.
Exception Error Code
Zero. The processor always pushes an error code of 0 ont o t he st ack of t he double-
fault handler.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers are undefined.
Program State Change
A program- st at e following a double- fault except ion is undefined. The program or t ask
cannot be resumed or rest art ed. The only available act ion of t he double- fault excep-
t ion handler is t o collect all possible cont ext informat ion for use in diagnost ics and
t hen close t he applicat ion and/ or shut down or reset t he processor.
Table 6-5. Conditions for Generating a Double Fault
Second Exception
First Exception Benign Contributory Page Fault
Benign Handle Exceptions
Serially
Handle Exceptions
Serially
Handle Exceptions
Serially
Contributory Handle Exceptions
Serially
Generate a Double
Fault
Handle Exceptions
Serially
Page Fault Handle Exceptions
Serially
Generate a Double
Fault
Generate a Double Fault
6-40 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
I f t he double fault occurs when any port ion of t he except ion handling machine st at e
is corrupt ed, t he handler cannot be invoked and t he processor must be reset .
Vol. 3 6-41
INTERRUPT AND EXCEPTION HANDLING
Interrupt 9Coprocessor Segment Overrun
Exception Class Abort. (Intel reserved; do not use. Recent IA-32 processors
do not generate this exception.)
Description
I ndicat es t hat an I nt el386 CPU- based syst ems wit h an I nt el 387 mat h coprocessor
det ect ed a page or segment violat ion while t ransferring t he middle port ion of an
I nt el 387 mat h coprocessor operand. The P6 family, Pent ium, and I nt el486 proces-
sors do not generat e t his except ion; inst ead, t his condit ion is det ect ed wit h a general
prot ect ion except ion ( # GP) , int errupt 13.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e following a coprocessor segment - overrun except ion is unde-
fined. The program or t ask cannot be resumed or rest art ed. The only available act ion
of t he except ion handler is t o save t he inst ruct ion point er and reinit ialize t he x87 FPU
using t he FNI NI T inst ruct ion.
6-42 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 10Invalid TSS Exception (#TS)
Exception Class Fault.
Description
I ndicat es t hat t here was an error relat ed t o a TSS. Such an error might be det ect ed
during a t ask swit ch or during t he execut ion of inst ruct ions t hat use informat ion from
a TSS. Table 6- 6 shows t he condit ions t hat cause an invalid TSS except ion t o be
generat ed.
Table 6-6. Invalid TSS Conditions
Error Code Index Invalid Condition
TSS segment selector index The TSS segment limit is less than 67H for 32-bit TSS or less than
2CH for 16-bit TSS.
TSS segment selector index During an IRET task switch, the TI flag in the TSS segment selector
indicates the LDT.
TSS segment selector index During an IRET task switch, the TSS segment selector exceeds
descriptor table limit.
TSS segment selector index During an IRET task switch, the busy flag in the TSS descriptor
indicates an inactive task.
TSS segment selector index During an IRET task switch, an attempt to load the backlink limit
faults.
TSS segment selector index During an IRET task switch, the backlink is a NULL selector.
TSS segment selector index During an IRET task switch, the backlink points to a descriptor
which is not a busy TSS.
TSS segment selector index The new TSS descriptor is beyond the GDT limit.
TSS segment selector index The new TSS descriptor is not writable.
TSS segment selector index Stores to the old TSS encounter a fault condition.
TSS segment selector index The old TSS descriptor is not writable for a jump or IRET task
switch.
TSS segment selector index The new TSS backlink is not writable for a call or exception task
switch.
TSS segment selector index The new TSS selector is null on an attempt to lock the new TSS.
TSS segment selector index The new TSS selector has the TI bit set on an attempt to lock the
new TSS.
TSS segment selector index The new TSS descriptor is not an available TSS descriptor on an
attempt to lock the new TSS.
LDT segment selector index LDT or LDT not present.
Vol. 3 6-43
INTERRUPT AND EXCEPTION HANDLING
Stack segment selector
index
The stack segment selector exceeds descriptor table limit.
Stack segment selector
index
The stack segment selector is NULL.
Stack segment selector
index
The stack segment descriptor is a non-data segment.
Stack segment selector
index
The stack segment is not writable.
Stack segment selector
index
The stack segment DPL != CPL.
Stack segment selector
index
The stack segment selector RPL != CPL.
Code segment selector
index
The code segment selector exceeds descriptor table limit.
Code segment selector
index
The code segment selector is NULL.
Code segment selector
index
The code segment descriptor is not a code segment type.
Code segment selector
index
The nonconforming code segment DPL != CPL.
Code segment selector
index
The conforming code segment DPL is greater than CPL.
Data segment selector index The data segment selector exceeds the descriptor table limit.
Data segment selector index The data segment descriptor is not a readable code or data type.
Data segment selector index The data segment descriptor is a nonconforming code type and RPL
> DPL.
Data segment selector index The data segment descriptor is a nonconforming code type and CPL
> DPL.
TSS segment selector index The TSS segment selector is NULL for LTR.
TSS segment selector index The TSS segment selector has the TI bit set for LTR.
TSS segment selector index The TSS segment descriptor/upper descriptor is beyond the GDT
segment limit.
TSS segment selector index The TSS segment descriptor is not an available TSS type.
TSS segment selector index The TSS segment descriptor is an available 286 TSS type in IA-32e
mode.
Table 6-6. Invalid TSS Conditions (Contd.)
Error Code Index Invalid Condition
6-44 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
This except ion can generat ed eit her in t he cont ext of t he original t ask or in t he
cont ext of t he new t ask ( see Sect ion 7. 3, Task Swit ching ) . Unt il t he processor has
complet ely verified t he presence of t he new TSS, t he except ion is generat ed in t he
cont ext of t he original t ask. Once t he exist ence of t he new TSS is verified, t he t ask
swit ch is considered complet e. Any invalid-TSS condit ions det ect ed aft er t his point
are handled in t he cont ext of t he new t ask. ( A t ask swit ch is considered complet e
when t he t ask regist er is loaded wit h t he segment select or for t he new TSS and, if t he
swit ch is due t o a procedure call or int errupt , t he previous t ask link field of t he new
TSS references t he old TSS. )
The invalid-TSS handler must be a t ask called using a t ask gat e. Handling t his excep-
t ion inside t he fault ing TSS cont ext is not recommended because t he processor st at e
may not be consist ent .
Exception Error Code
An error code cont aining t he segment select or index for t he segment descript or t hat
caused t he violat ion is pushed ont o t he st ack of t he except ion handler. I f t he EXT flag
is set , it indicat es t hat t he except ion was caused by an event ext ernal t o t he current ly
running program ( for example, if an ext ernal int errupt handler using a t ask gat e
at t empt ed a t ask swit ch t o an invalid TSS) .
Saved Instruction Pointer
I f t he except ion condit ion was det ect ed before t he t ask swit ch was carried out , t he
saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat invoked t he t ask
swit ch. I f t he except ion condit ion was det ect ed aft er t he t ask swit ch was carried out ,
t he saved cont ent s of CS and EI P regist ers point t o t he first inst ruct ion of t he new
t ask.
Program State Change
The abilit y of t he invalid-TSS handler t o recover from t he fault depends on t he error
condit ion t han causes t he fault . See Sect ion 7. 3, Task Swit ching, for more informa-
t ion on t he t ask swit ch process and t he possible recovery act ions t hat can be t aken.
TSS segment selector index The TSS segment upper descriptor is not the correct type.
TSS segment selector index The TSS segment descriptor contains a non-canonical base.
TSS segment selector index There is a limit violation in attempting to load SS selector or ESP
from a TSS on a call or exception which changes privilege levels in
legacy mode.
TSS segment selector index There is a limit violation or canonical fault in attempting to load RSP
or IST from a TSS on a call or exception which changes privilege
levels in IA-32e mode.
Table 6-6. Invalid TSS Conditions (Contd.)
Error Code Index Invalid Condition
Vol. 3 6-45
INTERRUPT AND EXCEPTION HANDLING
I f an invalid TSS except ion occurs during a t ask swit ch, it can occur before or aft er
t he commit - t o- new- t ask point . I f it occurs before t he commit point , no program st at e
change occurs. I f it occurs aft er t he commit point ( when t he segment descript or
informat ion for t he new segment select ors have been loaded in t he segment regis-
t ers) , t he processor will load all t he st at e informat ion from t he new TSS before it
generat es t he except ion. During a t ask swit ch, t he processor first loads all t he
segment regist ers wit h segment select ors from t he TSS, t hen checks t heir cont ent s
for validit y. I f an invalid TSS except ion is discovered, t he remaining segment regis-
t ers are loaded but not checked for validit y and t herefore may not be usable for refer-
encing memory. The invalid TSS handler should not rely on being able t o use t he
segment select ors found in t he CS, SS, DS, ES, FS, and GS regist ers wit hout causing
anot her except ion. The except ion handler should load all segment regist ers before
t rying t o resume t he new t ask; ot herwise, general- prot ect ion except ions ( # GP) may
result lat er under condit ions t hat make diagnosis more difficult . The I nt el recom-
mended way of dealing sit uat ion is t o use a t ask for t he invalid TSS except ion
handler. The t ask swit ch back t o t he int errupt ed t ask from t he invalid-TSS except ion-
handler t ask will t hen cause t he processor t o check t he regist ers as it loads t hem
from t he TSS.
6-46 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 11Segment Not Present (#NP)
Exception Class Fault.
Description
I ndicat es t hat t he present flag of a segment or gat e descript or is clear. The processor
can generat e t his except ion during any of t he following operat ions:
While at t empt ing t o load CS, DS, ES, FS, or GS regist ers. [ Det ect ion of a not -
present segment while loading t he SS regist er causes a st ack fault except ion
( # SS) t o be generat ed. ] This sit uat ion can occur while performing a t ask swit ch.
While at t empt ing t o load t he LDTR using an LLDT inst ruct ion. Det ect ion of a not -
present LDT while loading t he LDTR during a t ask swit ch operat ion causes an
invalid-TSS except ion ( # TS) t o be generat ed.
When execut ing t he LTR inst ruct ion and t he TSS is marked not present .
While at t empt ing t o use a gat e descript or or TSS t hat is marked segment - not -
present , but is ot herwise valid.
An operat ing syst em t ypically uses t he segment - not - present except ion t o implement
virt ual memory at t he segment level. I f t he except ion handler loads t he segment and
ret urns, t he int errupt ed program or t ask resumes execut ion.
A not - present indicat ion in a gat e descript or, however, does not indicat e t hat a
segment is not present ( because gat es do not correspond t o segment s) . The oper-
at ing syst em may use t he present flag for gat e descript ors t o t rigger except ions of
special significance t o t he operat ing syst em.
A cont ribut ory except ion or page fault t hat subsequent ly referenced a not - present
segment would cause a double fault ( # DF) t o be generat ed inst ead of # NP.
Exception Error Code
An error code cont aining t he segment select or index for t he segment descript or t hat
caused t he violat ion is pushed ont o t he st ack of t he except ion handler. I f t he EXT flag
is set , it indicat es t hat t he except ion result ed from eit her:
an ext ernal event ( NMI or I NTR) t hat caused an int errupt , which subsequent ly
referenced a not - present segment
a benign except ion t hat subsequent ly referenced a not - present segment
The I DT flag is set if t he error code refers t o an I DT ent ry. This occurs when t he I DT
ent ry for an int errupt being serviced references a not - present gat e. Such an event
could be generat ed by an I NT inst ruct ion or a hardware int errupt .
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers normally point t o t he inst ruct ion t hat
generat ed t he except ion. I f t he except ion occurred while loading segment descrip-
Vol. 3 6-47
INTERRUPT AND EXCEPTION HANDLING
t ors for t he segment select ors in a new TSS, t he CS and EI P regist ers point t o t he first
inst ruct ion in t he new t ask. I f t he except ion occurred while accessing a gat e
descript or, t he CS and EI P regist ers point t o t he inst ruct ion t hat invoked t he access
( for example a CALL inst ruct ion t hat references a call gat e) .
Program State Change
I f t he segment - not - present except ion occurs as t he result of loading a regist er ( CS,
DS, SS, ES, FS, GS, or LDTR) , a program- st at e change does accompany t he excep-
t ion because t he regist er is not loaded. Recovery from t his except ion is possible by
simply loading t he missing segment int o memory and set t ing t he present flag in t he
segment descript or.
I f t he segment - not - present except ion occurs while accessing a gat e descript or, a
program- st at e change does not accompany t he except ion. Recovery from t his excep-
t ion is possible merely by set t ing t he present flag in t he gat e descript or.
I f a segment - not - present except ion occurs during a t ask swit ch, it can occur before
or aft er t he commit - t o- new- t ask point ( see Sect ion 7. 3, Task Swit ching ) . I f it
occurs before t he commit point , no program st at e change occurs. I f it occurs aft er
t he commit point , t he processor will load all t he st at e informat ion from t he new TSS
( wit hout performing any addit ional limit , present , or t ype checks) before it generat es
t he except ion. The segment - not - present except ion handler should not rely on being
able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and GS regist ers
wit hout causing anot her except ion. ( See t he Program St at e Change descript ion for
I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addit ional informat ion
on how t o handle t his sit uat ion. )
6-48 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 12Stack Fault Exception (#SS)
Exception Class Fault.
Description
I ndicat es t hat one of t he following st ack relat ed condit ions was det ect ed:
A limit violat ion is det ect ed during an operat ion t hat refers t o t he SS regist er.
Operat ions t hat can cause a limit violat ion include st ack- orient ed inst ruct ions
such as POP, PUSH, CALL, RET, I RET, ENTER, and LEAVE, as well as ot her memory
references which implicit ly or explicit ly use t he SS regist er ( for example, MOV
AX, [ BP+ 6] or MOV AX, SS: [ EAX+ 6] ) . The ENTER inst ruct ion generat es t his
except ion when t here is not enough st ack space for allocat ing local variables.
A not - present st ack segment is det ect ed when at t empt ing t o load t he SS regist er.
This violat ion can occur during t he execut ion of a t ask swit ch, a CALL inst ruct ion
t o a different privilege level, a ret urn t o a different privilege level, an LSS
inst ruct ion, or a MOV or POP inst ruct ion t o t he SS regist er.
A canonical violat ion is det ect ed in 64- bit mode during an operat ion t hat
reference memory using t he st ack point er regist er cont aining a non- canonical
memory address.
Recovery from t his fault is possible by eit her ext ending t he limit of t he st ack segment
( in t he case of a limit violat ion) or loading t he missing st ack segment int o memory ( in
t he case of a not - present violat ion.
I n t he case of a canonical violat ion t hat was caused int ent ionally by soft ware,
recovery is possible by loading t he correct canonical value int o RSP. Ot herwise, a
canonical violat ion of t he address in RSP likely reflect s some regist er corrupt ion in
t he soft ware.
Exception Error Code
I f t he except ion is caused by a not - present st ack segment or by overflow of t he new
st ack during an int er- privilege- level call, t he error code cont ains a segment select or
for t he segment t hat caused t he except ion. Here, t he except ion handler can t est t he
present flag in t he segment descript or point ed t o by t he segment select or t o det er-
mine t he cause of t he except ion. For a normal limit violat ion ( on a st ack segment
already in use) t he error code is set t o 0.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers generally point t o t he inst ruct ion t hat
generat ed t he except ion. However, when t he except ion result s from at t empt ing t o
load a not - present st ack segment during a t ask swit ch, t he CS and EI P regist ers point
t o t he first inst ruct ion of t he new t ask.
Vol. 3 6-49
INTERRUPT AND EXCEPTION HANDLING
Program State Change
A program- st at e change does not generally accompany a st ack- fault except ion,
because t he inst ruct ion t hat generat ed t he fault is not execut ed. Here, t he inst ruct ion
can be rest art ed aft er t he except ion handler has correct ed t he st ack fault condit ion.
I f a st ack fault occurs during a t ask swit ch, it occurs aft er t he commit - t o- new- t ask
point ( see Sect ion 7.3, Task Swit ching ) . Here, t he processor loads all t he st at e
informat ion from t he new TSS ( wit hout performing any addit ional limit , present , or
t ype checks) before it generat es t he except ion. The st ack fault handler should t hus
not rely on being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS,
and GS r egist er s wit hout causing anot her except ion. The except ion handler should
check all segment regist ers before t rying t o resume t he new t ask; ot herwise, general
prot ect ion fault s may result lat er under condit ions t hat are more difficult t o diagnose.
( See t he Program St at e Change descript ion for I nt errupt 10I nvalid TSS Except ion
( # TS) in t his chapt er for addit ional informat ion on how t o handle t his sit uat ion. )
6-50 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 13General Protection Exception (#GP)
Exception Class Fault.
Description
I ndicat es t hat t he processor det ect ed one of a class of prot ect ion violat ions called
general- prot ect ion violat ions. The condit ions t hat cause t his except ion t o be gener-
at ed comprise all t he prot ect ion violat ions t hat do not cause ot her except ions t o be
generat ed ( such as, invalid-TSS, segment - not - present , st ack- fault , or page- fault
except ions) . The following condit ions cause general- prot ect ion except ions t o be
generat ed:
Exceeding t he segment limit when accessing t he CS, DS, ES, FS, or GS
segment s.
Exceeding t he segment limit when referencing a descript or t able ( except during a
t ask swit ch or a st ack swit ch) .
Transferring execut ion t o a segment t hat is not execut able.
Writ ing t o a code segment or a read- only dat a segment .
Reading from an execut e- only code segment .
Loading t he SS regist er wit h a segment select or for a read- only segment ( unless
t he select or comes from a TSS during a t ask swit ch, in which case an invalid-TSS
except ion occurs) .
Loading t he SS, DS, ES, FS, or GS regist er wit h a segment select or for a syst em
segment .
Loading t he DS, ES, FS, or GS regist er wit h a segment select or for an execut e-
only code segment .
Loading t he SS regist er wit h t he segment select or of an execut able segment or a
null segment select or.
Loading t he CS regist er wit h a segment select or for a dat a segment or a null
segment select or.
Accessing memory using t he DS, ES, FS, or GS regist er when it cont ains a null
segment select or.
Swit ching t o a busy t ask during a call or j ump t o a TSS.
Using a segment select or on a non- I RET t ask swit ch t hat point s t o a TSS
descript or in t he current LDT. TSS descript ors can only reside in t he GDT. This
condit ion causes a # TS except ion during an I RET t ask swit ch.
Violat ing any of t he privilege rules described in Chapt er 5, Prot ect ion.
Exceeding t he inst ruct ion lengt h limit of 15 byt es ( t his only can occur when
redundant prefixes are placed before an inst ruct ion) .
Loading t he CR0 regist er wit h a set PG flag ( paging enabled) and a clear PE flag
( prot ect ion disabled) .
Vol. 3 6-51
INTERRUPT AND EXCEPTION HANDLING
Loading t he CR0 regist er wit h a set NW flag and a clear CD flag.
Referencing an ent ry in t he I DT ( following an int errupt or except ion) t hat is not
an int errupt , t rap, or t ask gat e.
At t empt ing t o access an int errupt or except ion handler t hrough an int errupt or
t rap gat e from virt ual- 8086 mode when t he handler s code segment DPL is
great er t han 0.
At t empt ing t o writ e a 1 int o a reserved bit of CR4.
At t empt ing t o execut e a privileged inst ruct ion when t he CPL is not equal t o 0 ( see
Sect ion 5. 9, Privileged I nst ruct ions, for a list of privileged inst ruct ions) .
Writ ing t o a reserved bit in an MSR.
Accessing a gat e t hat cont ains a null segment select or.
Execut ing t he I NT n inst ruct ion when t he CPL is great er t han t he DPL of t he
referenced int errupt , t rap, or t ask gat e.
The segment select or in a call, int errupt , or t rap gat e does not point t o a code
segment .
The segment select or operand in t he LLDT inst ruct ion is a local t ype ( TI flag is
set ) or does not point t o a segment descript or of t he LDT t ype.
The segment select or operand in t he LTR inst ruct ion is local or point s t o a TSS
t hat is not available.
The t arget code- segment select or for a call, j ump, or ret urn is null.
I f t he PAE and/ or PSE flag in cont rol regist er CR4 is set and t he processor det ect s
any reserved bit s in a page- direct ory- point er- t able ent ry set t o 1. These bit s are
checked during a writ e t o cont rol regist ers CR0, CR3, or CR4 t hat causes a
reloading of t he page- direct ory- point er- t able ent ry.
At t empt ing t o writ e a non- zero value int o t he reserved bit s of t he MXCSR regist er.
Execut ing an SSE/ SSE2/ SSE3 inst ruct ion t hat at t empt s t o access a 128- bit
memory locat ion t hat is not aligned on a 16- byt e boundary when t he inst ruct ion
requires 16- byt e alignment . This condit ion also applies t o t he st ack segment .
A program or t ask can be rest art ed following any general- prot ect ion except ion. I f t he
except ion occurs while at t empt ing t o call an int errupt handler, t he int errupt ed
program can be rest art able, but t he int errupt may be lost .
Exception Error Code
The processor pushes an error code ont o t he except ion handler' s st ack. I f t he fault
condit ion was det ect ed while loading a segment descript or, t he error code cont ains a
segment select or t o or I DT vect or number for t he descript or; ot herwise, t he error
code is 0. The source of t he select or in an error code may be any of t he following:
An operand of t he inst ruct ion.
A select or from a gat e which is t he operand of t he inst ruct ion.
6-52 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
A select or from a TSS involved in a t ask swit ch.
I DT vect or number.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
I n general, a program- st at e change does not accompany a general- prot ect ion excep-
t ion, because t he invalid inst ruct ion or operat ion is not execut ed. An except ion
handler can be designed t o correct all of t he condit ions t hat cause general- prot ect ion
except ions and rest art t he program or t ask wit hout any loss of program cont inuit y.
I f a general- prot ect ion except ion occurs during a t ask swit ch, it can occur before or
aft er t he commit - t o- new- t ask point ( see Sect ion 7. 3, Task Swit ching ) . I f it occurs
before t he commit point , no program st at e change occurs. I f it occurs aft er t he
commit point , t he processor will load all t he st at e informat ion from t he new TSS
( wit hout performing any addit ional limit , present , or t ype checks) before it generat es
t he except ion. The general- prot ect ion except ion handler should t hus not rely on
being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and GS
regist ers wit hout causing anot her except ion. ( See t he Program St at e Change
descript ion for I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addi-
t ional informat ion on how t o handle t his sit uat ion. )
General Protection Exception in 64-bit Mode
The following condit ions cause general- prot ect ion except ions in 64- bit mode:
I f t he memory address is in a non- canonical form.
I f a segment descript or memory address is in non- canonical form.
I f t he t arget offset in a dest inat ion operand of a call or j mp is in a non- canonical
form.
I f a code segment or 64- bit call gat e overlaps non- canonical space.
I f t he code segment descript or point ed t o by t he select or in t he 64- bit gat e
doesn' t have t he L- bit set and t he D- bit clear.
I f t he EFLAGS. NT bit is set in I RET.
I f t he st ack segment select or of I RET is null when going back t o compat ibilit y
mode.
I f t he st ack segment select or of I RET is null going back t o CPL3 and 64- bit mode.
I f a null st ack segment select or RPL of I RET is not equal t o CPL going back t o non-
CPL3 and 64- bit mode.
I f t he proposed new code segment descript or of I RET has bot h t he D- bit and t he
L- bit set .
Vol. 3 6-53
INTERRUPT AND EXCEPTION HANDLING
I f t he segment descript or point ed t o by t he segment select or in t he dest inat ion
operand is a code segment and it has bot h t he D- bit and t he L- bit set .
I f t he segment descript or from a 64- bit call gat e is in non- canonical space.
I f t he DPL from a 64- bit call- gat e is less t han t he CPL or t han t he RPL of t he 64- bit
call- gat e.
I f t he upper t ype field of a 64- bit call gat e is not 0x0.
I f an at t empt is made t o load a null select or in t he SS regist er in compat ibilit y
mode.
I f an at t empt is made t o load null select or in t he SS regist er in CPL3 and 64- bit
mode.
I f an at t empt is made t o load a null select or in t he SS regist er in non- CPL3 and
64- bit mode where RPL is not equal t o CPL.
I f an at t empt is made t o clear CR0. PG while I A- 32e mode is enabled.
I f an at t empt is made t o set a reserved bit in CR3, CR4 or CR8.
6-54 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 14Page-Fault Exception (#PF)
Exception Class Fault.
Description
I ndicat es t hat , wit h paging enabled ( t he PG flag in t he CR0 regist er is set ) , t he
processor det ect ed one of t he following condit ions while using t he page- t ranslat ion
mechanism t o t ranslat e a linear address t o a physical address:
The P ( present ) flag in a page- direct ory or page- t able ent ry needed for t he
address t ranslat ion is clear, indicat ing t hat a page t able or t he page cont aining
t he operand is not present in physical memory.
The procedure does not have sufficient privilege t o access t he indicat ed page
( t hat is, a procedure running in user mode at t empt s t o access a supervisor- mode
page) .
Code running in user mode at t empt s t o writ e t o a read- only page. I n t he I nt el486
and lat er processors, if t he WP flag is set in CR0, t he page fault will also be
t riggered by code running in supervisor mode t hat t ries t o writ e t o a read- only
page.
An inst ruct ion fet ch t o a linear address t hat t ranslat es t o a physical address in a
memory page wit h t he execut e- disable bit set ( for informat ion about t he
execut e- disable bit , see Chapt er 4, Paging ) .
One or more reserved bit s in page direct ory ent ry are set t o 1. See descript ion
below of RSVD error code flag.
The except ion handler can recover from page- not - present condit ions and rest art t he
program or t ask wit hout any loss of program cont inuit y. I t can also rest art t he
program or t ask aft er a privilege violat ion, but t he problem t hat caused t he privilege
violat ion may be uncorrect able.
See also: Sect ion 4.7, Page- Fault Except ions.
Exception Error Code
Yes ( special format ) . The processor provides t he page- fault handler wit h t wo it ems of
informat ion t o aid in diagnosing t he except ion and recovering from it :
An error code on t he st ack. The error code for a page fault has a format different
from t hat for ot her except ions ( see Figure 6- 9) . The error code t ells t he
except ion handler four t hings:
The P flag indicat es whet her t he except ion was due t o a not - present page ( 0)
or t o eit her an access right s violat ion or t he use of a reserved bit ( 1) .
The W/ R flag indicat es whet her t he memory access t hat caused t he except ion
was a read ( 0) or writ e ( 1) .
Vol. 3 6-55
INTERRUPT AND EXCEPTION HANDLING
The U/ S flag indicat es whet her t he processor was execut ing at user mode ( 1)
or supervisor mode ( 0) at t he t ime of t he except ion.
The RSVD flag indicat es t hat t he processor det ect ed 1s in reserved bit s of t he
page direct ory, when t he PSE or PAE flags in cont rol regist er CR4 are set t o 1.
Not e:
The PSE flag is only available in recent I nt el 64 and I A- 32 processors
including t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors.
The PAE flag is only available on recent I nt el 64 and I A- 32 processors
including t he Pent ium 4, I nt el Xeon, and P6 family processors.
I n earlier I A- 32 processors, t he bit posit ion of t he RSVD flag is reserved
and is cleared t o 0.
The I / D flag indicat es whet her t he except ion was caused by an inst ruct ion
fet ch. This flag is reserved and cleared t o 0 if CR4. PAE = 0 ( 32- bit paging is
in use) or I A32_EFER. NXE= 0 ( t he execut e- disable feat ure is eit her
unsupport ed or not enabled) . See Sect ion 4. 7, Page- Fault Except ions, for
det ails.
The cont ent s of t he CR2 regist er. The processor loads t he CR2 regist er wit h t he
32- bit linear address t hat generat ed t he except ion. The page- fault handler can
use t his address t o locat e t he corresponding page direct ory and page- t able
ent ries. Anot her page fault can pot ent ially occur during execut ion of t he page-
fault handler; t he handler should save t he cont ent s of t he CR2 regist er before a
Figure 6-9. Page-Fault Error Code
The fault was caused by a non-present page.
The fault was caused by a page-level protection violation.
The access causing the fault was a read.
The access causing the fault was a write.
The access causing the fault originated when the processor
was executing in supervisor mode.
The access causing the fault originated when the processor
was executing in user mode.
31 0
Reserved
1 2 3 4
The fault was not caused by reserved bit violation.
The fault was caused by reserved bits set to 1 in a page directory.
P 0
1
W/R 0
1
U/S
0
RSVD
0
1
1
I
/
D
I/D
0 The fault was not caused by an instruction fetch.
1 The fault was caused by an instruction fetch.
P W
/
R
U
/
S
R
S
V
D
6-56 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
second page fault can occur.
1
I f a page fault is caused by a page- level prot ect ion
violat ion, t he access flag in t he page- direct ory ent ry is set when t he fault occurs.
The behavior of I A- 32 processors regarding t he access flag in t he corresponding
page- t able ent ry is model specific and not archit ect urally defined.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers generally point t o t he inst ruct ion t hat
generat ed t he except ion. I f t he page- fault except ion occurred during a t ask swit ch,
t he CS and EI P regist ers may point t o t he first inst ruct ion of t he new t ask ( as
described in t he following Program St at e Change sect ion) .
Program State Change
A program- st at e change does not normally accompany a page- fault except ion,
because t he inst ruct ion t hat causes t he except ion t o be generat ed is not execut ed.
Aft er t he page- fault except ion handler has correct ed t he violat ion ( for example,
loaded t he missing page int o memory) , execut ion of t he program or t ask can be
resumed.
When a page- fault except ion is generat ed during a t ask swit ch, t he program- st at e
may change, as follows. During a t ask swit ch, a page- fault except ion can occur
during any of following operat ions:
While writ ing t he st at e of t he original t ask int o t he TSS of t hat t ask.
While reading t he GDT t o locat e t he TSS descript or of t he new t ask.
While reading t he TSS of t he new t ask.
While reading segment descript ors associat ed wit h segment select ors from t he
new t ask.
While reading t he LDT of t he new t ask t o verify t he segment regist ers st ored in
t he new TSS.
I n t he last t wo cases t he except ion occurs in t he cont ext of t he new t ask. The inst ruc-
t ion point er refers t o t he first inst ruct ion of t he new t ask, not t o t he inst ruct ion which
caused t he t ask swit ch ( or t he last inst ruct ion t o be execut ed, in t he case of an int er-
rupt ) . I f t he design of t he operat ing syst em permit s page fault s t o occur during t ask-
swit ches, t he page- fault handler should be called t hrough a t ask gat e.
I f a page fault occurs during a t ask swit ch, t he processor will load all t he st at e infor-
mat ion from t he new TSS ( wit hout performing any addit ional limit , present , or t ype
checks) before it generat es t he except ion. The page- fault handler should t hus not
rely on being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and
GS regist ers wit hout causing anot her except ion. ( See t he Program St at e Change
1. Processors update CR2 whenever a page fault is detected. If a second page fault occurs while an
earlier page fault is being delivered, the faulting linear address of the second fault will overwrite
the contents of CR2 (replacing the previous address). These updates to CR2 occur even if the
page fault results in a double fault or occurs during the delivery of a double fault.
Vol. 3 6-57
INTERRUPT AND EXCEPTION HANDLING
descript ion for I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addi-
t ional informat ion on how t o handle t his sit uat ion. )
Additional Exception-Handling Information
Special care should be t aken t o ensure t hat an except ion t hat occurs during an
explicit st ack swit ch does not cause t he processor t o use an invalid st ack point er
( SS: ESP) . Soft ware writ t en for 16- bit I A- 32 processors oft en use a pair of inst ruc-
t ions t o change t o a new st ack, for example:
MOV SS, AX
MOV SP, StackTop
When execut ing t his code on one of t he 32- bit I A- 32 processors, it is possible t o get
a page fault , general- prot ect ion fault ( # GP) , or alignment check fault ( # AC) aft er t he
segment select or has been loaded int o t he SS regist er but before t he ESP regist er
has been loaded. At t his point , t he t wo part s of t he st ack point er ( SS and ESP) are
inconsist ent . The new st ack segment is being used wit h t he old st ack point er.
The processor does not use t he inconsist ent st ack point er if t he except ion handler
swit ches t o a well defined st ack ( t hat is, t he handler is a t ask or a more privileged
procedure) . However, if t he except ion handler is called at t he same privilege level
and from t he same t ask, t he processor will at t empt t o use t he inconsist ent st ack
point er.
I n syst ems t hat handle page- fault , general- prot ect ion, or alignment check excep-
t ions wit hin t he fault ing t ask ( wit h t rap or int errupt gat es) , soft ware execut ing at t he
same privilege level as t he except ion handler should init ialize a new st ack by using
t he LSS inst ruct ion rat her t han a pair of MOV inst ruct ions, as described earlier in t his
not e. When t he except ion handler is running at privilege level 0 ( t he normal case) ,
t he problem is limit ed t o procedures or t asks t hat run at privilege level 0, t ypically
t he kernel of t he operat ing syst em.
6-58 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 16x87 FPU Floating-Point Error (#MF)
Exception Class Fault.
Description
I ndicat es t hat t he x87 FPU has det ect ed a float ing- point error. The NE flag in t he
regist er CR0 must be set for an int errupt 16 ( float ing- point error except ion) t o be
generat ed. ( See Sect ion 2. 5, Cont rol Regist ers, for a det ailed descript ion of t he NE
flag. )
NOTE
SI MD float ing- point except ions ( # XM) are signaled t hrough int errupt
19.
While execut ing x87 FPU inst ruct ions, t he x87 FPU det ect s and report s six t ypes of
float ing- point error condit ions:
I nvalid operat ion ( # I )
St ack overflow or underflow ( # I S)
I nvalid arit hmet ic operat ion ( # I A)
Divide- by- zero ( # Z)
Denormalized operand ( # D)
Numeric overflow ( # O)
Numeric underflow ( # U)
I nexact result ( precision) ( # P)
Each of t hese error condit ions represent s an x87 FPU except ion t ype, and for each of
except ion t ype, t he x87 FPU provides a flag in t he x87 FPU st at us regist er and a mask
bit in t he x87 FPU cont rol regist er. I f t he x87 FPU det ect s a float ing- point error and
t he mask bit for t he except ion t ype is set , t he x87 FPU handles t he except ion aut o-
mat ically by generat ing a predefined ( default ) response and cont inuing program
execut ion. The default responses have been designed t o provide a reasonable result
for most float ing- point applicat ions.
I f t he mask for t he except ion is clear and t he NE flag in regist er CR0 is set , t he x87
FPU does t he following:
1. Set s t he necessary flag in t he FPU st at us regist er.
2. Wait s unt il t he next wait ing x87 FPU inst ruct ion or WAI T/ FWAI T inst ruct ion is
encount ered in t he programs inst ruct ion st ream.
3. Generat es an int ernal error signal t hat cause t he processor t o generat e a
float ing- point except ion ( # MF) .
Vol. 3 6-59
INTERRUPT AND EXCEPTION HANDLING
Prior t o execut ing a wait ing x87 FPU inst ruct ion or t he WAI T/ FWAI T inst ruct ion, t he
x87 FPU checks for pending x87 FPU float ing- point except ions ( as described in st ep 2
above) . Pending x87 FPU float ing- point except ions are ignored for non- wait ing x87
FPU inst ruct ions, which include t he FNI NI T, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW,
FNSTENV, and FNSAVE inst ruct ions. Pending x87 FPU except ions are also ignored
when execut ing t he st at e management inst ruct ions FXSAVE and FXRSTOR.
All of t he x87 FPU float ing- point error condit ions can be recovered from. The x87 FPU
float ing- point - error except ion handler can det ermine t he error condit ion t hat caused
t he except ion from t he set t ings of t he flags in t he x87 FPU st at us word. See Soft -
ware Except ion Handling in Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 1, for more informat ion on handling x87 FPU
float ing- point except ions.
Exception Error Code
None. The x87 FPU provides it s own error informat ion.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he float ing- point or WAI T/ FWAI T
inst ruct ion t hat was about t o be execut ed when t he float ing- point - error except ion
was generat ed. This is not t he fault ing inst ruct ion in which t he error condit ion was
det ect ed. The address of t he fault ing inst ruct ion is cont ained in t he x87 FPU inst ruc-
t ion point er regist er. See x87 FPU I nst ruct ion and Operand ( Dat a) Point ers in
Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 1, for more informat ion about informat ion t he FPU saves for use in handling
float ing- point - error except ions.
Program State Change
A program- st at e change generally accompanies an x87 FPU float ing- point except ion
because t he handling of t he except ion is delayed unt il t he next wait ing x87 FPU
float ing- point or WAI T/ FWAI T inst ruct ion following t he fault ing inst ruct ion. The x87
FPU, however, saves sufficient informat ion about t he error condit ion t o allow
recovery from t he error and re- execut ion of t he fault ing inst ruct ion if needed.
I n sit uat ions where non- x87 FPU float ing- point inst ruct ions depend on t he result s of
an x87 FPU float ing- point inst ruct ion, a WAI T or FWAI T inst ruct ion can be insert ed in
front of a dependent inst ruct ion t o force a pending x87 FPU float ing- point except ion
t o be handled before t he dependent inst ruct ion is execut ed. See x87 FPU Except ion
Synchronizat ion in Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1, for more informat ion about synchronizat ion of x87
float ing- point - error except ions.
6-60 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 17Alignment Check Exception (#AC)
Exception Class Fault.
Description
I ndicat es t hat t he processor det ect ed an unaligned memory operand when alignment
checking was enabled. Alignment checks are only carried out in dat a ( or st ack)
accesses ( not in code fet ches or syst em segment accesses) . An example of an align-
ment - check violat ion is a word st ored at an odd byt e address, or a doubleword st ored
at an address t hat is not an int eger mult iple of 4. Table 6- 7 list s t he alignment
requirement s various dat a t ypes recognized by t he processor.
Not e t hat t he alignment check except ion ( # AC) is generat ed only for dat a t ypes t hat
must be aligned on word, doubleword, and quadword boundaries. A general- prot ec-
t ion except ion ( # GP) is generat ed 128- bit dat a t ypes t hat are not aligned on a
16- byt e boundary.
To enable alignment checking, t he following condit ions must be t rue:
AM flag in CR0 regist er is set .
Table 6-7. Alignment Requirements by Data Type
Data Type Address Must Be Divisible By
Word 2
Doubleword 4
Single-precision floating-point (32-bits) 4
Double-precision floating-point (64-bits) 8
Double extended-precision floating-point (80-
bits)
8
Quadword 8
Double quadword 16
Segment Selector 2
32-bit Far Pointer 2
48-bit Far Pointer 4
32-bit Pointer 4
GDTR, IDTR, LDTR, or Task Register Contents 4
FSTENV/FLDENV Save Area 4 or 2, depending on operand size
FSAVE/FRSTOR Save Area 4 or 2, depending on operand size
Bit String 2 or 4 depending on the operand-size attribute.
Vol. 3 6-61
INTERRUPT AND EXCEPTION HANDLING
AC flag in t he EFLAGS regist er is set .
The CPL is 3 ( prot ect ed mode or virt ual- 8086 mode) .
Alignment - check except ions ( # AC) are generat ed only when operat ing at privilege
level 3 ( user mode) . Memory references t hat default t o privilege level 0, such as
segment descript or loads, do not generat e alignment - check except ions, even when
caused by a memory reference made from privilege level 3.
St oring t he cont ent s of t he GDTR, I DTR, LDTR, or t ask regist er in memory while at
privilege level 3 can generat e an alignment - check except ion. Alt hough applicat ion
programs do not normally st ore t hese regist ers, t he fault can be avoided by aligning
t he informat ion st ored on an even word- address.
The FXSAVE and FXRSTOR inst ruct ions save and rest ore a 512- byt e dat a st ruct ure,
t he first byt e of which must be aligned on a 16- byt e boundary. I f t he alignment - check
except ion ( # AC) is enabled when execut ing t hese inst ruct ions ( and CPL is 3) , a
misaligned memory operand can cause eit her an alignment - check except ion or a
general- prot ect ion except ion ( # GP) depending on t he processor implement at ion
( see FXSAVE- Save x87 FPU, MMX, SSE, and SSE2 St at e and FXRSTOR- Rest or e
x87 FPU, MMX, SSE, and SSE2 St at e in Chapt er 3 of t he I nt el 64 and I A- 32 Ar chi-
t ect ur es Soft war e Developer s Manual, Volume 2A) .
The MOVUPS and MOVUPD inst ruct ions perform 128- bit unaligned loads or st ores.
The LDDQU inst ruct ions loads 128- bit unaligned dat a. They do not generat e general-
prot ect ion except ions ( # GP) when operands are not aligned on a 16- byt e boundary.
I f alignment checking is enabled, alignment - check except ions ( # AC) may or may not
be generat ed depending on processor implement at ion when dat a addresses are not
aligned on an 8- byt e boundary.
FSAVE and FRSTOR inst ruct ions can generat e unaligned references, which can cause
alignment - check fault s. These inst ruct ions are rarely needed by applicat ion
programs.
Exception Error Code
Yes ( always zero) .
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany an alignment - check fault , because t he
inst ruct ion is not execut ed.
6-62 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 18Machine-Check Exception (#MC)
Exception Class Abort.
Description
I ndicat es t hat t he processor det ect ed an int ernal machine error or a bus error, or t hat
an ext ernal agent det ect ed a bus error. The machine- check except ion is model-
specific, available on t he Pent ium and lat er generat ions of processors. The imple-
ment at ion of t he machine- check except ion is different bet ween different processor
families, and t hese implement at ions may not be compat ible wit h fut ure I nt el 64 or
I A- 32 processors. ( Use t he CPUI D inst ruct ion t o det ermine whet her t his feat ure is
present . )
Bus errors det ect ed by ext ernal agent s are signaled t o t he processor on dedicat ed
pins: t he BI NI T# and MCERR# pins on t he Pent ium 4, I nt el Xeon, and P6 family
processors and t he BUSCHK# pin on t he Pent ium processor. When one of t hese pins
is enabled, assert ing t he pin causes error informat ion t o be loaded int o machine-
check regist ers and a machine- check except ion is generat ed.
The machine- check except ion and machine- check archit ect ure are discussed in det ail
in Chapt er 15, Machine- Check Archit ect ure. Also, see t he dat a books for t he indi-
vidual processors for processor- specific hardware informat ion.
Exception Error Code
None. Error informat ion is provide by machine- check MSRs.
Saved Instruction Pointer
For t he Pent ium 4 and I nt el Xeon processors, t he saved cont ent s of ext ended
machine- check st at e regist ers are direct ly associat ed wit h t he error t hat caused t he
machine- check except ion t o be generat ed ( see Sect ion 15. 3. 1. 2,
I A32_MCG_STATUS MSR, and Sect ion 15. 3. 2. 6, I A32_MCG Ext ended Machine
Check St at e MSRs ) .
For t he P6 family processors, if t he EI PV flag in t he MCG_STATUS MSR is set , t he
saved cont ent s of CS and EI P regist ers are direct ly associat ed wit h t he error t hat
caused t he machine- check except ion t o be generat ed; if t he flag is clear, t he saved
inst ruct ion point er may not be associat ed wit h t he error ( see Sect ion 15. 3. 1.2,
I A32_MCG_STATUS MSR ) .
For t he Pent ium processor, cont ent s of t he CS and EI P regist ers may not be associ-
at ed wit h t he error.
Program State Change
The machine- check mechanism is enabled by set t ing t he MCE flag in cont rol regist er
CR4.
Vol. 3 6-63
INTERRUPT AND EXCEPTION HANDLING
For t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors, a program- st at e
change always accompanies a machine- check except ion, and an abort class excep-
t ion is generat ed. For abort except ions, informat ion about t he except ion can be
collect ed from t he machine- check MSRs, but t he program cannot generally be
rest art ed.
I f t he machine- check mechanism is not enabled ( t he MCE flag in cont rol regist er CR4
is clear) , a machine- check except ion causes t he processor t o ent er t he shut down
st at e.
6-64 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 19SIMD Floating-Point Exception (#XM)
Exception Class Fault.
Description
I ndicat es t he processor has det ect ed an SSE/ SSE2/ SSE3 SI MD float ing- point excep-
t ion. The appropriat e st at us flag in t he MXCSR regist er must be set and t he part icular
except ion unmasked for t his int errupt t o be generat ed.
There are six classes of numeric except ion condit ions t hat can occur while execut ing
an SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ion:
I nvalid operat ion ( # I )
Divide- by- zero ( # Z)
Denormal operand ( # D)
Numeric overflow ( # O)
Numeric underflow ( # U)
I nexact result ( Precision) ( # P)
The invalid operat ion, divide- by- zero, and denormal- operand except ions are pre-
comput at ion except ions; t hat is, t hey are det ect ed before any arit hmet ic operat ion
occurs. The numeric underflow, numeric overflow, and inexact result except ions are
post - comput at ional except ions.
See "SI MD Float ing- Point Except ions" in Chapt er 11 of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1, for addit ional informat ion
about t he SI MD float ing- point except ion classes.
When a SI MD float ing- point except ion occurs, t he processor does eit her of t he
following t hings:
I t handles t he except ion aut omat ically by producing t he most reasonable result
and allowing program execut ion t o cont inue undist urbed. This is t he response t o
masked except ions.
I t generat es a SI MD float ing- point except ion, which in t urn invokes a soft ware
except ion handler. This is t he response t o unmasked except ions.
Each of t he six SI MD float ing- point except ion condit ions has a corresponding flag bit
and mask bit in t he MXCSR regist er. I f an except ion is masked ( t he corresponding
mask bit in t he MXCSR regist er is set ) , t he processor t akes an appropriat e aut omat ic
default act ion and cont inues wit h t he comput at ion. I f t he except ion is unmasked ( t he
corresponding mask bit is clear) and t he operat ing syst em support s SI MD float ing-
point except ions ( t he OSXMMEXCPT flag in cont rol regist er CR4 is set ) , a soft ware
except ion handler is invoked t hrough a SI MD float ing- point except ion. I f t he excep-
t ion is unmasked and t he OSXMMEXCPT bit is clear ( indicat ing t hat t he operat ing
syst em does not support unmasked SI MD float ing- point except ions) , an invalid
opcode except ion ( # UD) is signaled inst ead of a SI MD float ing- point except ion.
Vol. 3 6-65
INTERRUPT AND EXCEPTION HANDLING
Not e t hat because SI MD float ing- point except ions are precise and occur immediat ely,
t he sit uat ion does not arise where an x87 FPU inst ruct ion, a WAI T/ FWAI T inst ruct ion,
or anot her SSE/ SSE2/ SSE3 inst ruct ion will cat ch a pending unmasked SI MD float ing-
point except ion.
I n sit uat ions where a SI MD float ing- point except ion occurred while t he SI MD
float ing- point except ions were masked ( causing t he corresponding except ion flag t o
be set ) and t he SI MD float ing- point except ion was subsequent ly unmasked, t hen no
except ion is generat ed when t he except ion is unmasked.
When SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ions operat e on packed operands
( made up of t wo or four sub- operands) , mult iple SI MD float ing- point except ion
condit ions may be det ect ed. I f no more t han one except ion condit ion is det ect ed for
one or more set s of sub- operands, t he except ion flags are set for each except ion
condit ion det ect ed. For example, an invalid except ion det ect ed for one sub- operand
will not prevent t he report ing of a divide- by- zero except ion for anot her sub- operand.
However, when t wo or more except ions condit ions are generat ed for one sub-
operand, only one except ion condit ion is report ed, according t o t he precedences
shown in Table 6- 8. This except ion precedence somet imes result s in t he higher
priorit y except ion condit ion being report ed and t he lower priorit y except ion condi-
t ions being ignored.
Exception Error Code
None.
Table 6-8. SIMD Floating-Point Exceptions Priority
Priority Description
1 (Highest) Invalid operation exception due to SNaN operand (or any NaN operand for
maximum, minimum, or certain compare and convert operations).
2 QNaN operand
1
.
3 Any other invalid operation exception not mentioned above or a divide-by-zero
exception
2
.
4 Denormal operand exception
2
.
5 Numeric overflow and underflow exceptions possibly in conjunction with the
inexact result exception
2
.
6 (Lowest) Inexact result exception.
NOTES:
1. Though a QNaN this is not an exception, the handling of a QNaN operand has precedence over
lower priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a divide-
by-zero- exception.
2. If masked, then instruction execution continues, and a lower priority exception can occur as
well.
6-66 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he SSE/ SSE2/ SSE3 inst ruct ion
t hat was execut ed when t he SI MD float ing- point except ion was generat ed. This is t he
fault ing inst ruct ion in which t he error condit ion was det ect ed.
Program State Change
A program- st at e change does not accompany a SI MD float ing- point except ion
because t he handling of t he except ion is immediat e unless t he part icular except ion is
masked. The available st at e informat ion is oft en sufficient t o allow recovery from t he
error and re- execut ion of t he fault ing inst ruct ion if needed.
Vol. 3 6-67
INTERRUPT AND EXCEPTION HANDLING
Interrupts 32 to 255User Defined Interrupts
Exception Class Not applicable.
Description
I ndicat es t hat t he processor did one of t he following t hings:
Execut ed an I NT n inst ruct ion where t he inst ruct ion operand is one of t he vect or
numbers from 32 t hrough 255.
Responded t o an int errupt request at t he I NTR pin or from t he local API C when
t he int errupt vect or number associat ed wit h t he request is from 32 t hrough 255.
Exception Error Code
Not applicable.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat follows t he
I NT n inst ruct ion or inst ruct ion following t he inst ruct ion on which t he I NTR signal
occurred.
Program State Change
A program- st at e change does not accompany int errupt s generat ed by t he I NT n
inst ruct ion or t he I NTR signal. The I NT n inst ruct ion generat es t he int errupt wit hin
t he inst ruct ion st ream. When t he processor receives an I NTR signal, it commit s all
st at e changes for all previous inst ruct ions before it responds t o t he int errupt ; so,
program execut ion can resume upon ret urning from t he int errupt handler.
6-68 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Vol. 3 7-1
CHAPTER 7
TASK MANAGEMENT
This chapt er describes t he I A- 32 archit ect ures t ask management facilit ies. These
facilit ies are only available when t he processor is running in prot ect ed mode.
This chapt er focuses on 32- bit t asks and t he 32- bit TSS st ruct ure. For informat ion on
16- bit t asks and t he 16- bit TSS st ruct ure, see Sect ion 7. 6, 16- Bit Task- St at e
Segment ( TSS) . For informat ion specific t o t ask management in 64- bit mode, see
Sect ion 7. 7, Task Management in 64- bit Mode.
7.1 TASK MANAGEMENT OVERVIEW
A t ask is a unit of work t hat a processor can dispat ch, execut e, and suspend. I t can
be used t o execut e a program, a t ask or process, an operat ing- syst em service ut ilit y,
an int errupt or except ion handler, or a kernel or execut ive ut ilit y.
The I A- 32 archit ect ure provides a mechanism for saving t he st at e of a t ask, for
dispat ching t asks for execut ion, and for swit ching from one t ask t o anot her. When
operat ing in prot ect ed mode, all processor execut ion t akes place from wit hin a t ask.
Even simple syst ems must define at least one t ask. More complex syst ems can use
t he processor s t ask management facilit ies t o support mult it asking applicat ions.
7.1.1 Task Structure
A t ask is made up of t wo part s: a t ask execut ion space and a t ask- st at e segment
( TSS) . The t ask execut ion space consist s of a code segment , a st ack segment , and
one or more dat a segment s ( see Figure 7- 1) . I f an operat ing syst em or execut ive
uses t he processor s privilege- level prot ect ion mechanism, t he t ask execut ion space
also provides a separat e st ack for each privilege level.
The TSS specifies t he segment s t hat make up t he t ask execut ion space and provides
a st orage place for t ask st at e informat ion. I n mult it asking syst ems, t he TSS also
provides a mechanism for linking t asks.
A t ask is ident ified by t he segment select or for it s TSS. When a t ask is loaded int o t he
processor for execut ion, t he segment select or, base address, limit , and segment
descript or at t ribut es for t he TSS are loaded int o t he t ask regist er ( see Sect ion 2. 4. 4,
Task Regist er ( TR) ) .
I f paging is implement ed for t he t ask, t he base address of t he page direct ory used by
t he t ask is loaded int o cont rol regist er CR3.
7-2 Vol. 3
TASK MANAGEMENT
7.1.2 Task State
The following it ems define t he st at e of t he current ly execut ing t ask:
The t asks current execut ion space, defined by t he segment select ors in t he
segment regist ers ( CS, DS, SS, ES, FS, and GS) .
The st at e of t he general- purpose regist ers.
The st at e of t he EFLAGS regist er.
The st at e of t he EI P regist er.
The st at e of cont rol regist er CR3.
The st at e of t he t ask regist er.
The st at e of t he LDTR regist er.
The I / O map base address and I / O map ( cont ained in t he TSS) .
St ack point ers t o t he privilege 0, 1, and 2 st acks ( cont ained in t he TSS) .
Link t o previously execut ed t ask ( cont ained in t he TSS) .
Prior t o dispat ching a t ask, all of t hese it ems are cont ained in t he t asks TSS, except
t he st at e of t he t ask regist er. Also, t he complet e cont ent s of t he LDTR regist er are not
cont ained in t he TSS, only t he segment select or for t he LDT.
Figure 7-1. Structure of a Task
Code
Segment
Stack
Segment
(Current Priv.
Data
Segment
Stack Seg.
Priv. Level 0
Stack Seg.
Priv. Level 1
Stack
Segment
(Priv. Level 2)
Task-State
Segment
(TSS)
Task Register
CR3
Level)
Vol. 3 7-3
TASK MANAGEMENT
7.1.3 Executing a Task
Soft ware or t he processor can dispat ch a t ask for execut ion in one of t he following
ways:
A explicit call t o a t ask wit h t he CALL inst ruct ion.
A explicit j ump t o a t ask wit h t he JMP inst ruct ion.
An implicit call ( by t he processor) t o an int errupt - handler t ask.
An implicit call t o an except ion- handler t ask.
A ret urn ( init iat ed wit h an I RET inst ruct ion) when t he NT flag in t he EFLAGS
regist er is set .
All of t hese met hods for dispat ching a t ask ident ify t he t ask t o be dispat ched wit h a
segment select or t hat point s t o a t ask gat e or t he TSS for t he t ask. When dispat ching
a t ask wit h a CALL or JMP inst ruct ion, t he select or in t he inst ruct ion may select t he
TSS direct ly or a t ask gat e t hat holds t he select or for t he TSS. When dispat ching a
t ask t o handle an int errupt or except ion, t he I DT ent ry for t he int errupt or except ion
must cont ain a t ask gat e t hat holds t he select or for t he int errupt - or except ion-
handler TSS.
When a t ask is dispat ched for execut ion, a t ask swit ch occurs bet ween t he current ly
running t ask and t he dispat ched t ask. During a t ask swit ch, t he execut ion environ-
ment of t he current ly execut ing t ask ( called t he t asks st at e or cont ex t ) is saved in
it s TSS and execut ion of t he t ask is suspended. The cont ext for t he dispat ched t ask is
t hen loaded int o t he processor and execut ion of t hat t ask begins wit h t he inst ruct ion
point ed t o by t he newly loaded EI P regist er. I f t he t ask has not been run since t he
syst em was last init ialized, t he EI P will point t o t he first inst ruct ion of t he t asks code;
ot herwise, it will point t o t he next inst ruct ion aft er t he last inst ruct ion t hat t he t ask
execut ed when it was last act ive.
I f t he current ly execut ing t ask ( t he calling t ask) called t he t ask being dispat ched ( t he
called t ask) , t he TSS segment select or for t he calling t ask is st ored in t he TSS of t he
called t ask t o provide a link back t o t he calling t ask.
For all I A- 32 processors, t asks are not recursive. A t ask cannot call or j ump t o it self.
I nt errupt s and except ions can be handled wit h a t ask swit ch t o a handler t ask. Here,
t he processor performs a t ask swit ch t o handle t he int errupt or except ion and aut o-
mat ically swit ches back t o t he int errupt ed t ask upon ret urning from t he int errupt -
handler t ask or except ion- handler t ask. This mechanism can also handle int errupt s
t hat occur during int errupt t asks.
As part of a t ask swit ch, t he processor can also swit ch t o anot her LDT, allowing each
t ask t o have a different logical- t o- physical address mapping for LDT- based segment s.
The page- direct ory base regist er ( CR3) also is reloaded on a t ask swit ch, allowing
each t ask t o have it s own set of page t ables. These prot ect ion facilit ies help isolat e
t asks and prevent t hem from int erfering wit h one anot her.
I f prot ect ion mechanisms are not used, t he processor provides no prot ect ion
bet ween t asks. This is t rue even wit h operat ing syst ems t hat use mult iple privilege
levels for prot ect ion. A t ask running at privilege level 3 t hat uses t he same LDT and
7-4 Vol. 3
TASK MANAGEMENT
page t ables as ot her privilege- level- 3 t asks can access code and corrupt dat a and t he
st ack of ot her t asks.
Use of t ask management facilit ies for handling mult it asking applicat ions is opt ional.
Mult it asking can be handled in soft ware, wit h each soft ware defined t ask execut ed in
t he cont ext of a single I A- 32 archit ect ure t ask.
7.2 TASK MANAGEMENT DATA STRUCTURES
The processor defines five dat a st ruct ures for handling t ask- relat ed act ivit ies:
Task- st at e segment ( TSS) .
Task- gat e descript or.
TSS descript or.
Task regist er.
NT flag in t he EFLAGS regist er.
When operat ing in prot ect ed mode, a TSS and TSS descript or must be creat ed for at
least one t ask, and t he segment select or for t he TSS must be loaded int o t he t ask
regist er ( using t he LTR inst ruct ion) .
7.2.1 Task-State Segment (TSS)
The processor st at e informat ion needed t o rest ore a t ask is saved in a syst em
segment called t he t ask- st at e segment ( TSS) . Figure 7- 2 shows t he format of a TSS
for t asks designed for 32- bit CPUs. The fields of a TSS are divided int o t wo main cat e-
gories: dynamic fields and st at ic fields.
For informat ion about 16- bit I nt el 286 processor t ask st ruct ures, see Sect ion 7. 6,
16- Bit Task- St at e Segment ( TSS) . For informat ion about 64- bit mode t ask st ruc-
t ures, see Sect ion 7. 7, Task Management in 64- bit Mode.
Vol. 3 7-5
TASK MANAGEMENT
The processor updat es dynamic fields when a t ask is suspended during a t ask swit ch.
The following are dynamic fields:
Gener al - pur pose r egi st er f i el ds St at e of t he EAX, ECX, EDX, EBX, ESP, EBP,
ESI , and EDI regist ers prior t o t he t ask swit ch.
Segment sel ect or f i el ds Segment select ors st ored in t he ES, CS, SS, DS, FS,
and GS regist ers prior t o t he t ask swit ch.
EFLAGS r egi st er f i el d St at e of t he EFAGS regist er prior t o t he t ask swit ch.
Figure 7-2. 32-Bit Task-State Segment (TSS)
0 31
100
96
92
88
84
80
76
I/O Map Base Address
15
LDT Segment Selector
GS
FS
DS
SS
CS
72
68
64
60
56
52
48
44
40
36
32
28
24
20
SS2
16
12
8
4
0
SS1
SS0
ESP0
Previous Task Link
ESP1
ESP2
CR3 (PDBR)
T
ES
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
EFLAGS
EIP
Reserved bits. Set to 0.
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
7-6 Vol. 3
TASK MANAGEMENT
EI P ( i nst r uct i on poi nt er ) f i el d St at e of t he EI P regist er prior t o t he t ask
swit ch.
Pr ev i ous t ask l i nk f i el d Cont ains t he segment select or for t he TSS of t he
previous t ask ( updat ed on a t ask swit ch t hat was init iat ed by a call, int errupt , or
except ion) . This field ( which is somet imes called t he back link field) permit s a
t ask swit ch back t o t he previous t ask by using t he I RET inst ruct ion.
The processor reads t he st at ic fields, but does not normally change t hem. These
fields are set up when a t ask is creat ed. The following are st at ic fields:
LDT segment sel ect or f i el d Cont ains t he segment select or for t he t ask' s
LDT.
CR3 cont r ol r egi st er f i el d Cont ains t he base physical address of t he page
direct ory t o be used by t he t ask. Cont rol regist er CR3 is also known as t he page-
direct ory base regist er ( PDBR) .
Pr i v i l ege l ev el - 0, - 1, and - 2 st ack poi nt er f i el ds These st ack point ers
consist of a logical address made up of t he segment select or for t he st ack
segment ( SS0, SS1, and SS2) and an offset int o t he st ack ( ESP0, ESP1, and
ESP2) . Not e t hat t he values in t hese fields are st at ic for a part icular t ask;
whereas, t he SS and ESP values will change if st ack swit ching occurs wit hin t he
t ask.
T ( debug t r ap) f l ag ( by t e 100, bi t 0) When set , t he T flag causes t he
processor t o raise a debug except ion when a t ask swit ch t o t his t ask occurs ( see
Sect ion 16. 3. 1. 5, Task- Swit ch Except ion Condit ion ) .
I / O map base addr ess f i el d Cont ains a 16- bit offset from t he base of t he
TSS t o t he I / O permission bit map and int errupt redirect ion bit map. When
present , t hese maps are st ored in t he TSS at higher addresses. The I / O map base
address point s t o t he beginning of t he I / O permission bit map and t he end of t he
int errupt redirect ion bit map. See Chapt er 13, I nput / Out put , in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1, for more
informat ion about t he I / O permission bit map. See Sect ion 17. 3, I nt errupt and
Except ion Handling in Virt ual- 8086 Mode, for a det ailed descript ion of t he
int errupt redirect ion bit map.
I f paging is used:
Avoid placing a page boundary in t he part of t he TSS t hat t he processor reads
during a t ask swit ch ( t he first 104 byt es) . The processor may not correct ly
perform address t ranslat ions if a boundary occurs in t his area. During a t ask
swit ch, t he processor reads and writ es int o t he first 104 byt es of each TSS ( using
cont iguous physical addresses beginning wit h t he physical address of t he first
byt e of t he TSS) . So, aft er TSS access begins, if part of t he 104 byt es is not
physically cont iguous, t he processor will access incorrect informat ion wit hout
generat ing a page- fault except ion.
Pages corresponding t o t he previous t asks TSS, t he current t asks TSS, and t he
descript or t able ent ries for each all should be marked as read/ writ e.
Vol. 3 7-7
TASK MANAGEMENT
Task swit ches are carried out fast er if t he pages cont aining t hese st ruct ures are
present in memory before t he t ask swit ch is init iat ed.
7.2.2 TSS Descriptor
The TSS, like all ot her segment s, is defined by a segment descript or. Figure 7- 3
shows t he format of a TSS descript or. TSS descript ors may only be placed in t he GDT;
t hey cannot be placed in an LDT or t he I DT.
An at t empt t o access a TSS using a segment select or wit h it s TI flag set ( which indi-
cat es t he current LDT) causes a general- prot ect ion except ion ( # GP) t o be generat ed
during CALLs and JMPs; it causes an invalid TSS except ion ( # TS) during I RETs. A
general- prot ect ion except ion is also generat ed if an at t empt is made t o load a
segment select or for a TSS int o a segment regist er.
The busy flag ( B) in t he t ype field indicat es whet her t he t ask is busy. A busy t ask is
current ly running or suspended. A t ype field wit h a value of 1001B indicat es an inac-
t ive t ask; a value of 1011B indicat es a busy t ask. Tasks are not recursive. The
processor uses t he busy flag t o det ect an at t empt t o call a t ask whose execut ion has
been int errupt ed. To insure t hat t here is only one busy flag is associat ed wit h a t ask,
each TSS should have only one TSS descript or t hat point s t o it .
The base, limit , and DPL fields and t he granularit y and present flags have funct ions
similar t o t heir use in dat a- segment descript ors ( see Sect ion 3. 4. 5, Segment
Descript ors ) . When t he G flag is 0 in a TSS descript or for a 32- bit TSS, t he limit field
must have a value equal t o or great er t han 67H, one byt e less t han t he minimum size
Figure 7-3. TSS Descriptor
31 24 23 22 2120 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
0
0
31 16 15 0
Base Address 15:00 Segment Limit 15:00
Base 23:16
A
V
L
Limit
19:16
0
1 B 0 1
TSS Descriptor
AVL
B
BASE
DPL
G
Available for use by system software
Busy flag
Segment Base Address
Descriptor Privilege Level
Granularity
LIMIT
P
TYPE
Segment Limit
Segment Present
Segment Type
0
4
7-8 Vol. 3
TASK MANAGEMENT
of a TSS. At t empt ing t o swit ch t o a t ask whose TSS descript or has a limit less t han
67H generat es an invalid-TSS except ion ( # TS) . A larger limit is required if an I / O
permission bit map is included or if t he operat ing syst em st ores addit ional dat a. The
processor does not check for a limit great er t han 67H on a t ask swit ch; however, it
does check when accessing t he I / O permission bit map or int errupt redirect ion bit
map.
Any program or procedure wit h access t o a TSS descript or ( t hat is, whose CPL is
numerically equal t o or less t han t he DPL of t he TSS descript or) can dispat ch t he t ask
wit h a call or a j ump.
I n most syst ems, t he DPLs of TSS descript ors are set t o values less t han 3, so t hat
only privileged soft ware can perform t ask swit ching. However, in mult it asking appli-
cat ions, DPLs for some TSS descript ors may be set t o 3 t o allow t ask swit ching at t he
applicat ion ( or user) privilege level.
7.2.3 TSS Descriptor in 64-bit mode
I n 64- bit mode, t ask swit ching is not support ed, but TSS descript ors st ill exist . The
format of a 64- bit TSS is described in Sect ion 7.7.
I n 64- bit mode, t he TSS descript or is expanded t o 16 byt es ( see Figure 7- 4) . This
expansion also applies t o an LDT descript or in 64- bit mode. Table 3- 2 provides t he
encoding informat ion for t he segment t ype field.
Vol. 3 7-9
TASK MANAGEMENT
7.2.4 Task Register
The t ask regist er holds t he 16- bit segment select or and t he ent ire segment
descript or ( 32- bit base address, 16- bit segment limit , and descript or at t ribut es) for
t he TSS of t he current t ask ( see Figure 2- 5) . This informat ion is copied from t he TSS
descript or in t he GDT for t he current t ask. Figure 7- 5 shows t he pat h t he processor
uses t o access t he TSS ( using t he informat ion in t he t ask regist er) .
The t ask regist er has a visible part ( t hat can be read and changed by soft ware) and
an invisible part ( maint ained by t he processor and is inaccessible by soft ware) . The
segment select or in t he visible port ion point s t o a TSS descript or in t he GDT. The
processor uses t he invisible port ion of t he t ask regist er t o cache t he segment
descript or for t he TSS. Caching t hese values in a regist er makes execut ion of t he t ask
more efficient . The LTR ( load t ask regist er) and STR ( st ore t ask regist er) inst ruct ions
load and read t he visible port ion of t he t ask regist er:
Figure 7-4. Format of TSS and LDT Descriptors in 64-bit Mode
31 24 23 22 2120 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
0
0
31 16 15 0
Base Address 15:00 Segment Limit 15:00
Base 23:16
A
V
L
Limit
19:16
0
TSS (or LDT) Descriptor
AVL
B
BASE
DPL
G
Available for use by system software
Busy flag
Segment Base Address
Descriptor Privilege Level
Granularity
LIMIT
P
TYPE
Segment Limit
Segment Present
Segment Type
0
4
31 13 12 8 7 0
Reserved
31 0
Base Address 63:32
Reserved
0
8
12
7-10 Vol. 3
TASK MANAGEMENT
The LTR inst ruct ion loads a segment select or ( source operand) int o t he t ask regist er
t hat point s t o a TSS descript or in t he GDT. I t t hen loads t he invisible port ion of t he
t ask regist er wit h informat ion from t he TSS descript or. LTR is a privileged inst ruct ion
t hat may be execut ed only when t he CPL is 0. I t s used during syst em init ializat ion t o
put an init ial value in t he t ask regist er. Aft erwards, t he cont ent s of t he t ask regist er
are changed implicit ly when a t ask swit ch occurs.
The STR ( st ore t ask regist er) inst ruct ion st ores t he visible port ion of t he t ask regist er
in a general- purpose regist er or memory. This inst ruct ion can be execut ed by code
running at any privilege level in order t o ident ify t he current ly running t ask. However,
it is normally used only by operat ing syst em soft ware.
On power up or reset of t he processor, segment select or and base address are set t o
t he default value of 0; t he limit is set t o FFFFH.
Figure 7-5. Task Register
Segment Limit Selector
+
GDT
TSS Descriptor
0
Base Address
Task
Invisible Part Visible Part
TSS
Register
Vol. 3 7-11
TASK MANAGEMENT
7.2.5 Task-Gate Descriptor
A t ask- gat e descript or provides an indirect , prot ect ed reference t o a t ask ( see
Figure 7- 6) . I t can be placed in t he GDT, an LDT, or t he I DT. The TSS segment
select or field in a t ask- gat e descript or point s t o a TSS descript or in t he GDT. The RPL
in t his segment select or is not used.
The DPL of a t ask- gat e descript or cont rols access t o t he TSS descript or during a t ask
swit ch. When a program or procedure makes a call or j ump t o a t ask t hrough a t ask
gat e, t he CPL and t he RPL field of t he gat e select or point ing t o t he t ask gat e must be
less t han or equal t o t he DPL of t he t ask- gat e descript or. Not e t hat when a t ask gat e
is used, t he DPL of t he dest inat ion TSS descript or is not used.
A t ask can be accessed eit her t hrough a t ask- gat e descript or or a TSS descript or.
Bot h of t hese st ruct ures sat isfy t he following needs:
Need f or a t ask t o hav e onl y one busy f l ag Because t he busy flag for a t ask
is st ored in t he TSS descript or, each t ask should have only one TSS descript or.
There may, however, be several t ask gat es t hat reference t he same TSS
descript or.
Need t o pr ovi de sel ect i ve access t o t ask s Task gat es fill t his need, because
t hey can reside in an LDT and can have a DPL t hat is different from t he TSS
descript or' s DPL. A program or procedure t hat does not have sufficient privilege
t o access t he TSS descript or for a t ask in t he GDT ( which usually has a DPL of 0)
may be allowed access t o t he t ask t hrough a t ask gat e wit h a higher DPL. Task
gat es give t he operat ing syst em great er lat it ude for limit ing access t o specific
t asks.
Need f or an i nt er r upt or ex cept i on t o be handl ed by an i ndependent t ask
Task gat es may also reside in t he I DT, which allows int errupt s and except ions
Figure 7-6. Task-Gate Descriptor
31 16 15 13 14 12 11 8 7 0
P
D
P
L
Type
0
31 16 15 0
TSS Segment Selector
1 0 1 0
DPL
P
TYPE
Descriptor Privilege Level
Segment Present
Segment Type
4
0 Reserved
Reserved Reserved
7-12 Vol. 3
TASK MANAGEMENT
t o be handled by handler t asks. When an int errupt or except ion vect or point s t o
a t ask gat e, t he processor swit ches t o t he specified t ask.
Figure 7- 7 illust rat es how a t ask gat e in an LDT, a t ask gat e in t he GDT, and a t ask
gat e in t he I DT can all point t o t he same t ask.
7.3 TASK SWITCHING
The processor t ransfers execut ion t o anot her t ask in one of four cases:
The current program, t ask, or procedure execut es a JMP or CALL inst ruct ion t o a
TSS descript or in t he GDT.
The current program, t ask, or procedure execut es a JMP or CALL inst ruct ion t o a
t ask- gat e descript or in t he GDT or t he current LDT.
Figure 7-7. Task Gates Referencing the Same Task
LDT
Task Gate
TSS GDT
TSS Descriptor
IDT
Task Gate
Task Gate
Vol. 3 7-13
TASK MANAGEMENT
An int errupt or except ion vect or point s t o a t ask- gat e descript or in t he I DT.
The current t ask execut es an I RET when t he NT flag in t he EFLAGS regist er is set .
JMP, CALL, and I RET inst ruct ions, as well as int errupt s and except ions, are all mech-
anisms for redirect ing a program. The referencing of a TSS descript or or a t ask gat e
( when calling or j umping t o a t ask) or t he st at e of t he NT flag ( when execut ing an
I RET inst ruct ion) det ermines whet her a t ask swit ch occurs.
The processor performs t he following operat ions when swit ching t o a new t ask:
1. Obt ains t he TSS segment select or for t he new t ask as t he operand of t he JMP or
CALL inst ruct ion, from a t ask gat e, or from t he previous t ask link field ( for a t ask
swit ch init iat ed wit h an I RET inst ruct ion) .
2. Checks t hat t he current ( old) t ask is allowed t o swit ch t o t he new t ask. Dat a-
access privilege rules apply t o JMP and CALL inst ruct ions. The CPL of t he current
( old) t ask and t he RPL of t he segment select or for t he new t ask must be less t han
or equal t o t he DPL of t he TSS descript or or t ask gat e being referenced.
Except ions, int errupt s ( except for int errupt s generat ed by t he I NT n inst ruct ion) ,
and t he I RET inst ruct ion are permit t ed t o swit ch t asks regardless of t he DPL of
t he dest inat ion t ask- gat e or TSS descript or. For int errupt s generat ed by t he I NT n
inst ruct ion, t he DPL is checked.
3. Checks t hat t he TSS descript or of t he new t ask is marked present and has a valid
limit ( great er t han or equal t o 67H) .
4. Checks t hat t he new t ask is available ( call, j ump, except ion, or int errupt ) or busy
( I RET ret urn) .
5. Checks t hat t he current ( old) TSS, new TSS, and all segment descript ors used in
t he t ask swit ch are paged int o syst em memory.
6. I f t he t ask swit ch was init iat ed wit h a JMP or I RET inst ruct ion, t he processor
clears t he busy ( B) flag in t he current ( old) t asks TSS descript or; if init iat ed wit h
a CALL inst ruct ion, an except ion, or an int errupt : t he busy ( B) flag is left set .
( See Table 7- 2. )
7. I f t he t ask swit ch was init iat ed wit h an I RET inst ruct ion, t he processor clears t he
NT flag in a t emporarily saved image of t he EFLAGS regist er; if init iat ed wit h a
CALL or JMP inst ruct ion, an except ion, or an int errupt , t he NT flag is left
unchanged in t he saved EFLAGS image.
8. Saves t he st at e of t he current ( old) t ask in t he current t asks TSS. The processor
finds t he base address of t he current TSS in t he t ask regist er and t hen copies t he
st at es of t he following regist ers int o t he current TSS: all t he general- purpose
regist ers, segment select ors from t he segment regist ers, t he t emporarily saved
image of t he EFLAGS regist er, and t he inst ruct ion point er regist er ( EI P) .
9. I f t he t ask swit ch was init iat ed wit h a CALL inst ruct ion, an except ion, or an
int errupt , t he processor will set t he NT flag in t he EFLAGS loaded from t he new
t ask. I f init iat ed wit h an I RET inst ruct ion or JMP inst ruct ion, t he NT flag will reflect
t he st at e of NT in t he EFLAGS loaded from t he new t ask ( see Table 7- 2) .
7-14 Vol. 3
TASK MANAGEMENT
10. I f t he t ask swit ch was init iat ed wit h a CALL inst ruct ion, JMP inst ruct ion, an
except ion, or an int errupt , t he processor set s t he busy ( B) flag in t he new t asks
TSS descript or; if init iat ed wit h an I RET inst ruct ion, t he busy ( B) flag is left set .
11. Loads t he t ask regist er wit h t he segment select or and descript or for t he new
t ask' s TSS.
12. The TSS st at e is loaded int o t he processor. This includes t he LDTR regist er, t he
PDBR ( cont rol regist er CR3) , t he EFLAGS regist ers, t he EI P regist er, t he general-
purpose regist ers, and t he segment select ors. Not e t hat a fault during t he load of
t his st at e may corrupt archit ect ural st at e.
13. The descript ors associat ed wit h t he segment select ors are loaded and qualified.
Any errors associat ed wit h t his loading and qualificat ion occur in t he cont ext of
t he new t ask.
NOTES
I f all checks and saves have been carried out successfully, t he
processor commit s t o t he t ask swit ch. I f an unrecoverable error
occurs in st eps 1 t hrough 11, t he processor does not complet e t he
t ask swit ch and insures t hat t he processor is ret urned t o it s st at e
prior t o t he execut ion of t he inst ruct ion t hat init iat ed t he t ask swit ch.
I f an unrecoverable error occurs in st ep 12, archit ect ural st at e may
be corrupt ed, but an at t empt will be made t o handle t he error in t he
prior execut ion environment . I f an unrecoverable error occurs aft er
t he commit point ( in st ep 13) , t he processor complet es t he t ask
swit ch ( wit hout performing addit ional access and segment avail-
abilit y checks) and generat es t he appropriat e except ion prior t o
beginning execut ion of t he new t ask.
I f except ions occur aft er t he commit point , t he except ion handler
must finish t he t ask swit ch it self before allowing t he processor t o
begin execut ing t he new t ask. See Chapt er 6, I nt errupt 10I nvalid
TSS Except ion ( # TS) , for more informat ion about t he affect of
except ions on a t ask when t hey occur aft er t he commit point of a t ask
swit ch.
14. Begins execut ing t he new t ask. ( To an except ion handler, t he first inst ruct ion of
t he new t ask appears not t o have been execut ed. )
The st at e of t he current ly execut ing t ask is always saved when a successful t ask
swit ch occurs. I f t he t ask is resumed, execut ion st art s wit h t he inst ruct ion point ed t o
by t he saved EI P value, and t he regist ers are rest ored t o t he values t hey held when
t he t ask was suspended.
When swit ching t asks, t he privilege level of t he new t ask does not inherit it s privilege
level from t he suspended t ask. The new t ask begins execut ing at t he privilege level
specified in t he CPL field of t he CS regist er, which is loaded from t he TSS. Because
t asks are isolat ed by t heir separat e address spaces and TSSs and because privilege
Vol. 3 7-15
TASK MANAGEMENT
rules cont rol access t o a TSS, soft ware does not need t o perform explicit privilege
checks on a t ask swit ch.
Table 7- 1 shows t he except ion condit ions t hat t he processor checks for when
swit ching t asks. I t also shows t he except ion t hat is generat ed for each check if an
error is det ect ed and t he segment t hat t he error code references. ( The order of t he
checks in t he t able is t he order used in t he P6 family processors. The exact order is
model specific and may be different for ot her I A- 32 processors. ) Except ion handlers
designed t o handle t hese except ions may be subj ect t o recursive calls if t hey at t empt
t o reload t he segment select or t hat generat ed t he except ion. The cause of t he excep-
t ion ( or t he first of mult iple causes) should be fixed before reloading t he select or.
Table 7-1. Exception Conditions Checked During a Task Switch
Condition Checked Exception
1
Error Code
Reference
2
Segment selector for a TSS descriptor references
the GDT and is within the limits of the table.
#GP
#TS (for IRET)
New Tasks TSS
TSS descriptor is present in memory. #NP New Tasks TSS
TSS descriptor is not busy (for task switch initiated
by a call, interrupt, or exception).
#GP (for JMP, CALL,
INT)
Tasks back-link TSS
TSS descriptor is not busy (for task switch initiated
by an IRET instruction).
#TS (for IRET) New Tasks TSS
TSS segment limit greater than or equal to 108 (for
32-bit TSS) or 44 (for 16-bit TSS).
#TS New Tasks TSS
Registers are loaded from the values in the TSS.
LDT segment selector of new task is valid
3
. #TS New Tasks LDT
Code segment DPL matches segment selector RPL. #TS New Code Segment
SS segment selector is valid
2
. #TS New Stack Segment
Stack segment is present in memory. #SS New Stack Segment
Stack segment DPL matches CPL. #TS New stack segment
LDT of new task is present in memory. #TS New Tasks LDT
CS segment selector is valid
3
. #TS New Code Segment
Code segment is present in memory. #NP New Code Segment
Stack segment DPL matches selector RPL. #TS New Stack Segment
DS, ES, FS, and GS segment selectors are valid
3
. #TS New Data Segment
DS, ES, FS, and GS segments are readable. #TS New Data Segment
7-16 Vol. 3
TASK MANAGEMENT
The TS ( t ask swit ched) flag in t he cont rol regist er CR0 is set every t ime a t ask swit ch
occurs. Syst em soft ware uses t he TS flag t o coordinat e t he act ions of float ing- point
unit when generat ing float ing- point except ions wit h t he rest of t he processor. The TS
flag indicat es t hat t he cont ext of t he float ing- point unit may be different from t hat of
t he current t ask. See Sect ion 2. 5, Cont rol Regist ers , for a det ailed descript ion of
t he funct ion and use of t he TS flag.
7.4 TASK LINKING
The previous t ask link field of t he TSS ( somet imes called t he backlink ) and t he NT
flag in t he EFLAGS regist er are used t o ret urn execut ion t o t he previous t ask.
EFLAGS. NT = 1 indicat es t hat t he current ly execut ing t ask is nest ed wit hin t he
execut ion of anot her t ask.
When a CALL inst ruct ion, an int errupt , or an except ion causes a t ask swit ch: t he
processor copies t he segment select or for t he current TSS t o t he previous t ask link
field of t he TSS for t he new t ask; it t hen set s EFLAGS. NT = 1. I f soft ware uses an
I RET inst ruct ion t o suspend t he new t ask, t he processor checks for EFLAGS. NT = 1;
it t hen uses t he value in t he previous t ask link field t o ret urn t o t he previous t ask. See
Figures 7- 8.
When a JMP inst ruct ion causes a t ask swit ch, t he new t ask is not nest ed. The
previous t ask link field is not used and EFLAGS. NT = 0. Use a JMP inst ruct ion t o
dispat ch a new t ask when nest ing is not desired.
DS, ES, FS, and GS segments are present in memory. #NP New Data Segment
DS, ES, FS, and GS segment DPL greater than or
equal to CPL (unless these are
conforming segments).
#TS New Data Segment
NOTES:
1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS
exception, and #SS is stack-fault exception.
2. The error code contains an index to the segment descriptor referenced in this column.
3. A segment selector is valid if it is in a compatible type of table (GDT or LDT), occupies an address
within the table's segment limit, and refers to a compatible type of descriptor (for example, a seg-
ment selector in the CS register only is valid when it points to a code-segment descriptor).
Table 7-1. Exception Conditions Checked During a Task Switch (Contd.)
Condition Checked Exception
1
Error Code
Reference
2
Vol. 3 7-17
TASK MANAGEMENT
Table 7- 2 shows t he busy flag ( in t he TSS segment descript or) , t he NT flag, t he
previous t ask link field, and TS flag ( in cont rol regist er CR0) during a t ask swit ch.
The NT flag may be modified by soft ware execut ing at any privilege level. I t is
possible for a program t o set t he NT flag and execut e an I RET inst ruct ion. This might
randomly invoke t he t ask specified in t he previous link field of t he current t ask' s TSS.
To keep such spurious t ask swit ches from succeeding, t he operat ing syst em should
init ialize t he previous t ask link field in every TSS t hat it creat es t o 0.
Figure 7-8. Nested Tasks
Table 7-2. Effect of a Task Switch on Busy Flag, NT Flag,
Previous Task Link Field, and TS Flag
Flag or Field Effect of JMP
instruction
Effect of CALL
Instruction or
Interrupt
Effect of IRET
Instruction
Busy (B) flag of new
task.
Flag is set. Must have
been clear before.
Flag is set. Must have
been clear before.
No change. Must have
been set.
Busy flag of old task. Flag is cleared. No change. Flag is
currently set.
Flag is cleared.
NT flag of new task. Set to value from TSS
of new task.
Flag is set. Set to value from TSS
of new task.
NT flag of old task. No change. No change. Flag is cleared.
Previous task link field
of new task.
No change. Loaded with selector
for old tasks TSS.
No change.
Previous task link field
of old task.
No change. No change. No change.
TS flag in control
register CR0.
Flag is set. Flag is set. Flag is set.
Top Level
Task
NT=0
Previous
TSS
Nested
Task
NT=1
TSS
More Deeply
Nested Task
NT=1
TSS
Currently Executing
Task
NT=1
EFLAGS
Task Register
Task Link
Previous
Task Link
Previous
Task Link
7-18 Vol. 3
TASK MANAGEMENT
7.4.1 Use of Busy Flag To Prevent Recursive Task Switching
A TSS allows only one cont ext t o be saved for a t ask; t herefore, once a t ask is called
( dispat ched) , a recursive ( or re- ent rant ) call t o t he t ask would cause t he current
st at e of t he t ask t o be lost . The busy flag in t he TSS segment descript or is provided
t o prevent re- ent rant t ask swit ching and a subsequent loss of t ask st at e informat ion.
The processor manages t he busy flag as follows:
1. When dispat ching a t ask, t he processor set s t he busy flag of t he new t ask.
2. I f during a t ask swit ch, t he current t ask is placed in a nest ed chain ( t he t ask
swit ch is being generat ed by a CALL inst ruct ion, an int errupt , or an except ion) ,
t he busy flag for t he current t ask remains set .
3. When swit ching t o t he new t ask ( init iat ed by a CALL inst ruct ion, int errupt , or
except ion) , t he processor generat es a general- prot ect ion except ion ( # GP) if t he
busy flag of t he new t ask is already set . I f t he t ask swit ch is init iat ed wit h an I RET
inst ruct ion, t he except ion is not raised because t he processor expect s t he busy
flag t o be set .
4. When a t ask is t erminat ed by a j ump t o a new t ask ( init iat ed wit h a JMP
inst ruct ion in t he t ask code) or by an I RET inst ruct ion in t he t ask code, t he
processor clears t he busy flag, ret urning t he t ask t o t he not busy st at e.
The processor prevent s recursive t ask swit ching by prevent ing a t ask from swit ching
t o it self or t o any t ask in a nest ed chain of t asks. The chain of nest ed suspended t asks
may grow t o any lengt h, due t o mult iple calls, int errupt s, or except ions. The busy
flag prevent s a t ask from being invoked if it is in t his chain.
The busy flag may be used in mult iprocessor configurat ions, because t he processor
follows a LOCK prot ocol ( on t he bus or in t he cache) when it set s or clears t he busy
flag. This lock keeps t wo processors from invoking t he same t ask at t he same t ime.
See Sect ion 8. 1. 2.1, Aut omat ic Locking, for more informat ion about set t ing t he
busy flag in a mult iprocessor applicat ions.
7.4.2 Modifying Task Linkages
I n a uniprocessor syst em, in sit uat ions where it is necessary t o remove a t ask from a
chain of linked t asks, use t he following procedure t o remove t he t ask:
1. Disable int errupt s.
2. Change t he previous t ask link field in t he TSS of t he pre- empt ing t ask ( t he t ask
t hat suspended t he t ask t o be removed) . I t is assumed t hat t he pre- empt ing t ask
is t he next t ask ( newer t ask) in t he chain from t he t ask t o be removed. Change
t he previous t ask link field t o point t o t he TSS of t he next oldest t ask in t he chain
or t o an even older t ask in t he chain.
3. Clear t he busy ( B) flag in t he TSS segment descript or for t he t ask being removed
from t he chain. I f more t han one t ask is being removed from t he chain, t he busy
flag for each t ask being remove must be cleared.
4. Enable int errupt s.
Vol. 3 7-19
TASK MANAGEMENT
I n a mult iprocessing syst em, addit ional synchronizat ion and serializat ion operat ions
must be added t o t his procedure t o insure t hat t he TSS and it s segment descript or
are bot h locked when t he previous t ask link field is changed and t he busy flag is
cleared.
7.5 TASK ADDRESS SPACE
The address space for a t ask consist s of t he segment s t hat t he t ask can access.
These segment s include t he code, dat a, st ack, and syst em segment s referenced in
t he TSS and any ot her segment s accessed by t he t ask code. The segment s are
mapped int o t he processor s linear address space, which is in t urn mapped int o t he
processor s physical address space ( eit her direct ly or t hrough paging) .
The LDT segment field in t he TSS can be used t o give each t ask it s own LDT. Giving a
t ask it s own LDT allows t he t ask address space t o be isolat ed from ot her t asks by
placing t he segment descript ors for all t he segment s associat ed wit h t he t ask in t he
t asks LDT.
I t also is possible for several t asks t o use t he same LDT. This is a memory- efficient
way t o allow specific t asks t o communicat e wit h or cont rol each ot her, wit hout drop-
ping t he prot ect ion barriers for t he ent ire syst em.
Because all t asks have access t o t he GDT, it also is possible t o creat e shared
segment s accessed t hrough segment descript ors in t his t able.
I f paging is enabled, t he CR3 regist er ( PDBR) field in t he TSS allows each t ask t o
have it s own set of page t ables for mapping linear addresses t o physical addresses.
Or, several t asks can share t he same set of page t ables.
7.5.1 Mapping Tasks to the Linear and Physical Address Spaces
Tasks can be mapped t o t he linear address space and physical address space in one
of t wo ways:
One l i near - t o- phy si cal addr ess space mappi ng i s shar ed among al l t ask s.
When paging is not enabled, t his is t he only choice. Wit hout paging, all linear
addresses map t o t he same physical addresses. When paging is enabled, t his
form of linear- t o- physical address space mapping is obt ained by using one page
direct ory for all t asks. The linear address space may exceed t he available
physical space if demand- paged virt ual memory is support ed.
Each t ask has i t s ow n l i near addr ess space t hat i s mapped t o t he phy si cal
addr ess space. This form of mapping is accomplished by using a different
page direct ory for each t ask. Because t he PDBR ( cont rol regist er CR3) is loaded
on t ask swit ches, each t ask may have a different page direct ory.
The linear address spaces of different t asks may map t o complet ely dist inct physical
addresses. I f t he ent ries of different page direct ories point t o different page t ables
7-20 Vol. 3
TASK MANAGEMENT
and t he page t ables point t o different pages of physical memory, t hen t he t asks do
not share physical addresses.
Wit h eit her met hod of mapping t ask linear address spaces, t he TSSs for all t asks
must lie in a shared area of t he physical space, which is accessible t o all t asks. This
mapping is required so t hat t he mapping of TSS addresses does not change while t he
processor is reading and updat ing t he TSSs during a t ask swit ch. The linear address
space mapped by t he GDT also should be mapped t o a shared area of t he physical
space; ot herwise, t he purpose of t he GDT is defeat ed. Figure 7- 9 shows how t he
linear address spaces of t wo t asks can overlap in t he physical space by sharing page
t ables.
7.5.2 Task Logical Address Space
To allow t he sharing of dat a among t asks, use t he following t echniques t o creat e
shared logical- t o- physical address- space mappings for dat a segment s:
Thr ough t he segment descr i pt or s i n t he GDT All t asks must have access
t o t he segment descript ors in t he GDT. I f some segment descript ors in t he GDT
point t o segment s in t he linear- address space t hat are mapped int o an area of t he
physical- address space common t o all t asks, t hen all t asks can share t he dat a
and code in t hose segment s.
Thr ough a shar ed LDT Two or more t asks can use t he same LDT if t he LDT
fields in t heir TSSs point t o t he same LDT. I f some segment descript ors in a
Figure 7-9. Overlapping Linear-to-Physical Mappings
Task A
TSS
PDE
Page Directories
PDE
PTE
PTE
PTE
PTE
PTE
Page Tables Page Frames
Task A
Task A
Shared
Shared
Task B
Task B
Shared PT
PTE
PTE
PDE
PDE
PDBR
PDBR
Task A TSS
Task B TSS
Vol. 3 7-21
TASK MANAGEMENT
shared LDT point t o segment s t hat are mapped t o a common area of t he physical
address space, t he dat a and code in t hose segment s can be shared among t he
t asks t hat share t he LDT. This met hod of sharing is more select ive t han sharing
t hrough t he GDT, because t he sharing can be limit ed t o specific t asks. Ot her
t asks in t he syst em may have different LDTs t hat do not give t hem access t o t he
shared segment s.
Thr ough segment descr i pt or s i n di st i nct LDTs t hat ar e mapped t o
common addr esses i n l i near addr ess space I f t his common area of t he
linear address space is mapped t o t he same area of t he physical address space
for each t ask, t hese segment descript ors permit t he t asks t o share segment s.
Such segment descript ors are commonly called aliases. This met hod of sharing is
even more select ive t han t hose list ed above, because, ot her segment descript ors
in t he LDTs may point t o independent linear addresses which are not shared.
7.6 16-BIT TASK-STATE SEGMENT (TSS)
The 32- bit I A- 32 processors also recognize a 16- bit TSS format like t he one used in
I nt el 286 processors ( see Figure 7- 10) . This format is support ed for compat ibilit y
wit h soft ware writ t en t o run on earlier I A- 32 processors.
The following informat ion is import ant t o know about t he 16- bit TSS.
Do not use a 16- bit TSS t o implement a virt ual- 8086 t ask.
The valid segment limit for a 16- bit TSS is 2CH.
The 16- bit TSS does not cont ain a field for t he base address of t he page direct ory,
which is loaded int o cont rol regist er CR3. A separat e set of page t ables for each
t ask is not support ed for 16- bit t asks. I f a 16- bit t ask is dispat ched, t he page-
t able st ruct ure for t he previous t ask is used.
The I / O base address is not included in t he 16- bit TSS. None of t he funct ions of
t he I / O map are support ed.
When t ask st at e is saved in a 16- bit TSS, t he upper 16 bit s of t he EFLAGS regist er
and t he EI P regist er are lost .
When t he general- purpose regist ers are loaded or saved from a 16- bit TSS, t he
upper 16 bit s of t he regist ers are modified and not maint ained.
7-22 Vol. 3
TASK MANAGEMENT
7.7 TASK MANAGEMENT IN 64-BIT MODE
I n 64- bit mode, t ask st ruct ure and t ask st at e are similar t o t hose in prot ect ed mode.
However, t he t ask swit ching mechanism available in prot ect ed mode is not support ed
in 64- bit mode. Task management and swit ching must be performed by soft ware.
The processor issues a general- prot ect ion except ion ( # GP) if t he following is
at t empt ed in 64- bit mode:
Cont rol t ransfer t o a TSS or a t ask gat e using JMP, CALL, I NTn, or int errupt .
An I RET wit h EFLAGS. NT ( nest ed t ask) set t o 1.
Figure 7-10. 16-Bit TSS Format
Task LDT Selector
DS Selector
SS Selector
CS Selector
ES Selector
DI
SI
BP
SP
BX
DX
CX
AX
FLAG Word
IP (Entry Point)
SS2
SP2
SS1
SP1
SS0
SP0
Previous Task Link
15
0
42
40
36
34
32
30
38
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
Vol. 3 7-23
TASK MANAGEMENT
Alt hough hardware t ask- swit ching is not support ed in 64- bit mode, a 64- bit t ask
st at e segment ( TSS) must exist . Figure 7- 11 shows t he format of a 64- bit TSS. The
TSS holds informat ion import ant t o 64- bit mode and t hat is not direct ly relat ed t o t he
t ask- swit ch mechanism. This informat ion includes:
RSPn The full 64- bit canonical forms of t he st ack point ers ( RSP) for privilege
levels 0- 2.
I STn The full 64- bit canonical forms of t he int errupt st ack t able ( I ST) point ers.
I / O map base addr ess The 16- bit offset t o t he I / O permission bit map from
t he 64- bit TSS base.
The operat ing syst em must creat e at least one 64- bit TSS aft er act ivat ing I A- 32e
mode. I t must execut e t he LTR inst ruct ion ( in 64- bit mode) t o load t he TR regist er
wit h a point er t o t he 64- bit TSS responsible for bot h 64- bit - mode programs and
compat ibilit y- mode programs.
7-24 Vol. 3
TASK MANAGEMENT
Figure 7-11. 64-Bit TSS Format
0 31
100
96
92
88
84
80
76
I/O Map Base Address
15
72
68
64
60
56
52
48
44
40
36
32
28
24
20
16
12
8
4
0
RSP0 (lower 32 bits)
RSP1 (lower 32 bits)
RSP2 (lower 32 bits)
Reserved bits. Set to 0.
RSP0 (upper 32 bits)
RSP1 (upper 32 bits)
RSP2 (upper 32 bits)
IST1 (lower 32 bits)
IST1 (upper 32 bits)
IST2 (lower 32 bits)
IST3 (lower 32 bits)
IST4 (lower 32 bits)
IST5 (lower 32 bits)
IST6 (lower 32 bits)
IST7 (lower 32 bits)
IST2 (upper 32 bits)
IST3 (upper 32 bits)
IST4 (upper 32 bits)
IST5 (upper 32 bits)
IST6 (upper 32 bits)
IST7 (upper 32 bits)
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Vol. 3 8-1
CHAPTER 8
MULTIPLE-PROCESSOR MANAGEMENT
The I nt el 64 and I A- 32 archit ect ures provide mechanisms for managing and
improving t he performance of mult iple processors connect ed t o t he same syst em
bus. These include:
Bus locking and/ or cache coherency management for performing at omic
operat ions on syst em memory.
Serializing inst ruct ions. These inst ruct ions apply only t o t he Pent ium 4, I nt el
Xeon, P6 family, and Pent ium processors.
An advance programmable int errupt cont roller ( API C) locat ed on t he processor
chip ( see Chapt er 10, Advanced Programmable I nt errupt Cont roller ( API C) ) .
This feat ure was int roduced by t he Pent ium processor.
A second- level cache ( level 2, L2) . For t he Pent ium 4, I nt el Xeon, and P6 family
processors, t he L2 cache is included in t he processor package and is t ight ly
coupled t o t he processor. For t he Pent ium and I nt el486 processors, pins are
provided t o support an ext ernal L2 cache.
A t hird- level cache ( level 3, L3) . For I nt el Xeon processors, t he L3 cache is
included in t he processor package and is t ight ly coupled t o t he processor.
I nt el Hyper-Threading Technology. This ext ension t o t he I nt el 64 and I A- 32 archi-
t ect ures enables a single processor core t o execut e t wo or more t hreads concur-
rent ly ( see Sect ion 8. 5, I nt el
Mult i-
Core Technology ) .
These mechanisms are part icularly useful in symmet ric- mult iprocessing ( SMP)
syst ems. However, t hey can also be used when an I nt el 64 or I A- 32 processor and a
special- purpose processor ( such as a communicat ions, graphics, or video processor)
share t he syst em bus.
These mult iprocessing mechanisms have t he following charact erist ics:
To maint ain syst em memory coherency When t wo or more processors are
at t empt ing simult aneously t o access t he same address in syst em memory, some
communicat ion mechanism or memory access prot ocol must be available t o
promot e dat a coherency and, in some inst ances, t o allow one processor t o
t emporarily lock a memory locat ion.
To maint ain cache consist ency When one processor accesses dat a cached on
anot her processor, it must not receive incorrect dat a. I f it modifies dat a, all ot her
processors t hat access t hat dat a must receive t he modified dat a.
To allow predict able ordering of writ es t o memory I n some circumst ances, it is
import ant t hat memory writ es be observed ext ernally in precisely t he same order
as programmed.
8-2 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
To dist ribut e int errupt handling among a group of processors When several
processors are operat ing in a syst em in parallel, it is useful t o have a cent ralized
mechanism for receiving int errupt s and dist ribut ing t hem t o available processors
for servicing.
To increase syst em performance by exploit ing t he mult i- t hreaded and mult i-
process nat ure of cont emporary operat ing syst ems and applicat ions.
The caching mechanism and cache consist ency of I nt el 64 and I A- 32 processors are
discussed in Chapt er 11. The API C archit ect ure is described in Chapt er 10. Bus and
memory locking, serializing inst ruct ions, memory ordering, and I nt el Hyper-
Threading Technology are discussed in t he following sect ions.
8.1 LOCKED ATOMIC OPERATIONS
The 32- bit I A- 32 processors support locked at omic operat ions on locat ions in syst em
memory. These operat ions are t ypically used t o manage shared dat a st ruct ures ( such
as semaphores, segment descript ors, syst em segment s, or page t ables) in which t wo
or more processors may t ry simult aneously t o modify t he same field or flag. The
processor uses t hree int erdependent mechanisms for carrying out locked at omic
operat ions:
Guarant eed at omic operat ions
Bus locking, using t he LOCK# signal and t he LOCK inst ruct ion prefix
Cache coherency prot ocols t hat ensure t hat at omic operat ions can be carried out
on cached dat a st ruct ures ( cache lock) ; t his mechanism is present in t he
Pent ium 4, I nt el Xeon, and P6 family processors
These mechanisms are int erdependent in t he following ways. Cert ain basic memory
t ransact ions ( such as reading or writ ing a byt e in syst em memory) are always guar-
ant eed t o be handled at omically. That is, once st art ed, t he processor guarant ees t hat
t he operat ion will be complet ed before anot her processor or bus agent is allowed
access t o t he memory locat ion. The processor also support s bus locking for
performing select ed memory operat ions ( such as a read- modify- writ e operat ion in a
shared area of memory) t hat t ypically need t o be handled at omically, but are not
aut omat ically handled t his way. Because frequent ly used memory locat ions are oft en
cached in a processor s L1 or L2 caches, at omic operat ions can oft en be carried out
inside a processor s caches wit hout assert ing t he bus lock. Here t he processor s
cache coherency prot ocols ensure t hat ot her processors t hat are caching t he same
memory locat ions are managed properly while at omic operat ions are performed on
cached memory locat ions.
NOTE
Where t here are cont est ed lock accesses, soft ware may need t o
implement algorit hms t hat ensure fair access t o resources in order t o
prevent lock st arvat ion. The hardware provides no resource t hat
guarant ees fairness t o part icipat ing agent s. I t is t he responsibilit y of
Vol. 3 8-3
MULTIPLE-PROCESSOR MANAGEMENT
soft ware t o manage t he fairness of semaphores and exclusive locking
funct ions.
The mechanisms for handling locked at omic operat ions have evolved wit h t he
complexit y of I A- 32 processors. More recent I A- 32 processors ( such as t he
Pent ium 4, I nt el Xeon, and P6 family processors) and I nt el 64 provide a more refined
locking mechanism t han earlier processors. These mechanisms are described in t he
following sect ions.
8.1.1 Guaranteed Atomic Operations
The I nt el486 processor ( and newer processors since) guarant ees t hat t he following
basic memory operat ions will always be carried out at omically:
Reading or writ ing a byt e
Reading or writ ing a word aligned on a 16- bit boundary
Reading or writ ing a doubleword aligned on a 32- bit boundary
The Pent ium processor ( and newer processors since) guarant ees t hat t he following
addit ional memory operat ions will always be carried out at omically:
Reading or writ ing a quadword aligned on a 64- bit boundary
16- bit accesses t o uncached memory locat ions t hat fit wit hin a 32- bit dat a bus
The P6 family processors ( and newer processors since) guarant ee t hat t he following
addit ional memory operat ion will always be carried out at omically:
Unaligned 16- , 32- , and 64- bit accesses t o cached memory t hat fit wit hin a cache
line
Accesses t o cacheable memory t hat are split across bus widt hs, cache lines, and
page boundaries are not guarant eed t o be at omic by t he I nt el Core 2 Duo, I nt el
At om, I nt el Core Duo, Pent ium M, Pent ium 4, I nt el Xeon, P6 family, Pent ium, and
I nt el486 processors. The I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium M,
Pent ium 4, I nt el Xeon, and P6 family processors provide bus cont rol signals t hat
permit ext ernal memory subsyst ems t o make split accesses at omic; however,
nonaligned dat a accesses will seriously impact t he performance of t he processor and
should be avoided.
An x87 inst ruct ion or an SSE inst ruct ions t hat accesses dat a larger t han a quadword
may be implement ed using mult iple memory accesses. I f such an inst ruct ion st ores
t o memory, some of t he accesses may complet e ( writ ing t o memory) while anot her
causes t he operat ion t o fault for archit ect ural reasons ( e. g. due an page- t able ent ry
t hat is marked not present ) . I n t his case, t he effect s of t he complet ed accesses
may be visible t o soft ware even t hough t he overall inst ruct ion caused a fault . I f TLB
invalidat ion has been delayed ( see Sect ion 4. 10. 4. 4) , such page fault s may occur
even if all accesses are t o t he same page.
8-4 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.1.2 Bus Locking
I nt el 64 and I A- 32 processors provide a LOCK# signal t hat is assert ed aut omat ically
during cert ain crit ical memory operat ions t o lock t he syst em bus or equivalent link.
While t his out put signal is assert ed, request s from ot her processors or bus agent s for
cont rol of t he bus are blocked. Soft ware can specify ot her occasions when t he LOCK
semant ics are t o be followed by prepending t he LOCK prefix t o an inst ruct ion.
I n t he case of t he I nt el386, I nt el486, and Pent ium processors, explicit ly locked
inst ruct ions will result in t he assert ion of t he LOCK# signal. I t is t he responsibilit y of
t he hardware designer t o make t he LOCK# signal available in syst em hardware t o
cont rol memory accesses among processors.
For t he P6 and more recent processor families, if t he memory area being accessed is
cached int ernally in t he processor, t he LOCK# signal is generally not assert ed;
inst ead, locking is only applied t o t he processor s caches ( see Sect ion 8. 1. 4, Effect s
of a LOCK Operat ion on I nt ernal Processor Caches ) .
8.1.2.1 Automatic Locking
The operat ions on which t he processor aut omat ically follows t he LOCK semant ics are
as follows:
When execut ing an XCHG inst ruct ion t hat references memory.
When set t i ng t he B ( busy ) f l ag of a TSS descr i pt or The processor t est s
and set s t he busy flag in t he t ype field of t he TSS descript or when swit ching t o a
t ask. To ensure t hat t wo processors do not swit ch t o t he same t ask simult a-
neously, t he processor follows t he LOCK semant ics while t est ing and set t ing t his
flag.
When updat i ng segment descr i pt or s When loading a segment descript or,
t he processor will set t he accessed flag in t he segment descript or if t he flag is
clear. During t his operat ion, t he processor follows t he LOCK semant ics so t hat t he
descript or will not be modified by anot her processor while it is being updat ed. For
t his act ion t o be effect ive, operat ing- syst em procedures t hat updat e descript ors
should use t he following st eps:
Use a locked operat ion t o modify t he access- right s byt e t o indicat e t hat t he
segment descript or is not - present , and specify a value for t he t ype field t hat
indicat es t hat t he descript or is being updat ed.
Updat e t he fields of t he segment descript or. ( This operat ion may require
several memory accesses; t herefore, locked operat ions cannot be used. )
Use a locked operat ion t o modify t he access- right s byt e t o indicat e t hat t he
segment descript or is valid and present .
The I nt el386 processor always updat es t he accessed flag in t he segment
descript or, whet her it is clear or not . The Pent ium 4, I nt el Xeon, P6 family,
Pent ium, and I nt el486 processors only updat e t his flag if it is not already set .
Vol. 3 8-5
MULTIPLE-PROCESSOR MANAGEMENT
When updat i ng page- di r ect or y and page- t abl e ent r i es When updat ing
page- direct ory and page- t able ent ries, t he processor uses locked cycles t o set
t he accessed and dirt y flag in t he page- direct ory and page- t able ent ries.
Ack now l edgi ng i nt er r upt s Aft er an int errupt request , an int errupt cont roller
may use t he dat a bus t o send t he int errupt vect or for t he int errupt t o t he
processor. The processor follows t he LOCK semant ics during t his t ime t o ensure
t hat no ot her dat a appears on t he dat a bus when t he int errupt vect or is being
t ransmit t ed.
8.1.2.2 Software Controlled Bus Locking
To explicit ly force t he LOCK semant ics, soft ware can use t he LOCK prefix wit h t he
following inst ruct ions when t hey are used t o modify a memory locat ion. An invalid-
opcode except ion ( # UD) is generat ed when t he LOCK prefix is used wit h any ot her
inst ruct ion or when no writ e operat ion is made t o memory ( t hat is, when t he dest ina-
t ion operand is in a regist er) .
The bit t est and modify inst ruct ions ( BTS, BTR, and BTC) .
The exchange inst ruct ions ( XADD, CMPXCHG, and CMPXCHG8B) .
The LOCK prefix is aut omat ically assumed for XCHG inst ruct ion.
The following single- operand arit hmet ic and logical inst ruct ions: I NC, DEC, NOT,
and NEG.
The following t wo- operand arit hmet ic and logical inst ruct ions: ADD, ADC, SUB,
SBB, AND, OR, and XOR.
A locked inst ruct ion is guarant eed t o lock only t he area of memory defined by t he
dest inat ion operand, but may be int erpret ed by t he syst em as a lock for a larger
memory area.
Soft ware should access semaphores ( shared memory used for signalling bet ween
mult iple processors) using ident ical addresses and operand lengt hs. For example, if
one processor accesses a semaphore using a word access, ot her processors should
not access t he semaphore using a byt e access.
NOTE
Do not implement semaphores using t he WC memory t ype. Do not
perform non- t emporal st ores t o a cache line cont aining a locat ion
used t o implement a semaphore.
The int egrit y of a bus lock is not affect ed by t he alignment of t he memory field. The
LOCK semant ics are followed for as many bus cycles as necessary t o updat e t he
ent ire operand. However, it is recommend t hat locked accesses be aligned on t heir
nat ural boundaries for bet t er syst em performance:
Any boundary for an 8- bit access ( locked or ot herwise) .
16- bit boundary for locked word accesses.
8-6 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
32- bit boundary for locked doubleword accesses.
64- bit boundary for locked quadword accesses.
Locked operat ions are at omic wit h respect t o all ot her memory operat ions and all
ext ernally visible event s. Only inst ruct ion fet ch and page t able accesses can pass
locked inst ruct ions. Locked inst ruct ions can be used t o synchronize dat a writ t en by
one processor and read by anot her processor.
For t he P6 family processors, locked operat ions serialize all out st anding load and
st ore operat ions ( t hat is, wait for t hem t o complet e) . This rule is also t rue for t he
Pent ium 4 and I nt el Xeon processors, wit h one except ion. Load operat ions t hat refer-
ence weakly ordered memory t ypes ( such as t he WC memory t ype) may not be seri-
alized.
Locked inst ruct ions should not be used t o ensure t hat dat a writ t en can be fet ched as
inst ruct ions.
NOTE
The locked inst ruct ions for t he current versions of t he Pent ium 4,
I nt el Xeon, P6 family, Pent ium, and I nt el486 processors allow dat a
writ t en t o be fet ched as inst ruct ions. However, I nt el recommends
t hat developers who require t he use of self- modifying code use a
different synchronizing mechanism, described in t he following
sect ions.
8.1.3 Handling Self- and Cross-Modifying Code
The act of a pr ocessor wr it ing dat a int o a cur r ent ly execut ing code segment wit h
t he int ent of execut ing t hat dat a as code is called sel f - modi f y i n g code. I A- 32
pr ocessor s exhibit model- specif ic behavior when execut ing self- modif ied code,
depending upon how f ar ahead of t he cur r ent execut ion point er t he code has been
modif ied.
As processor microarchit ect ures become more complex and st art t o speculat ively
execut e code ahead of t he ret irement point ( as in P6 and more recent processor
families) , t he rules regarding which code should execut e, pre- or post - modificat ion,
become blurred. To writ e self- modifying code and ensure t hat it is compliant wit h
current and fut ure versions of t he I A- 32 archit ect ures, use one of t he following
coding opt ions:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Vol. 3 8-7
MULTIPLE-PROCESSOR MANAGEMENT
Execute new code;
The use of one of t hese opt ions is not required for programs int ended t o run on t he
Pent ium or I nt el486 processors, but are recommended t o ensure compat ibilit y wit h
t he P6 and more recent processor families.
Self- modifying code will execut e at a lower level of performance t han non- self- modi-
fying or normal code. The degree of t he performance det eriorat ion will depend upon
t he frequency of modificat ion and specific charact erist ics of t he code.
The act of one processor writ ing dat a int o t he current ly execut ing code segment of a
second processor wit h t he int ent of having t he second processor execut e t hat dat a as
code is called cr oss- modi f y i ng code. As wit h self- modifying code, I A- 32 processors
exhibit model- specific behavior when execut ing cross- modifying code, depending
upon how far ahead of t he execut ing processors current execut ion point er t he code
has been modified.
To writ e cross- modifying code and ensure t hat it is compliant wit h current and fut ure
versions of t he I A- 32 archit ect ure, t he following processor synchronizat ion algorit hm
must be implement ed:
(* Action of Modifying Processor *)
Memory_Flag 0; (* Set Memory_Flag to value other than 1 *)
Store modified code (as data) into code segment;
Memory_Flag 1;
(* Action of Executing Processor *)
WHILE (Memory_Flag 1)
Wait for code to update;
ELIHW;
Execute serializing instruction; (* For example, CPUID instruction *)
Begin executing modified code;
( The use of t his opt ion is not required for programs int ended t o run on t he I nt el486
processor, but is recommended t o ensure compat ibilit y wit h t he Pent ium 4, I nt el
Xeon, P6 family, and Pent ium processors. )
Like self- modifying code, cross- modifying code will execut e at a lower level of perfor-
mance t han non- cross- modifying ( normal) code, depending upon t he frequency of
modificat ion and specific charact erist ics of t he code.
The rest rict ions on self- modifying code and cross- modifying code also apply t o t he
I nt el 64 archit ect ure.
8.1.4 Effects of a LOCK Operation on Internal Processor Caches
For t he I nt el486 and Pent ium processors, t he LOCK# signal is always assert ed on t he
bus during a LOCK operat ion, even if t he area of memory being locked is cached in
t he processor.
8-8 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
For t he P6 and more recent processor families, if t he area of memory being locked
during a LOCK operat ion is cached in t he processor t hat is performing t he LOCK oper-
at ion as writ e- back memory and is complet ely cont ained in a cache line, t he
processor may not assert t he LOCK# signal on t he bus. I nst ead, it will modify t he
memory locat ion int ernally and allow it s cache coherency mechanism t o ensure t hat
t he operat ion is carried out at omically. This operat ion is called cache locking. The
cache coherency mechanism aut omat ically prevent s t wo or more processors t hat
have cached t he same area of memory from simult aneously modifying dat a in t hat
area.
8.2 MEMORY ORDERING
The t erm memor y or der i ng refers t o t he order in which t he processor issues reads
( loads) and writ es ( st ores) t hrough t he syst em bus t o syst em memory. The I nt el 64
and I A- 32 archit ect ures support several memory- ordering models depending on t he
implement at ion of t he archit ect ure. For example, t he I nt el386 processor enforces
pr ogr am or der i ng ( generally referred t o as st r ong or der i ng) , where reads and
writ es are issued on t he syst em bus in t he order t hey occur in t he inst ruct ion st ream
under all circumst ances.
To allow performance opt imizat ion of inst ruct ion execut ion, t he I A- 32 archit ect ure
allows depart ures from st rong- ordering model called pr ocessor or der i ng in
Pent ium 4, I nt el Xeon, and P6 family processors. These pr ocessor - or der i ng varia-
t ions ( called here t he memor y - or der i ng model ) allow performance enhancing
operat ions such as allowing reads t o go ahead of buffered writ es. The goal of any of
t hese variat ions is t o increase inst ruct ion execut ion speeds, while maint aining
memory coherency, even in mult iple- processor syst ems.
Sect ion 8.2. 1 and Sect ion 8.2. 2 describe t he memory- ordering implement ed by
I nt el486, Pent ium, I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium 4, I nt el
Xeon, and P6 family processors. Sect ion 8. 2.3 gives examples illust rat ing t he
behavior of t he memory- ordering model on I A- 32 and I nt el- 64 processors. Sect ion
8.2. 4 considers t he special t reat ment of st ores for st ring operat ions and Sect ion
8.2. 5 discusses how memory- ordering behavior may be modified t hrough t he use of
specific inst ruct ions.
8.2.1 Memory Ordering in the Intel
Pentium
and Intel486
Processors
The Pent ium and I nt el486 processors follow t he processor- ordered memory model;
however, t hey operat e as st rongly- ordered processors under most circumst ances.
Reads and writ es always appear in programmed order at t he syst em busexcept for
t he following sit uat ion where processor ordering is exhibit ed. Read misses are
permit t ed t o go ahead of buffered writ es on t he syst em bus when all t he buffered
writ es are cache hit s and, t herefore, are not direct ed t o t he same address being
accessed by t he read miss.
Vol. 3 8-9
MULTIPLE-PROCESSOR MANAGEMENT
I n t he case of I / O operat ions, bot h reads and writ es always appear in programmed
order.
Soft ware int ended t o operat e correct ly in processor- ordered processors ( such as t he
Pent ium 4, I nt el Xeon, and P6 family processors) should not depend on t he relat ively
st rong ordering of t he Pent ium or I nt el486 processors. I nst ead, it should ensure
t hat accesses t o shared var iables t hat are int ended t o cont rol concurrent execut ion
among processors are explicit ly required t o obey pr ogram or der ing t hr ough t he use
of appropriat e locking or serializing operat ions ( see Sect ion 8. 2. 5, St rengt hening or
Weakening t he Memory- Ordering Model ) .
8.2.2 Memory Ordering in P6 and More Recent Processor Families
The I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium 4, and P6 family proces-
sors also use a processor- ordered memory- ordering model t hat can be furt her
defined as writ e ordered wit h st ore- buffer forwarding. This model can be charact er-
ized as follows.
I n a single- processor syst em for memory regions defined as writ e- back cacheable,
t he memory- ordering model respect s t he following principles ( Not e t he memory-
ordering principles for single- processor and mult iple- processor syst ems are writ t en
from t he perspect ive of soft ware execut ing on t he processor, where t he t erm
processor refers t o a logical processor. For example, a physical processor
support ing mult iple cores and/ or HyperThreading Technology is t reat ed as a mult i-
processor syst ems. ) :
Reads are not reordered wit h ot her reads.
Writ es are not reordered wit h older reads.
Writ es t o memory are not reordered wit h ot her writ es, wit h t he following
except ions:
writ es execut ed wit h t he CLFLUSH inst ruct ion;
st reaming st ores ( writ es) execut ed wit h t he non- t emporal move inst ruct ions
( MOVNTI , MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) ; and
st ring operat ions ( see Sect ion 8.2. 4. 1) .
Reads may be reordered wit h older writ es t o different locat ions but not wit h older
writ es t o t he same locat ion.
Reads or writ es cannot be reordered wit h I / O inst ruct ions, locked inst ruct ions, or
serializing inst ruct ions.
Reads cannot pass earlier LFENCE and MFENCE inst ruct ions.
Writ es cannot pass earlier LFENCE, SFENCE, and MFENCE inst ruct ions.
LFENCE inst ruct ions cannot pass earlier reads.
SFENCE inst ruct ions cannot pass earlier writ es.
MFENCE inst ruct ions cannot pass earlier reads or writ es.
8-10 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
I n a mult iple- processor syst em, t he following ordering principles apply:
I ndividual processors use t he same ordering principles as in a single- processor
syst em.
Writ es by a single processor are observed in t he same order by all processors.
Writ es from an individual processor are NOT ordered wit h respect t o t he writ es
from ot her processors.
Memory ordering obeys causalit y ( memory ordering respect s t ransit ive
visibilit y) .
Any t wo st ores are seen in a consist ent order by processors ot her t han t hose
performing t he st ores
Locked inst ruct ions have a t ot al order.
See t he example in Figure 8- 1. Consider t hree processors in a syst em and each
processor performs t hree writ es, one t o each of t hree defined locat ions ( A, B, and C) .
I ndividually, t he processors perform t he writ es in t he same program order, but
because of bus arbit rat ion and ot her memory access mechanisms, t he order t hat t he
t hree processors writ e t he individual memory locat ions can differ each t ime t he
respect ive code sequences are execut ed on t he processors. The final values in loca-
t ion A, B, and C would possibly vary on each execut ion of t he writ e sequence.
The processor- ordering model described in t his sect ion is virt ually ident ical t o t hat
used by t he Pent ium and I nt el486 processors. The only enhancement s in t he Pent ium
4, I nt el Xeon, and P6 family processors are:
Added support for speculat ive reads, while st ill adhering t o t he ordering
principles above.
St ore- buffer forwarding, when a read passes a writ e t o t he same memory
locat ion.
Out of order st ore from long st ring st ore and st ring move operat ions ( see Sect ion
8.2.4, Out - of- Order St ores For St ring Operat ions, below) .
Vol. 3 8-11
MULTIPLE-PROCESSOR MANAGEMENT
NOTE
I n P6 processor family, st ore- buffer forwarding t o reads of WC memory from
st reaming st ores t o t he same address does not occur due t o errat a.
8.2.3 Examples Illustrating the Memory-Ordering Principles
This sect ion provides a set of examples t hat illust rat e t he behavior of t he memory-
ordering principles int roduced in Sect ion 8. 2.2. They are designed t o give soft ware
writ ers an underst anding of how memory ordering may affect t he result s of different
sequences of inst ruct ions.
These examples are limit ed t o accesses t o memory regions defined as writ e- back
cacheable ( WB) . ( Sect ion 8. 2.3. 1 describes ot her limit at ions on t he generalit y of t he
examples. ) The reader should underst and t hat t hey describe only soft ware- visible
behavior. A logical processor may reorder t wo accesses even if one of examples indi-
cat es t hat t hey may not be reordered. Such an example st at es only t hat soft ware
cannot det ect t hat such a reordering occurred. Similarly, a logical processor may
execut e a memory access more t han once as long as t he behavior visible t o soft ware
is consist ent wit h a single execut ion of t he memory access.
Figure 8-1. Example of Write Ordering in Multiple-Processor Systems
Processor #1 Processor #2 Processor #3
Write A.3
Write B.3
Write C.3
Write A.1
Write B.1
Write A.2
Write A.3
Write C.1
Write B.2
Write C.2
Write B.3
Write C.3
Order of Writes From Individual Processors
Write A.2
Write B.2
Write C.2
Write A.1
Write B.1
Write C.1
Writes from all
processors are
not guaranteed
to occur in a
particular order.
Each processor
is guaranteed to
perform writes in
program order.
Writes are in order
with respect to
individual processes.
Example of order of actual writes
from all processors to memory
8-12 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.2.3.1 Assumptions, Terminology, and Notation
As not ed above, t he examples in t his sect ion are limit ed t o accesses t o memory
regions defined as writ e- back cacheable ( WB) . They apply only t o ordinary loads
st ores and t o locked read- modify- writ e inst ruct ions. They do not necessarily apply t o
any of t he following: out - of- order st ores for st ring inst ruct ions ( see Sect ion 8. 2. 4) ;
accesses wit h a non- t emporal hint ; reads from memory by t he processor as part of
address t ranslat ion ( e. g., page walks) ; and updat es t o segment at ion and paging
st ruct ures by t he processor ( e. g., t o updat e accessed bit s) .
The principles underlying t he examples in t his sect ion apply t o individual memory
accesses and t o locked read- modify- writ e inst ruct ions. The I nt el- 64 memory-
ordering model guarant ees t hat , for each of t he following memory- access inst ruc-
t ions, t he const it uent memory operat ion appears t o execut e as a single memory
access:
I nst ruct ions t hat read or writ e a single byt e.
I nst ruct ions t hat read or writ e a word ( 2 byt es) whose address is aligned on a 2
byt e boundary.
I nst ruct ions t hat read or writ e a doubleword ( 4 byt es) whose address is aligned
on a 4 byt e boundary.
I nst ruct ions t hat read or writ e a quadword ( 8 byt es) whose address is aligned on
an 8 byt e boundary.
Any locked inst ruct ion ( eit her t he XCHG inst ruct ion or anot her read- modify- writ e
inst ruct ion wit h a LOCK prefix) appears t o execut e as an indivisible and unint errupt -
ible sequence of load( s) followed by st ore( s) regardless of alignment .
Ot her inst ruct ions may be implement ed wit h mult iple memory accesses. From a
memory- ordering point of view, t here are no guarant ees regarding t he relat ive order
in which t he const it uent memory accesses are made. There is also no guarant ee t hat
t he const it uent operat ions of a st ore are execut ed in t he same order as t he const it -
uent operat ions of a load.
Sect ion 8.2. 3. 2 t hrough Sect ion 8. 2. 3.7 give examples using t he MOV inst ruct ion.
The principles t hat underlie t hese examples apply t o load and st ore accesses in
general and t o ot her inst ruct ions t hat load from or st ore t o memory. Sect ion 8. 2. 3. 8
and Sect ion 8. 2. 3. 9 give examples using t he XCHG inst ruct ion. The principles t hat
underlie t hese examples apply t o ot her locked read- modify- writ e inst ruct ions.
This sect ion uses t he t erm processor is t o refer t o a logical processor. The examples
are writ t en using I nt el- 64 assembly- language synt ax and use t he following not a-
t ional convent ions:
Argument s beginning wit h an r , such as r1 or r2 refer t o regist ers ( e. g., EAX)
visible only t o t he processor being considered.
Memory locat ions are denot ed wit h x, y, z.
St ores are writ t en as mov [ _x] , val, which implies t hat val is being st ored int o
t he memory locat ion x.
Vol. 3 8-13
MULTIPLE-PROCESSOR MANAGEMENT
Loads are writ t en as mov r, [ _x] , which implies t hat t he cont ent s of t he memory
locat ion x are being loaded int o t he regist er r.
As not ed earlier, t he examples refer only t o soft ware visible behavior. When t he
succeeding sect ions make st at ement such as t he t wo st ores are reordered, t he
implicat ion is only t hat t he t wo st ores appear t o be reordered from t he point of view
of soft ware.
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
The I nt el- 64 memory- ordering model allows neit her loads nor st ores t o be reordered
wit h t he same kind of operat ion. That is, it ensures t hat loads are seen in program
order and t hat st ores are seen in program order. This is illust rat ed by t he following
example:
The disallowed ret urn values could be exhibit ed only if processor 0s t wo st ores are
reordered ( wit h t he t wo loads occurring bet ween t hem) or if processor 1s t wo loads
are reordered ( wit h t he t wo st ores occurring bet ween t hem) .
I f r1 = = 1, t he st ore t o y occurs before t he load from y. Because t he I nt el- 64
memory- ordering model does not allow st ores t o be reordered, t he earlier st ore t o x
occurs before t he load from y. Because t he I nt el- 64 memory- ordering model does
not allow loads t o be reordered, t he st ore t o x also occurs before t he lat er load from
x. This r2 = = 1.
8.2.3.3 Stores Are Not Reordered With Earlier Loads
The I nt el- 64 memory- ordering model ensures t hat a st ore by a processor may not
occur before a previous load by t he same processor. This is illust rat ed by t he
following example:
Example 8-1. Stores Are Not Reordered with Other Stores
Processor 0 Processor 1
mov [ _x], 1 mov r1, [ _y]
mov [ _y], 1 mov r2, [ _x]
Initially x == y == 0
r1 == 1 and r2 == 0 is not allowed
Example 8-2. Stores Are Not Reordered with Older Loads
Processor 0 Processor 1
mov r1, [ _x] mov r2, [ _y]
mov [ _y], 1 mov [ _x], 1
Initially x == y == 0
r1 == 1 and r2 == 1 is not allowed
8-14 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Assume r1 = = 1.
Because r1 = = 1, processor 1s st ore t o x occurs before processor 0s load from
x.
Because t he I nt el- 64 memory- ordering model prevent s each st ore from being
reordered wit h t he earlier load by t he same processor, processor 1s load from y
occurs before it s st ore t o x.
Similarly, processor 0s load from x occurs before it s st ore t o y.
Thus, processor 1s load from y occurs before processor 0s st ore t o y, implying
r2 = = 0.
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different
Locations
The I nt el- 64 memory- ordering model allows a load t o be reordered wit h an earlier
st ore t o a different locat ion. However, loads are not reordered wit h st ores t o t he
same locat ion.
The fact t hat a load may be reordered wit h an earlier st ore t o a different locat ion is
illust rat ed by t he following example:
At each processor, t he load and t he st ore are t o different locat ions and hence may be
reordered. Any int erleaving of t he operat ions is t hus allowed. One such int erleaving
has t he t wo loads occurring before t he t wo st ores. This would result in each load
ret urning value 0.
The fact t hat a load may not be reordered wit h an earlier st ore t o t he same locat ion
is illust rat ed by t he following example:
Example 8-3. Loads May be Reordered with Older Stores
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _y] mov r2, [ _x]
Initially x == y == 0
r1 == 0 and r2 == 0 is allowed
Example 8-4. Loads Are not Reordered with Older Stores to the Same Location
Processor 0
mov [ _x], 1
mov r1, [ _x]
Initially x == 0
r1 == 0 is not allowed
Vol. 3 8-15
MULTIPLE-PROCESSOR MANAGEMENT
The I nt el- 64 memory- ordering model does not allow t he load t o be reordered wit h
t he earlier st ore because t he accesses are t o t he same locat ion. Therefore, r1 = = 1
must hold.
8.2.3.5 Intra-Processor Forwarding Is Allowed
The memory- ordering model allows concurrent st ores by t wo processors t o be seen
in different orders by t hose t wo processors; specifically, each processor may perceive
it s own st ore occurring before t hat of t he ot her. This is illust rat ed by t he following
example:
The memory- ordering model imposes no const raint s on t he order in which t he t wo
st ores appear t o execut e by t he t wo processors. This fact allows processor 0 t o see
it s st ore before seeing processor 1' s, while processor 1 sees it s st ore before seeing
processor 0' s. ( Each processor is self consist ent . ) This allows r2 = = 0 and r4 = = 0.
I n pract ice, t he reordering in t his example can arise as a result of st ore- buffer
forwarding. While a st ore is t emporarily held in a processor' s st ore buffer, it can
sat isfy t he processor' s own loads but is not visible t o ( and cannot sat isfy) loads by
ot her processors.
8.2.3.6 Stores Are Transitively Visible
The memory- ordering model ensures t ransit ive visibilit y of st ores; st ores t hat are
causally relat ed appear t o all processors t o occur in an order consist ent wit h t he
causalit y relat ion. This is illust rat ed by t he following example:
Assume t hat r1 = = 1 and r2 = = 1.
Example 8-5. Intra-Processor Forwarding is Allowed
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _x] mov r3, [ _y]
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0
r2 == 0 and r4 == 0 is allowed
Example 8-6. Stores Are Transitively Visible
Processor 0 Processor 1 Processor 2
mov [ _x], 1 mov r1, [ _x]
mov [ _y], 1 mov r2, [ _y]
mov r3, [_x]
Initially x == y == 0
r1 == 1, r2 == 1, r3 == 0 is not allowed
8-16 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Because r1 = = 1, processor 0s st ore occurs before processor 1s load.
Because t he memory- ordering model prevent s a st ore from being reordered wit h
an earlier load ( see Sect ion 8. 2. 3. 3) , processor 1s load occurs before it s st ore.
Thus, processor 0s st ore causally precedes processor 1s st ore.
Because processor 0s st ore causally precedes processor 1s st ore, t he memory-
ordering model ensures t hat processor 0s st ore appears t o occur before
processor 1s st ore from t he point of view of all processors.
Because r2 = = 1, processor 1s st ore occurs before processor 2s load.
Because t he I nt el- 64 memory- ordering model prevent s loads from being
reordered ( see Sect ion 8. 2.3. 2) , processor 2s load occur in order.
The above it ems imply t hat processor 0s st ore t o x occurs before processor 2s
load from x. This implies t hat r3 = = 1.
8.2.3.7 Stores Are Seen in a Consistent Order by Other Processors
As not ed in Sect ion 8. 2.3. 5, t he memory- ordering model allows st ores by t wo
processors t o be seen in different orders by t hose t wo processors. However, any t wo
st ores must appear t o execut e in t he same order t o all processors ot her t han t hose
performing t he st ores. This is illust rat ed by t he following example:
By t he principles discussed in Sect ion 8. 2. 3. 2,
processor 2s first and second load cannot be reordered,
processor 3s first and second load cannot be reordered.
I f r1 = = 1 and r2 = = 0, processor 0s st ore appears t o precede processor 1s
st ore wit h respect t o processor 2.
Similarly, r3 = = 1 and r4 = = 0 imply t hat processor 1s st ore appears t o precede
processor 0s st ore wit h respect t o processor 1.
Because t he memory- ordering model ensures t hat any t wo st ores appear t o execut e
in t he same order t o all processors ( ot her t han t hose performing t he st ores) , t his set
of ret urn values is not allowed
Example 8-7. Stores Are Seen in a Consistent Order by Other Processors
Processor 0 Processor 1 Processor 2 Processor 3
mov [ _x], 1 mov [ _y], 1 mov r1, [ _x] mov r3, [_y]
mov r2, [ _y] mov r4, [_x]
Initially x == y ==0
r1 == 1, r2 == 0, r3 == 1, r4 == 0is not allowed
Vol. 3 8-17
MULTIPLE-PROCESSOR MANAGEMENT
8.2.3.8 Locked Instructions Have a Total Order
The memory- ordering model ensures t hat all processors agree on a single execut ion
order of all locked inst ruct ions, including t hose t hat are larger t han 8 byt es or are not
nat urally aligned. This is illust rat ed by t he following example:
Processor 2 and processor 3 must agree on t he order of t he t wo execut ions of XCHG.
Wit hout loss of generalit y, suppose t hat processor 0s XCHG occurs first .
I f r5 = = 1, processor 1s XCHG int o y occurs before processor 3s load from y.
Because t he I nt el- 64 memory- ordering model prevent s loads from being
reordered ( see Sect ion 8. 2. 3. 2) , processor 3s loads occur in order and,
t herefore, processor 1s XCHG occurs before processor 3s load from x.
Since processor 0s XCHG int o x occurs before processor 1s XCHG ( by
assumpt ion) , it occurs before processor 3s load from x. Thus, r6 = = 1.
A similar argument ( referring inst ead t o processor 2s loads) applies if processor 1s
XCHG occurs before processor 0s XCHG.
8.2.3.9 Loads and Stores Are Not Reordered with Locked Instructions
The memory- ordering model prevent s loads and st ores from being reordered wit h
locked inst ruct ions t hat execut e earlier or lat er. The examples in t his sect ion illust rat e
only cases in which a locked inst ruct ion is execut ed before a load or a st ore. The
reader should not e t hat reordering is prevent ed also if t he locked inst ruct ion is
execut ed aft er a load or a st ore.
The first example illust rat es t hat loads may not be reordered wit h earlier locked
inst ruct ions:
Example 8-8. Locked Instructions Have a Total Order
Processor 0 Processor 1 Processor 2 Processor 3
xchg [ _x], r1 xchg [ _y], r2
mov r3, [ _x] mov r5, [_y]
mov r4, [ _y] mov r6, [_x]
Initially r1 == r2 == 1, x == y == 0
r3 == 1, r4 == 0, r5 == 1, r6 == 0 is not allowed
Example 8-9. Loads Are not Reordered with Locks
Processor 0 Processor 1
xchg [ _x], r1 xchg [ _y], r3
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0, r1 == r3 == 1
r2 == 0 and r4 == 0 is not allowed
8-18 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
As explained in Sect ion 8. 2. 3. 8, t here is a t ot al order of t he execut ions of locked
inst ruct ions. Wit hout loss of generalit y, suppose t hat processor 0s XCHG occurs first .
Because t he I nt el- 64 memory- ordering model prevent s processor 1s load from
being reordered wit h it s earlier XCHG, processor 0s XCHG occurs before
processor 1s load. This implies r4 = = 1.
A similar argument ( referring inst ead t o processor 2s accesses) applies if
processor 1s XCHG occurs before processor 0s XCHG.
The second example illust rat es t hat a st ore may not be reordered wit h an earlier
locked inst ruct ion:
Assume r2 = = 1.
Because r2 = = 1, processor 0s st ore t o y occurs before processor 1s load from
y.
Because t he memory- ordering model prevent s a st ore from being reordered wit h
an earlier locked inst ruct ion, processor 0s XCHG int o x occurs before it s st ore t o
y. Thus, processor 0s XCHG int o x occurs before processor 1s load from y.
Because t he memory- ordering model prevent s loads from being reordered ( see
Sect ion 8.2. 3. 2) , processor 1s loads occur in order and, t herefore, processor 1s
XCHG int o x occurs before processor 1s load from x. Thus, r3 = = 1.
8.2.4 Out-of-Order Stores For String Operations
The I nt el Core 2 Duo, I nt el Core, Pent ium 4, and P6 family processors modify t he
processors operat ion during t he st ring st ore operat ions ( init iat ed wit h t he MOVS and
STOS inst ruct ions) t o maximize performance. Once t he fast st ring operat ions init ial
condit ions are met ( as described below) , t he processor will essent ially operat e on,
from an ext ernal perspect ive, t he st ring in a cache line by cache line mode. This
result s in t he processor looping on issuing a cache- line read for t he source address
and an invalidat ion on t he ext ernal bus for t he dest inat ion address, knowing t hat all
byt es in t he dest inat ion cache line will be modified, for t he lengt h of t he st ring. I n t his
mode int errupt s will only be accept ed by t he processor on cache line boundaries. I t is
possible in t his mode t hat t he dest inat ion line invalidat ions, and t herefore st ores, will
be issued on t he ext ernal bus out of order.
Code dependent upon sequent ial st ore ordering should not use t he st ring operat ions
for t he ent ire dat a st ruct ure t o be st ored. Dat a and semaphores should be separat ed.
Order dependent code should use a discret e semaphore uniquely st ored t o aft er any
st ring operat ions t o allow correct ly ordered dat a t o be seen by all processors.
Example 8-10. Stores Are not Reordered with Locks
Processor 0 Processor 1
xchg [ _x], r1 mov r2, [ _y]
mov [ _y], 1 mov r3, [ _x]
Initially x == y == 0, r1 == 1
r2 == 1 and r3 == 0 is not allowed
Vol. 3 8-19
MULTIPLE-PROCESSOR MANAGEMENT
Fast st ring operat ion can be disabled by clearing t he fast - st ring- enable bit ( bit 0) of
I A32_MI SC_ENABLES MSR.
I nit ial condit ions for fast st ring operat ions are implement at ion specific. Example
condit ions include:
EDI and ESI must be 8- byt e aligned for t he Pent ium III processor. EDI must be 8-
byt e aligned for t he Pent ium 4 processor.
St ring operat ion must be performed in ascending address order.
The init ial operat ion count er ( ECX) must be equal t o or great er t han 64.
Source and dest inat ion must not overlap by less t han a cache line ( 64 byt es, for
I nt el Core 2 Duo, I nt el Core, Pent ium M, and Pent ium 4 processors; 32 byt es P6
family and Pent ium processors) .
The memory t ype for bot h source and dest inat ion addresses must be eit her WB
or WC.
NOTE
I nit ial condit ions for fast st ring operat ion in fut ure I nt el 64 or I A- 32 processor fami-
lies may differ from above.
8.2.4.1 Memory-Ordering Model for String Operations on Write-back (WB)
Memory
This sect ion deals wit h t he memory- ordering model for st ring operat ions on writ e-
back ( WB) memory for t he I nt el 64 archit ect ure.
The memory- ordering model respect s t he follow principles:
1. St ores wit hin a single st ring operat ion may be execut ed out of order.
2. St ores from separat e st ring operat ions ( for example, st ores from consecut ive
st ring operat ions) do not execut e out of order. All t he st ores from an earlier st ring
operat ion will complet e before any st ore from a lat er st ring operat ion.
3. St ring operat ions are not reordered wit h ot her st ore operat ions.
Fast st ring operat ions ( e. g. st ring operat ions init iat ed wit h t he MOVS/ STOS inst ruc-
t ions and t he REP prefix) may be int errupt ed by except ions or int errupt s. The int er-
rupt s are precise but may be delayed - for example, t he int errupt ions may be t aken
at cache line boundaries, aft er every few it erat ions of t he loop, or aft er operat ing on
every few byt es. Different implement at ions may choose different opt ions, or may
even choose not t o delay int errupt handling, so soft ware should not rely on t he delay.
When t he int errupt / t rap handler is reached, t he source/ dest inat ion regist ers point t o
t he next st ring element t o be operat ed on, while t he EI P st ored in t he st ack point s t o
t he st ring inst ruct ion, and t he ECX regist er has t he value it held following t he last
successful it erat ion. The ret urn from t hat t rap/ int errupt handler should cause t he
st ring inst ruct ion t o be resumed from t he point where it was int errupt ed.
8-20 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
The st ring operat ion memory- ordering principles, ( it em 2 and 3 above) should be
int erpret ed by t aking t he incorrupt ibilit y of fast st ring operat ions int o account . For
example, if a fast st ring operat ion get s int errupt ed aft er k it erat ions, t hen st ores
performed by t he int errupt handler will become visible aft er t he fast st ring st ores
from it erat ion 0 t o k, and before t he fast st ring st ores from t he ( k+ 1) t h it erat ion
onward.
St ores wit hin a single st ring operat ion may execut e out of order ( it em 1 above) only
if fast st ring operat ion is enabled. Fast st ring operat ions are enabled/ disabled
t hrough t he I A32_MI SC_ENABLE model specific regist er.
8.2.4.2 Examples Illustrating Memory-Ordering Principles for String
Operations
The following examples uses t he same not at ion and convent ion as described in
Sect ion 8.2. 3. 1.
I n Example 8- 11, processor 0 does one round of ( 128 it erat ions) doubleword st ring
st ore operat ion via rep: st osd, writ ing t he value 1 ( value in EAX) int o a block of 512
byt es from locat ion _x ( kept in ES: EDI ) in ascending order. Since each operat ion
st ores a doubleword ( 4 byt es) , t he operat ion is repeat ed 128 t imes ( value in ECX) .
The block of memory init ially cont ained 0. Processor 1 is reading t wo memory loca-
t ions t hat are part of t he memory block being updat ed by processor 0, i. e, reading
locat ions in t he range _x t o ( _x+ 511) .
I t is possible for processor 1 t o perceive t hat t he repeat ed st ring st ores in processor
0 are happening out of order. Assume t hat fast st ring operat ions are enabled on
processor 0.
I n Example 8- 12, processor 0 does t wo separat e rounds of rep st osd operat ion of 128
doubleword st ores, writ ing t he value 1 ( value in EAX) int o t he first block of 512 byt es
from locat ion _x ( kept in ES: EDI ) in ascending order. I t t hen writ es 1 int o a second
block of memory from ( _x+ 512) t o ( _x+ 1023) . All of t he memory locat ions init ially
cont ain 0. The block of memory init ially cont ained 0. Processor 1 performs t wo load
operat ions from t he t wo blocks of memory.
Example 8-11. Stores Within a String Operation May be Reordered
Processor 0 Processor 1
rep:stosd [ _x] mov r1, [ _z]
mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_x] to 511[_x]== 0, _x <= _y < _z < _x+512
r1 == 1 and r2 == 0 is allowed
Vol. 3 8-21
MULTIPLE-PROCESSOR MANAGEMENT
I t is not possible in t he above example for processor 1 t o perceive any of t he st ores
from t he lat er st ring operat ion ( t o t he second 512 block) in processor 0 before seeing
t he st ores from t he earlier st ring operat ion t o t he first 512 block.
The above example assumes t hat writ es t o t he second block ( _x+ 512 t o _x+ 1023)
does not get execut ed while processor 0s st ring operat ion t o t he first block has been
int errupt ed. I f t he st ring operat ion t o t he first block by processor 0 is int errupt ed,
and a writ e t o t he second memory block is execut ed by t he int errupt handler, t hen
t hat change in t he second memory block will be visible before t he st ring operat ion t o
t he first memory block resumes.
I n Example 8- 13, processor 0 does one round of ( 128 it erat ions) doubleword st ring
st ore operat ion via rep: st osd, writ ing t he value 1 ( value in EAX) int o a block of 512
byt es from locat ion _x ( kept in ES: EDI ) in ascending order. I t t hen writ es t o a second
memory locat ion out side t he memory block of t he previous st ring operat ion.
Processor 1 performs t wo read operat ions, t he first read is from an address out side
t he 512- byt e block but t o be updat ed by processor 0, t he second ready is from inside
t he block of memory of st ring operat ion.
Processor 1 cannot perceive t he lat er st ore by processor 0 unt il it sees all t he st ores
from t he st ring operat ion. Example 8- 13 assumes t hat processor 0s st ore t o [ _z] is
Example 8-12. Stores Across String Operations Are not Reordered
Processor 0 Processor 1
rep:stosd [ _x]
mov r1, [ _z]
mov ecx, $128
mov r2, [ _y]
rep:stosd 512[ _x]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_x] to 1023[_x]== 0, _x <= _y < _x+512 < _z < _x+1024
r1 == 1 and r2 == 0 is not allowed
Example 8-13. String Operations Are not Reordered with later Stores
Processor 0 Processor 1
rep:stosd [ _x] mov r1, [ _z]
mov [_z], $1 mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is not allowed
8-22 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
not execut ed while t he st ring operat ion has been int errupt ed. I f t he st ring operat ion
is int errupt ed and t he st ore t o [ _z] by processor 0 is execut ed by t he int errupt
handler, t hen changes t o [ _z] will become visible before t he st ring operat ion
resumes.
Example 8- 14 illust rat es t he visibilit y principle when a st ring operat ion is int errupt ed.
I n Example 8- 14, processor 0 st art ed a st ring operat ion t o writ e t o a memory block
of 512 byt es st art ing at address _x. Processor 0 got int errupt ed aft er k it erat ions of
st ore operat ions. The address _y has not yet been updat ed by processor 0 when
processor 0 got int errupt ed. The int errupt handler t hat t ook cont rol on processor 0
writ es t o t he address _z. Processor 1 may see t he st ore t o _z from t he int errupt
handler, before seeing t he remaining st ores t o t he 512- byt e memory block t hat are
execut ed when t he st ring operat ion resumes.
Example 8- 15 illust rat es t he ordering of st ring operat ions wit h earlier st ores. No
st ore from a st ring operat ion can be visible before all prior st ores are visible.
Example 8-14. Interrupted String Operation
Processor 0 Processor 1
rep:stosd [ _x] // interrupted before es:edi reach
_y
mov r1, [ _z]
mov [_z], $1 // interrupt handler mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is allowed
Example 8-15. String Operations Are not Reordered with Earlier Stores
Processor 0 Processor 1
mov [_z], $1 mov r1, [ _y]
rep:stosd [ _x] mov r2, [ _z]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is not allowed
Vol. 3 8-23
MULTIPLE-PROCESSOR MANAGEMENT
8.2.5 Strengthening or Weakening the Memory-Ordering Model
The I nt el 64 and I A- 32 archit ect ures provide several mechanisms for st rengt hening
or weakening t he memory- ordering model t o handle special programming sit uat ions.
These mechanisms include:
The I / O inst ruct ions, locking inst ruct ions, t he LOCK prefix, and serializing
inst ruct ions force st ronger ordering on t he processor.
The SFENCE inst ruct ion ( int roduced t o t he I A- 32 archit ect ure in t he Pent ium III
processor) and t he LFENCE and MFENCE inst ruct ions ( int roduced in t he Pent ium
4 processor) provide memory- ordering and serializat ion capabilit ies for specific
t ypes of memory operat ions.
The memory t ype range regist ers ( MTRRs) can be used t o st rengt hen or weaken
memory ordering for specific area of physical memory ( see Sect ion 11. 11,
Memory Type Range Regist ers ( MTRRs) ) . MTRRs are available only in t he
Pent ium 4, I nt el Xeon, and P6 family processors.
The page at t ribut e t able ( PAT) can be used t o st rengt hen memory ordering for a
specific page or group of pages ( see Sect ion 11. 12, Page At t ribut e Table ( PAT) ) .
The PAT is available only in t he Pent ium 4, I nt el Xeon, and Pent ium III processors.
These mechanisms can be used as follows:
Memory mapped devices and ot her I / O devices on t he bus are oft en sensit ive t o t he
order of writ es t o t heir I / O buffers. I / O inst ruct ions can be used t o ( t he I N and OUT
inst ruct ions) impose st rong writ e ordering on such accesses as follows. Prior t o
execut ing an I / O inst ruct ion, t he processor wait s for all previous inst ruct ions in t he
program t o complet e and for all buffered writ es t o drain t o memory. Only inst ruct ion
fet ch and page t ables walks can pass I / O inst ruct ions. Execut ion of subsequent
inst ruct ions do not begin unt il t he processor det ermines t hat t he I / O inst ruct ion has
been complet ed.
Synchronizat ion mechanisms in mult iple- processor syst ems may depend upon a
st rong memory- ordering model. Here, a program can use a locking inst ruct ion such
as t he XCHG inst ruct ion or t he LOCK prefix t o ensure t hat a read- modify- writ e oper-
at ion on memory is carried out at omically. Locking operat ions t ypically operat e like
I / O operat ions in t hat t hey wait for all previous inst ruct ions t o complet e and for all
buffered writ es t o drain t o memory ( see Sect ion 8.1.2, Bus Locking ) .
Program synchronizat ion can also be carried out wit h serializing inst ruct ions ( see
Sect ion 8. 3) . These inst ruct ions are t ypically used at crit ical procedure or t ask
boundaries t o force complet ion of all previous inst ruct ions before a j ump t o a new
sect ion of code or a cont ext swit ch occurs. Like t he I / O and locking inst ruct ions, t he
processor wait s unt il all previous inst ruct ions have been complet ed and all buffered
writ es have been drained t o memory before execut ing t he serializing inst ruct ion.
The SFENCE, LFENCE, and MFENCE inst ruct ions provide a performance- efficient way
of ensuring load and st ore memory ordering bet ween rout ines t hat produce weakly-
ordered result s and rout ines t hat consume t hat dat a. The funct ions of t hese inst ruc-
t ions are as follows:
8-24 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
SFENCE Serializes all st ore ( writ e) operat ions t hat occurred prior t o t he
SFENCE inst ruct ion in t he program inst ruct ion st ream, but does not affect load
operat ions.
LFENCE Serializes all load ( read) operat ions t hat occurred prior t o t he LFENCE
inst ruct ion in t he program inst ruct ion st ream, but does not affect st ore
operat ions.
1
MFENCE Serializes all st ore and load operat ions t hat occurred prior t o t he
MFENCE inst ruct ion in t he program inst ruct ion st ream.
Not e t hat t he SFENCE, LFENCE, and MFENCE inst ruct ions provide a more efficient
met hod of cont rolling memory ordering t han t he CPUI D inst ruct ion.
The MTRRs were int roduced in t he P6 family processors t o define t he cache charac-
t erist ics for specified areas of physical memory. The following are t wo examples of
how memory t ypes set up wit h MTRRs can be used st rengt hen or weaken memory
ordering for t he Pent ium 4, I nt el Xeon, and P6 family processors:
The st rong uncached ( UC) memory t ype forces a st rong- ordering model on
memory accesses. Here, all reads and writ es t o t he UC memory region appear on
t he bus and out - of- order or speculat ive accesses are not performed. This
memory t ype can be applied t o an address range dedicat ed t o memory mapped
I / O devices t o force st rong memory ordering.
For areas of memory where weak ordering is accept able, t he writ e back ( WB)
memory t ype can be chosen. Here, reads can be performed speculat ively and
writ es can be buffered and combined. For t his t ype of memory, cache locking is
performed on at omic ( locked) operat ions t hat do not split across cache lines,
which helps t o reduce t he performance penalt y associat ed wit h t he use of t he
t ypical synchronizat ion inst ruct ions, such as XCHG, t hat lock t he bus during t he
ent ire read- modify- writ e operat ion. Wit h t he WB memory t ype, t he XCHG
inst ruct ion locks t he cache inst ead of t he bus if t he memory access is cont ained
wit hin a cache line.
The PAT was int roduced in t he Pent ium III processor t o enhance t he caching charac-
t erist ics t hat can be assigned t o pages or groups of pages. The PAT mechanism t ypi-
cally used t o st rengt hen caching charact erist ics at t he page level wit h respect t o t he
caching charact erist ics est ablished by t he MTRRs. Table 11- 7 shows t he int eract ion of
t he PAT wit h t he MTRRs.
I nt el recommends t hat soft ware writ t en t o run on I nt el Core 2 Duo, I nt el At om, I nt el
Core Duo, Pent ium 4, I nt el Xeon, and P6 family processors assume t he processor-
ordering model or a weaker memory- ordering model. The I nt el Core 2 Duo, I nt el
At om, I nt el Core Duo, Pent ium 4, I nt el Xeon, and P6 family processors do not imple-
1. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no
later instruction begins execution until LFENCE completes. As a result, an instruction that loads
from memory and that precedes an LFENCE receives data from memory prior to completion of
the LFENCE. An LFENCE that follows an instruction that stores to memory might complete before
the data being stored have become globally visible. Instructions following an LFENCE may be
fetched from memory before the LFENCE, but they will not execute until the LFENCE completes.
Vol. 3 8-25
MULTIPLE-PROCESSOR MANAGEMENT
ment a st rong memory- ordering model, except when using t he UC memory t ype.
Despit e t he fact t hat Pent ium 4, I nt el Xeon, and P6 family processors support
processor ordering, I nt el does not guarant ee t hat fut ure processors will support t his
model. To make soft ware port able t o fut ure processors, it is recommended t hat oper-
at ing syst ems provide crit ical region and resource cont rol const ruct s and API s ( appli-
cat ion program int erfaces) based on I / O, locking, and/ or serializing inst ruct ions be
used t o synchronize access t o shared areas of memory in mult iple- processor
syst ems. Also, soft ware should not depend on processor ordering in sit uat ions where
t he syst em hardware does not support t his memory- ordering model.
8.3 SERIALIZING INSTRUCTIONS
The I nt el 64 and I A- 32 archit ect ures define several ser i al i zi ng i nst r uct i ons. These
inst ruct ions force t he processor t o complet e all modificat ions t o flags, regist ers, and
memory by previous inst ruct ions and t o drain all buffered writ es t o memory before
t he next inst ruct ion is fet ched and execut ed. For example, when a MOV t o cont rol
regist er inst ruct ion is used t o load a new value int o cont rol regist er CR0 t o enable
prot ect ed mode, t he processor must perform a serializing operat ion before it ent ers
prot ect ed mode. This serializing operat ion ensures t hat all operat ions t hat were
st art ed while t he processor was in real- address mode are complet ed before t he
swit ch t o prot ect ed mode is made.
The concept of serializing inst ruct ions was int roduced int o t he I A- 32 archit ect ure
wit h t he Pent ium processor t o support parallel inst ruct ion execut ion. Serializing
inst ruct ions have no meaning for t he I nt el486 and earlier processors t hat do not
implement parallel inst ruct ion execut ion.
I t is import ant t o not e t hat execut ing of serializing inst ruct ions on P6 and more
recent processor families const rain speculat ive execut ion because t he result s of
speculat ively execut ed inst ruct ions are discarded. The following inst ruct ions are seri-
alizing inst ruct ions:
Pr i vi l eged ser i al i zi ng i nst r uct i ons I NVD, I NVEPT, I NVLPG, I NVVPI D, LGDT,
LI DT, LLDT, LTR, MOV ( t o cont rol regist er, wit h t he except ion of MOV CR8
2
) , MOV
( t o debug regist er) , WBI NVD, and WRMSR.
Non- pr i vi l eged ser i al i zi ng i nst r uct i ons CPUI D, I RET, and RSM.
When t he processor serializes inst ruct ion execut ion, it ensures t hat all pending
memory t ransact ions are complet ed ( including writ es st ored in it s st ore buffer)
before it execut es t he next inst ruct ion. Not hing can pass a serializing inst ruct ion and
a serializing inst ruct ion cannot pass any ot her inst ruct ion ( read, writ e, inst ruct ion
fet ch, or I / O) . For example, CPUI D can be execut ed at any privilege level t o serialize
inst ruct ion execut ion wit h no effect on program flow, except t hat t he EAX, EBX, ECX,
and EDX regist ers are modified.
2. MOV CR8 is not defined architecturally as a serializing instruction.
8-26 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
The following inst ruct ions are memory- ordering inst ruct ions, not serializing inst ruc-
t ions. These drain t he dat a memory subsyst em. They do not serialize t he inst ruct ion
execut ion st ream:
3
Non- pr i v i l eged memor y - or der i ng i nst r uct i ons SFENCE, LFENCE, and
MFENCE.
The SFENCE, LFENCE, and MFENCE inst ruct ions provide more granularit y in cont rol-
ling t he serializat ion of memory loads and st ores ( see Sect ion 8. 2. 5, St rengt hening
or Weakening t he Memory- Ordering Model ) .
The following addit ional informat ion is wort h not ing regarding serializing inst ruc-
t ions:
The processor does not writ eback t he cont ent s of modified dat a in it s dat a cache
t o ext ernal memory when it serializes inst ruct ion execut ion. Soft ware can force
modified dat a t o be writ t en back by execut ing t he WBI NVD inst ruct ion, which is a
serializing inst ruct ion. The amount of t ime or cycles for WBI NVD t o complet e will
vary due t o t he size of different cache hierarchies and ot her fact ors. As a conse-
quence, t he use of t he WBI NVD inst ruct ion can have an impact on
int errupt / event response t ime.
When an inst ruct ion is execut ed t hat enables or disables paging ( t hat is, changes
t he PG flag in cont rol regist er CR0) , t he inst ruct ion should be followed by a j ump
inst ruct ion. The t arget inst ruct ion of t he j ump inst ruct ion is fet ched wit h t he new
set t ing of t he PG flag ( t hat is, paging is enabled or disabled) , but t he j ump
inst ruct ion it self is fet ched wit h t he previous set t ing. The Pent ium 4, I nt el Xeon,
and P6 family processors do not require t he j ump operat ion following t he move t o
regist er CR0 ( because any use of t he MOV inst ruct ion in a Pent ium 4, I nt el Xeon,
or P6 family processor t o writ e t o CR0 is complet ely serializing) . However, t o
maint ain backwards and forward compat ibilit y wit h code writ t en t o run on ot her
I A- 32 processors, it is recommended t hat t he j ump operat ion be performed.
Whenever an inst ruct ion is execut ed t o change t he cont ent s of CR3 while paging
is enabled, t he next inst ruct ion is fet ched using t he t ranslat ion t ables t hat
correspond t o t he new value of CR3. Therefore t he next inst ruct ion and t he
sequent ially following inst ruct ions should have a mapping based upon t he new
value of CR3. ( Global ent ries in t he TLBs are not invalidat ed, see Sect ion 4. 10. 4,
I nvalidat ion of TLBs and Paging- St ruct ure Caches. )
The Pent ium processor and more recent processor families use branch- predict ion
t echniques t o improve performance by prefet ching t he dest inat ion of a branch
inst ruct ion before t he branch inst ruct ion is execut ed. Consequent ly, inst ruct ion
execut ion is not det erminist ically serialized when a branch inst ruct ion is
execut ed.
3. LFENCE does provide some guarantees on instruction ordering. It does not execute until all prior
instructions have completed locally, and no later instruction begins execution until LFENCE com-
pletes.
Vol. 3 8-27
MULTIPLE-PROCESSOR MANAGEMENT
8.4 MULTIPLE-PROCESSOR (MP) INITIALIZATION
The I A- 32 archit ect ure ( beginning wit h t he P6 family processors) defines a mult iple-
processor ( MP) init ializat ion prot ocol called t he Mult iprocessor Specificat ion Version
1. 4. This specificat ion defines t he boot prot ocol t o be used by I A- 32 processors in
mult iple- processor syst ems. ( Here, mul t i pl e pr ocessor s is defined as t wo or more
processors. ) The MP init ializat ion prot ocol has t he following import ant feat ures:
I t support s cont rolled boot ing of mult iple processors wit hout requiring dedicat ed
syst em hardware.
I t allows hardware t o init iat e t he boot ing of a syst em wit hout t he need for a
dedicat ed signal or a predefined boot processor.
I t allows all I A- 32 processors t o be boot ed in t he same manner, including t hose
support ing I nt el Hyper-Threading Technology.
The MP init ializat ion prot ocol also applies t o MP syst ems using I nt el 64
processors.
The mechanism for carrying out t he MP init ializat ion prot ocol differs depending on
t he I A- 32 processor family, as follows:
For P6 f ami l y pr ocessor s The select ion of t he BSP and APs ( see Sect ion
8. 4. 1, BSP and AP Processors ) is handled t hrough arbit rat ion on t he API C bus,
using BI PI and FI PI messages. See Appendix C, MP I nit ializat ion For P6 Family
Processors, for a complet e discussion of MP init ializat ion for P6 family
processors.
I nt el Xeon pr ocessor s w i t h f ami l y , model , and st eppi ng I Ds up t o F09H
The select ion of t he BSP and APs ( see Sect ion 8. 4.1, BSP and AP Processors ) is
handled t hrough arbit rat ion on t he syst em bus, using BI PI and FI PI messages
( see Sect ion 8.4. 3, MP I nit ializat ion Prot ocol Algorit hm for
I nt el Xeon Processors ) .
I nt el Xeon pr ocessor s w i t h f ami l y , model , and st eppi ng I Ds of F0AH and
beyond, 6E0H and beyond, 6F0H and beyond The select ion of t he BSP and
APs is handled t hrough a special syst em bus cycle, wit hout using BI PI and FI PI
message arbit rat ion ( see Sect ion 8.4. 3, MP I nit ializat ion Prot ocol Algorit hm for
I nt el Xeon Processors ) .
The family, model, and st epping I D for a processor is given in t he EAX regist er when
t he CPUI D inst ruct ion is execut ed wit h a value of 1 in t he EAX regist er.
8.4.1 BSP and AP Processors
The MP init ializat ion prot ocol defines t wo classes of processors: t he boot st rap
processor ( BSP) and t he applicat ion processors ( APs) . Following a power- up or
RESET of an MP syst em, syst em hardware dynamically select s one of t he processors
on t he syst em bus as t he BSP. The remaining processors are designat ed as APs.
8-28 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
As part of t he BSP select ion mechanism, t he BSP flag is set in t he I A32_API C_BASE
MSR ( see Figure 10- 5) of t he BSP, indicat ing t hat it is t he BSP. This flag is cleared for
all ot her processors.
The BSP execut es t he BI OSs boot - st rap code t o configure t he API C environment ,
set s up syst em- wide dat a st ruct ures, and st art s and init ializes t he APs. When t he BSP
and APs are init ialized, t he BSP t hen begins execut ing t he operat ing- syst em init ial-
izat ion code.
Following a power- up or reset , t he APs complet e a minimal self- configurat ion, t hen
wait for a st art up signal ( a SI PI message) from t he BSP processor. Upon receiving a
SI PI message, an AP execut es t he BI OS AP configurat ion code, which ends wit h t he
AP being placed in halt st at e.
For I nt el 64 and I A- 32 processors support ing I nt el Hyper-Threading Technology, t he
MP init ializat ion prot ocol t reat s each of t he logical processors on t he syst em bus or
coherent link domain as a separat e processor ( wit h a unique API C I D) . During boot -
up, one of t he logical processors is select ed as t he BSP and t he remainder of t he
logical processors are designat ed as APs.
8.4.2 MP Initialization Protocol Requirements and Restrictions
The MP init ializat ion prot ocol imposes t he following requirement s and rest rict ions on
t he syst em:
The MP prot ocol is execut ed only aft er a power- up or RESET. I f t he MP prot ocol
has complet ed and a BSP is chosen, subsequent I NI Ts ( eit her t o a specific
processor or syst em wide) do not cause t he MP prot ocol t o be repeat ed. I nst ead,
each logical processor examines it s BSP flag ( in t he I A32_API C_BASE MSR) t o
det ermine whet her it should execut e t he BI OS boot - st rap code ( if it is t he BSP) or
ent er a wait - for- SI PI st at e ( if it is an AP) .
All devices in t he syst em t hat are capable of delivering int errupt s t o t he
processors must be inhibit ed from doing so for t he durat ion of t he MP init ial-
izat ion prot ocol. The t ime during which int errupt s must be inhibit ed includes t he
window bet ween when t he BSP issues an I NI T- SI PI - SI PI sequence t o an AP and
when t he AP responds t o t he last SI PI in t he sequence.
8.4.3 MP Initialization Protocol Algorithm for
Intel Xeon Processors
Following a power- up or RESET of an MP syst em, t he processors in t he syst em
execut e t he MP init ializat ion prot ocol algorit hm t o init ialize each of t he logical proces-
sors on t he syst em bus or coherent link domain. I n t he course of execut ing t his algo-
rit hm, t he following boot - up and init ializat ion operat ions are carried out :
1. Each logical processor is assigned a unique API C I D, based on syst em t opology.
The unique I D is a 32- bit value if t he processor support s CPUI D leaf 0BH,
ot herwise t he unique I D is an 8- bit value. ( see Sect ion 8.4. 5, I dent ifying Logical
Vol. 3 8-29
MULTIPLE-PROCESSOR MANAGEMENT
Processors in an MP Syst em ) . This I D is writ t en int o t he local API C I D regist er for
each processor.
2. Each logical processor is assigned a unique arbit rat ion priorit y based on it s
API C I D.
3. Each logical processor execut es it s int ernal BI ST simult aneously wit h t he ot her
logical processors on t he syst em bus.
4. Upon complet ion of t he BI ST, t he logical processors use a hardware- defined
select ion mechanism t o select t he BSP and t he APs from t he available logical
processors on t he syst em bus. The BSP select ion mechanism differs depending
on t he family, model, and st epping I Ds of t he processors, as follows:
Family, model, and st epping I Ds of F0AH and onwards:
The logical processors begin monit oring t he BNR# signal, which is
t oggling. When t he BNR# pin st ops t oggling, each processor at t empt s t o
issue a NOP special cycle on t he syst em bus.
The logical processor wit h t he highest arbit rat ion priorit y succeeds in
issuing a NOP special cycle and is nominat ed t he BSP. This processor set s
t he BSP flag in it s I A32_API C_BASE MSR, t hen fet ches and begins
execut ing BI OS boot - st rap code, beginning at t he reset vect or ( physical
address FFFF FFF0H) .
The remaining logical processors ( t hat failed in issuing a NOP special
cycle) are designat ed as APs. They leave t heir BSP flags in t he clear st at e
and ent er a wait - for- SI PI st at e.
Family, model, and st epping I Ds up t o F09H:
Each processor broadcast s a BI PI t o all including self. The first processor
t hat broadcast s a BI PI ( and t hus receives it s own BI PI vect or) , select s
it self as t he BSP and set s t he BSP flag in it s I A32_API C_BASE MSR. ( See
Appendix C. 1, Overview of t he MP I nit ializat ion Process For P6 Family
Processors, for a descript ion of t he BI PI , FI PI , and SI PI messages. )
The remainder of t he processors ( which were not select ed as t he BSP) are
designat ed as APs. They leave t heir BSP flags in t he clear st at e and ent er
a wait - for- SI PI st at e.
The newly est ablished BSP broadcast s an FI PI message t o all including
self, which t he BSP and APs t reat as an end of MP init ializat ion signal.
Only t he processor wit h it s BSP flag set responds t o t he FI PI message. I t
responds by fet ching and execut ing t he BI OS boot - st rap code, beginning
at t he reset vect or ( physical address FFFF FFF0H) .
5. As part of t he boot - st rap code, t he BSP creat es an ACPI t able and an MP t able and
adds it s init ial API C I D t o t hese t ables as appropriat e.
6. At t he end of t he boot - st rap procedure, t he BSP set s a processor count er t o 1,
t hen broadcast s a SI PI message t o all t he APs in t he syst em. Here, t he SI PI
8-30 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
message cont ains a vect or t o t he BI OS AP init ializat ion code ( at 000VV000H,
where VV is t he vect or cont ained in t he SI PI message) .
7. The first act ion of t he AP init ializat ion code is t o set up a race ( among t he APs) t o
a BI OS init ializat ion semaphore. The first AP t o t he semaphore begins execut ing
t he init ializat ion code. ( See Sect ion 8.4. 4, MP I nit ializat ion Example, for
semaphore implement at ion det ails. ) As part of t he AP init ializat ion procedure,
t he AP adds it s API C I D number t o t he ACPI and MP t ables as appropriat e and
increment s t he processor count er by 1. At t he complet ion of t he init ializat ion
procedure, t he AP execut es a CLI inst ruct ion and halt s it self.
8. When each of t he APs has gained access t o t he semaphore and execut ed t he AP
init ializat ion code, t he BSP est ablishes a count for t he number of processors
connect ed t o t he syst em bus, complet es execut ing t he BI OS boot - st rap code,
and t hen begins execut ing operat ing- syst em boot - st rap and st art - up code.
9. While t he BSP is execut ing operat ing- syst em boot - st rap and st art - up code, t he
APs remain in t he halt ed st at e. I n t his st at e t hey will respond only t o I NI Ts, NMI s,
and SMI s. They will also respond t o snoops and t o assert ions of t he STPCLK# pin.
The following sect ion gives an example ( wit h code) of t he MP init ializat ion prot ocol
for mult iple I nt el Xeon processors operat ing in an MP configurat ion.
Appendix B, Model- Specific Regist ers ( MSRs) , describes how t o program t he
LI NT[ 0: 1] pins of t he processor s local API Cs aft er an MP configurat ion has been
complet ed.
8.4.4 MP Initialization Example
The following example illust rat es t he use of t he MP init ializat ion prot ocol used t o
init ialize processors in an MP syst em aft er t he BSP and APs have been est ablished.
The code runs on I nt el 64 or I A- 32 processors t hat use a prot ocol. This includes P6
Family processors, Pent ium 4 processors, I nt el Core Duo, I nt el Core 2 Duo and I nt el
Xeon processors.
The following const ant s and dat a definit ions are used in t he accompanying
code examples. They are based on t he addresses of t he API C regist ers defined in
Table 10- 1.
ICR_LOW EQU 0FEE00300H
SVR EQU 0FEE000F0H
APIC_ID EQU 0FEE00020H
LVT3 EQU 0FEE00370H
APIC_ENABLED EQU 0100H
BOOT_ID DD ?
COUNT EQU 00H
VACANT EQU 00H
Vol. 3 8-31
MULTIPLE-PROCESSOR MANAGEMENT
8.4.4.1 Typical BSP Initialization Sequence
Aft er t he BSP and APs have been select ed ( by means of a hardware prot ocol, see
Sect ion 8. 4. 3, MP I nit ializat ion Prot ocol Algorit hm for I nt el Xeon Processors ) , t he
BSP begins execut ing BI OS boot - st rap code ( POST) at t he normal I A- 32 archit ect ure
st art ing address ( FFFF FFF0H) . The boot - st rap code t ypically performs t he following
operat ions:
1. I nit ializes memory.
2. Loads t he microcode updat e int o t he processor.
3. I nit ializes t he MTRRs.
4. Enables t he caches.
5. Execut es t he CPUI D inst ruct ion wit h a value of 0H in t he EAX regist er, t hen reads
t he EBX, ECX, and EDX regist ers t o det ermine if t he BSP is GenuineI nt el.
6. Execut es t he CPUI D inst ruct ion wit h a value of 1H in t he EAX regist er, t hen saves
t he values in t he EAX, ECX, and EDX regist ers in a syst em configurat ion space in
RAM for use lat er.
7. Loads st art - up code for t he AP t o execut e int o a 4- KByt e page in t he lower 1
MByt e of memory.
8. Swit ches t o prot ect ed mode and ensures t hat t he API C address space is mapped
t o t he st rong uncacheable ( UC) memory t ype.
9. Det ermine t he BSPs API C I D from t he local API C I D regist er ( default is 0) , t he
code snippet below is an example t hat applies t o logical processors in a syst em
whose local API C unit s operat e in xAPI C mode t hat API C regist ers are accessed
using memory mapped int erface:
MOV ESI, APIC_ID; Address of local APIC ID register
MOV EAX, [ESI];
AND EAX, 0FF000000H; Zero out all other bits except APIC ID
MOV BOOT_ID, EAX; Save in memory
Saves t he API C I D in t he ACPI and MP t ables and opt ionally in t he syst em config-
urat ion space in RAM.
10. Convert s t he base address of t he 4- KByt e page for t he APs boot up code int o 8- bit
vect or. The 8- bit vect or defines t he address of a 4- KByt e page in t he real- address
mode address space ( 1- MByt e space) . For example, a vect or of 0BDH specifies a
st art - up memory address of 000BD000H.
11. Enables t he local API C by set t ing bit 8 of t he API C spurious vect or regist er ( SVR) .
MOV ESI, SVR; Address of SVR
MOV EAX, [ESI];
OR EAX, APIC_ENABLED; Set bit 8 to enable (0 on reset)
MOV [ESI], EAX;
8-32 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
12. Set s up t he LVT error handling ent ry by est ablishing an 8- bit vect or for t he API C
error handler.
MOV ESI, LVT3;
MOV EAX, [ESI];
AND EAX, FFFFFF00H; Clear out previous vector.
OR EAX, 000000xxH; xx is the 8-bit vector the APIC error handler.
MOV [ESI], EAX;
13. I nit ializes t he Lock Semaphore variable VACANT t o 00H. The APs use t his
semaphore t o det ermine t he order in which t hey execut e BI OS AP init ializat ion
code.
14. Performs t he following operat ion t o set up t he BSP t o det ect t he presence of APs
in t he syst em and t he number of processors:
Set s t he value of t he COUNT variable t o 1.
St art s a t imer ( set for an approximat e int erval of 100 milliseconds) . I n t he AP
BI OS init ializat ion code, t he AP will increment t he COUNT variable t o indicat e
it s presence. When t he t imer expires, t he BSP checks t he value of t he COUNT
variable. I f t he t imer expires and t he COUNT variable has not been incre-
ment ed, no APs are present or some error has occurred.
15. Broadcast s an I NI T- SI PI - SI PI I PI sequence t o t he APs t o wake t hem up and
init ialize t hem:
MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
; to all APs into EAX.
MOV [ESI], EAX; Broadcast INIT IPI to all APs
; 10-millisecond delay loop.
MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
; to all APs into EAX, where xx is the vector computed in step 10.
MOV [ESI], EAX; Broadcast SIPI IPI to all APs
; 200-microsecond delay loop
MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
; 200-microsecond delay loop
Step 15:
MOV EAX, 000C46XXH; Load ICR encoding from broadcast SIPI IP
; to all APs into EAX where xx is the vector computed in step 8.
16. Wait s for t he t imer int errupt .
17. Reads and evaluat es t he COUNT variable and est ablishes a processor count .
18. I f necessary, reconfigures t he API C and cont inues wit h t he remaining syst em
diagnost ics as appropriat e.
Vol. 3 8-33
MULTIPLE-PROCESSOR MANAGEMENT
8.4.4.2 Typical AP Initialization Sequence
When an AP receives t he SI PI , it begins execut ing BI OS AP init ializat ion code at t he
vect or encoded in t he SI PI . The AP init ializat ion code t ypically performs t he following
operat ions:
1. Wait s on t he BI OS init ializat ion Lock Semaphore. When cont rol of t he semaphore
is at t ained, init ializat ion cont inues.
2. Loads t he microcode updat e int o t he processor.
3. I nit ializes t he MTRRs ( using t he same mapping t hat was used for t he BSP) .
4. Enables t he cache.
5. Execut es t he CPUI D inst ruct ion wit h a value of 0H in t he EAX regist er, t hen reads
t he EBX, ECX, and EDX regist ers t o det ermine if t he AP is GenuineI nt el.
6. Execut es t he CPUI D inst ruct ion wit h a value of 1H in t he EAX regist er, t hen saves
t he values in t he EAX, ECX, and EDX regist ers in a syst em configurat ion space in
RAM for use lat er.
7. Swit ches t o prot ect ed mode and ensures t hat t he API C address space is mapped
t o t he st rong uncacheable ( UC) memory t ype.
8. Det ermines t he APs API C I D from t he local API C I D regist er, and adds it t o t he MP
and ACPI t ables and opt ionally t o t he syst em configurat ion space in RAM.
9. I nit ializes and configures t he local API C by set t ing bit 8 in t he SVR regist er and
set t ing up t he LVT3 ( error LVT) for error handling ( as described in st eps 9 and 10
in Sect ion 8.4.4. 1, Typical BSP I nit ializat ion Sequence ) .
10. Configures t he APs SMI execut ion environment . ( Each AP and t he BSP must have
a different SMBASE address. )
11. I ncrement s t he COUNT variable by 1.
12. Releases t he semaphore.
13. Execut es t he CLI and HLT inst ruct ions.
14. Wait s for an I NI T I PI .
8.4.5 Identifying Logical Processors in an MP System
Aft er t he BI OS has complet ed t he MP init ializat ion prot ocol, each logical processor
can be uniquely ident ified by it s local API C I D. Soft ware can access t hese API C I Ds in
eit her of t he following ways:
Read API C I D f or a l ocal API C Code running on a logical processor can read
API C I D in one of t wo ways depending on t he local API C unit is operat ing in
x2API C mode ( see I nt el 64 Archit ect ure x2API C Specificat ion) or in xAPI C
mode:
I f t he local API C unit support s x2API C and is operat ing in x2API C mode, 32-
bit API C I D can be read by execut ing a RDMSR inst ruct ion t o read t he
8-34 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
processor s x2API C I D regist er. This met hod is equivalent t o execut ing CPUI D
leaf 0BH described below.
I f t he local API C unit is operat ing in xAPI C mode, 8- bit API C I D can be read by
execut ing a MOV inst ruct ion t o read t he processor s local API C I D regist er
( see Sect ion 10. 4. 6, Local API C I D ) . This is t he I D t o use for direct ing
physical dest inat ion mode int errupt s t o t he processor.
Read ACPI or MP t abl e As part of t he MP init ializat ion prot ocol, t he BI OS
creat es an ACPI t able and an MP t able. These t ables are defined in t he Mult ipro-
cessor Specificat ion Version 1. 4 and provide soft ware wit h a list of t he processors
in t he syst em and t heir local API C I Ds. The format of t he ACPI t able is derived
from t he ACPI specificat ion, which is an indust ry st andard power management
and plat form configurat ion specificat ion for MP syst ems.
Read I ni t i al API C I D ( I f t he process does not support CPUI D leaf 0BH) An
API C I D is assigned t o a logical processor during power up. This is t he init ial API C
I D report ed by CPUI D.1: EBX[ 31: 24] and may be different from t he current value
read from t he local API C. The init ial API C I D can be used t o det ermine t he
t opological relat ionship bet ween logical processors for mult i- processor syst ems
t hat do not support CPUI D leaf 0BH.
Bit s in t he 8- bit init ial API C I D can be int erpret ed using several bit masks. Each
bit mask can be used t o ext ract an ident ifier t o represent a hierarchical level of
t he mult i- t hreading resource t opology in an MP syst em ( See Sect ion 8. 9. 1,
Hierarchical Mapping of Shared Resources ) . The init ial API C I D may consist of
up t o four bit - fields. I n a non- clust ered MP syst em, t he field consist s of up t o
t hree bit fields.
Read 32- bi t API C I D f r om CPUI D l eaf 0BH ( I f t he processor support s CPUI D
leaf 0BH) A unique API C I D is assigned t o a logical processor during power up.
This API C I D is report ed by CPUI D. 0BH: EDX[ 31: 0] as a 32- bit value. Use t he 32-
bit API C I D and CPUI D leaf 0BH t o det ermine t he t opological relat ionship bet ween
logical processors if t he processor support s CPUI D leaf 0BH.
Bit s in t he 32- bit x2API C I D can be ext ract ed int o sub- fields using CPUI D leaf 0BH
paramet ers. ( See Sect ion 8. 9. 1, Hierarchical Mapping of Shared Resources ) .
Figure 8- 2 shows t wo examples of API C I D bit fields in earlier single- core processors.
I n single- core I nt el Xeon processors, t he API C I D assigned t o a logical processor
during power- up and init ializat ion is 8 bit s. Bit s 2: 1 form a 2- bit physical package
ident ifier ( which can also be t hought of as a socket ident ifier) . I n syst ems t hat
configure physical processors in clust ers, bit s 4: 3 form a 2- bit clust er I D. Bit 0 is used
in t he I nt el Xeon processor MP t o ident ify t he t wo logical processors wit hin t he
package ( see Sect ion 8. 9.3, Hierarchical I D of Logical Processors in an MP Syst em ) .
For I nt el Xeon processors t hat do not support I nt el Hyper-Threading Technology, bit
0 is always set t o 0; for I nt el Xeon processors support ing I nt el Hyper-Threading
Technology, bit 0 performs t he same funct ion as it does for I nt el Xeon processor MP.
For more recent mult i- core processors, see Sect ion 8.9.1, Hierarchical Mapping of
Shared Resources for a complet e descript ion of t he t opological relat ionships
Vol. 3 8-35
MULTIPLE-PROCESSOR MANAGEMENT
bet ween logical processors and bit field locat ions wit hin an init ial API C I D across I nt el
64 and I A- 32 processor families.
Not e t he number of bit fields and t he widt h of bit - fields are dependent on processor
and plat form hardware capabilit ies. Soft ware should det ermine t hese at runt ime.
When init ial API C I Ds are assigned t o logical processors, t he value of API C I D
assigned t o a logical processor will respect t he bit - field boundaries corresponding
core, physical package, et c. Addit ional examples of t he bit fields in t he init ial API C I D
of mult i- t hreading capable syst ems are shown in Sect ion 8. 9.
For P6 family processors, t he API C I D t hat is assigned t o a processor during power-
up and init ializat ion is 4 bit s ( see Figure 8- 2) . Here, bit s 0 and 1 form a 2- bit
processor ( or socket ) ident ifier and bit s 2 and 3 form a 2- bit clust er I D.
8.5 INTEL
MULTI-CORE TECHNOLOGY
I nt el Hyper-Threading Technology and I nt el mult i- core t echnology are ext ensions t o
I nt el 64 and I A- 32 archit ect ures t hat enable a single physical processor t o execut e
t wo or more separat e code st reams ( called t hreads) concurrent ly. I n I nt el Hyper-
Threading Technology, a single processor core provides t wo logical processors t hat
share execut ion resources ( see Sect ion 8. 7, I nt el
Hyper-Threading Technology
Archit ect ure ) . I n I nt el mult i- core t echnology, a physical processor package provides
Figure 8-2. Interpretation of APIC ID in Early MP Systems
0
Processor ID
1 7 4 3 2
Cluster
Reserved
0
Processor ID
1 7 4 3 2 5
Cluster
Reserved
APIC ID Format for Intel Xeon Processors that
APIC ID Format for P6 Family Processors
0
do not Support Intel Hyper-Threading Technology
8-36 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
t wo or more processor cores. Bot h configurat ions r equir e chipset s and a BI OS t hat
suppor t t he t echnologies.
Soft ware should not rely on processor names t o det ermine whet her a processor
support s I nt el Hyper-Threading Technology or I nt el mult i- core t echnology. Use t he
CPUI D inst ruct ion t o det ermine processor capabilit y ( see Sect ion 8. 6. 2, I nit ializing
Mult i- Core Processors ) .
8.6 DETECTING HARDWARE MULTI-THREADING
SUPPORT AND TOPOLOGY
Use t he CPUI D inst ruct ion t o det ect t he presence of hardware mult i- t hreading
support in a physical processor. Hardware mult i- t hreading can support several vari-
et ies of mult igrade and/ or I nt el Hyper-Threading Technology. CPUI D inst ruct ion
provides several set s of paramet er informat ion t o aid soft ware enumerat ing t opology
informat ion. The relevant t opology enumerat ion paramet ers provided by CPUI D
include:
Har dw ar e Mul t i - Thr eadi ng f eat ur e f l ag ( CPUI D.1: EDX[ 28] = 1)
I ndicat es when set t hat t he physical package is capable of support ing I nt el
Hyper-Threading Technology and/ or mult iple cores.
Pr ocessor t opol ogy enumer at i on par amet er s f or 8- bi t API C I D:
Addr essabl e I Ds f or Logi cal pr ocessor s i n t he same Pack age
( CPUI D.1: EBX[ 23: 16] ) I ndicat es t he maximum number of addressable
I D for logical processors in a physical package. Wit hin a physical package,
t here may be addressable I Ds t hat are not occupied by any logical
processors. This paramet er does not represent s t he hardware capabilit y of
t he physical processor.
4
Addr essabl e I Ds f or pr ocessor cor es i n t he same Pack age
5
( CPUI D.( EAX= 4, ECX= 0
6
) : EAX[ 31: 26] + 1 = Y) I ndicat es t he maximum
number of addressable I Ds at t ribut able t o processor cores ( Y) in t he physical
package.
Ex t ended Pr ocessor Topol ogy Enumer at i on par amet er s f or 32- bi t API C
I D: I nt el 64 processors support ing CPUI D leaf 0BH will assign unique API C I Ds t o
each logical processor in t he syst em. CPUI D leaf 0BH report s t he 32- bit API C I D
4. Operating system and BIOS may implement features that reduce the number of logical proces-
sors available in a platform to applications at runtime to less than the number of physical pack-
ages times the number of hardware-capable logical processors per package.
5. Software must check CPUID for its support of leaf 4 when implementing support for multi-core. If
CPUID leaf 4 is not available at runtime, software should handle the situation as if there is only
one core per package.
6. Maximum number of cores in the physical package must be queried by executing CPUID with
EAX=4 and a valid ECX input value. Valid ECX input values start from 0.
Vol. 3 8-37
MULTIPLE-PROCESSOR MANAGEMENT
and provide t opology enumerat ion paramet ers. See CPUI D inst ruct ion reference
pages in I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2A.
The CPUI D feat ure flag may indicat e support for hardware mult i- t hreading when only
one logical processor available in t he package. I n t his case, t he decimal value repre-
sent ed by bit s 16 t hrough 23 in t he EBX regist er will have a value of 1.
Soft ware should not e t hat t he number of logical processors enabled by syst em soft -
ware may be less t han t he value of Addressable I Ds for Logical processors. Simi-
larly, t he number of cores enabled by syst em soft ware may be less t han t he value of
Addressable I Ds for processor cores.
Soft ware can det ect t he availabilit y of t he CPUI D ext ended t opology enumerat ion leaf
( 0BH) by performing t wo st eps:
Check maximum input value for basic CPUI D informat ion by execut ing CPUI D
wit h EAX= 0. I f CPUI D.0H: EAX is great er t han or equal or 11 ( 0BH) , t hen proceed
t o next st ep,
Check CPUI D. EAX= 0BH, ECX= 0H: EBX is non- zero.
I f bot h of t he above condit ions are t rue, ext ended t opology enumerat ion leaf is avail-
able. Not e t he presence of CPUI D leaf 0BH in a processor does not guarant ee support
t hat t he local API C support s x2API C. I f CPUI D. ( EAX= 0BH, ECX= 0H) : EBX ret urns
zero and maximum input value for basic CPUI D informat ion is great er t han 0BH, t hen
CPUI D. 0BH leaf is not support ed on t hat processor.
8.6.1 Initializing Processors
Supporting Hyper-Threading Technology
The init ializat ion process for an MP syst em t hat cont ains processors support ing I nt el
Hyper-Threading Technology is t he same as for convent ional MP syst ems ( see
Sect ion 8. 4, Mult iple- Processor ( MP) I nit ializat ion ) . One logical processor in t he
syst em is select ed as t he BSP and ot her processors ( or logical processors) are desig-
nat ed as APs. The init ializat ion process is ident ical t o t hat described in Sect ion 8. 4. 3,
MP I nit ializat ion Prot ocol Algorit hm for I nt el Xeon Processors, and Sect ion 8. 4.4,
MP I nit ializat ion Example.
During init ializat ion, each logical processor is assigned an API C I D t hat is st ored in
t he local API C I D regist er for each logical processor. I f t wo or more processors
support ing I nt el Hyper-Threading Technology are present , each logical processor on
t he syst em bus is assigned a unique I D ( see Sect ion 8. 9. 3, Hierarchical I D of Logical
Processors in an MP Syst em ) . Once logical processors have API C I Ds, soft ware
communicat es wit h t hem by sending API C I PI messages.
8-38 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.6.2 Initializing Multi-Core Processors
The init ializat ion process for an MP syst em t hat cont ains mult i- core I nt el 64 or I A- 32
processors is t he same as for convent ional MP syst ems ( see Sect ion 8. 4, Mult iple-
Processor ( MP) I nit ializat ion ) . A logical processor in one core is select ed as t he BSP;
ot her logical processors are designat ed as APs.
During init ializat ion, each logical processor is assigned an API C I D. Once logical
processors have API C I Ds, soft ware may communicat e wit h t hem by sending API C
I PI messages.
8.6.3 Executing Multiple Threads on an Intel
64 or IA-32
Processor Supporting Hardware Multi-Threading
Upon complet ing t he operat ing syst em boot - up procedure, t he boot st rap processor
( BSP) execut es operat ing syst em code. Ot her logical processors are placed in t he
halt st at e. To execut e a code st ream ( t hread) on a halt ed logical processor, t he oper-
at ing syst em issues an int erprocessor int errupt ( I PI ) addressed t o t he halt ed logical
processor. I n response t o t he I PI , t he processor wakes up and begins execut ing t he
t hread ident ified by t he int errupt vect or received as part of t he I PI .
To manage execut ion of mult iple t hreads on logical processors, an operat ing syst em
can use convent ional symmet ric mult iprocessing ( SMP) t echniques. For example, t he
operat ing- syst em can use a t ime- slice or load balancing mechanism t o periodically
int errupt each of t he act ive logical processors. Upon int errupt ing a logical processor,
t he operat ing syst em checks it s run queue for a t hread wait ing t o be execut ed and
dispat ches t he t hread t o t he int errupt ed logical processor.
8.6.4 Handling Interrupts on an IA-32 Processor Supporting
Hardware Multi-Threading
I nt errupt s are handled on processors support ing I nt el Hyper-Threading Technology
as t hey are on convent ional MP syst ems. Ext ernal int errupt s are received by t he I / O
API C, which dist ribut es t hem as int errupt messages t o specific logical processors
( see Figure 8- 3) .
Logical processors can also send I PI s t o ot her logical processors by writ ing t o t he I CR
regist er of it s local API C ( see Sect ion 10. 6, I ssuing I nt erprocessor I nt errupt s ) . This
also applies t o dual- core processors.
Vol. 3 8-39
MULTIPLE-PROCESSOR MANAGEMENT
8.7 INTEL
HYPER-THREADING TECHNOLOGY
ARCHITECTURE
Figure 8- 4 shows a generalized view of an I nt el processor support ing I nt el Hyper-
Threading Technology, using t he original I nt el Xeon processor MP as an example.
This implement at ion of t he I nt el Hyper-Threading Technology consist s of t wo logical
processors ( each represent ed by a separat e archit ect ural st at e) which share t he
processor s execut ion engine and t he bus int erface. Each logical processor also has
it s own advanced programmable int errupt cont roller ( API C) .
Figure 8-3. Local APICs and I/O APIC in MP System Supporting Intel HT Technology
I/O APIC
External
Interrupts
System Chip Set
Bridge
PCI
Interrupt Messages
Local APIC
Logical
Processor 0
Local APIC
Logical
Processor 1
Hyper-Threading Technology
Intel Processor with Intel
Bus Interface
Processor Core
IPIs
Interrupt
Messages
Local APIC
Logical
Processor 0
Local APIC
Logical
Processor 1
Hyper-Threading Technology
Intel Processor with Intel
Bus Interface
Processor Core
IPIs
Interrupt
Messages
8-40 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.7.1 State of the Logical Processors
The following feat ures are part of t he archit ect ural st at e of logical processors wit hin
I nt el 64 or I A- 32 processors support ing I nt el Hyper-Threading Technology. The
feat ures can be subdivided int o t hree groups:
Duplicat ed for each logical processor
Shared by logical processors in a physical processor
Shared or duplicat ed, depending on t he implement at ion
The following feat ures are duplicat ed for each logical processor:
General purpose regist ers ( EAX, EBX, ECX, EDX, ESI , EDI , ESP, and EBP)
Segment regist ers ( CS, DS, SS, ES, FS, and GS)
EFLAGS and EI P regist ers. Not e t hat t he CS and EI P/ RI P regist ers for each logical
processor point t o t he inst ruct ion st ream for t he t hread being execut ed by t he
logical processor.
x87 FPU regist ers ( ST0 t hrough ST7, st at us word, cont rol word, t ag word, dat a
operand point er, and inst ruct ion point er)
MMX regist ers ( MM0 t hrough MM7)
XMM regist ers ( XMM0 t hrough XMM7) and t he MXCSR regist er
Cont rol regist ers and syst em t able point er regist ers ( GDTR, LDTR, I DTR, t ask
regist er)
Figure 8-4. IA-32 Processor with Two Logical Processors Supporting Intel HT
Technology
Logical
Processor 0
Architectural
State
Bus Interface
Local APIC Local APIC
Logical
Processor 1
Architectural
State
Execution Engine
System Bus
Vol. 3 8-41
MULTIPLE-PROCESSOR MANAGEMENT
Debug regist ers ( DR0, DR1, DR2, DR3, DR6, DR7) and t he debug cont rol MSRs
Machine check global st at us ( I A32_MCG_STATUS) and machine check capabilit y
( I A32_MCG_CAP) MSRs
Thermal clock modulat ion and ACPI Power management cont rol MSRs
Time st amp count er MSRs
Most of t he ot her MSR regist ers, including t he page at t ribut e t able ( PAT) . See t he
except ions below.
Local API C regist ers.
Addit ional general purpose regist ers ( R8- R15) , XMM regist ers ( XMM8-XMM15) ,
cont rol regist er, I A32_EFER on I nt el 64 processors.
The following feat ures are shared by logical processors:
Memory t ype range regist ers ( MTRRs)
Whet her t he following feat ures are shared or duplicat ed is implement at ion- specific:
I A32_MI SC_ENABLE MSR ( MSR address 1A0H)
Machine check archit ect ure ( MCA) MSRs ( except for t he I A32_MCG_STATUS and
I A32_MCG_CAP MSRs)
Performance monit oring cont rol and count er MSRs
8.7.2 APIC Functionality
When a processor support ing I nt el Hyper-Threading Technology support is init ialized,
each logical processor is assigned a local API C I D ( see Table 10- 1) . The local API C I D
serves as an I D for t he logical processor and is st ored in t he logical processor s API C
I D regist er. I f t wo or more processors support ing I nt el Hyper-Threading Technology
are present in a dual processor ( DP) or MP syst em, each logical processor on t he
syst em bus is assigned a unique local API C I D ( see Sect ion 8. 9. 3, Hierarchical I D of
Logical Processors in an MP Syst em ) .
Soft ware communicat es wit h local processors using t he API Cs int erprocessor int er-
rupt ( I PI ) messaging facilit y. Set up and programming for API Cs is ident ical in proces-
sors t hat support and do not support I nt el Hyper-Threading Technology. See Chapt er
10, Advanced Programmable I nt errupt Cont roller ( API C) , for a det ailed discussion.
8.7.3 Memory Type Range Registers (MTRR)
MTRRs in a processor support ing I nt el Hyper-Threading Technology are shared by
logical processors. When one logical processor updat es t he set t ing of t he MTRRs,
set t ings are aut omat ically shared wit h t he ot her logical processors in t he same phys-
ical package.
The archit ect ures require t hat all MP syst ems based on I nt el 64 and I A- 32 processors
( t his includes logical processors) must use an ident ical MTRR memory map. This
8-42 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
gives soft ware a consist ent view of memory, independent of t he processor on which
it is running. See Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) , for infor-
mat ion on set t ing up MTRRs.
8.7.4 Page Attribute Table (PAT)
Each logical processor has it s own PAT MSR ( I A32_PAT) . However, as described in
Sect ion 11. 12, Page At t ribut e Table ( PAT) , t he PAT MSR set t ings must be t he same
for all processors in a syst em, including t he logical processors.
8.7.5 Machine Check Architecture
I n t he I nt el HT Technology cont ext as implement ed by processors based on I nt el
Net Burst
microarchit ect ure, all of t he machine check archit ect ure ( MCA) MSRs
( except for t he I A32_MCG_STATUS and I A32_MCG_CAP MSRs) are duplicat ed for
each logical processor. This permit s logical processors t o init ialize, configure, query,
and handle machine- check except ions simult aneously wit hin t he same physical
processor. The design is compat ible wit h machine check except ion handlers t hat
follow t he guidelines given in Chapt er 15, Machine- Check Archit ect ure.
The I A32_MCG_STATUS MSR is duplicat ed for each logical processor so t hat it s
machine check in progress bit field ( MCI P) can be used t o det ect recursion on t he
part of MCA handlers. I n addit ion, t he MSR allows each logical processor t o det er-
mine t hat a machine- check except ion is in progress independent of t he act ions of
anot her logical processor in t he same physical package.
Because t he logical processors wit hin a physical package are t ight ly coupled wit h
respect t o shared hardware resources, bot h logical processors are not ified of
machine check errors t hat occur wit hin a given physical processor. I f machine- check
except ions are enabled when a fat al error is report ed, all t he logical processors wit hin
a physical package are dispat ched t o t he machine- check except ion handler. I f
machine- check except ions are disabled, t he logical processors ent er t he shut down
st at e and assert t he I ERR# signal.
When enabling machine- check except ions, t he MCE flag in cont rol regist er CR4
should be set for each logical processor.
On I nt el At om family processors t hat support I nt el Hyper-Threading Technology, t he
MCA facilit ies are shared bet ween all logical processors on t he same processor core.
8.7.6 Debug Registers and Extensions
Each logical processor has it s own set of debug regist ers ( DR0, DR1, DR2, DR3, DR6,
DR7) and it s own debug cont rol MSR. These can be set t o cont rol and record debug
informat ion for each logical processor independent ly. Each logical processor also has
it s own last branch records ( LBR) st ack.
Vol. 3 8-43
MULTIPLE-PROCESSOR MANAGEMENT
8.7.7 Performance Monitoring Counters
Performance count ers and t heir companion cont rol MSRs are shared bet ween t he
logical processors wit hin a processor core for processors based on I nt el Net Burst
microarchit ect ure. As a result , soft ware must manage t he use of t hese resources.
The performance count er int errupt s, event s, and precise event monit oring support
can be set up and allocat ed on a per t hread ( per logical processor) basis.
See Sect ion 30. 9, Performance Monit oring and I nt el Hyper-Threading Technology in
Processors Based on I nt el Net Burst
Hyper-Threading Tech-
nology Archit ect ure, and Sect ion 8. 8, Mult i- Core Archit ect ure ) .
From a soft ware programming perspect ive, cont rol t ransfer of processor operat ion is
managed at t he granularit y of logical processor ( operat ing syst ems dispat ch a
runnable t ask by allocat ing an available logical processor on t he plat form) . To
manage t he t opology of shared resources in a mult i- t hreading environment , it may
be useful for soft ware t o underst and and manage resources t hat are shared by more
t han one logical processors.
8.9.1 Hierarchical Mapping of Shared Resources
The API C_I D value associat ed wit h each logical processor in a mult i- processor
syst em is unique ( see Sect ion 8. 6, Det ect ing Hardware Mult i-Threading Support and
Topology ) . This 8- bit or 32- bit value can be decomposed int o sub- fields, where each
sub- field corresponds a hierarchical level of t he t opological mapping of hardware
resources.
The decomposit ion of an API C_I D may consist of several sub fields represent ing t he
t opology wit hin a physical processor package, t he higher- order bit s of an API C I D
may also be used by clust er vendors t o represent t he t opology of clust er nodes of
each coherent mult iprocessor syst ems. I f t he processor does not support CPUI D leaf
0BH, t he 8- bit init ial API C I D can represent 4 levels of hierarchy:
Cl ust er Some mult i- t hreading environment s consist s of mult iple clust ers of
mult i- processor syst ems. The CLUSTER_I D sub- field is usually support ed by
vendor firmware t o dist inguish different clust ers. For non- clust ered syst ems,
CLUSTER_I D is usually 0 and syst em t opology is reduced t o t hree levels of
hierarchy.
Pack age A mult i- processor syst em consist s of t wo or more socket s, each
mat es wit h a physical processor package. The PACKAGE_I D sub- field dist in-
guishes different physical packages wit hin a clust er.
Cor e A physical processor package consist s of one or more processor cores.
The CORE_I D sub- field dist inguishes processor cores in a package. For a single-
core processor, t he widt h of t his bit field is 0.
SMT A processor core provides one or more logical processors sharing
execut ion resources. The SMT_I D sub- field dist inguishes logical processors in a
core. The widt h of t his bit field is non- zero if a processor core provides more t han
one logical processors.
SMT and CORE sub- fields are bit - wise cont iguous in t he API C_I D field ( see
Figure 8- 5) .
8-50 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
I f t he processor support s CPUI D leaf 0BH, t he 32- bit API C I D can represent clust er
plus several levels of t opology wit hin t he physical processor package. The exact
number of hierarchical levels wit hin a physical processor package must be enumer-
at ed t hrough CPUI D leaf 0BH. Common processor families may employ t opology
similar t o t hat represent ed by 8- bit I nit ial API C I D. I n general, CPUI D leaf 0BH can
support t opology enumerat ion algorit hm t hat decompose a 32- bit API C I D int o more
t han four sub- fields ( see Figure 8- 6) .
The widt h of each sub- field depends on hardware and soft ware configurat ions. Field
widt hs can be det ermined at runt ime using t he algorit hm discussed below ( Example
8- 16 t hrough Example 8- 20) .
Figure 7- 6 depict s t he relat ionships of t hree of t he hierarchical sub- fields in a hypo-
t het ical MP syst em. The value of valid API C_I Ds need not be cont iguous across
package boundary or core boundaries.
Figure 8-5. Generalized Four level Interpretation of the APIC ID
Figure 8-6. Conceptual Five-level Topology and 32-bit APIC ID Composition
0
Package ID
SMT ID
X
Cluster ID
Reserved
Core ID
X=31 if x2APIC is supported
Otherwise X= 7
0
Package ID
R ID
31
Cluster ID
Reserved
Q ID
SMT ID
R
SMT
Q
Package
Physical Processor Topology 32-bit APIC ID Composition
Vol. 3 8-51
MULTIPLE-PROCESSOR MANAGEMENT
8.9.2 Hierarchical Mapping of CPUID Extended Topology Leaf
CPUI D leaf 0BH provides enumerat ion paramet ers for soft ware t o ident ify each hier-
archy of t he processor t opology in a det erminist ic manner. Each hierarchical level of
t he t opology st art ing from t he SMT level is represent ed numerically by a sub- leaf
index wit hin t he CPUI D 0BH leaf. Each level of t he t opology is mapped t o a sub- field
in t he API C I D, following t he general relat ionship depict ed in Figure 8- 6. This mech-
anism allows soft ware t o query t he exact number of levels wit hin a physical
processor package and t he bit - widt h of each sub- field of x2API C I D direct ly. For
example,
St art ing from sub- leaf index 0 and increment ing ECX unt il CPUI D. ( EAX= 0BH,
ECX= N) : ECX[ 15: 8] ret urns an invalid level t ype encoding. The number of
levels wit hin t he physical processor package is N ( excluding PACKAGE) . Using
Figure 8- 6 as an example, CPUI D. ( EAX= 0BH, ECX= 3) : ECX[ 15: 8] will report
00H, indicat ing sub leaf 03H is invalid. This is also depict ed by a pseudo code
example:
Example 8-16. Number of Levels Below the Physical Processor Package
Byte type = 1;
s = 0;
While ( type ) {
EAX = 0BH; // query each sub leaf of CPUID leaf 0BH
ECX = s;
CPUID;
type = ECX[15:8]; // examine level type encoding
s ++;
}
N = ECX[7:0];
Sub- leaf index 0 ( ECX= 0 as input ) provides enumerat ion paramet ers t o ext ract
t he SMT sub- field of x2API C I D. I f EAX = 0BH, and ECX = 0 is specified as input
when execut ing CPUI D, CPUI D. ( EAX= 0BH, ECX= 0) : EAX[ 4: 0] report s a value ( a
right - shift count ) t hat allow soft ware t o ext ract part of x2API C I D t o dist inguish
t he next higher t opological ent it ies above t he SMT level. This value also
corresponds t o t he bit - widt h of t he sub- field of x2API C I D corresponding t he
hierarchical level wit h sub- leaf index 0.
For each subsequent higher sub- leaf index m, CPUI D. ( EAX= 0BH,
ECX= m) : EAX[ 4: 0] report s t he right - shift count t hat will allow soft ware t o ext ract
part of x2API C I D t o dist inguish higher- level t opological ent it ies. This means t he
right - shift value at of sub- leaf m, corresponds t o t he least significant ( m+ 1)
subfields of t he 32- bit x2API C I D.
Example 8-17. BitWidth Determination of x2APIC ID Subfields
8-52 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
For m = 0, m < N, m ++;
{ cumulative_width[m] = CPUID.(EAX=0BH, ECX= m): EAX[4:0]; }
BitWidth[0] = cumulative_width[0];
For m = 1, m < N, m ++;
BitWidth[m] = cumulative_width[m] - cumulative_width[m-1];
Current ly, only t he following encoding of hierarchical level t ype are defined: 0
( invalid) , 1 ( SMT) , and 2 ( core) . Soft ware must not assume any level t ype encoding
value t o be relat ed t o any sub- leaf index, except sub- leaf 0.
Example 8- 16 and Example 8- 17 represent t he general t echnique for using CPUI D
leaf 0BH t o enumerat e processor t opology of more t han t wo levels of hierarchy inside
a physical package. Most processor families t o dat e requires only SMT and CORE
levels wit hin a physical package. The examples in lat er sect ions will focus on t hese
t hree- level t opology only.
8.9.3 Hierarchical ID of Logical Processors in an MP System
For I nt el 64 and I A- 32 processors, syst em hardware est ablishes an 8- bit init ial API C
I D ( or 32- bit API C I D if t he processor support s CPUI D leaf 0BH) t hat is unique for
each logical processor following power- up or RESET ( see Sect ion 8. 6.1) . Each logical
processor on t he syst em is allocat ed an init ial API C I D. BI OS may implement feat ures
t hat t ell t he OS t o support less t han t he t ot al number of logical processors on t he
syst em bus. Those logical processors t hat are not available t o applicat ions at runt ime
are halt ed during t he OS boot process. As a result , t he number valid local API C_I Ds
t hat can be queried by affinit izing- current - t hread- cont ext ( See Example 8- 22) is
limit ed t o t he number of logical processors enabled at runt ime by t he OS boot
process.
Table 8- 1 shows an example of t he 8- bit API C I Ds t hat are init ially report ed for logical
processors in a syst em wit h four I nt el Xeon MP processors t hat support I nt el Hyper-
Threading Technology ( a t ot al of 8 logical processors, each physical package has t wo
processor cores and support s I nt el Hyper-Threading Technology) . Of t he t wo logical
processors wit hin a I nt el Xeon processor MP, logical processor 0 is designat ed t he
primary logical processor and logical processor 1 as t he secondary logical processor.
Vol. 3 8-53
MULTIPLE-PROCESSOR MANAGEMENT
Table 8- 2 shows t he init ial API C I Ds for a hypot het ical sit uat ion wit h a dual processor
syst em. Each physical package providing t wo processor cores, and each processor
core also support ing I nt el Hyper-Threading Technology.
Figure 8-7. Topological Relationships between Hierarchical IDs in a Hypothetical MP
Platform
Table 8-1. Initial APIC IDs for the Logical Processors in a System that has Four Intel
Xeon MP Processors Supporting Intel Hyper-Threading Technology
1
Initial APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 1H 0H 0H
3H 1H 0H 1H
4H 2H 0H 0H
5H 2H 0H 1H
6H 3H 0H 0H
7H 3H 0H 1H
NOTE:
1. Because information on the number of processor cores in a physical package was not available
in early single-core processors supporting Intel Hyper-Threading Technology, the core ID can be
treated as 0.
Package 0
Core 0
T0 T1
Core1
T0 T1
Package 1
Core 0
T0 T1
Core1
T0 T1 SMT_ID
Core ID
Package ID
8-54 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.9.3.1 Hierarchical ID of Logical Processors with x2APIC ID
Table 8- 3 shows an example of possible x2API C I D assignment s for a dual processor
syst em t hat support x2API C. Each physical package providing four processor cores,
and each processor core also support ing I nt el Hyper-Threading Technology. Not e t hat
t he x2API C I D need not be cont iguous in t he syst em.
Table 8-2. Initial APIC IDs for the Logical Processors in a System that has Two
Physical Processors Supporting Dual-Core and Intel Hyper-Threading Technology
Initial APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 0H 1H 0H
3H 0H 1H 1H
4H 1H 0H 0H
5H 1H 0H 1H
6H 1H 1H 0H
7H 1H 1H 1H
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two
Physical Processors Supporting x2APIC and Intel Hyper-Threading Technology
x2APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 0H 1H 0H
3H 0H 1H 1H
4H 0H 2H 0H
5H 0H 2H 1H
6H 0H 3H 0H
7H 0H 3H 1H
10H 1H 0H 0H
11H 1H 0H 1H
12H 1H 1H 0H
13H 1H 1H 1H
14H 1H 2H 0H
Vol. 3 8-55
MULTIPLE-PROCESSOR MANAGEMENT
8.9.4 Algorithm for Three-Level Mappings of APIC_ID
Soft ware can gat her t he init ial API C_I Ds for each logical processor support ed by t he
operat ing syst em at runt ime
7
and ext ract ident ifiers corresponding t o t he t hree
levels of sharing t opology ( package, core, and SMT) . The t hree- level algorit hms
below focus on a non- clust ered MP syst em for simplicit y. They do not assume API C
I Ds are cont iguous or t hat all logical processors on t he plat form are enabled.
I nt el support s mult i- t hreading syst ems where all physical processors report ident ical
values in CPUI D leaf 0BH, CPUI D.1: EBX[ 23: 16] ) , CPUI D. 4
8
: EAX[ 31: 26] , and
CPUI D. 4
9
: EAX[ 25: 14] . The algorit hms below assume t he t arget syst em has
symmet ry across physical package boundaries wit h respect t o t he number of logical
processors per package, number of cores per package, and cache t opology wit hin a
package.
The ext ract ion algorit hm ( for t hree- level mappings from an API C I D) uses t he
general procedure depict ed in Example 8- 18, and is supplement ed by more det ailed
descript ions on t he derivat ion of t opology enumerat ion paramet ers for ext ract ion bit
masks:
1. Det ect hardware mult i- t hreading support in t he processor.
2. Derive a set of bit masks t hat can ext ract t he sub I D of each hierarchical level of
t he t opology. The algorit hm t o derive ext ract ion bit masks for
SMT_I D/ CORE_I D/ PACKAGE_I D differs based on API C I D is 32- bit ( see st ep 3
below) or 8- bit ( see st ep 4 below) :
3. I f t he processor support s CPUI D leaf 0BH, each API C I D cont ains a 32- bit value,
t he t opology enumerat ion paramet ers needed t o derive t hree- level ext ract ion bit
masks are:
15H 1H 2H 1H
16H 1H 3H 0H
17H 1H 3H 1H
7. As noted in Section 8.6 and Section 8.9.3, the number of logical processors supported by the OS
at runtime may be less than the total number logical processors available in the platform hard-
ware.
8. Maximum number of addressable ID for processor cores in a physical processor is obtained by
executing CPUID with EAX=4 and a valid ECX index, The ECX index start at 0.
9. Maximum number addressable ID for processor cores sharing the target cache level is obtained
by executing CPUID with EAX = 4 and the ECX index corresponding to the target cache level.
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two
Physical Processors Supporting x2APIC and Intel Hyper-Threading Technology
x2APIC ID Package ID Core ID SMT ID
8-56 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
a. Query t he right - shift value for t he SMT level of t he t opology using CPUI D leaf
0BH wit h ECX = 0H as input . The number of bit s t o shift - right on x2API C I D
( EAX[ 4: 0] ) can dist inguish different higher- level ent it ies above SMT ( e. g.
processor cores) in t he same physical package. This is also t he widt h of t he
bit mask t o ext ract t he SMT_I D.
b. Query CPUI D leaf 0BH for t he amount of bit shift t o dist inguish next higher-
level ent it ies ( e. g. physical processor packages) in t he syst em. This describes
an explicit t hree- level- t opology sit uat ion for commonly available processors.
Consult Example 8- 17 t o adapt t o sit uat ions beyond t hree- level t opology of a
physical processor. The widt h of t he ext ract ion bit mask can be used t o derive
t he cumulat ive ext ract ion bit mask t o ext ract t he sub I Ds of logical processors
( including different processor cores) in t he same physical package. The
ext ract ion bit mask t o dist inguish merely different processor cores can be
derived by xor ing t he SMT ext ract ion bit mask from t he cumulat ive
ext ract ion bit mask.
c. Query t he 32- bit x2API C I D for t he logical processor where t he current t hread
is execut ing.
d. Derive t he ext ract ion bit masks corresponding t o SMT_I D, CORE_I D, and
PACKAGE_I D, st art ing from SMT_I D.
e. Apply each ext ract ion bit mask t o t he 32- bit x2API C I D t o ext ract sub- field
I Ds.
4. I f t he processor does not support CPUI D leaf 0BH, each init ial API C I D cont ains
an 8- bit value, t he t opology enumerat ion paramet ers needed t o derive ext ract ion
bit masks are:
a. Query t he size of address space for sub I Ds t hat can accommodat e logical
processors in a physical processor package. This size paramet ers
( CPUI D. 1: EBX[ 23: 16] ) can be used t o derive t he widt h of an ext ract ion
bit mask t o enumerat e t he sub I Ds of different logical processors in t he same
physical package.
b. Query t he size of address space for sub I Ds t hat can accommodat e processor
cores in a physical processor package. This size paramet ers can be used t o
derive t he widt h of an ext ract ion bit mask t o enumerat e t he sub I Ds of
processor cores in t he same physical package.
c. Query t he 8- bit init ial API C I D for t he logical processor where t he current
t hread is execut ing.
d. Derive t he ext ract ion bit masks using respect ive address sizes corresponding
t o SMT_I D, CORE_I D, and PACKAGE_I D, st art ing from SMT_I D.
e. Apply each ext ract ion bit mask t o t he 8- bit init ial API C I D t o ext ract sub- field
I Ds.
Vol. 3 8-57
MULTIPLE-PROCESSOR MANAGEMENT
Example 8-18. Support Routines for Detecting Hardware Multi-Threading and Identifying the
Relationships Between Package, Core and Logical Processors
1. Detect support for Hardware Multi-Threading Support in a processor.
// Returns a non-zero value if CPUID reports the presence of hardware multi-threading
// support in the physical package where the current logical processor is located.
// This does not guarantee BIOS or OS will enable all logical processors in the physical
// package and make them available to applications.
// Returns zero if hardware multi-threading is not present.
#define HWMT_BIT 0x10000000
unsigned int HWMTSupported(void)
{
// ensure cpuid instruction is supported
execute cpuid with eax = 0 to get vendor string
execute cpuid with eax = 1 to get feature flag and signature
// Check to see if this a Genuine Intel Processor
if (vendor string EQ GenuineIntel) {
return (feature_flag_edx & HWMT_BIT); // bit 28
}
return 0;
}
Example 8-19. Support Routines for Identifying Package, Core and Logical Processors from
32-bit x2APIC ID
a. Derive the extraction bitmask for logical processors in a processor core and
associated mask offset for different cores.
int DeriveSMT_Mask_Offsets (void)
{
if (!HWMTSupported()) return -1;
execute cpuid with eax = 11, ECX = 0;
If (returned level type encoding in ECX[15:8] does not match SMT) return -1;
Mask_SMT_shift = EAX[4:0]; // # bits shift right of APIC ID to distinguish different cores
SMT_MASK = ~( (-1) << Mask_SMT_shift); // shift left to derive extraction bitmask for SMT_ID
return 0;
}
b. Derive the extraction bitmask for processor cores in a physical processor package
and associated mask offset for different packages.
8-58 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
int DeriveCore_Mask_Offsets (void)
{
if (!HWMTSupported()) return -1;
execute cpuid with eax = 11, ECX = 0;
while( ECX[15:8] ) { // level type encoding is valid
If (returned level type encoding in ECX[15:8] matches CORE) {
Mask_Core_shift = EAX[4:0]; // needed to distinguish different physical packages
COREPlusSMT_MASK = ~( (-1) << Mask_Core_shift);
CORE_MASK = COREPlusSMT_MASK ^ SMT_MASK;
PACKAGE_MASK = (-1) << Mask_Core_shift;
return 0
}
ECX ++;
execute cpuid with eax = 11;
}
return -1;
}
c. Query the x2APIC ID of a logical processor.
APIC_IDs for each logical processor.
unsigned char Getx2APIC_ID (void)
{
unsigned reg_edx = 0;
execute cpuid with eax = 11, ECX = 0
store returned value of edx
return (unsigned) (reg_edx) ;
}
Example 8-20. Support Routines for Identifying Package, Core and Logical Processors from 8-
bit Initial APIC ID
a. Find the size of address space for logical processors in a physical processor
package.
#define NUM_LOGICAL_BITS 0x00FF0000
// Use the mask above and CPUID.1.EBX[23:16] to obtain the max number of addressable IDs
// for logical processors in a physical package,
//Returns the size of address space of logical processors in a physical processor package;
// Software should not assume the value to be a power of 2.
Vol. 3 8-59
MULTIPLE-PROCESSOR MANAGEMENT
unsigned char MaxLPIDsPerPackage(void)
{
if (!HWMTSupported()) return 1;
execute cpuid with eax = 1
store returned value of ebx
return (unsigned char) ((reg_ebx & NUM_LOGICAL_BITS) >> 16);
}
b. Find the size of address space for processor cores in a physical processor package.
// Returns the max number of addressable IDs for processor cores in a physical processor package;
// Software should not assume cpuid reports this value to be a power of 2.
unsigned MaxCoreIDsPerPackage(void)
{
if (!HWMTSupported()) return (unsigned char) 1;
if cpuid supports leaf number 4
{ // we can retrieve multi-core topology info using leaf 4
execute cpuid with eax = 4, ecx = 0
store returned value of eax
return (unsigned) ((reg_eax >> 26) +1);
}
else // must be a single-core processor
return 1;
}
c. Query the initial APIC ID of a logical processor.
#define INITIAL_APIC_ID_BITS 0xFF000000 // CPUID.1.EBX[31:24] initial APIC ID
// Returns the 8-bit unique initial APIC ID for the processor running the code.
// Software can use OS services to affinitize the current thread to each logical processor
// available under the OS to gather the initial APIC_IDs for each logical processor.
unsigned GetInitAPIC_ID (void)
{
unsigned int reg_ebx = 0;
execute cpuid with eax = 1
store returned value of ebx
return (unsigned) ((reg_ebx & INITIAL_APIC_ID_BITS) >> 24;
}
d. Find the width of an extraction bitmask from the maximum count of the bit-field
(address size).
8-60 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
// Returns the mask bit width of a bit field from the maximum count that bit field can represent.
// This algorithm does not assume address size to have a value equal to power of 2.
// Address size for SMT_ID can be calculated from MaxLPIDsPerPackage()/MaxCoreIDsPerPackage()
// Then use the routine below to derive the corresponding width of SMT extraction bitmask
// Address size for CORE_ID is MaxCoreIDsPerPackage(),
// Derive the bitwidth for CORE extraction mask similarly
unsigned FindMaskWidth(Unsigned Max_Count)
{unsigned int mask_width, cnt = Max_Count;
__asm {
mov eax, cnt
mov ecx, 0
mov mask_width, ecx
dec eax
bsr cx, ax
jz next
inc cx
mov mask_width, ecx
next:
mov eax, mask_width
}
return mask_width;
}
e. Extract a sub ID from an 8-bit full ID, using address size of the sub ID and shift
count.
// The routine below can extract SMT_ID, CORE_ID, and PACKAGE_ID respectively from the init
APIC_ID
// To extract SMT_ID, MaxSubIDvalue is set to the address size of SMT_ID, Shift_Count = 0
// To extract CORE_ID, MaxSubIDvalue is the address size of CORE_ID, Shift_Count is width of SMT
extraction bitmask.
// Returns the value of the sub ID, this is not a zero-based value
Unsigned char GetSubID(unsigned char Full_ID, unsigned char MaxSubIDvalue, unsigned char
Shift_Count)
{
MaskWidth = FindMaskWidth(MaxSubIDValue);
MaskBits = ((uchar) (0xff << Shift_Count)) ^ ((uchar) (0xff << Shift_Count + MaskWidth)) ;
SubID = Full_ID & MaskBits;
Return SubID;
}
Vol. 3 8-61
MULTIPLE-PROCESSOR MANAGEMENT
Soft ware must not assume local API C_I D values in an MP syst em are consecut ive.
Non- consecut ive local API C_I Ds may be t he result of hardware configurat ions or
debug feat ures implement ed in t he BI OS or OS.
An ident ifier for each hierarchical level can be ext ract ed from an 8- bit API C_I D using
t he support rout ines illust rat ed in Example 8- 20. The appropriat e bit mask and shift
value t o const ruct t he appropriat e bit mask for each level must be det ermined
dynamically at runt ime.
8.9.5 Identifying Topological Relationships in a MP System
To det ect t he number of physical packages, processor cores, or ot her t opological
relat ionships in a MP syst em, t he following procedures are recommended:
Ext ract t he t hree- level ident ifiers from t he API C I D of each logical processor
enabled by syst em soft ware. The sequence is as follows ( See t he pseudo code
shown in Example 8- 21 and support rout ines shown in Example 8- 18) :
The ext ract ion st art from t he right - most bit field, corresponding t o
SMT_I D, t he innermost hierarchy in a t hree- level t opology ( See Figure
8- 7) . For t he right - most bit field, t he shift value of t he working mask is
zero. The widt h of t he bit field is det ermined dynamically using t he
maximum number of logical processor per core, which can be derived
from informat ion provided from CPUI D.
To ext ract t he next bit - field, t he shift value of t he working mask is
det ermined from t he widt h of t he bit mask of t he previous st ep. The widt h
of t he bit field is det ermined dynamically using t he maximum number of
cores per package.
To ext ract t he remaining bit - field, t he shift value of t he working mask is
det ermined from t he maximum number of logical processor per package.
So t he remaining bit s in t he API C I D ( excluding t hose bit s already
ext ract ed in t he t wo previous st eps) are ext ract ed as t he t hird ident ifier.
This applies t o a non- clust ered MP syst em, or if t here is no need t o
dist inguish bet ween PACKAGE_I D and CLUSTER_I D.
I f t here is need t o dist inguish bet ween PACKAGE_I D and CLUSTER_I D,
PACKAGE_I D can be ext ract ed using an algorit hm similar t o t he
ext ract ion of CORE_I D, assuming t he number of physical packages in
each node of a clust ered syst em is symmet ric.
Assemble t he t hree- level ident ifiers of SMT_I D, CORE_I D, PACKAGE_I Ds int o
arrays for each enabled logical processor. This is shown in Example 8- 22a.
To det ect t he number of physical packages: use PACKAGE_I D t o ident ify t hose
logical processors t hat reside in t he same physical package. This is shown in
Example 8- 22b. This example also depict s a t echnique t o const ruct a mask t o
represent t he logical processors t hat reside in t he same package.
To det ect t he number of processor cores: use CORE_I D t o ident ify t hose logical
processors t hat reside in t he same core. This is shown in Example 8- 22. This
8-62 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
example also depict s a t echnique t o const ruct a mask t o represent t he logical
processors t hat reside in t he same core.
I n Example 8- 21, t he numerical I D value can be obt ained from t he value ext ract ed
wit h t he mask by shift ing it right by shift count . Algorit hms below do not shift t he
value. The assumpt ion is t hat t he SubI D values can be compared for equivalence
wit hout t he need t o shift .
Example 8-21. Pseudo Code Depicting Three-level Extraction Algorithm
For Each local_APIC_ID{
// Calculate SMT_MASK, the bit mask pattern to extract SMT_ID,
// SMT_MASK is determined using topology enumertaion parameters
// from CPUID leaf 0BH (Example 8- 19) ;
// otherwise, SMT_MASK is determined using CPUID leaf 01H and leaf 04H (Example 8- 20).
// This algorithm assumes there is symmetry across core boundary, i.e. each core within a
// package has the same number of logical processors
// SMT_ID always starts from bit 0, corresponding to the right-most bit-field
SMT_ID = APIC_ID & SMT_MASK;
// Extract CORE_ID:
// CORE_MASK is determined in Example 8- 19 or Example 8- 20
CORE_ID = (APIC_ID & CORE_MASK) ;
// Extract PACKAGE_ID:
// Assume single cluster.
// Shift out the mask width for maximum logical processors per package
// PACKAGE_MASK is determined in Example 8- 19 or Example 8- 20
PACKAGE_ID = (APIC_ID & PACKAGE_MASK) ;
}
Example 8-22. Compute the Number of Packages, Cores, and Processor Relationships in a MP
System
a) Assemble lists of PACKAGE_ID, CORE_ID, and SMT_ID of each enabled logical processors
//The BIOS and/or OS may limit the number of logical processors available to applications
// after system boot. The below algorithm will compute topology for the processors visible
// to the thread that is computing it.
// Extract the 3-levels of IDs on every processor
// SystemAffinity is a bitmask of all the processors started by the OS. Use OS specific APIs to
// obtain it.
// ThreadAffinityMask is used to affinitize the topology enumeration thread to each processor
Vol. 3 8-63
MULTIPLE-PROCESSOR MANAGEMENT
using OS specific APIs.
// Allocate per processor arrays to store the Package_ID, Core_ID and SMT_ID for every started
// processor.
ThreadAffinityMask = 1;
ProcessorNum = 0;
while (ThreadAffinityMask != 0 && ThreadAffinityMask <= SystemAffinity) {
// Check to make sure we can utilize this processor first.
if (ThreadAffinityMask & SystemAffinity){
Set thread to run on the processor specified in ThreadAffinityMask
Wait if necessary and ensure thread is running on specified processor
APIC_ID = GetAPIC_ID(); // 32 bit ID in Example 8- 19 or 8- bit I D in Example
8- 20
Extract the Package_ID, Core_ID and SMT_ID as explained in three level extraction
algorithm of Example 8-21
PackageID[ProcessorNUM] = PACKAGE_ID;
CoreID[ProcessorNum] = CORE_ID;
SmtID[ProcessorNum] = SMT_ID;
ProcessorNum++;
}
ThreadAffinityMask <<= 1;
}
NumStartedLPs = ProcessorNum;
b) Using the list of PACKAGE_ID to count the number of physical packages in a MP system and
construct, for each package, a multi-bit mask corresponding to those logical processors residing in
the same package.
// Compute the number of packages by counting the number of processors
// with unique PACKAGE_IDs in the PackageID array.
// Compute the mask of processors in each package.
PackageIDBucket is an array of unique PACKAGE_ID values. Allocate an array of
NumStartedLPs count of entries in this array.
PackageProcessorMask is a corresponding array of the bit mask of processors belonging to
the same package, these are processors with the same PACKAGE_ID
The algorithm below assumes there is symmetry across package boundary if more than
one socket is populated in an MP system.
// Bucket Package IDs and compute processor mask for every package.
PackageNum = 1;
PackageIDBucket[0] = PackageID[0];
ProcessorMask = 1;
8-64 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
PackageProcessorMask[0] = ProcessorMask;
For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) {
ProcessorMask << = 1;
For (i=0; i < PackageNum; i++) {
// we may be comparing bit-fields of logical processors residing in different
// packages, the code below assume package symmetry
If (PackageID[ProcessorNum] == PackageIDBucket[i]) {
PackageProcessorMask[i] |= ProcessorMask;
Break; // found in existing bucket, skip to next iteration
}
}
if (i ==PackageNum) {
//PACKAGE_ID did not match any bucket, start new bucket
PackageIDBucket[i] = PackageID[ProcessorNum];
PackageProcessorMask[i] = ProcessorMask;
PackageNum++;
}
}
// PackageNum has the number of Packages started in OS
// PackageProcessorMask[] array has the processor set of each package
c) Using the list of CORE_ID to count the number of cores in a MP system and construct, for each
core, a multi-bit mask corresponding to those logical processors residing in the same core.
Processors in the same core can be determined by bucketing the processors with the same
PACKAGE_ID and CORE_ID. Note that code below can BIT OR the values of PACKGE and CORE ID
because they have not been shifted right.
The algorithm below assumes there is symmetry across package boundary if more than one socket
is populated in an MP system.
//Bucketing PACKAGE and CORE IDs and computing processor mask for every core
CoreNum = 1;
CoreIDBucket[0] = PackageID[0] | CoreID[0];
ProcessorMask = 1;
CoreProcessorMask[0] = ProcessorMask;
For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) {
ProcessorMask << = 1;
For (i=0; i < CoreNum; i++) {
// we may be comparing bit-fields of logical processors residing in different
// packages, the code below assume package symmetry
If ((PackageID[ProcessorNum] | CoreID[ProcessorNum]) == CoreIDBucket[i]) {
CoreProcessorMask[i] |= ProcessorMask;
Break; // found in existing bucket, skip to next iteration
}
Vol. 3 8-65
MULTIPLE-PROCESSOR MANAGEMENT
}
if (i == CoreNum) {
//Did not match any bucket, start new bucket
CoreIDBucket[i] = PackageID[ProcessorNum] | CoreID[ProcessorNum];
CoreProcessorMask[i] = ProcessorMask;
CoreNum++;
}
}
// CoreNum has the number of cores started in the OS
// CoreProcessorMask[] array has the processor set of each core
Ot her processor relat ionships such as processor mask of sibling cores can be
comput ed from set operat ions of t he PackageProcessorMask[ ] and CoreProcessor-
Mask[ ] .
The algorit hm shown above can be adapt ed t o work wit h earlier generat ions of
single- core I A- 32 processors t hat support I nt el Hyper-Threading Technology and in
sit uat ions t hat t he det erminist ic cache paramet er leaf is not support ed ( provided
CPUI D support s init ial API C I D) . A reference code example is available ( see I nt el 64
Archit ect ure Processor Topology Enumerat ion) .
8.10 MANAGEMENT OF IDLE AND BLOCKED CONDITIONS
When a logical processor in an MP syst em ( including mult i- core processor or proces-
sors support ing I nt el Hyper-Threading Technology) is idle ( no work t o do) or blocked
( on a lock or semaphore) , addit ional management of t he core execut ion engine
resource can be accomplished by using t he HLT ( halt ) , PAUSE, or t he
MONI TOR/ MWAI T inst ruct ions.
8.10.1 HLT Instruction
The HLT inst ruct ion st ops t he execut ion of t he logical processor on which it is
execut ed and places it in a halt ed st at e unt il furt her not ice ( see t he descript ion of t he
HLT inst ruct ion in Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2A) . When a logical processor is halt ed, act ive logical
processors cont inue t o have full access t o t he shared resources wit hin t he physical
package. Here shared resources t hat were being used by t he halt ed logical processor
become available t o act ive logical processors, allowing t hem t o execut e at great er
efficiency. When t he halt ed logical processor resumes execut ion, shared resources
are again shared among all act ive logical processors. ( See Sect ion 8. 10. 6. 3, Halt
I dle Logical Processors, for more informat ion about using t he HLT inst ruct ion wit h
processors support ing I nt el Hyper-Threading Technology. )
8-66 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.10.2 PAUSE Instruction
The PAUSE inst ruct ion can improves t he performance of processors support ing I nt el
Hyper-Threading Technology when execut ing spin- wait loops and ot her rout ines
where one t hread is accessing a shared lock or semaphore in a t ight polling loop.
When execut ing a spin- wait loop, t he processor can suffer a severe performance
penalt y when exit ing t he loop because it det ect s a possible memory order violat ion
and flushes t he core processor s pipeline. The PAUSE inst ruct ion provides a hint t o
t he processor t hat t he code sequence is a spin- wait loop. The processor uses t his hint
t o avoid t he memory order violat ion and pr event t he pipeline flush. I n addit ion, t he
PAUSE inst ruct ion de- pipelines t he spin- wait loop t o prevent it from consuming
execut ion resources excessively and consume power needlessly. ( See Sect ion
8.10. 6. 1, Use t he PAUSE I nst ruct ion in Spin-Wait Loops, for more informat ion
about using t he PAUSE inst ruct ion wit h I A- 32 processors support ing I nt el Hyper-
Threading Technology. )
8.10.3 Detecting Support MONITOR/MWAIT Instruction
St reaming SI MD Ext ensions 3 int roduced t wo inst ruct ions ( MONI TOR and MWAI T) t o
help mult it hreaded soft ware improve t hread synchronizat ion. I n t he init ial imple-
ment at ion, MONI TOR and MWAI T are available t o soft ware at ring 0. The inst ruct ions
are condit ionally available at levels great er t han 0. Use t he following st eps t o det ect
t he availabilit y of MONI TOR and MWAI T:
Use CPUI D t o query t he MONI TOR bit ( CPUI D. 1. ECX[ 3] = 1) .
I f CPUI D indicat es support , execut e MONI TOR inside a TRY/ EXCEPT except ion
handler and t rap for an except ion. I f an except ion occurs, MONI TOR and MWAI T
are not support ed at a privilege level great er t han 0. See Example 8- 23.
Example 8-23. Verifying MONITOR/MWAIT Support
boolean MONITOR_MWAIT_works = TRUE;
try {
_asm {
xor ecx, ecx
xor edx, edx
mov eax, MemArea
monitor
}
// Use monitor
} except (UNWIND) {
// if we get here, MONITOR/MWAIT is not supported
MONITOR_MWAIT_works = FALSE;
}
Vol. 3 8-67
MULTIPLE-PROCESSOR MANAGEMENT
8.10.4 MONITOR/MWAIT Instruction
Operat ing syst ems usually implement idle loops t o handle t hread synchronizat ion. I n
a t ypical idle- loop scenario, t here could be several busy loops and t hey would use a
set of memory locat ions. An impact ed processor wait s in a loop and poll a memory
locat ion t o det ermine if t here is available work t o execut e. The post ing of work is
t ypically a writ e t o memory ( t he work- queue of t he wait ing processor) . The t ime for
init iat ing a work request and get t ing it scheduled is on t he order of a few bus cycles.
From a resource sharing perspect ive ( logical processors sharing execut ion
resources) , use of t he HLT inst ruct ion in an OS idle loop is desirable but has implica-
t ions. Execut ing t he HLT inst ruct ion on a idle logical processor put s t he t arget ed
processor in a non- execut ion st at e. This requires anot her processor ( when post ing
work for t he halt ed logical processor) t o wake up t he halt ed processor using an int er-
processor int errupt . The post ing and servicing of such an int errupt int roduces a delay
in t he servicing of new work request s.
I n a shared memory configurat ion, exit s from busy loops usually occur because of a
st at e change applicable t o a specific memory locat ion; such a change t ends t o be
t riggered by writ es t o t he memory locat ion by anot her agent ( t ypically a processor) .
MONI TOR/ MWAI T complement t he use of HLT and PAUSE t o allow for efficient part i-
t ioning and un- part it ioning of shared resources among logical processors sharing
physical resources. MONI TOR set s up an effect ive address range t hat is monit ored for
writ e- t o- memory act ivit ies; MWAI T places t he processor in an opt imized st at e ( t his
may vary bet ween different implement at ions) unt il a writ e t o t he monit ored address
range occurs.
I n t he init ial implement at ion of MONI TOR and MWAI T, t hey are available at CPL = 0
only.
Bot h inst ruct ions rely on t he st at e of t he processor s monit or hardware. The monit or
hardware can be eit her armed ( by execut ing t he MONI TOR inst ruct ion) or t riggered
( due t o a variet y of event s, including a st ore t o t he monit ored memory region) . I f
upon execut ion of MWAI T, monit or hardware is in a t riggered st at e: MWAI T behaves
as a NOP and execut ion cont inues at t he next inst ruct ion in t he execut ion st ream.
The st at e of monit or hardware is not archit ect urally visible except t hrough t he
behavior of MWAI T.
Mult iple event s ot her t han a writ e t o t he t riggering address range can cause a
processor t hat execut ed MWAI T t o wake up. These include event s t hat would lead t o
volunt ary or involunt ary cont ext swit ches, such as:
Ext ernal int errupt s, including NMI , SMI , I NI T, BI NI T, MCERR, A20M#
Fault s, Abort s ( including Machine Check)
Archit ect ural TLB invalidat ions including writ es t o CR0, CR3, CR4 and cert ain MSR
writ es; execut ion of LMSW ( occurring prior t o issuing MWAI T but aft er set t ing t he
monit or)
Volunt ary t ransit ions due t o fast syst em call and far calls ( occurring prior t o
issuing MWAI T but aft er set t ing t he monit or)
8-68 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Power management relat ed event s ( such as Thermal Monit or 2 or chipset driven
STPCLK# assert ion) will not cause t he monit or event pending flag t o be cleared.
Fault s will not cause t he monit or event pending flag t o be cleared.
Soft ware should not allow for volunt ary cont ext swit ches in bet ween
MONI TOR/ MWAI T in t he inst ruct ion flow. Not e t hat execut ion of MWAI T does not re-
arm t he monit or hardware. This means t hat MONI TOR/ MWAI T need t o be execut ed in
a loop. Also not e t hat exit s from t he MWAI T st at e could be due t o a condit ion ot her
t han a writ e t o t he t riggering address; soft ware should explicit ly check t he t riggering
dat a locat ion t o det ermine if t he writ e occurred. Soft ware should also check t he value
of t he t riggering address following t he execut ion of t he monit or inst ruct ion ( and prior
t o t he execut ion of t he MWAI T inst ruct ion) . This check is t o ident ify any writ es t o t he
t riggering address t hat occurred during t he course of MONI TOR execut ion.
The address range provided t o t he MONI TOR inst ruct ion must be of writ e- back
caching t ype. Only writ e- back memory t ype st ores t o t he monit ored address range
will t rigger t he monit or hardware. I f t he address range is not in memory of writ e-
back t ype, t he address monit or hardware may not be set up properly or t he monit or
hardware may not be armed. Soft ware is also responsible for ensuring t hat
Writ es t hat are not int ended t o cause t he exit of a busy loop do not writ e t o a
locat ion wit hin t he address region being monit ored by t he monit or hardware,
Writ es int ended t o cause t he exit of a busy loop are writ t en t o locat ions wit hin t he
monit ored address region.
Not doing so will lead t o more false wakeups ( an exit from t he MWAI T st at e not due
t o a writ e t o t he int ended dat a locat ion) . These have negat ive performance implica-
t ions. I t might be necessary for soft ware t o use padding t o prevent false wakeups.
CPUI D provides a mechanism for det ermining t he size dat a locat ions for monit oring
as well as a mechanism for det ermining t he size of a t he pad.
8.10.5 Monitor/Mwait Address Range Determination
To use t he MONI TOR/ MWAI T inst ruct ions, soft ware should know t he lengt h of t he
region monit ored by t he MONI TOR/ MWAI T inst ruct ions and t he size of t he coherence
line size for cache- snoop t raffic in a mult iprocessor syst em. This informat ion can be
queried using t he CPUI D monit or leaf funct ion ( EAX = 05H) . You will need t he
smallest and largest monit or line size:
To avoid missed wake- ups: make sure t hat t he dat a st ruct ure used t o monit or
writ es fit s wit hin t he smallest monit or line- size. Ot herwise, t he processor may
not wake up aft er a writ e int ended t o t rigger an exit from MWAI T.
To avoid false wake- ups; use t he largest monit or line size t o pad t he dat a
st ruct ure used t o monit or writ es. Soft ware must make sure t hat beyond t he dat a
st ruct ure, no unrelat ed dat a variable exist s in t he t riggering area for MWAI T. A
pad may be needed t o avoid t his sit uat ion.
These above t wo values bear no relat ionship t o cache line size in t he syst em and soft -
ware should not make any assumpt ions t o t hat effect . Wit hin a single- clust er syst em,
Vol. 3 8-69
MULTIPLE-PROCESSOR MANAGEMENT
t he t wo paramet ers should default t o be t he same ( t he size of t he monit or t riggering
area is t he same as t he syst em coherence line size) .
Based on t he monit or line sizes ret urned by t he CPUI D, t he OS should dynamically
allocat e st ruct ures wit h appropriat e padding. I f st at ic dat a st ruct ures must be used
by an OS, at t empt t o adapt t he dat a st ruct ure and use a dynamically allocat ed dat a
buffer for t hread synchronizat ion. When t he lat t er t echnique is not possible, consider
not using MONI TOR/ MWAI T when using st at ic dat a st ruct ures.
To set up t he dat a st ruct ure correct ly for MONI TOR/ MWAI T on mult i- clust ered
syst ems: int eract ion bet ween processors, chipset s, and t he BI OS is required ( syst em
coherence line size may depend on t he chipset used in t he syst em; t he size could be
different from t he processor s monit or t riggering area) . The BI OS is responsible t o
set t he correct value for syst em coherence line size using t he
I A32_MONI TOR_FI LTER_LI NE_SI ZE MSR. Depending on t he relat ive magnit ude of
t he size of t he monit or t riggering area versus t he value writ t en int o t he
I A32_MONI TOR_FI LTER_LI NE_SI ZE MSR, t he smaller of t he paramet ers will be
report ed as t he Smallest Monit or Line Size. The larger of t he paramet ers will be
report ed as t he Largest Monit or Line Size.
8.10.6 Required Operating System Support
This sect ion describes changes t hat must be made t o an operat ing syst em t o run on
processors support ing I nt el Hyper-Threading Technology. I t also describes opt imiza-
t ions t hat can help an operat ing syst em make more efficient use of t he logical
processors sharing execut ion resources. The required changes and suggest ed opt i-
mizat ions are represent at ive of t he t ypes of modificat ions t hat appear in Windows*
XP and Linux* kernel 2.4. 0 operat ing syst ems for I nt el processors support ing I nt el
Hyper-Threading Technology. Addit ional opt imizat ions for processors support ing
I nt el Hyper-Threading Technology are described in t he I nt el 64 and I A- 32 Archit ec-
t ures Opt imizat ion Reference Manual.
8.10.6.1 Use the PAUSE Instruction in Spin-Wait Loops
I nt el recommends t hat a PAUSE inst ruct ion be placed in all spin- wait loops t hat run
on I nt el processors support ing I nt el Hyper-Threading Technology and mult i- core
processors.
Soft ware rout ines t hat use spin- wait loops include mult iprocessor synchronizat ion
primit ives ( spin- locks, semaphores, and mut ex variables) and idle loops. Such
rout ines keep t he processor core busy execut ing a load- compare- branch loop while a
t hread wait s for a resource t o become available. I ncluding a PAUSE inst ruct ion in such
a loop great ly improves efficiency ( see Sect ion 8.10.2, PAUSE I nst ruct ion ) . The
following rout ine gives an example of a spin- wait loop t hat uses a PAUSE inst ruct ion:
Spin_Lock:
CMP lockvar, 0 ;Check if lock is free
8-70 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
JE Get_Lock
PAUSE ;Short delay
JMP Spin_Lock
Get_Lock:
MOV EAX, 1
XCHG EAX, lockvar ;Try to get lock
CMP EAX, 0 ;Test if successful
JNE Spin_Lock
Critical_Section:
<critical section code>
MOV lockvar, 0
...
Continue:
The spin- wait loop above uses a t est , t est - and- set t echnique for det ermining t he
availabilit y of t he synchronizat ion variable. This t echnique is recommended when
writ ing spin- wait loops.
I n I A- 32 processor generat ions earlier t han t he Pent ium 4 processor, t he PAUSE
inst ruct ion is t reat ed as a NOP inst ruct ion.
8.10.6.2 Potential Usage of MONITOR/MWAIT in C0 Idle Loops
An operat ing syst em may implement different handlers for different idle st at es. A
t ypical OS idle loop on an ACPI - compat ible OS is shown in Example 8- 24:
Example 8-24. A Typical OS Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue.
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1 handler
// shown below
}
}
}
Vol. 3 8-71
MULTIPLE-PROCESSOR MANAGEMENT
// C1 handler uses a Halt instruction
VOID C1Handler()
{ STI
HLT
}
The MONITOR and MWAIT instructions may be considered for use in the C0 idle state loops, if
MONITOR and MWAIT are supported.
Example 8-25. An OS Idle Loop with MONITOR/MWAIT in the C0 Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The following example assumes that the necessary padding has been
// added surrounding WorkQueue to eliminate false wakeups
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue.
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated.
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1
// handler shown below
MONITOR WorkQueue // Setup of eax with WorkQueue
// LinearAddress,
// ECX, EDX = 0
IF (WorkQueue != 0) THEN {
MWAIT
}
}
}
}
// C1 handler uses a Halt instruction.
VOID C1Handler()
{ STI
HLT
8-72 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
}
8.10.6.3 Halt Idle Logical Processors
I f one of t wo logical processors is idle or in a spin- wait loop of long durat ion, explicit ly
halt t hat processor by means of a HLT inst ruct ion.
I n an MP syst em, operat ing syst ems can place idle processors int o a loop t hat cont in-
uously checks t he run queue for runnable soft ware t asks. Logical processors t hat
execut e idle loops consume a significant amount of cores execut ion resources t hat
might ot herwise be used by t he ot her logical processors in t he physical package. For
t his reason, halt ing idle logical processors opt imizes t he performance.
10
I f all logical
processors wit hin a physical package are halt ed, t he processor will ent er a power-
saving st at e.
8.10.6.4 Potential Usage of MONITOR/MWAIT in C1 Idle Loops
An operat ing syst em may also consider replacing HLT wit h MONI TOR/ MWAI T in it s C1
idle loop. An example is shown in Example 8- 26:
Example 8-26. An OS Idle Loop with MONITOR/MWAIT in the C1 Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The following example assumes that the necessary padding has been
// added surrounding WorkQueue to eliminate false wakeups
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1
// handler shown below
}
}
}
// C1 handler uses a Halt instruction
VOID C1Handler()
10. Excessive transitions into and out of the HALT state could also incur performance penalties.
Operating systems should evaluate the performance trade-offs for their operating system.
Vol. 3 8-73
MULTIPLE-PROCESSOR MANAGEMENT
{
MONITOR WorkQueue // Setup of eax with WorkQueue LinearAddress,
// ECX, EDX = 0
IF (WorkQueue != 0) THEN {
STI
MWAIT // EAX, ECX = 0
}
}
8.10.6.5 Guidelines for Scheduling Threads on Logical Processors Sharing
Execution Resources
Because t he logical processors, t he order in which t hreads are dispat ched t o logical
processors for execut ion can affect t he overall efficiency of a syst em. The following
guidelines are recommended for scheduling t hreads for execut ion.
Dispat ch t hreads t o one logical processor per processor core before dispat ching
t hreads t o t he ot her logical processor sharing execut ion resources in t he same
processor core.
I n an MP syst em wit h t wo or more physical packages, dist ribut e t hreads out over
all t he physical processors, rat her t han concent rat e t hem in one or t wo physical
processors.
Use processor affinit y t o assign a t hread t o a specific processor core or package,
depending on t he cache- sharing t opology. The pract ice increases t he chance t hat
t he processor s caches will cont ain some of t he t hreads code and dat a when it is
dispat ched for execut ion aft er being suspended.
8.10.6.6 Eliminate Execution-Based Timing Loops
I nt el discourages t he use of t iming loops t hat depend on a processor s execut ion
speed t o measure t ime. There are several reasons:
Timing loops cause problems when t hey are calibrat ed on a I A- 32 processor
running at one clock speed and t hen execut ed on a processor running at anot her
clock speed.
Rout ines for calibrat ing execut ion- based t iming loops produce unpredict able
result s when run on an I A- 32 processor support ing I nt el Hyper-Threading
Technology. This is due t o t he sharing of execut ion resources bet ween t he logical
processors wit hin a physical package.
To avoid t he problems described, t iming loop rout ines must use a t iming mechanism
for t he loop t hat does not depend on t he execut ion speed of t he logical processors in
t he syst em. The following sources are generally available:
A high resolut ion syst em t imer ( for example, an I nt el 8254) .
8-74 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
A high resolut ion t imer wit hin t he processor ( such as, t he local API C t imer or t he
t ime- st amp count er) .
For addit ional informat ion, see t he I nt el 64 and I A- 32 Archit ect ures Opt imizat ion
Reference Manual.
8.10.6.7 Place Locks and Semaphores in Aligned, 128-Byte Blocks of
Memory
When soft ware uses locks or semaphores t o synchronize processes, t hreads, or ot her
code sect ions; I nt el recommends t hat only one lock or semaphore be present wit hin
a cache line ( or 128 byt e sect or, if 128- byt e sect or is support ed) . I n processors based
on I nt el Net Burst microarchit ect ure ( which support 128- byt e sect or consist ing of t wo
cache lines) , following t his recommendat ion means t hat each lock or semaphore
should be cont ained in a 128- byt e block of memory t hat begins on a 128- byt e
boundary. The pract ice minimizes t he bus t raffic required t o service locks.
Vol. 3 9-1
CHAPTER 9
PROCESSOR MANAGEMENT AND INITIALIZATION
This chapt er describes t he facilit ies provided for managing processor wide funct ions
and for init ializing t he processor. The subj ect s covered include: processor init ializa-
t ion, x87 FPU init ializat ion, processor configurat ion, feat ure det erminat ion, mode
swit ching, t he MSRs ( in t he Pent ium, P6 family, Pent ium 4, and I nt el Xeon proces-
sors) , and t he MTRRs ( in t he P6 family, Pent ium 4, and I nt el Xeon processors) .
9.1 INITIALIZATION OVERVIEW
Following power- up or an assert ion of t he RESET# pin, each processor on t he syst em
bus performs a hardware init ializat ion of t he processor ( known as a hardware reset )
and an opt ional built - in self- t est ( BI ST) . A hardware reset set s each processor s
regist ers t o a known st at e and places t he processor in real- address mode. I t also
invalidat es t he int ernal caches, t ranslat ion lookaside buffers ( TLBs) and t he branch
t arget buffer ( BTB) . At t his point , t he act ion t aken depends on t he processor family:
Pent i um 4 and I nt el Xeon pr ocessor s All t he processors on t he syst em bus
( including a single processor in a uniprocessor syst em) execut e t he mult iple
processor ( MP) init ializat ion prot ocol. The processor t hat is select ed t hrough t his
prot ocol as t he boot st rap processor ( BSP) t hen immediat ely st art s execut ing
soft ware- init ializat ion code in t he current code segment beginning at t he offset in
t he EI P regist er. The applicat ion ( non- BSP) processors ( APs) go int o a Wait For
St art up I PI ( SI PI ) st at e while t he BSP is execut ing init ializat ion code. See Sect ion
8. 4, Mult iple- Processor ( MP) I nit ializat ion, for more det ails. Not e t hat in a
uniprocessor syst em, t he single Pent ium 4 or I nt el Xeon processor aut omat ically
becomes t he BSP.
P6 f ami l y pr ocessor s The act ion t aken is t he same as for t he Pent ium 4 and
I nt el Xeon processors ( as described in t he previous paragraph) .
Pent i um pr ocessor s I n eit her a single- or dual- processor syst em, a single
Pent ium processor is always pre- designat ed as t he primary processor. Following
a reset , t he primary processor behaves as follows in bot h single- and dual-
processor syst ems. Using t he dual- processor ( DP) ready init ializat ion prot ocol,
t he primary processor immediat ely st art s execut ing soft ware- init ializat ion code
in t he current code segment beginning at t he offset in t he EI P regist er. The
secondary processor ( if t here is one) goes int o a halt st at e.
I nt el 486 pr ocessor The primary processor ( or single processor in a unipro-
cessor syst em) immediat ely st art s execut ing soft ware- init ializat ion code in t he
current code segment beginning at t he offset in t he EI P regist er. ( The I nt el486
does not aut omat ically execut e a DP or MP init ializat ion prot ocol t o det ermine
which processor is t he primary processor. )
9-2 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
The soft ware- init ializat ion code performs all syst em- specific init ializat ion of t he BSP
or primary processor and t he syst em logic.
At t his point , for MP ( or DP) syst ems, t he BSP ( or primary) processor wakes up each
AP ( or secondary) processor t o enable t hose processors t o execut e self- configurat ion
code.
When all processors are init ialized, configured, and synchronized, t he BSP or primary
processor begins execut ing an init ial operat ing- syst em or execut ive t ask.
The x87 FPU is also init ialized t o a known st at e during hardware reset . x87 FPU soft -
ware init ializat ion code can t hen be execut ed t o perform operat ions such as set t ing
t he precision of t he x87 FPU and t he except ion masks. No special init ializat ion of t he
x87 FPU is required t o swit ch operat ing modes.
Assert ing t he I NI T# pin on t he processor invokes a similar response t o a hardware
reset . The maj or difference is t hat during an I NI T, t he int ernal caches, MSRs, MTRRs,
and x87 FPU st at e are left unchanged ( alt hough, t he TLBs and BTB are invalidat ed as
wit h a hardware reset ) . An I NI T provides a met hod for swit ching from prot ect ed t o
real- address mode while maint aining t he cont ent s of t he int ernal caches.
9.1.1 Processor State After Reset
Table 9- 1 shows t he st at e of t he flags and ot her regist ers following power- up for t he
Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors. The st at e of cont rol
regist er CR0 is 60000010H ( see Figure 9- 1) . This places t he processor is in real-
address mode wit h paging disabled.
9.1.2 Processor Built-In Self-Test (BIST)
Hardware may request t hat t he BI ST be performed at power- up. The EAX regist er is
cleared ( 0H) if t he processor passes t he BI ST. A nonzero value in t he EAX regist er
aft er t he BI ST indicat es t hat a processor fault was det ect ed. I f t he BI ST is not
request ed, t he cont ent s of t he EAX regist er aft er a hardware reset is 0H.
The overhead for performing a BI ST varies bet ween processor families. For example,
t he BI ST t akes approximat ely 30 million processor clock periods t o execut e on t he
Pent ium 4 processor. This clock count is model- specific; I nt el reserves t he right t o
change t he number of periods for any I nt el 64 or I A- 32 processor, wit hout not ificat ion.
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
EFLAGS
1
00000002H 00000002H 00000002H
EIP 0000FFF0H 0000FFF0H 0000FFF0H
CR0 60000010H
2
60000010H
2
60000010H
2
Vol. 3 9-3
PROCESSOR MANAGEMENT AND INITIALIZATION
CR2, CR3, CR4 00000000H 00000000H 00000000H
CS Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
SS, DS, ES, FS, GS Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
EDX 00000FxxH 000n06xxH
3
000005xxH
EAX 0
4
0
4
0
4
EBX, ECX, ESI, EDI,
EBP, ESP
00000000H 00000000H 00000000H
ST0 through ST7
5
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
x87 FPU Control
Word
5
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
x87 FPU Status
Word
5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Tag
Word
5
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
x87 FPU Data
Operand and CS
Seg. Selectors
5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Data
Operand and Inst.
Pointers
5
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
MM0 through
MM7
5
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium II and Pentium III
Processors Only
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium with MMX
Technology Only
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
XMM0 through
XMM7
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
Pentium III processor Only
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
NA
MXCSR Pwr up or Reset: 1F80H
INIT: Unchanged
Pentium III processor only-
Pwr up or Reset: 1F80H
INIT: Unchanged
NA
GDTR, IDTR Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
9-4 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
LDTR, Task
Register
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
DR0, DR1, DR2,
DR3
00000000H 00000000H 00000000H
DR6 FFFF0FF0H FFFF0FF0H FFFF0FF0H
DR7 00000400H 00000400H 00000400H
Time-Stamp
Counter
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Perf. Counters and
Event Select
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
All Other MSRs Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Data and Code
Cache, TLBs
Invalid Invalid Invalid
Fixed MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Variable MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Machine-Check
Architecture
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Not Implemented
APIC Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
NOTES:
1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software
should not depend on the states of any of these bits.
2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared.
3. Where n is the Extended Model Value for the respective processor.
4. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST
cannot be invoked during an INIT.)
5. The state of the x87 FPU and MMX registers is not changed by the execution of an INIT.
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
Vol. 3 9-5
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.3 Model and Stepping Information
Following a hardware reset , t he EDX regist er cont ains component ident ificat ion and
revision informat ion ( see Figure 9- 2) . For example, t he model, family, and processor
t ype ret urned for t he first processor in t he I nt el Pent ium 4 family is as follows: model
( 0000B) , family ( 1111B) , and processor t ype ( 00B) .
The st epping I D field cont ains a unique ident ifier for t he processor s st epping I D or
revision level. The ext ended family and ext ended model fields were added t o t he
I A- 32 archit ect ure in t he Pent ium 4 processors.
Figure 9-1. Contents of CR0 Register after Reset
Figure 9-2. Version Information in the EDX Register after Reset
External x87 FPU error reporting: 0
(Not used): 1
No task switch: 0
x87 FPU instructions not trapped: 0
WAIT/FWAIT instructions not trapped: 0
Real-address mode: 0
31 19 16 15 0
P
E
1 2 3 4 5 6 17 18 28 29 30
M
P
E
M
1
N
E
T
S
P
G
C
D
N
W
W
P
A
M
Paging disabled: 0
Alignment check disabled: 0
Caching disabled: 1
Not write-through disabled: 1
Write-protect disabled: 0
Reserved Reserved
31 12 11 8 7 4 3 0
EDX
Family (1111B for the Pentium 4 Processor Family)
Model (Beginning with 0000B)
13 14
Processor Type
Model Family
Stepping
ID
15
Model
Extended
Extended
Family
16 19 20 23 24
Reserved
9-6 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.4 First Instruction Executed
The first inst ruct ion t hat is fet ched and execut ed following a hardware reset is
locat ed at physical address FFFFFFF0H. This address is 16 byt es below t he
processor s uppermost physical address. The EPROM cont aining t he soft ware-
init ializat ion code must be locat ed at t his address.
The address FFFFFFF0H is beyond t he 1- MByt e addressable range of t he processor
while in real- address mode. The processor is init ialized t o t his st art ing address as
follows. The CS regist er has t wo part s: t he visible segment select or part and t he
hidden base address part . I n real- address mode, t he base address is normally
formed by shift ing t he 16- bit segment select or value 4 bit s t o t he left t o produce a
20- bit base address. However, during a hardware reset , t he segment select or in t he
CS regist er is loaded wit h F000H and t he base address is loaded wit h FFFF0000H. The
st art ing address is t hus formed by adding t he base address t o t he value in t he EI P
regist er ( t hat is, FFFF0000 + FFF0H = FFFFFFF0H) .
The first t ime t he CS regist er is loaded wit h a new value aft er a hardware reset , t he
processor will follow t he normal rule for address t ranslat ion in real- address mode
( t hat is, [ CS base address = CS segment select or * 16] ) . To insure t hat t he base
address in t he CS regist er remains unchanged unt il t he EPROM based soft ware-
init ializat ion code is complet ed, t he code must not cont ain a far j ump or far call or
allow an int errupt t o occur ( which would cause t he CS select or value t o be changed) .
9.2 X87 FPU INITIALIZATION
Soft ware- init ializat ion code can det ermine t he whet her t he processor cont ains an
x87 FPU by using t he CPUI D inst ruct ion. The code must t hen init ialize t he x87 FPU
and set flags in cont rol regist er CR0 t o reflect t he st at e of t he x87 FPU environment .
A hardware reset places t he x87 FPU in t he st at e shown in Table 9- 1. This st at e is
different from t he st at e t he x87 FPU is placed in following t he execut ion of an FI NI T
or FNI NI T inst ruct ion ( also shown in Table 9- 1) . I f t he x87 FPU is t o be used, t he soft -
ware- init ializat ion code should execut e an FI NI T/ FNI NI T inst ruct ion following a hard-
ware reset . These inst ruct ions, t ag all dat a regist ers as empt y, clear all t he except ion
masks, set t he TOP- of- st ack value t o 0, and select t he default rounding and precision
cont rols set t ing ( round t o nearest and 64- bit precision) .
I f t he processor is reset by assert ing t he I NI T# pin, t he x87 FPU st at e is not changed.
9.2.1 Configuring the x87 FPU Environment
I nit ializat ion code must load t he appropriat e values int o t he MP, EM, and NE flags of
cont rol regist er CR0. These bit s are cleared on hardware reset of t he processor.
Figure 9- 2 shows t he suggest ed set t ings for t hese flags, depending on t he I A- 32
processor being init ialized. I nit ializat ion code can t est for t he t ype of processor
present before set t ing or clearing t hese flags.
Vol. 3 9-7
PROCESSOR MANAGEMENT AND INITIALIZATION
The EM flag det ermines whet her float ing- point inst ruct ions are execut ed by t he x87
FPU ( EM is cleared) or a device- not - available except ion ( # NM) is generat ed for all
float ing- point inst ruct ions so t hat an except ion handler can emulat e t he float ing-
point operat ion ( EM = 1) . Ordinarily, t he EM flag is cleared when an x87 FPU or mat h
coprocessor is present and set if t hey are not present . I f t he EM flag is set and no x87
FPU, mat h coprocessor, or float ing- point emulat or is present , t he processor will hang
when a float ing- point inst ruct ion is execut ed.
The MP flag det ermines whet her WAI T/ FWAI T inst ruct ions react t o t he set t ing of t he
TS flag. I f t he MP flag is clear, WAI T/ FWAI T inst ruct ions ignore t he set t ing of t he TS
flag; if t he MP flag is set , t hey will generat e a device- not - available except ion ( # NM)
if t he TS flag is set . Generally, t he MP flag should be set for processors wit h an int e-
grat ed x87 FPU and clear for processors wit hout an int egrat ed x87 FPU and wit hout a
mat h coprocessor present . However, an operat ing syst em can choose t o save t he
float ing- point cont ext at every cont ext swit ch, in which case t here would be no need
t o set t he MP bit .
Table 2- 1 shows t he act ions t aken for float ing- point and WAI T/ FWAI T inst ruct ions
based on t he set t ings of t he EM, MP, and TS flags.
The NE flag det ermines whet her unmasked float ing- point except ions are handled by
generat ing a float ing- point error except ion int ernally ( NE is set , nat ive mode) or
t hrough an ext ernal int errupt ( NE is cleared) . I n syst ems where an ext ernal int errupt
cont roller is used t o invoke numeric except ion handlers ( such as MS- DOS- based
syst ems) , t he NE bit should be cleared.
9.2.2 Setting the Processor for x87 FPU Software Emulation
Set t ing t he EM flag causes t he processor t o generat e a device- not - available excep-
t ion ( # NM) and t rap t o a soft ware except ion handler whenever it encount ers a
float ing- point inst ruct ion. ( Table 9- 2 shows when it is appropriat e t o use t his flag. )
Set t ing t his flag has t wo funct ions:
Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors
EM MP NE IA-32 processor
1 0 1 Intel486 SX, Intel386 DX, and Intel386 SX processors
only, without the presence of a math coprocessor.
0 1 1 or 0
*
Pentium 4, Intel Xeon, P6 family, Pentium, Intel486 DX, and
Intel 487 SX processors, and Intel386 DX and Intel386 SX
processors when a companion math coprocessor is present.
0 1 1 or 0
*
More recent Intel 64 or IA-32 processors
NOTE:
* The setting of the NE flag depends on the operating system being used.
9-8 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
I t allows x87 FPU code t o run on an I A- 32 processor t hat has neit her an
int egrat ed x87 FPU nor is connect ed t o an ext ernal mat h coprocessor, by using a
float ing- point emulat or.
I t allows float ing- point code t o be execut ed using a special or nonst andard
float ing- point emulat or, select ed for a part icular applicat ion, regardless of
whet her an x87 FPU or mat h coprocessor is present .
To emulat e float ing- point inst ruct ions, t he EM, MP, and NE flag in cont rol regist er CR0
should be set as shown in Table 9- 3.
Regardless of t he value of t he EM bit , t he I nt el486 SX processor generat es a device-
not - available except ion ( # NM) upon encount ering any float ing- point inst ruct ion.
9.3 CACHE ENABLING
I A- 32 processors ( beginning wit h t he I nt el486 processor) and I nt el 64 processors
cont ain int ernal inst ruct ion and dat a caches. These caches are enabled by clearing
t he CD and NW flags in cont rol regist er CR0. ( They are set during a hardware reset . )
Because all int ernal cache lines are invalid following reset init ializat ion, it is not
necessary t o invalidat e t he cache before enabling caching. Any ext ernal caches may
require init ializat ion and invalidat ion using a syst em- specific init ializat ion and invali-
dat ion code sequence.
Depending on t he hardware and operat ing syst em or execut ive requirement s, addi-
t ional configurat ion of t he processor s caching facilit ies will probably be required.
Beginning wit h t he I nt el486 processor, page- level caching can be cont rolled wit h t he
PCD and PWT flags in page- direct ory and page- t able ent ries. Beginning wit h t he P6
family processors, t he memory t ype range regist ers ( MTRRs) cont rol t he caching
charact erist ics of t he regions of physical memory. ( For t he I nt el486 and Pent ium
processors, ext ernal hardware can be used t o cont rol t he caching charact erist ics of
regions of physical memory. ) See Chapt er 11, Memory Cache Cont rol, for det ailed
informat ion on configurat ion of t he caching facilit ies in t he Pent ium 4, I nt el Xeon, and
P6 family processors and syst em memory.
Table 9-3. Software Emulation Settings of EM, MP, and NE Flags
CR0 Bit Value
EM 1
MP 0
NE 1
Vol. 3 9-9
PROCESSOR MANAGEMENT AND INITIALIZATION
9.4 MODEL-SPECIFIC REGISTERS (MSRS)
Most I A- 32 processors ( st art ing from Pent ium processors) and I nt el 64 processors
cont ain a model- specific regist ers ( MSRs) . A given MSR may not be support ed across
all families and models for I nt el 64 and I A- 32 processors. Some MSRs are designat ed
as archit ect ural t o simplify soft ware programming; a feat ure int roduced by an archi-
t ect ural MSR is expect ed t o be support ed in fut ure processors. Non- archit ect ural
MSRs are not guarant eed t o be support ed or t o have t he same funct ions on fut ure
processors.
MSRs t hat provide cont rol for a number of hardware and soft ware- relat ed feat ures,
include:
Performance- monit oring count ers ( see Chapt er 20, I nt roduct ion t o Virt ual-
Machine Ext ensions ) .
Debug ext ensions ( see Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext en-
sions. ) .
Machine- check except ion capabilit y and it s accompanying machine- check archi-
t ect ure ( see Chapt er 15, Machine- Check Archit ect ure ) .
MTRRs ( see Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) ) .
Thermal and power management .
I nst ruct ion- specific support ( for example: SYSENTER, SYSEXI T, SWAPGS, et c. ) .
Processor feat ure/ mode support ( for example: I A32_EFER,
I A32_FEATURE_CONTROL) .
The MSRs can be read and writ t en t o using t he RDMSR and WRMSR inst ruct ions,
respect ively.
When performing soft ware init ializat ion of an I A- 32 or I nt el 64 processor, many of
t he MSRs will need t o be init ialized t o set up t hings like performance- monit oring
event s, run- t ime machine checks, and memory t ypes for physical memory.
List s of available performance- monit oring event s are given in Appendix A, Perfor-
mance Monit oring Event s , and list s of available MSRs are given in Appendix B,
Model- Specific Regist ers ( MSRs) The references earlier in t his sect ion show where
t he funct ions of t he various groups of MSRs are described in t his manual.
9.5 MEMORY TYPE RANGE REGISTERS (MTRRS)
Memory t ype range regist ers ( MTRRs) were int roduced int o t he I A- 32 archit ect ure
wit h t he Pent ium Pro processor. They allow t he t ype of caching ( or no caching) t o be
specified in syst em memory for select ed physical address ranges. They allow
memory accesses t o be opt imized for various t ypes of memory such as RAM, ROM,
frame buffer memory, and memory- mapped I / O devices.
I n general, init ializing t he MTRRs is normally handled by t he soft ware init ializat ion
code or BI OS and is not an operat ing syst em or execut ive funct ion. At t he very least ,
9-10 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
all t he MTRRs must be cleared t o 0, which select s t he uncached ( UC) memory t ype.
See Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) , for det ailed informa-
t ion on t he MTRRs.
9.6 INITIALIZING SSE/SSE2/SSE3/SSSE3 EXTENSIONS
For processors t hat cont ain SSE/ SSE2/ SSE3/ SSSE3 ext ensions, st eps must be t aken
when init ializing t he processor t o allow execut ion of t hese inst ruct ions.
1. Check t he CPUI D feat ure flags for t he presence of t he SSE/ SSE2/ SSE3/ SSSE3
ext ensions ( respect ively: EDX bit s 25 and 26, ECX bit 0 and 9) and support for
t he FXSAVE and FXRSTOR inst ruct ions ( EDX bit 24) . Also check for support for
t he CLFLUSH inst ruct ion ( EDX bit 19) . The CPUI D feat ure flags are loaded in t he
EDX and ECX regist ers when t he CPUI D inst ruct ion is execut ed wit h a 1 in t he
EAX regist er.
2. Set t he OSFXSR flag ( bit 9 in cont rol regist er CR4) t o indicat e t hat t he operat ing
syst em support s saving and rest oring t he SSE/ SSE2/ SSE3/ SSSE3 execut ion
environment ( XXM and MXCSR regist ers) wit h t he FXSAVE and FXRSTOR inst ruc-
t ions, respect ively. See Sect ion 2.5, Cont rol Regist ers, for a descript ion of t he
OSFXSR flag.
3. Set t he OSXMMEXCPT flag ( bit 10 in cont rol regist er CR4) t o indicat e t hat t he
operat ing syst em support s t he handling of SSE/ SSE2/ SSE3 SI MD float ing- point
except ions ( # XF) . See Sect ion 2. 5, Cont rol Regist ers, for a descript ion of t he
OSXMMEXCPT flag.
4. Set t he mask bit s and flags in t he MXCSR regist er according t o t he mode of
operat ion desired for SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ions. See
MXCSR Cont rol and St at us Regist er in Chapt er 10, Programming wit h
St reaming SI MD Ext ensions ( SSE) , of t he I nt el 64 and I A- 32 Archit ect ures
Soft ware Developers Manual, Volume 1, for a det ailed descript ion of t he bit s and
flags in t he MXCSR regist er.
9.7 SOFTWARE INITIALIZATION FOR REAL-ADDRESS
MODE OPERATION
Following a hardware reset ( eit her t hrough a power- up or t he assert ion of t he
RESET# pin) t he processor is placed in real- address mode and begins execut ing soft -
ware init ializat ion code from physical address FFFFFFF0H. Soft ware init ializat ion code
must first set up t he necessary dat a st ruct ures for handling basic syst em funct ions,
such as a real- mode I DT for handling int errupt s and except ions. I f t he processor is t o
remain in real- address mode, soft ware must t hen load addit ional operat ing- syst em
or execut ive code modules and dat a st ruct ures t o allow reliable execut ion of applica-
t ion programs in real- address mode.
I f t he processor is going t o operat e in prot ect ed mode, soft ware must load t he neces-
sary dat a st ruct ures t o operat e in prot ect ed mode and t hen swit ch t o prot ect ed
Vol. 3 9-11
PROCESSOR MANAGEMENT AND INITIALIZATION
mode. The prot ect ed- mode dat a st ruct ures t hat must be loaded are described in
Sect ion 9. 8, Soft ware I nit ializat ion for Prot ect ed- Mode Operat ion.
9.7.1 Real-Address Mode IDT
I n real- address mode, t he only syst em dat a st ruct ure t hat must be loaded int o
memory is t he I DT ( also called t he int errupt vect or t able ) . By default , t he address
of t he base of t he I DT is physical address 0H. This address can be changed by using
t he LI DT inst ruct ion t o change t he base address value in t he I DTR. Soft ware init ial-
izat ion code needs t o load int errupt - and except ion- handler point ers int o t he I DT
before int errupt s can be enabled.
The act ual int errupt - and except ion- handler code can be cont ained eit her in EPROM
or RAM; however, t he code must be locat ed wit hin t he 1- MByt e addressable range of
t he processor in real- address mode. I f t he handler code is t o be st ored in RAM, it
must be loaded along wit h t he I DT.
9.7.2 NMI Interrupt Handling
The NMI int errupt is always enabled ( except when mult iple NMI s are nest ed) . I f t he
I DT and t he NMI int errupt handler need t o be loaded int o RAM, t here will be a period
of t ime following hardware reset when an NMI int errupt cannot be handled. During
t his t ime, hardware must provide a mechanism t o prevent an NMI int errupt from
halt ing code execut ion unt il t he I DT and t he necessary NMI handler soft ware is
loaded. Here are t wo examples of how NMI s can be handled during t he init ial st at es
of processor init ializat ion:
A simple I DT and NMI int errupt handler can be provided in EPROM. This allows an
NMI int errupt t o be handled immediat ely aft er reset init ializat ion.
The syst em hardware can provide a mechanism t o enable and disable NMI s by
passing t he NMI # signal t hrough an AND gat e cont rolled by a flag in an I / O port .
Hardware can clear t he flag when t he processor is reset , and soft ware can set t he
flag when it is ready t o handle NMI int errupt s.
9.8 SOFTWARE INITIALIZATION FOR PROTECTED-MODE
OPERATION
The processor is placed in real- address mode following a hardware reset . At t his
point in t he init ializat ion process, some basic dat a st ruct ures and code modules must
be loaded int o physical memory t o support furt her init ializat ion of t he processor, as
described in Sect ion 9.7, Soft ware I nit ializat ion for Real-Address Mode Operat ion.
Before t he processor can be swit ched t o prot ect ed mode, t he soft ware init ializat ion
code must load a minimum number of prot ect ed mode dat a st ruct ures and code
9-12 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
modules int o memory t o support reliable operat ion of t he processor in prot ect ed
mode. These dat a st ruct ures include t he following:
A I DT.
A GDT.
A TSS.
( Opt ional) An LDT.
I f paging is t o be used, at least one page direct ory and one page t able.
A code segment t hat cont ains t he code t o be execut ed when t he processor
swit ches t o prot ect ed mode.
One or more code modules t hat cont ain t he necessary int errupt and except ion
handlers.
Soft ware init ializat ion code must also init ialize t he following syst em regist ers before
t he processor can be swit ched t o prot ect ed mode:
The GDTR.
( Opt ional. ) The I DTR. This regist er can also be init ialized immediat ely aft er
swit ching t o prot ect ed mode, prior t o enabling int errupt s.
Cont rol regist ers CR1 t hrough CR4.
( Pent ium 4, I nt el Xeon, and P6 family processors only. ) The memory t ype range
regist ers ( MTRRs) .
Wit h t hese dat a st ruct ures, code modules, and syst em regist ers init ialized, t he
processor can be swit ched t o prot ect ed mode by loading cont rol regist er CR0 wit h a
value t hat set s t he PE flag ( bit 0) .
9.8.1 Protected-Mode System Data Structures
The cont ent s of t he prot ect ed- mode syst em dat a st ruct ures loaded int o memory
during soft ware init ializat ion, depend largely on t he t ype of memory management
t he prot ect ed- mode operat ing- syst em or execut ive is going t o support : flat , flat wit h
paging, segment ed, or segment ed wit h paging.
To implement a flat memory model wit hout paging, soft ware init ializat ion code must
at a minimum load a GDT wit h one code and one dat a- segment descript or. A null
descript or in t he first GDT ent ry is also required. The st ack can be placed in a normal
read/ writ e dat a segment , so no dedicat ed descript or for t he st ack is required. A flat
memory model wit h paging also requires a page direct ory and at least one page t able
( unless all pages are 4 MByt es in which case only a page direct ory is required) . See
Sect ion 9.8. 3, I nit ializing Paging.
Before t he GDT can be used, t he base address and limit for t he GDT must be loaded
int o t he GDTR regist er using an LGDT inst ruct ion.
A mult i- segment ed model may require addit ional segment s for t he operat ing syst em,
as well as segment s and LDTs for each applicat ion program. LDTs require segment
Vol. 3 9-13
PROCESSOR MANAGEMENT AND INITIALIZATION
descript ors in t he GDT. Some operat ing syst ems allocat e new segment s and LDTs as
t hey are needed. This provides maximum flexibilit y for handling a dynamic program-
ming environment . However, many operat ing syst ems use a single LDT for all t asks,
allocat ing GDT ent ries in advance. An embedded syst em, such as a process
cont roller, might pre- allocat e a fixed number of segment s and LDTs for a fixed
number of applicat ion programs. This would be a simple and efficient way t o st ruc-
t ure t he soft ware environment of a real- t ime syst em.
9.8.2 Initializing Protected-Mode Exceptions and Interrupts
Soft ware init ializat ion code must at a minimum load a prot ect ed- mode I DT wit h gat e
descript or for each except ion vect or t hat t he processor can generat e. I f int errupt or
t rap gat es are used, t he gat e descript ors can all point t o t he same code segment ,
which cont ains t he necessary except ion handlers. I f t ask gat es are used, one TSS
and accompanying code, dat a, and t ask segment s are required for each except ion
handler called wit h a t ask gat e.
I f hardware allows int errupt s t o be generat ed, gat e descript ors must be provided in
t he I DT for one or more int errupt handlers.
Before t he I DT can be used, t he base address and limit for t he I DT must be loaded
int o t he I DTR regist er using an LI DT inst ruct ion. This operat ion is t ypically carried out
immediat ely aft er swit ching t o prot ect ed mode.
9.8.3 Initializing Paging
Paging is cont rolled by t he PG flag in cont rol regist er CR0. When t his flag is clear ( it s
st at e following a hardware reset ) , t he paging mechanism is t urned off; when it is set ,
paging is enabled. Before set t ing t he PG flag, t he following dat a st ruct ures and regis-
t ers must be init ialized:
Soft ware must load at least one page direct ory and one page t able int o physical
memory. The page t able can be eliminat ed if t he page direct ory cont ains a
direct ory ent ry point ing t o it self ( here, t he page direct ory and page t able reside
in t he same page) , or if only 4- MByt e pages are used.
Cont rol regist er CR3 ( also called t he PDBR regist er) is loaded wit h t he physical
base address of t he page direct ory.
( Opt ional) Soft ware may provide one set of code and dat a descript ors in t he GDT
or in an LDT for supervisor mode and anot her set for user mode.
Wit h t his paging init ializat ion complet e, paging is enabled and t he processor is
swit ched t o prot ect ed mode at t he same t ime by loading cont rol regist er CR0 wit h an
image in which t he PG and PE flags are set . ( Paging cannot be enabled before t he
processor is swit ched t o prot ect ed mode. )
9-14 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.8.4 Initializing Multitasking
I f t he mult it asking mechanism is not going t o be used and changes bet ween privilege
levels are not allowed, it is not necessary load a TSS int o memory or t o init ialize t he
t ask regist er.
I f t he mult it asking mechanism is going t o be used and/ or changes bet ween privilege
levels are allowed, soft ware init ializat ion code must load at least one TSS and an
accompanying TSS descript or. ( A TSS is required t o change privilege levels because
point ers t o t he privileged- level 0, 1, and 2 st ack segment s and t he st ack point ers for
t hese st acks are obt ained from t he TSS. ) TSS descript ors must not be marked as
busy when t hey are creat ed; t hey should be marked busy by t he processor only as a
side- effect of performing a t ask swit ch. As wit h descript ors for LDTs, TSS descript ors
reside in t he GDT.
Aft er t he processor has swit ched t o prot ect ed mode, t he LTR inst ruct ion can be used
t o load a segment select or for a TSS descript or int o t he t ask regist er. This inst ruct ion
marks t he TSS descript or as busy, but does not perform a t ask swit ch. The processor
can, however, use t he TSS t o locat e point ers t o privilege- level 0, 1, and 2 st acks. The
segment select or for t he TSS must be loaded before soft ware performs it s first t ask
swit ch in prot ect ed mode, because a t ask swit ch copies t he current t ask st at e int o
t he TSS.
Aft er t he LTR inst ruct ion has been execut ed, furt her operat ions on t he t ask regist er
are performed by t ask swit ching. As wit h ot her segment s and LDTs, TSSs and TSS
descript ors can be eit her pre- allocat ed or allocat ed as needed.
9.8.5 Initializing IA-32e Mode
On I nt el 64 processors, t he I A32_EFER MSR is cleared on syst em reset . The oper-
at ing syst em must be in prot ect ed mode wit h paging enabled before at t empt ing t o
init ialize I A- 32e mode. I A- 32e mode operat ion also requires physical- address ext en-
sions wit h four levels of enhanced paging st ruct ures ( see Sect ion 4. 5, I A- 32e
Paging ) .
Operat ing syst ems should follow t his sequence t o init ialize I A- 32e mode:
1. St art ing from prot ect ed mode, disable paging by set t ing CR0.PG = 0. Use t he
MOV CR0 inst ruct ion t o disable paging ( t he inst ruct ion must be locat ed in an
ident it y- mapped page) .
2. Enable physical- address ext ensions ( PAE) by set t ing CR4.PAE = 1. Failure t o
enable PAE will result in a # GP fault when an at t empt is made t o init ialize I A- 32e
mode.
3. Load CR3 wit h t he physical base address of t he Level 4 page map t able ( PML4) .
4. Enable I A- 32e mode by set t ing I A32_EFER. LME = 1.
5. Enable paging by set t ing CR0.PG = 1. This causes t he processor t o set t he
I A32_EFER. LMA bit t o 1. The MOV CR0 inst ruct ion t hat enables paging and t he
Vol. 3 9-15
PROCESSOR MANAGEMENT AND INITIALIZATION
following inst ruct ions must be locat ed in an ident it y- mapped page ( unt il such
t ime t hat a branch t o non- ident it y mapped pages can be effect ed) .
64- bit mode paging t ables must be locat ed in t he first 4 GByt es of physical- address
space prior t o act ivat ing I A- 32e mode. This is necessary because t he MOV CR3
inst ruct ion used t o init ialize t he page- direct ory base must be execut ed in legacy
mode prior t o act ivat ing I A- 32e mode ( set t ing CR0. PG = 1 t o enable paging) .
Because MOV CR3 is execut ed in prot ect ed mode, only t he lower 32 bit s of t he
regist er are writ t en, limit ing t he t able locat ion t o t he low 4 GByt es of memory. Soft -
ware can relocat e t he page t ables anywhere in physical memory aft er I A- 32e mode
is act ivat ed.
The processor performs 64- bit mode consist ency checks whenever soft ware
at t empt s t o modify any of t he enable bit s direct ly involved in act ivat ing I A- 32e mode
( I A32_EFER. LME, CR0.PG, and CR4. PAE) . I t will generat e a general prot ect ion fault
( # GP) if consist ency checks fail. 64- bit mode consist ency checks ensure t hat t he
processor does not ent er an undefined mode or st at e wit h unpredict able behavior.
64- bit mode consist ency checks fail in t he following circumst ances:
An at t empt is made t o enable or disable I A- 32e mode while paging is enabled.
I A- 32e mode is enabled and an at t empt is made t o enable paging prior t o
enabling physical- address ext ensions ( PAE) .
I A- 32e mode is act ive and an at t empt is made t o disable physical- address
ext ensions ( PAE) .
I f t he current CS has t he L- bit set on an at t empt t o act ivat e I A- 32e mode.
I f t he TR cont ains a 16- bit TSS.
9.8.5.1 IA-32e Mode System Data Structures
Aft er act ivat ing I A- 32e mode, t he syst em- descript or- t able regist ers ( GDTR, LDTR,
I DTR, TR) cont inue t o reference legacy prot ect ed- mode descript or t ables. Tables
referenced by t he descript ors all reside in t he lower 4 GByt es of linear- address space.
Aft er act ivat ing I A- 32e mode, 64- bit operat ing- syst ems should use t he LGDT, LLDT,
LI DT, and LTR inst ruct ions t o load t he syst em- descript or- t able regist ers wit h refer-
ences t o 64- bit descript or t ables.
9.8.5.2 IA-32e Mode Interrupts and Exceptions
Soft ware must not allow except ions or int errupt s t o occur bet ween t he t ime I A- 32e
mode is act ivat ed and t he updat e of t he int errupt - descript or- t able regist er ( I DTR)
t hat est ablishes references t o a 64- bit int errupt - descript or t able ( I DT) . This is
because t he I DT remains in legacy form immediat ely aft er I A- 32e mode is act ivat ed.
I f an int errupt or except ion occurs prior t o updat ing t he I DTR, a legacy 32- bit int er-
rupt gat e will be referenced and int erpret ed as a 64- bit int errupt gat e wit h unpredict -
able result s. Ext ernal int errupt s can be disabled by using t he CLI inst ruct ion.
Non- maskable int errupt s ( NMI ) must be disabled using ext ernal hardware.
9-16 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.8.5.3 64-bit Mode and Compatibility Mode Operation
I A- 32e mode uses t wo code segment - descript or bit s ( CS.L and CS.D, see Figure 3- 8)
t o cont rol t he operat ing modes aft er I A- 32e mode is init ialized. I f CS.L = 1 and CS.D =
0, t he processor is running in 64- bit mode. Wit h t his encoding, t he default operand
size is 32 bit s and default address size is 64 bit s. Using inst ruct ion prefixes, operand
size can be changed t o 64 bit s or 16 bit s; address size can be changed t o 32 bit s.
When I A- 32e mode is act ive and CS. L = 0, t he processor operat es in compat ibilit y
mode. I n t his mode, CS. D cont rols default operand and address sizes exact ly as it
does in t he I A- 32 archit ect ure. Set t ing CS. D = 1 specifies default operand and
address size as 32 bit s. Clearing CS. D t o 0 specifies default operand and address size
as 16 bit s ( t he CS. L = 1, CS. D = 1 bit combinat ion is reserved) .
Compat ibilit y mode execut ion is select ed on a code- segment basis. This mode allows
legacy applicat ions t o coexist wit h 64- bit applicat ions running in 64- bit mode. An
operat ing syst em running in I A- 32e mode can execut e exist ing 16- bit and 32- bit
applicat ions by clearing t heir code- segment descript or s CS. L bit t o 0.
I n compat ibilit y mode, t he following syst em- level mechanisms cont inue t o operat e
using t he I A- 32e- mode archit ect ural semant ics:
Linear- t o- physical address t ranslat ion uses t he 64- bit mode ext ended page-
t ranslat ion mechanism.
I nt errupt s and except ions are handled using t he 64- bit mode mechanisms.
Syst em calls ( calls t hrough call gat es and SYSENTER/ SYSEXI T) are handled using
t he I A- 32e mode mechanisms.
9.8.5.4 Switching Out of IA-32e Mode Operation
To ret urn from I A- 32e mode t o paged- prot ect ed mode operat ion. Operat ing syst ems
must use t he following sequence:
1. Swit ch t o compat ibilit y mode.
2. Deact ivat e I A- 32e mode by clearing CR0.PG = 0. This causes t he processor t o set
I A32_EFER. LMA = 0. The MOV CR0 inst ruct ion used t o disable paging and
subsequent inst ruct ions must be locat ed in an ident it y- mapped page.
3. Load CR3 wit h t he physical base address of t he legacy page- t able- direct ory base
address.
4. Disable I A- 32e mode by set t ing I A32_EFER. LME = 0.
5. Enable legacy paged- prot ect ed mode by set t ing CR0. PG = 1
6. A branch inst ruct ion must follow t he MOV CR0 t hat enables paging. Bot h t he MOV
CR0 and t he branch inst ruct ion must be locat ed in an ident it y- mapped page.
Regist ers only available in 64- bit mode ( R8- R15 and XMM8-XMM15) are preserved
across t ransit ions from 64- bit mode int o compat ibilit y mode t hen back int o 64- bit
mode. However, values of R8- R15 and XMM8-XMM15 are undefined aft er t ransit ions
Vol. 3 9-17
PROCESSOR MANAGEMENT AND INITIALIZATION
from 64- bit mode t hrough compat ibilit y mode t o legacy or real mode and t hen back
t hrough compat ibilit y mode t o 64- bit mode.
9.9 MODE SWITCHING
To use t he processor in prot ect ed mode aft er hardware or soft ware reset , a mode
swit ch must be performed from real- address mode. Once in prot ect ed mode, soft -
ware generally does not need t o ret urn t o real- address mode. To run soft ware writ t en
t o run in real- address mode ( 8086 mode) , it is generally more convenient t o run t he
soft ware in virt ual- 8086 mode, t han t o swit ch back t o real- address mode.
9.9.1 Switching to Protected Mode
Before swit ching t o prot ect ed mode from real mode, a minimum set of syst em dat a
st ruct ures and code modules must be loaded int o memory, as described in Sect ion
9. 8, Soft ware I nit ializat ion for Prot ect ed- Mode Operat ion. Once t hese t ables are
creat ed, soft ware init ializat ion code can swit ch int o prot ect ed mode.
Prot ect ed mode is ent ered by execut ing a MOV CR0 inst ruct ion t hat set s t he PE flag
in t he CR0 regist er. ( I n t he same inst ruct ion, t he PG flag in regist er CR0 can be set t o
enable paging. ) Execut ion in prot ect ed mode begins wit h a CPL of 0.
I nt el 64 and I A- 32 processors have slight ly different requirement s for swit ching t o
prot ect ed mode. To insure upwards and downwards code compat ibilit y wit h I nt el 64
and I A- 32 processors, we r ecommend t hat you follow t hese st eps:
1. Disable int errupt s. A CLI inst ruct ion disables maskable hardware int errupt s. NMI
int errupt s can be disabled wit h ext ernal circuit ry. ( Soft ware must guarant ee t hat
no except ions or int errupt s are generat ed during t he mode swit ching operat ion. )
2. Execut e t he LGDT inst ruct ion t o load t he GDTR regist er wit h t he base address of
t he GDT.
3. Execut e a MOV CR0 inst ruct ion t hat set s t he PE flag ( and opt ionally t he PG flag)
in cont rol regist er CR0.
4. I mmediat ely following t he MOV CR0 inst ruct ion, execut e a far JMP or far CALL
inst ruct ion. ( This operat ion is t ypically a far j ump or call t o t he next inst ruct ion in
t he inst ruct ion st ream. )
5. The JMP or CALL inst ruct ion immediat ely aft er t he MOV CR0 inst ruct ion changes
t he flow of execut ion and serializes t he processor.
6. I f paging is enabled, t he code for t he MOV CR0 inst ruct ion and t he JMP or CALL
inst ruct ion must come from a page t hat is ident it y mapped ( t hat is, t he linear
address before t he j ump is t he same as t he physical address aft er paging and
prot ect ed mode is enabled) . The t arget inst ruct ion for t he JMP or CALL inst ruct ion
does not need t o be ident it y mapped.
9-18 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
7. I f a local descript or t able is going t o be used, execut e t he LLDT inst ruct ion t o load
t he segment select or for t he LDT in t he LDTR regist er.
8. Execut e t he LTR inst ruct ion t o load t he t ask regist er wit h a segment select or t o
t he init ial prot ect ed- mode t ask or t o a writ able area of memory t hat can be used
t o st ore TSS informat ion on a t ask swit ch.
9. Aft er ent ering prot ect ed mode, t he segment regist ers cont inue t o hold t he
cont ent s t hey had in real- address mode. The JMP or CALL inst ruct ion in st ep 4
reset s t he CS regist er. Perform one of t he following operat ions t o updat e t he
cont ent s of t he remaining segment regist ers.
Reload segment regist ers DS, SS, ES, FS, and GS. I f t he ES, FS, and/ or GS
regist ers are not going t o be used, load t hem wit h a null select or.
Perform a JMP or CALL inst ruct ion t o a new t ask, which aut omat ically reset s
t he values of t he segment regist ers and branches t o a new code segment .
10. Execut e t he LI DT inst ruct ion t o load t he I DTR regist er wit h t he address and limit
of t he prot ect ed- mode I DT.
11. Execut e t he STI inst ruct ion t o enable maskable hardware int errupt s and perform
t he necessary hardware operat ion t o enable NMI int errupt s.
Random failures can occur if ot her inst ruct ions exist bet ween st eps 3 and 4 above.
Failures will be readily seen in some sit uat ions, such as when inst ruct ions t hat refer-
ence memory are insert ed bet ween st eps 3 and 4 while in syst em management
mode.
9.9.2 Switching Back to Real-Address Mode
The processor swit ches from prot ect ed mode back t o real- address mode if soft ware
clears t he PE bit in t he CR0 regist er wit h a MOV CR0 inst ruct ion. A procedure t hat re-
ent ers real- address mode should perform t he following st eps:
1. Disable int errupt s. A CLI inst ruct ion disables maskable hardware int errupt s. NMI
int errupt s can be disabled wit h ext ernal circuit ry.
2. I f paging is enabled, perform t he following operat ions:
Transfer program cont rol t o linear addresses t hat are ident it y mapped t o
physical addresses ( t hat is, linear addresses equal physical addresses) .
I nsure t hat t he GDT and I DT are in ident it y mapped pages.
Clear t he PG bit in t he CR0 regist er.
Move 0H int o t he CR3 regist er t o flush t he TLB.
3. Transfer program cont rol t o a readable segment t hat has a limit of 64 KByt es
( FFFFH) . This operat ion loads t he CS regist er wit h t he segment limit required in
real- address mode.
Vol. 3 9-19
PROCESSOR MANAGEMENT AND INITIALIZATION
4. Load segment regist ers SS, DS, ES, FS, and GS wit h a select or for a descript or
cont aining t he following values, which are appropriat e for real- address mode:
Limit = 64 KByt es ( 0FFFFH)
Byt e granular ( G = 0)
Expand up ( E = 0)
Writ able ( W = 1)
Present ( P = 1)
Base = any value
5. The segment regist ers must be loaded wit h non- null segment select ors or t he
segment regist ers will be unusable in real- address mode. Not e t hat if t he
segment regist ers are not reloaded, execut ion cont inues using t he descript or
at t ribut es loaded during prot ect ed mode.
6. Execut e an LI DT inst ruct ion t o point t o a real- address mode int errupt t able t hat is
wit hin t he 1- MByt e real- address mode address range.
7. Clear t he PE flag in t he CR0 regist er t o swit ch t o real- address mode.
8. Execut e a far JMP inst ruct ion t o j ump t o a real- address mode program. This
operat ion flushes t he inst ruct ion queue and loads t he appropriat e base and
access right s values in t he CS regist er.
9. Load t he SS, DS, ES, FS, and GS regist ers as needed by t he real- address mode
code. I f any of t he regist ers are not going t o be used in real- address mode, writ e
0s t o t hem.
10. Execut e t he STI inst ruct ion t o enable maskable hardware int errupt s and perform
t he necessary hardware operat ion t o enable NMI int errupt s.
NOTE
All t he code t hat is execut ed in st eps 1 t hrough 9 must be in a single
page and t he linear addresses in t hat page must be ident it y mapped
t o physical addresses.
9.10 INITIALIZATION AND MODE SWITCHING EXAMPLE
This sect ion provides an init ializat ion and mode swit ching example t hat can be incor-
porat ed int o an applicat ion. This code was originally writ t en t o init ialize t he I nt el386
processor, but it will execut e successfully on t he Pent ium 4, I nt el Xeon, P6 family,
Pent ium, and I nt el486 processors. The code in t his example is int ended t o reside in
EPROM and t o run following a hardware reset of t he processor. The funct ion of t he
code is t o do t he following:
Est ablish a basic real- address mode operat ing environment .
Load t he necessary prot ect ed- mode syst em dat a st ruct ures int o RAM.
9-20 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Load t he syst em regist ers wit h t he necessary point ers t o t he dat a st ruct ures and
t he appropriat e flag set t ings for prot ect ed- mode operat ion.
Swit ch t he processor t o prot ect ed mode.
Figure 9- 3 shows t he physical memory layout for t he processor following a hardware
reset and t he st art ing point of t his example. The EPROM t hat cont ains t he init ializa-
t ion code resides at t he upper end of t he processor s physical memory address range,
st art ing at address FFFFFFFFH and going down from t here. The address of t he first
inst ruct ion t o be execut ed is at FFFFFFF0H, t he default st art ing address for t he
processor following a hardware reset .
The main st eps carried out in t his example are summarized in Table 9- 4. The source
list ing for t he example ( wit h t he filename STARTUP. ASM) is given in Example 9- 1.
The line numbers given in Table 9- 4 refer t o t he source list ing.
The following are some addit ional not es concerning t his example:
When t he processor is swit ched int o prot ect ed mode, t he original code segment
base- address value of FFFF0000H ( locat ed in t he hidden part of t he CS regist er)
is ret ained and execut ion cont inues from t he current offset in t he EI P regist er.
The processor will t hus cont inue t o execut e code in t he EPROM unt il a far j ump or
call is made t o a new code segment , at which t ime, t he base address in t he CS
regist er will be changed.
Maskable hardware int errupt s are disabled aft er a hardware reset and should
remain disabled unt il t he necessary int errupt handlers have been inst alled. The
NMI int errupt is not disabled following a reset . The NMI # pin must t hus be
inhibit ed from being assert ed unt il an NMI handler has been loaded and made
available t o t he processor.
The use of a t emporary GDT allows simple t ransfer of t ables from t he EPROM t o
anywhere in t he RAM area. A GDT ent ry is const ruct ed wit h it s base point ing t o
address 0 and a limit of 4 GByt es. When t he DS and ES regist ers are loaded wit h
t his descript or, t he t emporary GDT is no longer needed and can be replaced by
t he applicat ion GDT.
This code loads one TSS and no LDTs. I f more TSSs exist in t he applicat ion, t hey
must be loaded int o RAM. I f t here are LDTs t hey may be loaded as well.
Vol. 3 9-21
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-3. Processor State After Reset
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing
STARTUP.ASM Line
Numbers
Description
From To
157 157 Jump (short) to the entry code in the EPROM
162 169 Construct a temporary GDT in RAM with one entry:
0 - null
1 - R/W data segment, base = 0, limit = 4 GBytes
171 172 Load the GDTR to point to the temporary GDT
174 177 Load CR0 with PE flag set to switch to protected mode
179 181 Jump near to clear real mode instruction queue
184 186 Load DS, ES registers with GDT[1] descriptor, so both point to the
entire physical memory space
0
FFFF FFFFH
After Reset
[CS.BASE+EIP] FFFF FFF0H
EIP = 0000 FFF0H
[SP, DS, SS, ES]
FFFF 0000H
64K EPROM
CS.BASE = FFFF 0000H
DS.BASE = 0H
ES.BASE = 0H
SS.BASE = 0H
ESP = 0H
9-22 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.1 Assembler Usage
I n t his example, t he I nt el assembler ASM386 and build t ools BLD386 are used t o
assemble and build t he init ializat ion code module. The following assumpt ions are
used when using t he I nt el ASM386 and BLD386 t ools.
The ASM386 will generat e t he right operand size opcodes according t o t he code-
segment at t ribut e. The at t ribut e is assigned eit her by t he ASM386 invocat ion
cont rols or in t he code- segment definit ion.
I f a code segment t hat is going t o run in real- address mode is defined, it must be
set t o a USE 16 at t ribut e. I f a 32- bit operand is used in an inst ruct ion in t his code
segment ( for example, MOV EAX, EBX) , t he assembler aut omat ically generat es
an operand prefix for t he inst ruct ion t hat forces t he processor t o execut e a 32- bit
operat ion, even t hough it s default code- segment at t ribut e is 16- bit .
I nt el' s ASM386 assembler allows specific use of t he 16- or 32- bit inst ruct ions, for
example, LGDTW, LGDTD, I RETD. I f t he generic inst ruct ion LGDT is used, t he
default - segment at t ribut e will be used t o generat e t he right opcode.
188 195 Perform specific board initialization that is imposed by the new
protected mode
196 218 Copy the application's GDT from ROM into RAM
220 238 Copy the application's IDT from ROM into RAM
241 243 Load application's GDTR
244 245 Load application's IDTR
247 261 Copy the application's TSS from ROM into RAM
263 267 Update TSS descriptor and other aliases in GDT (GDT alias or IDT
alias)
277 277 Load the task register (without task switch) using LTR instruction
282 286 Load SS, ESP with the value found in the application's TSS
287 287 Push EFLAGS value found in the application's TSS
288 288 Push CS value found in the application's TSS
289 289 Push EIP value found in the application's TSS
290 293 Load DS, ES with the value found in the application's TSS
296 296 Perform IRET; pop the above values and enter the application code
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing (Contd.)
STARTUP.ASM Line
Numbers
Description
From To
Vol. 3 9-23
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.2 STARTUP.ASM Listing
Example 9- 1 provides high- level sample code designed t o move t he processor int o
prot ect ed mode. This list ing does not include any opcode and offset informat ion.
Example 9-1. STARTUP.ASM
MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP 09:44:51 08/19/92
PAGE 1
MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE
STARTUP
OBJECT MODULE PLACED IN startup.obj
ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.a58 pw (132 )
LINE SOURCE
1 NAME STARTUP
2
3 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
4 ;
5 ; ASSUMPTIONS:
6 ;
7 ; 1. The bottom 64K of memory is ram, and can be used for
8 ; scratch space by this module.
9 ;
10 ; 2. The system has sufficient free usable ram to copy the
11 ; initial GDT, IDT, and TSS
12 ;
13 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
14
15 ; configuration data - must match with build definition
16
17 CS_BASE EQU 0FFFF0000H
18
19 ; CS_BASE is the linear address of the segment STARTUP_CODE
20 ; - this is specified in the build language file
21
22 RAM_START EQU 400H
23
24 ; RAM_START is the start of free, usable ram in the linear
25 ; memory space. The GDT, IDT, and initial TSS will be
26 ; copied above this space, and a small data segment will be
27 ; discarded at this linear address. The 32-bit word at
9-24 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
28 ; RAM_START will contain the linear address of the first
29 ; free byte above the copied tables - this may be useful if
30 ; a memory manager is used.
31
32 TSS_INDEX EQU 10
33
34 ; TSS_INDEX is the index of the TSS of the first task to
35 ; run after startup
36
37
38 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
39
40 ; ------------------------- STRUCTURES and EQU ---------------
41 ; structures for system data
42
43 ; TSS structure
44 TASK_STATE STRUC
45 link DW ?
46 link_h DW ?
47 ESP0 DD ?
48 SS0 DW ?
49 SS0_h DW ?
50 ESP1 DD ?
51 SS1 DW ?
52 SS1_h DW ?
53 ESP2 DD ?
54 SS2 DW ?
55 SS2_h DW ?
56 CR3_reg DD ?
57 EIP_reg DD ?
58 EFLAGS_regDD ?
59 EAX_reg DD ?
60 ECX_reg DD ?
61 EDX_reg DD ?
62 EBX_reg DD ?
63 ESP_reg DD ?
64 EBP_reg DD ?
65 ESI_reg DD ?
66 EDI_reg DD ?
67 ES_reg DW ?
68 ES_h DW ?
69 CS_reg DW ?
70 CS_h DW ?
Vol. 3 9-25
PROCESSOR MANAGEMENT AND INITIALIZATION
71 SS_reg DW ?
72 SS_h DW ?
73 DS_reg DW ?
74 DS_h DW ?
75 FS_reg DW ?
76 FS_h DW ?
77 GS_reg DW ?
78 GS_h DW ?
79 LDT_reg DW ?
80 LDT_h DW ?
81 TRAP_reg DW ?
82 IO_map_baseDW ?
83 TASK_STATE ENDS
84
85 ; basic structure of a descriptor
86 DESC STRUC
87 lim_0_15 DW ?
88 bas_0_15 DW ?
89 bas_16_23DB ?
90 access DB ?
91 gran DB ?
92 bas_24_31DB ?
93 DESC ENDS
94
95 ; structure for use with LGDT and LIDT instructions
96 TABLE_REG STRUC
97 table_limDW ?
98 table_linearDD ?
99 TABLE_REG ENDS
100
101 ; offset of GDT and IDT descriptors in builder generated GDT
102 GDT_DESC_OFF EQU 1*SIZE(DESC)
103 IDT_DESC_OFF EQU 2*SIZE(DESC)
104
105 ; equates for building temporary GDT in RAM
106 LINEAR_SEL EQU 1*SIZE (DESC)
107 LINEAR_PROTO_LO EQU 00000FFFFH ; LINEAR_ALIAS
108 LINEAR_PROTO_HI EQU 000CF9200H
109
110 ; Protection Enable Bit in CR0
111 PE_BIT EQU 1B
112
113 ; ------------------------------------------------------------
9-26 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
114
115 ; ------------------------- DATA SEGMENT----------------------
116
117 ; Initially, this data segment starts at linear 0, according
118 ; to the processors power-up state.
119
120 STARTUP_DATA SEGMENT RW
121
122 free_mem_linear_base LABEL DWORD
123 TEMP_GDT LABEL BYTE ; must be first in segment
124 TEMP_GDT_NULL_DESC DESC <>
125 TEMP_GDT_LINEAR_DESC DESC <>
126
127 ; scratch areas for LGDT and LIDT instructions
128 TEMP_GDT_SCRATCH TABLE_REG <>
129 APP_GDT_RAM TABLE_REG <>
130 APP_IDT_RAM TABLE_REG <>
131 ; align end_data
132 fill DW ?
133
134 ; last thing in this segment - should be on a dword boundary
135 end_data LABEL BYTE
136
137 STARTUP_DATA ENDS
138 ; ------------------------------------------------------------
139
140
141 ; ------------------------- CODE SEGMENT----------------------
142 STARTUP_CODE SEGMENT ER PUBLIC USE16
143
144 ; filled in by builder
145 PUBLIC GDT_EPROM
146 GDT_EPROM TABLE_REG <>
147
148 ; filled in by builder
149 PUBLIC IDT_EPROM
150 IDT_EPROM TABLE_REG <>
151
152 ; entry point into startup code - the bootstrap will vector
153 ; here with a near JMP generated by the builder. This
154 ; label must be in the top 64K of linear memory.
155
156 PUBLIC STARTUP
157 STARTUP:
158
Vol. 3 9-27
PROCESSOR MANAGEMENT AND INITIALIZATION
159 ; DS,ES address the bottom 64K of flat linear memory
160 ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA
161 ; See Figure 9-4
162 ; load GDTR with temporary GDT
163 LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram,
164 MOV DWORD PTR [EBX],0 ; where we can address
165 MOV DWORD PTR [EBX]+4,0
166 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO
167 MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI
168 MOV TEMP_GDT_scratch.table_linear,EBX
169 MOV TEMP_GDT_scratch.table_lim,15
170
171 DB 66H; execute a 32 bit LGDT
172 LGDT TEMP_GDT_scratch
173
174 ; enter protected mode
175 MOV EBX,CR0
176 OR EBX,PE_BIT
177 MOV CR0,EBX
178
179 ; clear prefetch queue
180 JMP CLEAR_LABEL
181 CLEAR_LABEL:
182
183 ; make DS and ES address 4G of linear memory
184 MOV CX,LINEAR_SEL
185 MOV DS,CX
186 MOV ES,CX
187
188 ; do board specific initialization
189 ;
190 ;
191 ; ......
192 ;
193
194
195 ; See Figure 9-5
196 ; copy EPROM GDT to ram at:
197 ; RAM_START + size (STARTUP_DATA)
198 MOV EAX,RAM_START
199 ADD EAX,OFFSET (end_data)
200 MOV EBX,RAM_START
9-28 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
201 MOV ECX, CS_BASE
202 ADD ECX, OFFSET (GDT_EPROM)
203 MOV ESI, [ECX].table_linear
204 MOV EDI,EAX
205 MOVZX ECX, [ECX].table_lim
206 MOV APP_GDT_ram[EBX].table_lim,CX
207 INC ECX
208 MOV EDX,EAX
209 MOV APP_GDT_ram[EBX].table_linear,EAX
210 ADD EAX,ECX
211 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
212
213 ; fixup GDT base in descriptor
214 MOV ECX,EDX
215 MOV [EDX].bas_0_15+GDT_DESC_OFF,CX
216 ROR ECX,16
217 MOV [EDX].bas_16_23+GDT_DESC_OFF,CL
218 MOV [EDX].bas_24_31+GDT_DESC_OFF,CH
219
220 ; copy EPROM IDT to ram at:
221 ; RAM_START+size(STARTUP_DATA)+SIZE (EPROM GDT)
222 MOV ECX, CS_BASE
223 ADD ECX, OFFSET (IDT_EPROM)
224 MOV ESI, [ECX].table_linear
225 MOV EDI,EAX
226 MOVZX ECX, [ECX].table_lim
227 MOV APP_IDT_ram[EBX].table_lim,CX
228 INC ECX
229 MOV APP_IDT_ram[EBX].table_linear,EAX
230 MOV EBX,EAX
231 ADD EAX,ECX
232 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
233
234 ; fixup IDT pointer in GDT
235 MOV [EDX].bas_0_15+IDT_DESC_OFF,BX
236 ROR EBX,16
237 MOV [EDX].bas_16_23+IDT_DESC_OFF,BL
238 MOV [EDX].bas_24_31+IDT_DESC_OFF,BH
239
240 ; load GDTR and IDTR
241 MOV EBX,RAM_START
242 DB 66H ; execute a 32 bit LGDT
243 LGDT APP_GDT_ram[EBX]
244 DB 66H ; execute a 32 bit LIDT
245 LIDT APP_IDT_ram[EBX]
Vol. 3 9-29
PROCESSOR MANAGEMENT AND INITIALIZATION
246
247 ; move the TSS
248 MOV EDI,EAX
249 MOV EBX,TSS_INDEX*SIZE(DESC)
250 MOV ECX,GDT_DESC_OFF ;build linear address for TSS
251 MOV GS,CX
252 MOV DH,GS:[EBX].bas_24_31
253 MOV DL,GS:[EBX].bas_16_23
254 ROL EDX,16
255 MOV DX,GS:[EBX].bas_0_15
256 MOV ESI,EDX
257 LSL ECX,EBX
258 INC ECX
259 MOV EDX,EAX
260 ADD EAX,ECX
261 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
262
263 ; fixup TSS pointer
264 MOV GS:[EBX].bas_0_15,DX
265 ROL EDX,16
266 MOV GS:[EBX].bas_24_31,DH
267 MOV GS:[EBX].bas_16_23,DL
268 ROL EDX,16
269 ;save start of free ram at linear location RAMSTART
270 MOV free_mem_linear_base+RAM_START,EAX
271
272 ;assume no LDT used in the initial task - if necessary,
273 ;code to move the LDT could be added, and should resemble
274 ;that used to move the TSS
275
276 ; load task register
277 LTR BX ; No task switch, only descriptor loading
278 ; See Figure 9-6
279 ; load minimal set of registers necessary to simulate task
280 ; switch
281
282
283 MOV AX,[EDX].SS_reg ; start loading registers
284 MOV EDI,[EDX].ESP_reg
285 MOV SS,AX
286 MOV ESP,EDI ; stack now valid
287 PUSH DWORD PTR [EDX].EFLAGS_reg
288 PUSH DWORD PTR [EDX].CS_reg
9-30 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
289 PUSH DWORD PTR [EDX].EIP_reg
290 MOV AX,[EDX].DS_reg
291 MOV BX,[EDX].ES_reg
292 MOV DS,AX ; DS and ES no longer linear memory
293 MOV ES,BX
294
295 ; simulate far jump to initial task
296 IRETD
297
298 STARTUP_CODE ENDS
*** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED
INSTRUCTION(S)
299
300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA
301
302
ASSEMBLY COMPLETE, 1 WARNING, NO ERRORS.
Vol. 3 9-31
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines
162-172 of List File)
FFFF FFFFH
Base=0, Limit=4G
START: [CS.BASE+EIP]
TEMP_GDT
Jump near start
FFFF 0000H
Construct TEMP_GDT
LGDT
Move to protected mode
DS, ES = GDT[1] 4 GB
0
GDT [1]
GDT [0]
GDT_SCRATCH
Base
Limit
9-32 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List
File)
FFFF FFFFH
GDT RAM
Move the GDT, IDT, TSS
Fix Aliases
LTR
0
RAM_START
TSS
IDT
GDT
TSS RAM
IDT RAM
from ROM to RAM
Vol. 3 9-33
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.3 MAIN.ASM Source Code
The file MAI N. ASM shown in Example 9- 2 defines t he dat a and st ack segment s for
t his applicat ion and can be subst it ut ed wit h t he main module t ask writ t en in a high-
level language t hat is invoked by t he I RET inst ruct ion execut ed by STARTUP. ASM.
Example 9-2. MAIN.ASM
NAME main_module
data SEGMENT RW
dw 1000 dup(?)
DATA ENDS
stack stackseg 800
Figure 9-6. Task Switching (Lines 282-296 of List File)
GDT RAM
RAM_START
TSS RAM
IDT RAM
GDT Alias
IDT Alias
DS
EIP
EFLAGS
CS
SS
0
ES
ESP
SS = TSS.SS
ESP = TSS.ESP
PUSH TSS.EFLAG
PUSH TSS.CS
PUSH TSS.EIP
ES = TSS.ES
DS = TSS.DS
IRET
GDT
9-34 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
CODE SEGMENT ER use32 PUBLIC
main_start:
nop
nop
nop
CODE ENDS
END main_start, ds:data, ss:stack
9.10.4 Supporting Files
The bat ch file shown in Example 9- 3 can be used t o assemble t he source code files
STARTUP.ASM and MAI N. ASM and build t he final applicat ion.
Example 9-3. Batch File to Assemble and Build the Application
ASM386 STARTUP.ASM
ASM386 MAIN.ASM
BLD386 STARTUP.OBJ, MAIN.OBJ buildfile(EPROM.BLD) bootstrap(STARTUP)
Bootload
BLD386 performs several operations in this example:
It allocates physical memory location to segments and tables.
It generates tables using the build file and the input files.
It links object files and resolves references.
It generates a boot-loadable file to be programmed into the EPROM.
Example 9- 4 shows t he build file used as an input t o BLD386 t o perform t he above
funct ions.
Example 9-4. Build File
INIT_BLD_EXAMPLE;
SEGMENT
*SEGMENTS(DPL = 0)
, startup.startup_code(BASE = 0FFFF0000H)
;
TASK
BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0,
NOT INTENABLED)
, PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0,
NOT INTENABLED)
;
Vol. 3 9-35
PROCESSOR MANAGEMENT AND INITIALIZATION
TABLE
GDT (
LOCATION = GDT_EPROM
, ENTRY = (
10: PROTECTED_MODE_TASK
, startup.startup_code
, startup.startup_data
, main_module.data
, main_module.code
, main_module.stack
)
),
IDT (
LOCATION = IDT_EPROM
);
MEMORY
(
RESERVE = (0..3FFFH
-- Area for the GDT, IDT, TSS copied from ROM
, 60000H..0FFFEFFFFH)
, RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH))
-- Eprom size 64K
, RANGE = (RAM_AREA = RAM (4000H..05FFFFH))
);
END
Table 9- 5 shows t he relat ionship of each build it em wit h an ASM source file.
Table 9-5. Relationship Between BLD Item and ASM Source File
Item ASM386 and
Startup.A58
BLD386 Controls
and BLD file
Effect
Bootstrap public startup
startup:
bootstrap
start(startup)
Near jump at 0FFFFFFF0H
to start.
GDT location public GDT_EPROM
GDT_EPROM TABLE_REG <>
TABLE
GDT(location = GDT_EPROM)
The location of the GDT
will be programmed into
the GDT_EPROM location.
IDT location public IDT_EPROM
IDT_EPROM TABLE_REG <>
TABLE
IDT(location = IDT_EPROM
The location of the IDT
will be programmed into
the IDT_EPROM location.
9-36 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11 MICROCODE UPDATE FACILITIES
The Pent ium 4, I nt el Xeon, and P6 family processors have t he capabilit y t o correct
errat a by loading an I nt el- supplied dat a block int o t he processor. The dat a block is
called a microcode updat e. This sect ion describes t he mechanisms t he BI OS needs t o
provide in order t o use t his feat ure during syst em init ializat ion. I t also describes a
specificat ion t hat permit s t he incorporat ion of fut ure updat es int o a syst em BI OS.
I nt el considers t he release of a microcode updat e for a silicon revision t o be t he
equivalent of a processor st epping and complet es a full- st epping level validat ion for
releases of microcode updat es.
A microcode updat e is used t o correct errat a in t he processor. The BI OS, which has
an updat e loader, is responsible for loading t he updat e on processors during syst em
init ializat ion ( Figure 9- 7) . There are t wo st eps t o t his process: t he first is t o incorpo-
rat e t he necessary updat e dat a blocks int o t he BI OS; t he second is t o load updat e
dat a blocks int o t he processor.
RAM start RAM_START equ 400H memory (reserve = (0..3FFFH)) RAM_START is used as
the ram destination for
moving the tables. It must
be excluded from the
application's segment
area.
Location of the
application TSS
in the GDT
TSS_INDEX EQU 10 TABLE GDT(
ENTRY = (10:
PROTECTED_MODE_
TASK))
Put the descriptor of the
application TSS in GDT
entry 10.
EPROM size
and location
size and location of the
initialization code
SEGMENT startup.code (base =
0FFFF0000H) ...memory
(RANGE(
ROM_AREA = ROM(x..y))
Initialization code size
must be less than 64K
and resides at upper most
64K of the 4-GByte
memory space.
Table 9-5. Relationship Between BLD Item and ASM Source File (Contd.)
Item ASM386 and
Startup.A58
BLD386 Controls
and BLD file
Effect
Vol. 3 9-37
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.1 Microcode Update
A microcode updat e consist s of an I nt el- supplied binary t hat cont ains a descript ive
header and dat a. No execut able code resides wit hin t he updat e. Each microcode
updat e is t ailored for a specific list of processor signat ures. A mismat ch of t he
processor s signat ure wit h t he signat ure cont ained in t he updat e will result in a
failure t o load. A processor signat ure includes t he ext ended family, ext ended model,
t ype, family, model, and st epping of t he processor ( st art ing wit h processor family
0fH, model 03H, a given microcode updat e may be associat ed wit h one of mult iple
processor signat ures; see Sect ion 9. 11. 2 for det ail) .
Microcode updat es are composed of a mult i- byt e header, followed by encrypt ed dat a
and t hen by an opt ional ext ended signat ure t able. Table 9- 6 provides a definit ion of
t he fields; Table 9- 7 shows t he format of an updat e.
The header is 48 byt es. The first 4 byt es of t he header cont ain t he header version.
The updat e header and it s reserved fields are int erpret ed by soft ware based upon t he
header version. An encoding scheme guards against t ampering and provides a
means for det ermining t he aut hent icit y of any given updat e. For microcode updat es
wit h a dat a size field equal t o 00000000H, t he size of t he microcode updat e is 2048
byt es. The first 48 byt es cont ain t he microcode updat e header. The remaining 2000
byt es cont ain encrypt ed dat a.
For microcode updat es wit h a dat a size not equal t o 00000000H, t he t ot al size field
specifies t he size of t he microcode updat e. The first 48 byt es cont ain t he microcode
updat e header. The second part of t he microcode updat e is t he encrypt ed dat a. The
dat a size field of t he microcode updat e header specifies t he encrypt ed dat a size, it s
value must be a mult iple of t he size of DWORD. The t ot al size field of t he microcode
updat e header specifies t he encrypt ed dat a size plus t he header size; it s value must
be in mult iples of 1024 byt es ( 1 KByt es) . The opt ional ext ended signat ure t able if
implement ed follows t he encrypt ed dat a, and it s size is calculat ed by ( Tot al Size
( Dat a Size + 48) ) .
Figure 9-7. Applying Microcode Updates
CPU
BIOS
Update
Blocks
New Update
Update
Loader
9-38 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
NOTE
The opt ional ext ended signat ure t able is support ed st art ing wit h
processor family 0FH, model 03H.
.
Table 9-6. Microcode Update Field Definitions
Field Name Offset
(bytes)
Length
(bytes)
Description
Header Version 0 4 Version number of the update header.
Update Revision 4 4 Unique version number for the update, the basis for the
update signature provided by the processor to indicate
the current update functioning within the processor.
Used by the BIOS to authenticate the update and verify
that the processor loads successfully. The value in this
field cannot be used for processor stepping identification
alone. This is a signed 32-bit number.
Date 8 4 Date of the update creation in binary format: mmddyyyy
(e.g. 07/18/98 is 07181998H).
Processor
Signature
12 4 Extended family, extended model, type, family, model,
and stepping of processor that requires this particular
update revision (e.g., 00000650H). Each microcode
update is designed specifically for a given extended
family, extended model, type, family, model, and stepping
of the processor.
The BIOS uses the processor signature field in
conjunction with the CPUID instruction to determine
whether or not an update is appropriate to load on a
processor. The information encoded within this field
exactly corresponds to the bit representations returned
by the CPUID instruction.
Checksum 16 4 Checksum of Update Data and Header. Used to verify the
integrity of the update header and data. Checksum is
correct when the summation of all the DWORDs (including
the extended Processor Signature Table) that comprise
the microcode update result in 00000000H.
Loader Revision 20 4 Version number of the loader program needed to
correctly load this update. The initial version is
00000001H.
Processor Flags 24 4 Platform type information is encoded in the lower 8 bits
of this 4-byte field. Each bit represents a particular
platform type for a given CPUID. The BIOS uses the
processor flags field in conjunction with the platform Id
bits in MSR (17H) to determine whether or not an update
is appropriate to load on a processor. Multiple bits may be
set representing support for multiple platform IDs.
Data Size 28 4 Specifies the size of the encrypted data in bytes, and
must be a multiple of DWORDs. If this value is
00000000H, then the microcode update encrypted data
is 2000 bytes (or 500 DWORDs).
Total Size 32 4 Specifies the total size of the microcode update in bytes.
It is the summation of the header size, the encrypted
data size and the size of the optional extended signature
table. This value is always a multiple of 1024.
Vol. 3 9-39
PROCESSOR MANAGEMENT AND INITIALIZATION
Reserved 36 12 Reserved fields for future expansion
Update Data 48 Data Size or
2000
Update data
Extended Signature
Count
Data Size +
48
4 Specifies the number of extended signature structures
(Processor Signature[n], processor flags[n] and
checksum[n]) that exist in this microcode update.
Extended
Checksum
Data Size +
52
4 Checksum of update extended processor signature table.
Used to verify the integrity of the extended processor
signature table. Checksum is correct when the
summation of the DWORDs that comprise the extended
processor signature table results in 00000000H.
Reserved Data Size +
56
12 Reserved fields
Processor
Signature[n]
Data Size +
68 + (n * 12)
4 Extended family, extended model, type, family, model,
and stepping of processor that requires this particular
update revision (e.g., 00000650H). Each microcode
update is designed specifically for a given extended
family, extended model, type, family, model, and stepping
of the processor.
The BIOS uses the processor signature field in
conjunction with the CPUID instruction to determine
whether or not an update is appropriate to load on a
processor. The information encoded within this field
exactly corresponds to the bit representations returned
by the CPUID instruction.
Processor Flags[n] Data Size +
72 + (n * 12)
4 Platform type information is encoded in the lower 8 bits
of this 4-byte field. Each bit represents a particular
platform type for a given CPUID. The BIOS uses the
processor flags field in conjunction with the platform Id
bits in MSR (17H) to determine whether or not an update
is appropriate to load on a processor. Multiple bits may be
set representing support for multiple platform IDs.
Checksum[n] Data Size +
76 + (n * 12)
4 Used by utility software to decompose a microcode
update into multiple microcode updates where each of
the new updates is constructed without the optional
Extended Processor Signature Table.
To calculate the Checksum, substitute the Primary
Processor Signature entry and the Processor Flags entry
with the corresponding Extended Patch entry. Delete the
Extended Processor Signature Table entries. The
Checksum is correct when the summation of all DWORDs
that comprise the created Extended Processor Patch
results in 00000000H.
Table 9-6. Microcode Update Field Definitions (Contd.)
Field Name Offset
(bytes)
Length
(bytes)
Description
9-40 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 9-7. Microcode Update Format
31 24 16 8 0 Bytes
Header Version 0
Update Revision 4
Month: 8 Day: 8 Year: 16 8
Processor Signature (CPUID) 12
R
e
s
:
4
E
x
t
e
n
d
e
d
F
a
m
i
l
y
:
8
E
x
t
e
n
d
e
d
M
o
d
e
:
4
R
e
s
e
r
v
e
d
:
2
T
y
p
e
:
2
F
a
m
i
l
y
:
4
M
o
d
e
l
:
4
S
t
e
p
p
i
n
g
:
4
Checksum 16
Loader Revision 20
Processor Flags 24
Reserved (24 bits)
P
7
P
6
P
5
P
4
P
3
P
2
P
1
P
0
Data Size 28
Total Size 32
Reserved (12 Bytes) 36
Update Data (Data Size bytes, or 2000 Bytes if Data Size = 00000000H) 48
Extended Signature Count n Data Size
+ 48
Extended Processor Signature Table Checksum Data Size
+ 52
Reserved (12 Bytes) Data Size
+ 56
Processor Signature[n] Data Size
+ 68 +
(n * 12)
Processor Flags[n] Data Size
+ 72 +
(n * 12)
Checksum[n] Data Size
+ 76 +
(n * 12)
Vol. 3 9-41
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.2 Optional Extended Signature Table
The ext ended signat ure t able is a st ruct ure t hat may be appended t o t he end of t he
encrypt ed dat a when t he encrypt ed dat a only support s a single processor signat ure
( opt ional case) . The ext ended signat ure t able will always be present when t he
encrypt ed dat a support s mult iple processor st eppings and/ or models ( required
case) .
The ext ended signat ure t able consist s of a 20- byt e ext ended signat ure header st ruc-
t ure, which cont ains t he ext ended signat ure count , t he ext ended processor signat ure
t able checksum, and 12 reserved byt es ( Table 9- 8) . Following t he ext ended signa-
t ure header st ruct ure, t he ext ended signat ure t able cont ains 0- t o- n ext ended
processor signat ure st ruct ures.
Each processor signat ure st ruct ure consist of t he processor signat ure, processor
flags, and a checksum ( Table 9- 9) .
The ext ended signat ure count in t he ext ended signat ure header st ruct ure indicat es
t he number of processor signat ure st ruct ures t hat exist in t he ext ended signat ure
t able.
The ext ended processor signat ure t able checksum is a checksum of all DWORDs t hat
comprise t he ext ended signat ure t able. That includes t he ext ended signat ure count ,
ext ended processor signat ure t able checksum, 12 reserved byt es and t he n
processor signat ure st ruct ures. A valid ext ended signat ure t able exist s when t he
result of a DWORD checksum is 00000000H.
9.11.3 Processor Identification
Each microcode updat e is designed t o for a specific processor or set of processors. To
det ermine t he correct microcode updat e t o load, soft ware must ensure t hat one of
t he processor signat ures embedded in t he microcode updat e mat ches t he 32- bit
processor signat ure ret urned by t he CPUI D inst ruct ion when execut ed by t he t arget
processor wit h EAX = 1. At t empt ing t o load a microcode updat e t hat does not mat ch
Table 9-8. Extended Processor Signature Table Header Structure
Extended Signature Count n Data Size + 48
Extended Processor Signature Table Checksum Data Size + 52
Reserved (12 Bytes) Data Size + 56
Table 9-9. Processor Signature Structure
Processor Signature[n] Data Size + 68 + (n * 12)
Processor Flags[n] Data Size + 72 + (n * 12)
Checksum[n] Data Size + 76 + (n * 12)
9-42 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
a processor signat ure embedded in t he microcode updat e wit h t he processor signa-
t ure ret urned by CPUI D will cause t he BI OS t o rej ect t he updat e.
Example 9- 5 shows how t o check for a valid processor signat ure mat ch bet ween t he
processor and microcode updat e.
Example 9-5. Pseudo Code to Validate the Processor Signature
ProcessorSignature CPUID(1):EAX
If (Update.HeaderVersion == 00000001h)
{
// first check the ProcessorSignature field
If (ProcessorSignature == Update.ProcessorSignature)
Success
// if extended signature is present
Else If (Update.TotalSize > (Update.DataSize + 48))
{
//
// Assume the Data Size has been used to calculate the
// location of Update.ProcessorSignature[0].
//
For (N 0; ((N < Update.ExtendedSignatureCount) AND
(ProcessorSignature != Update.ProcessorSignature[N])); N++);
// if the loops ended when the iteration count is
// less than the number of processor signatures in
// the table, we have a match
If (N < Update.ExtendedSignatureCount)
Success
Else
Fail
}
Else
Fail
Else
Fail
9.11.4 Platform Identification
I n addit ion t o verifying t he processor signat ure, t he int ended processor plat form t ype
must be det ermined t o properly t arget t he microcode updat e. The int ended
processor plat form t ype is det ermined by reading t he I A32_PLATFORM_I D regist er,
( MSR 17H) . This 64- bit regist er must be read using t he RDMSR inst ruct ion.
Vol. 3 9-43
PROCESSOR MANAGEMENT AND INITIALIZATION
The t hree plat form I D bit s, when read as a binary coded decimal ( BCD) number, indi-
cat e t he bit posit ion in t he microcode updat e header s processor flags field associat ed
wit h t he inst alled processor. The processor flags in t he 48- byt e header and t he
processor flags field associat ed wit h t he ext ended processor signat ure st ruct ures
may have mult iple bit s set . Each set bit represent s a different plat form I D t hat t he
updat e support s.
Register Name: IA32_PLATFORM_ID
MSR Address: 017H
Access: Read Only
IA32_PLATFORM_ID is a 64-bit register accessed only when referenced as a Qword through a
RDMSR instruction.
To validat e t he plat form informat ion, soft ware may implement an algorit hm similar t o
t he algorit hms in Example 9- 6.
Example 9-6. Pseudo Code Example of Processor Flags Test
Flag 1 << IA32_PLATFORM_ID[52:50]
If (Update.HeaderVersion == 00000001h)
{
If (Update.ProcessorFlags & Flag)
{
Load Update
Table 9-10. Processor Flags
Bit Descriptions
63:53 Reserved
52:50 Platform Id Bits (RO). The field gives information concerning the intended platform for
the processor. See also Table 9-7.
52 51 50
0 0 0 Processor Flag 0
0 0 1 Processor Flag 1
0 1 0 Processor Flag 2
0 1 1 Processor Flag 3
1 0 0 Processor Flag 4
1 0 1 Processor Flag 5
1 1 0 Processor Flag 6
1 1 1 Processor Flag 7
49:0 Reserved
9-44 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
}
Else
{
//
// Assume the Data Size has been used to calculate the
// location of Update.ProcessorSignature[N] and a match
// on Update.ProcessorSignature[N] has already succeeded
//
If (Update.ProcessorFlags[n] & Flag)
{
Load Update
}
}
}
9.11.5 Microcode Update Checksum
Each microcode updat e cont ains a DWORD checksum locat ed in t he updat e header. I t
is soft wares responsibilit y t o ensure t hat a microcode updat e is not corrupt . To check
for a corrupt microcode updat e, soft ware must perform a unsigned DWORD ( 32- bit )
checksum of t he microcode updat e. Even t hough some fields are signed, t he
checksum procedure t reat s all DWORDs as unsigned. Microcode updat es wit h a
header version equal t o 00000001H must sum all DWORDs t hat comprise t he micro-
code updat e. A valid checksum check will yield a value of 00000000H. Any ot her
value indicat es t he microcode updat e is corrupt and should not be loaded.
The checksum algorit hm shown by t he pseudo code in Example 9- 7 t reat s t he micro-
code updat e as an array of unsigned DWORDs. I f t he dat a size DWORD field at byt e
offset 32 equals 00000000H, t he size of t he encrypt ed dat a is 2000 byt es, result ing
in 500 DWORDs. Ot herwise t he microcode updat e size in DWORDs = ( Tot al Size / 4) ,
where t he t ot al size is a mult iple of 1024 byt es ( 1 KByt es) .
Example 9-7. Pseudo Code Example of Checksum Test
N 512
If (Update.DataSize != 00000000H)
N Update.TotalSize / 4
ChkSum 0
For (I 0; I < N; I++)
{
ChkSum ChkSum + MicrocodeUpdate[I]
}
Vol. 3 9-45
PROCESSOR MANAGEMENT AND INITIALIZATION
If (ChkSum == 00000000H)
Success
Else
Fail
9.11.6 Microcode Update Loader
This sect ion describes an updat e loader used t o load an updat e int o a Pent ium 4, I nt el
Xeon, or P6 family processor. I t also discusses t he requirement s placed on t he BI OS
t o ensure proper loading. The updat e loader described cont ains t he minimal inst ruc-
t ions needed t o load an updat e. The specific inst ruct ion sequence t hat is required t o
load an updat e is dependent upon t he loader revision field cont ained wit hin t he
updat e header. This revision is expect ed t o change infrequent ly ( pot ent ially, only
when new processor models are int roduced) .
Example 9- 8 below represent s t he updat e loader wit h a loader revision of
00000001H. Not e t hat t he microcode updat e must be aligned on a 16- byt e boundary
and t he size of t he microcode updat e must be 1- KByt e granular.
Example 9-8. Assembly Code Example of Simple Microcode Update Loader
mov ecx,79h ; MSR to read in ECX
xor eax,eax ; clear EAX
xor ebx,ebx ; clear EBX
mov ax,cs ; Segment of microcode update
shl eax,4
mov bx,offset Update ; Offset of microcode update
add eax,ebx ; Linear Address of Update in EAX
add eax,48d ; Offset of the Update Data within the Update
xor edx,edx ; Zero in EDX
WRMSR ; microcode update trigger
The loader shown in Example 9- 8 assumes t hat updat e is t he address of a microcode
updat e ( header and dat a) embedded wit hin t he code segment of t he BI OS. I t also
assumes t hat t he processor is operat ing in real mode. The dat a may reside anywhere
in memory, aligned on a 16- byt e boundary, t hat is accessible by t he processor wit hin
it s current operat ing mode.
Before t he BI OS execut es t he microcode updat e t rigger ( WRMSR) inst ruct ion, t he
following must be t rue:
I n 64- bit mode, EAX cont ains t he lower 32- bit s of t he microcode updat e linear
address. I n prot ect ed mode, EAX cont ains t he full 32- bit linear address of t he
microcode updat e.
I n 64- bit mode, EDX cont ains t he upper 32- bit s of t he microcode updat e linear
address. I n prot ect ed mode, EDX equals zero.
9-46 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
ECX cont ains 79H ( address of I A32_BI OS_UPDT_TRI G) .
Ot her requirement s are:
I f t he updat e is loaded while t he processor is in real mode, t hen t he updat e dat a
may not cross a segment boundary.
I f t he updat e is loaded while t he processor is in real mode, t hen t he updat e dat a
may not exceed a segment limit .
I f paging is enabled, pages t hat are current ly present must map t he updat e dat a.
The microcode updat e dat a requires a 16- byt e boundary alignment .
9.11.6.1 Hard Resets in Update Loading
The effect s of a loaded updat e are cleared from t he processor upon a hard reset .
Therefore, each t ime a hard reset is assert ed during t he BI OS POST, t he updat e must
be reloaded on all processors t hat observed t he reset . The effect s of a loaded updat e
are, however, maint ained across a processor I NI T. There are no side effect s caused
by loading an updat e int o a processor mult iple t imes.
9.11.6.2 Update in a Multiprocessor System
A mult iprocessor ( MP) syst em requires loading each processor wit h updat e dat a
appropriat e for it s CPUI D and plat form I D bit s. The BI OS is responsible for ensuring
t hat t his requirement is met and t hat t he loader is locat ed in a module execut ed by
all processors in t he syst em. I f a syst em design permit s mult iple st eppings of
Pent ium 4, I nt el Xeon, and P6 family processors t o exist concurrent ly; t hen t he BI OS
must verify individual processors against t he updat e header informat ion t o ensure
appropriat e loading. Given t hese considerat ions, it is most pract ical t o load t he
updat e during MP init ializat ion.
9.11.6.3 Update in a System Supporting Intel Hyper-Threading Technology
I nt el Hyper-Threading Technology has implicat ions on t he loading of t he microcode
updat e. The updat e must be loaded for each core in a physical processor. Thus, for a
processor support ing I nt el Hyper-Threading Technology, only one logical processor
per core is required t o load t he microcode updat e. Each individual logical processor
can independent ly load t he updat e. However, MP init ializat ion must provide some
mechanism ( e. g. a soft ware semaphore) t o force serializat ion of microcode updat e
loads and t o prevent simult aneous load at t empt s t o t he same core.
9.11.6.4 Update in a System Supporting Dual-Core Technology
Dual- core t echnology has implicat ions on t he loading of t he microcode updat e. The
microcode updat e facilit y is not shared bet ween processor cores in t he same physical
package. The updat e must be loaded for each core in a physical processor.
Vol. 3 9-47
PROCESSOR MANAGEMENT AND INITIALIZATION
I f processor core support s I nt el Hyper-Threading Technology, t he guideline described
in Sect ion 9. 11. 6. 3 also applies.
9.11.6.5 Update Loader Enhancements
The updat e loader present ed in Sect ion 9. 11. 6, Microcode Updat e Loader, is a
minimal implement at ion t hat can be enhanced t o provide addit ional funct ionalit y.
Pot ent ial enhancement s are described below:
BI OS can incorporat e mult iple updat es t o support mult iple st eppings of t he
Pent ium 4, I nt el Xeon, and P6 family processors. This feat ure provides for
operat ing in a mixed st epping environment on an MP syst em and enables a user
t o upgrade t o a lat er version of t he processor. I n t his case, modify t he loader t o
check t he CPUI D and plat form I D bit s of t he processor t hat it is running on
against t he available headers before loading a part icular updat e. The number of
updat es is only limit ed by available BI OS space.
A loader can load t he updat e and t est t he processor t o det ermine if t he updat e
was loaded correct ly. See Sect ion 9. 11. 7, Updat e Signat ure and Verificat ion.
A loader can verify t he int egrit y of t he updat e dat a by performing a checksum on
t he double words of t he updat e summing t o zero. See Sect ion 9. 11. 5, Microcode
Updat e Checksum.
A loader can provide power- on messages indicat ing successful loading of an
updat e.
9.11.7 Update Signature and Verification
The Pent ium 4, I nt el Xeon, and P6 family processors provide capabilit ies t o verify t he
aut hent icit y of a part icular updat e and t o ident ify t he current updat e revision. This
sect ion describes t he model- specific ext ensions of processors t hat support t his
feat ure. The updat e verificat ion met hod below assumes t hat t he BI OS will only verify
an updat e t hat is more recent t han t he revision current ly loaded in t he processor.
CPUI D ret urns a value in a model specific regist er in addit ion t o it s usual regist er
ret urn values. The semant ics of CPUI D cause it t o deposit an updat e I D value in t he
64- bit model- specific regist er at address 08BH ( I A32_BI OS_SI GN_I D) . I f no updat e
is present in t he processor, t he value in t he MSR remains unmodified. The BI OS must
pre- load a zero int o t he MSR before execut ing CPUI D. I f a read of t he MSR at 8BH st ill
ret urns zero aft er execut ing CPUI D, t his indicat es t hat no updat e is present .
The updat e I D value ret urned in t he EDX regist er aft er RDMSR execut es indicat es t he
revision of t he updat e loaded in t he processor. This value, in combinat ion wit h t he
CPUI D value ret urned in t he EAX regist er, uniquely ident ifies a part icular updat e. The
signat ure I D can be direct ly compared wit h t he updat e revision field in a microcode
updat e header for verificat ion of a correct load. No consecut ive updat es released for
a given st epping of a processor may share t he same signat ure. The processor signa-
t ure ret urned by CPUI D different iat es updat es for different st eppings.
9-48 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.7.1 Determining the Signature
An updat e t hat is successfully loaded int o t he processor provides a signat ure t hat
mat ches t he updat e revision of t he current ly funct ioning revision. This signat ure is
available any t ime aft er t he act ual updat e has been loaded. Request ing t he signat ure
does not have a negat ive impact upon a loaded updat e.
The procedure for det ermining t his signat ure shown in Example 9- 9.
Example 9-9. Assembly Code to Retrieve the Update Revision
MOV ECX, 08BH ;IA32_BIOS_SIGN_ID
XOR EAX, EAX ;clear EAX
XOR EDX, EDX ;clear EDX
WRMSR ;Load 0 to MSR at 8BH
MOV EAX, 1
cpuid
MOV ECX, 08BH ;IA32_BIOS_SIGN_ID
rdmsr ;Read Model Specific Register
I f t here is an updat e act ive in t he processor, it s revision is ret urned in t he EDX
regist er aft er t he RDMSR inst ruct ion execut es.
IA32_BIOS_SIGN_ID Microcode Update Signature Register
MSR Address: 08BH Accessed as a Qword
Default Value: XXXX XXXX XXXX XXXXh
Access: Read/Write
The I A32_BI OS_SI GN_I D regist er is used t o report t he microcode updat e signat ure
when CPUI D execut es. The signat ure is ret urned in t he upper DWORD ( Table 9- 11) .
9.11.7.2 Authenticating the Update
An updat e may be aut hent icat ed by t he BI OS using t he signat ure primit ive,
described above, and t he algorit hm in Example 9- 10.
Table 9-11. Microcode Update Signature
Bit Description
63:32 Microcode update signature. This field contains the signature of the currently loaded
microcode update when read following the execution of the CPUID instruction, function
1. It is required that this register field be pre-loaded with zero prior to executing the
CPUID, function 1. If the field remains equal to zero, then there is no microcode update
loaded. Another non-zero value will be the signature.
31:0 Reserved.
Vol. 3 9-49
PROCESSOR MANAGEMENT AND INITIALIZATION
Example 9-10. Pseudo Code to Authenticate the Update
Z Obtain Update Revision from the Update Header to be authenticated;
X Obtain Current Update Signature from MSR 8BH;
If (Z > X)
{
Load Update that is to be authenticated;
Y Obtain New Signature from MSR 8BH;
If (Z == Y)
Success
Else
Fail
}
Else
Fail
Example 9- 10 requires t hat t he BI OS only aut hent icat e updat es t hat cont ain a
numerically larger revision t han t he current ly loaded revision, where Current Signa-
t ure ( X) < New Updat e Revision ( Z) . A processor wit h no loaded updat e is considered
t o have a revision equal t o zero.
This aut hent icat ion procedure relies upon t he decoding provided by t he processor t o
verify an updat e from a pot ent ially host ile source. As an example, t his mechanism in
conj unct ion wit h ot her safeguards provides securit y for dynamically incorporat ing
field updat es int o t he BI OS.
9.11.8 Pentium 4, Intel Xeon, and P6 Family Processor
Microcode Update Specifications
This sect ion describes t he int erface t hat an applicat ion can use t o dynamically int e-
grat e processor- specific updat es int o t he syst em BI OS. I n t his discussion, t he appli-
cat ion is referred t o as t he calling program or caller.
The real mode I NT15 call specificat ion described here is an I nt el ext ension t o an OEM
BI OS. This ext ension allows an applicat ion t o read and modify t he cont ent s of t he
microcode updat e dat a in NVRAM. The updat e loader, which is part of t he syst em
BI OS, cannot be updat ed by t he int erface. All of t he funct ions defined in t he specifi-
cat ion must be implement ed for a syst em t o be considered compliant wit h t he speci-
ficat ion. The I NT15 funct ions are accessible only from real mode.
9.11.8.1 Responsibilities of the BIOS
I f a BI OS passes t he presence t est ( I NT 15H, AX = 0D042H, BL = 0H) , it must imple-
ment all of t he sub- funct ions defined in t he I NT 15H, AX = 0D042H specificat ion.
9-50 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
There are no opt ional funct ions. BI OS must load t he appropriat e updat e for each
processor during syst em init ializat ion.
A Header Version of an updat e block cont aining t he value 0FFFFFFFFH indicat es t hat
t he updat e block is unused and available for st oring a new updat e.
The BI OS is responsible for providing a region of non- volat ile st orage ( NVRAM) for
each pot ent ial processor st epping wit hin a syst em. This st orage unit consist s of one
or more updat e blocks. An updat e block is a cont iguous 2048- byt e block of memory.
The BI OS for a single processor syst em need only provide updat e blocks t o st ore one
microcode updat e. I f t he BI OS for a mult iple processor syst em is int ended t o support
mixed processor st eppings, t hen t he BI OS needs t o provide enough updat e blocks t o
st ore each unique microcode updat e or for each processor socket on t he OEMs
syst em board.
The BI OS is responsible for managing t he NVRAM updat e blocks. This includes
garbage collect ion, such as removing microcode updat es t hat exist in NVRAM for
which a corresponding processor does not exist in t he syst em. This specificat ion only
provides t he mechanism for ensuring securit y, t he uniqueness of an ent ry, and t hat
st ale ent ries are not loaded. The act ual updat e block management is implement at ion
specific on a per- BI OS basis.
As an example, t he BI OS may use updat e blocks sequent ially in ascending order wit h
CPU signat ures sort ed versus t he first available block. I n addit ion, garbage collect ion
may be implement ed as a set up opt ion t o clear all NVRAM slot s or as BI OS code t hat
searches and eliminat es unused ent ries during boot .
NOTES
For I A- 32 processors st art ing wit h family 0FH and model 03H and
I nt el 64 processors, t he microcode updat e may be as large as 16
KByt es. Thus, BI OS must allocat e 8 updat e blocks for each microcode
updat e. I n a MP syst em, a common microcode updat e may be
sufficient for each socket in t he syst em.
For I A- 32 processors earlier t han family 0FH and model 03H, t he
microcode updat e is 2 KByt es. An MP- capable BI OS t hat support s
mult iple st eppings must allocat e a block for each socket in t he syst em.
A single- processor BI OS t hat support s variable- sized microcode
updat e and fixed- sized microcode updat e must allocat e one 16- KByt e
region and a second region of at least 2 KByt es.
The following algorit hm ( Example 9- 11) describes t he st eps performed during BI OS
init ializat ion used t o load t he updat es int o t he processor( s) . The algorit hm assumes:
The BI OS ensures t hat no updat e cont ained wit hin NVRAM has a header version
or loader version t hat does not mat ch one current ly support ed by t he BI OS.
The updat e cont ains a correct checksum.
The BI OS ensures t hat ( at most ) one updat e exist s for each processor st epping.
Older updat e revisions are not allowed t o overwrit e more recent ones.
Vol. 3 9-51
PROCESSOR MANAGEMENT AND INITIALIZATION
These requirement s are checked by t he BI OS during t he execut ion of t he writ e
updat e funct ion of t his int erface. The BI OS sequent ially scans t hrough all of t he
updat e blocks in NVRAM st art ing wit h index 0. The BI OS scans unt il it finds an updat e
where t he processor fields in t he header mat ch t he processor signat ure ( ext ended
family, ext ended model, t ype, family, model, and st epping) as well as t he plat form
bit s of t he current processor.
Example 9-11. Pseudo Code, Checks Required Prior to Loading an Update
For each processor in the system
{
Determine the Processor Signature via CPUID function 1;
Determine the Platform Bits 1 << IA32_PLATFORM_ID[52:50];
For (I UpdateBlock 0, I < NumOfBlocks; I++)
{
If (Update.Header_Version == 0x00000001)
{
If ((Update.ProcessorSignature == Processor Signature) &&
(Update.ProcessorFlags & Platform Bits))
{
Load Update.UpdateData into the Processor;
Verify update was correctly loaded into the processor
Go on to next processor
Break;
}
Else If (Update.TotalSize > (Update.DataSize + 48))
{
N 0
While (N < Update.ExtendedSignatureCount)
{
If ((Update.ProcessorSignature[N] ==
Processor Signature) &&
(Update.ProcessorFlags[N] & Platform Bits))
{
Load Update.UpdateData into the Processor;
Verify update correctly loaded into the processor
Go on to next processor
Break;
}
N N + 1
}
I I + (Update.TotalSize / 2048)
If ((Update.TotalSize MOD 2048) == 0)
I I + 1
}
}
9-52 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
}
}
NOTES
The plat form I d bit s in I A32_PLATFORM_I D are encoded as a t hree-
bit binary coded decimal field. The plat form bit s in t he microcode
updat e header are individually bit encoded. The algorit hm must do a
t ranslat ion from one format t o t he ot her prior t o doing a check.
When performing t he I NT 15H, 0D042H funct ions, t he BI OS must assume t hat t he
caller has no knowledge of plat form specific requirement s. I t is t he responsibilit y of
BI OS calls t o manage all chipset and plat form specific prerequisit es for managing t he
NVRAM device. When writ ing t he updat e dat a using t he Writ e Updat e sub- funct ion,
t he BI OS must maint ain implement at ion specific dat a requirement s ( such as t he
updat e of NVRAM checksum) . The BI OS should also at t empt t o verify t he success of
writ e operat ions on t he st orage device used t o record t he updat e.
9.11.8.2 Responsibilities of the Calling Program
This sect ion of t he document list s t he responsibilit ies of a calling program using t he
int erface specificat ions t o load microcode updat e( s) int o BI OS NVRAM.
The calling program should call t he I NT 15H, 0D042H funct ions from a pure real
mode program and should be execut ing on a syst em t hat is running in pure real
mode.
The caller should issue t he presence t est funct ion ( sub funct ion 0) and verify t he
signat ure and ret urn codes of t hat funct ion.
I t is import ant t hat t he calling program provides t he required scrat ch RAM buffers
for t he BI OS and t he proper st ack size as specified in t he int erface definit ion.
The calling program should read any updat e dat a t hat already exist s in t he BI OS
in order t o make decisions about t he appropriat eness of loading t he updat e. The
BI OS must refuse t o overwrit e a newer updat e wit h an older version. The updat e
header cont ains informat ion about version and processor specifics for t he calling
program t o make an int elligent decision about loading.
There can be no ambiguous updat es. The BI OS must refuse t o allow mult iple
updat es for t he same CPU t o exist at t he same t ime; it also must refuse t o load
updat es for processors t hat don t exist on t he syst em.
The calling applicat ion should implement a verify funct ion t hat is run aft er t he
updat e writ e funct ion successfully complet es. This funct ion reads back t he
updat e and verifies t hat t he BI OS ret urned an image ident ical t o t he one t hat was
writ t en.
Example 9- 12 represent s a calling program.
Vol. 3 9-53
PROCESSOR MANAGEMENT AND INITIALIZATION
Example 9-12. INT 15 DO42 Calling Program Pseudo-code
//
// We must be in real mode
//
If the system is not in Real mode exit
//
// Detect presence of Genuine Intel processor(s) that can be updated
// using(CPUID)
//
If no Intel processors exist that can be updated exit
//
// Detect the presence of the Intel microcode update extensions
//
If the BIOS fails the PresenceTestexit
//
// If the APIC is enabled, see if any other processors are out there
//
Read IA32_APICBASE
If APIC enabled
{
Send Broadcast Message to all processors except self via APIC
Have all processors execute CPUID, record the Processor Signature
(i.e.,Extended Family, Extended Model, Type, Family, Model,
Stepping)
Have all processors read IA32_PLATFORM_ID[52:50], record Platform
Id Bits
If current processor cannot be updated
exit
}
//
// Determine the number of unique update blocks needed for this system
//
NumBlocks = 0
For each processor
{
If ((this is a unique processor stepping) AND
(we have a unique update in the database for this processor))
{
Checksum the update from the database;
If Checksum fails
exit
NumBlocks NumBlocks + size of microcode update / 2048
}
}
//
9-54 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
// Do we have enough update slots for all CPUs?
//
If there are more blocks required to support the unique processor
steppings than update blocks provided by the BIOS exit
//
// Do we need any update blocks at all? If not, we are done
//
If (NumBlocks == 0)
exit
//
// Record updates for processors in NVRAM.
//
For (I=0; I<NumBlocks; I++)
{
//
// Load each Update
//
Issue the WriteUpdate function
If (STORAGE_FULL) returned
{
Display Error -- BIOS is not managing NVRAM appropriately
exit
}
If (INVALID_REVISION) returned
{
Display Message: More recent update already loaded in NVRAM for
this stepping
continue
}
If any other error returned
{
Display Diagnostic
exit
}
//
// Verify the update was loaded correctly
//
Issue the ReadUpdate function
If an error occurred
{
Display Diagnostic
exit
Vol. 3 9-55
PROCESSOR MANAGEMENT AND INITIALIZATION
}
//
// Compare the Update read to that written
//
If (Update read != Update written)
{
Display Diagnostic
exit
}
I I + (size of microcode update / 2048)
}
//
// Enable Update Loading, and inform user
//
Issue the Update Control function with Task = Enable.
9.11.8.3 Microcode Update Functions
Table 9- 12 defines current Pent ium 4, I nt el Xeon, and P6 family processor microcode
updat e funct ions.
9.11.8.4 INT 15H-based Interface
I nt el recommends t hat a BI OS int erface be provided t hat allows addit ional microcode
updat es t o be added t o syst em flash. The I NT15H int erface is t he I nt el- defined
met hod for doing t his.
The program t hat calls t his int erface is responsible for providing t hree 64- kilobyt e
RAM areas for BI OS use during calls t o t he read and writ e funct ions. These RAM
scrat ch pads can be used by t he BI OS for any purpose, but only for t he durat ion of
t he funct ion call. The calling rout ine places real mode segment s point ing t o t he RAM
blocks in t he CX, DX and SI regist ers. Calls t o funct ions in t his int erface must be
made wit h a minimum of 32 kilobyt es of st ack available t o t he BI OS.
Table 9-12. Microcode Update Functions
Microcode Update
Function
Function
Number
Description Required/Optional
Presence test 00H Returns information about the
supported functions.
Required
Write update data 01H Writes one of the update data areas
(slots).
Required
Update control 02H Globally controls the loading of updates. Required
Read update data 03H Reads one of the update data areas
(slots).
Required
9-56 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
I n general, each funct ion ret urns wit h CF cleared and AH cont ains t he ret urned
st at us. The general ret urn codes and ot her const ant definit ions are list ed in Sect ion
9.11.8. 9, Ret urn Codes.
The OEM error field ( AL) is provided for t he OEM t o ret urn addit ional error informa-
t ion specific t o t he plat form. I f t he BI OS provides no addit ional informat ion about t he
error, OEM error must be set t o SUCCESS. The OEM error field is undefined if AH
cont ains eit her SUCCESS ( 00H) or NOT_I MPLEMENTED ( 86H) . I n all ot her cases, it
must be set wit h eit her SUCCESS or a value meaningful t o t he OEM.
The following sect ions describe funct ions provided by t he I NT15H- based int erface.
9.11.8.5 Function 00HPresence Test
This funct ion verifies t hat t he BI OS has implement ed required microcode updat e
funct ions. Table 9- 13 list s t he paramet ers and ret urn codes for t he funct ion.
I n order t o assure t hat t he BI OS funct ion is present , t he caller must verify t he carry
flag, t he ret urn code, and t he 64- bit signat ure. The updat e count reflect s t he number
of 2048- byt e blocks available for st orage wit hin one non- volat ile RAM.
The loader version number refers t o t he revision of t he updat e loader program t hat is
included in t he syst em BI OS image.
Table 9-13. Parameters for the Presence Test
Input
AX Function Code 0D042H
BL Sub-function 00H - Presence test
Output
CF Carry Flag Carry Set - Failure - AH contains status
Carry Clear - All return values valid
AH Return Code
AL OEM Error Additional OEM information.
EBX Signature Part 1 'INTE' - Part one of the signature
ECX Signature Part 2 'LPEP'- Part two of the signature
EDX Loader Version Version number of the microcode update loader
SI Update Count Number of 2048 update blocks in NVRAM the BIOS
allocated to storing microcode updates
Return Codes (see Table 9-18 for code definitions
SUCCESS The function completed successfully.
NOT_IMPLEMENTED The function is not implemented.
Vol. 3 9-57
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.8.6 Function 01HWrite Microcode Update Data
This funct ion int egrat es a new microcode updat e int o t he BI OS st orage device. Table
9- 14 list s t he paramet ers and ret urn codes for t he funct ion.
Table 9-14. Parameters for the Write Update Data Function
Input
AX Function Code 0D042H
BL Sub-function 01H - Write update
ES:DI Update Address Real Mode pointer to the Intel Update structure. This
buffer is 2048 bytes in length if the processor supports
only fixed-size microcode update or...
Real Mode pointer to the Intel Update structure. This
buffer is 64 KBytes in length if the processor supports a
variable-size microcode update.
CX Scratch Pad1 Real mode segment address of 64 KBytes of RAM block
DX Scratch Pad2 Real mode segment address of 64 KBytes of RAM block
SI Scratch Pad3 Real mode segment address of 64 KBytes of RAM block
SS:SP Stack pointer 32 KBytes of stack minimum
Output
CF Carry Flag Carry Set - Failure - AH Contains status
Carry Clear - All return values valid
AH Return Code Status of the call
AL OEM Error Additional OEM information
Return Codes (see Table 9-18 for code definitions
SUCCESS The function completed successfully.
NOT_IMPLEMENTED The function is not implemented.
WRITE_FAILURE A failure occurred because of the inability to write the
storage device.
ERASE_FAILURE A failure occurred because of the inability to erase the
storage device.
READ_FAILURE A failure occurred because of the inability to read the
storage device.
STORAGE_FULL The BIOS non-volatile storage area is unable to
accommodate the update because all available update
blocks are filled with updates that are needed for
processors in the system.
9-58 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Description
The BI OS is responsible for select ing an appropriat e updat e block in t he non- volat ile
st orage for st oring t he new updat e. This BI OS is also responsible for ensuring t he
int egrit y of t he informat ion provided by t he caller, including aut hent icat ing t he
proposed updat e before incorporat ing it int o st orage.
Before writ ing t he updat e block int o NVRAM, t he BI OS should ensure t hat t he updat e
st ruct ure meet s t he following crit eria in t he following order:
1. The updat e header version should be equal t o an updat e header version
recognized by t he BI OS.
2. The updat e loader version in t he updat e header should be equal t o t he updat e
loader version cont ained wit hin t he BI OS image.
3. The updat e block must checksum. This checksum is comput ed as a 32- bit
summat ion of all double words in t he st ruct ure, including t he header, dat a, and
processor signat ure t able.
The BI OS select s updat e block( s) in non- volat ile st orage for st oring t he candidat e
updat e. The BI OS can select any available updat e block as long as it guarant ees t hat
only a single updat e exist s for any given processor st epping in non- volat ile st orage.
I f t he updat e block select ed already cont ains an updat e, t he following addit ional
crit eria apply t o overwrit e it :
The processor signat ure in t he proposed updat e must be equal t o t he processor
signat ure in t he header of t he current updat e in NVRAM ( Processor Signat ure +
plat form I D bit s) .
The updat e revision in t he proposed updat e should be great er t han t he updat e
revision in t he header of t he current updat e in NVRAM.
I f no unused updat e blocks are available and t he above crit eria are not met , t he BI OS
can overwrit e updat e block( s) for a processor st epping t hat is no longer present in
t he syst em. This can be done by scanning t he updat e blocks and comparing t he
processor st eppings, ident ified in t he MP Specificat ion t able, t o t he processor st ep-
pings t hat current ly exist in t he syst em.
CPU_NOT_PRESENT The processor stepping does not currently exist in the
system.
INVALID_HEADER The update header contains a header or loader version
that is not recognized by the BIOS.
INVALID_HEADER_CS The update does not checksum correctly.
SECURITY_FAILURE The processor rejected the update.
INVALID_REVISION The same or more recent revision of the update exists in
the storage device.
Table 9-14. Parameters for the Write Update Data Function (Contd.)
Input
Vol. 3 9-59
PROCESSOR MANAGEMENT AND INITIALIZATION
Finally, before st oring t he proposed updat e in NVRAM, t he BI OS must verify t he
aut hent icit y of t he updat e via t he mechanism described in Sect ion 9. 11. 6, Micro-
code Updat e Loader. This includes loading t he updat e int o t he current processor,
execut ing t he CPUI D inst ruct ion, reading MSR 08Bh, and comparing a calculat ed
value wit h t he updat e revision in t he proposed updat e header for equalit y.
When performing t he writ e updat e funct ion, t he BI OS must record t he ent ire updat e,
including t he header, t he updat e dat a, and t he ext ended processor signat ure t able ( if
applicable) . When writ ing an updat e, t he original cont ent s may be overwrit t en,
assuming t he above crit eria have been met . I t is t he responsibilit y of t he BI OS t o
ensure t hat more recent updat es are not overwrit t en t hrough t he use of t his BI OS
call, and t hat only a single updat e exist s wit hin t he NVRAM for any processor st ep-
ping and plat form I D.
Figure 9- 8 and Figure 9- 9 show t he process t he BI OS follows t o choose an updat e
block and ensure t he int egrit y of t he dat a when it st ores t he new microcode updat e.
9-60 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-8. Microcode Update Write Operation Flow [1]
1
Valid Update
Header Version?
Loader Revision Match
BIOSs Loader?
Does Update Match A
CPU in The System
Write Microcode Update
Does Update
ChecksumCorrectly?
Yes
Yes
Yes
No
Return
CPU_NOT_PRESENT
No
Return
INVALID_HEADER
No
Return
INVALID_HEADER
No
Return
INVALID_HEADER_CS
Vol. 3 9-61
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-9. Microcode Update Write Operation Flow [2]
Return
INVALID_REVISION
Yes
1
Update Revision Newer
Than NVRAM Update?
Update Pass
Authenticity Test?
Return
SECURITY_FAILURE
Yes
Update NMRAM Record
Return
SUCCESS
Update Matching CPU
Already In NVRAM?
Space Available in
NVRAM?
Yes
No
Return
STORAGE_FULL
Replacement
policy implemented?
No
No
No
Yes
Yes
9-62 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.8.7 Function 02HMicrocode Update Control
This funct ion enables loading of binary updat es int o t he processor. Table 9- 15 list s
t he paramet ers and ret urn codes for t he funct ion.
This cont rol is provided on a global basis for all updat es and processors. The caller
can det ermine t he current st at us of updat e loading ( enabled or disabled) wit hout
changing t he st at e. The funct ion does not allow t he caller t o disable loading of binary
updat es, as t his poses a securit y risk.
The caller specifies t he request ed operat ion by placing one of t he values from Table
9- 16 in t he BH regist er. Aft er successfully complet ing t his funct ion, t he BL regist er
cont ains eit her t he enable or t he disable designat or. Not e t hat if t he funct ion fails, t he
updat e st at us ret urn value is undefined.
Table 9-15. Parameters for the Control Update Sub-function
Input
AX Function Code 0D042H
BL Sub-function 02H - Control update
BH Task See the description below.
CX Scratch Pad1 Real mode segment of 64 KBytes of RAM block
DX Scratch Pad2 Real mode segment of 64 KBytes of RAM block
SI Scratch Pad3 Real mode segment of 64 KBytes of RAM block
SS:SP Stack pointer 32 kilobytes of stack minimum
Output
CF Carry Flag Carry Set - Failure - AH contains status
Carry Clear - All return values valid.
AH Return Code Status of the call
AL OEM Error Additional OEM Information.
BL Update Status Either enable or disable indicator
Return Codes (see Table 9-18 for code definitions)
SUCCESS Function completed successfully.
READ_FAILURE A failure occurred because of the inability to read the
storage device.
Vol. 3 9-63
PROCESSOR MANAGEMENT AND INITIALIZATION
The READ_FAI LURE error code ret urned by t his funct ion has meaning only if t he
cont rol funct ion is implement ed in t he BI OS NVRAM. The st at e of t his feat ure
( enabled/ disabled) can also be implement ed using CMOS RAM bit s where READ
failure errors cannot occur.
9.11.8.8 Function 03HRead Microcode Update Data
This funct ion reads a current ly inst alled microcode updat e from t he BI OS st orage int o
a caller- provided RAM buffer. Table 9- 17 list s t he paramet ers and ret urn codes.
Table 9-16. Mnemonic Values
Mnemonic Value Meaning
Enable 1 Enable the Update loading at initialization time.
Query 2 Determine the current state of the update control without
changing its status.
Table 9-17. Parameters for the Read Microcode Update Data Function
Input
AX Function Code 0D042H
BL Sub-function 03H - Read Update
ES:DI Buffer Address Real Mode pointer to the Intel Update
structure that will be written with the
binary data
ECX Scratch Pad1 Real Mode Segment address of 64
KBytes of RAM Block (lower 16 bits)
ECX Scratch Pad2 Real Mode Segment address of 64
KBytes of RAM Block (upper 16 bits)
DX Scratch Pad3 Real Mode Segment address of 64
KBytes of RAM Block
SS:SP Stack pointer 32 KBytes of Stack Minimum
SI Update Number This is the index number of the update
block to be read. This value is zero based
and must be less than the update count
returned from the presence test
function.
Output
CF Carry Flag Carry Set - Failure - AH contains Status
Carry Clear - All return
values are valid.
AH Return Code Status of the Call
9-64 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
The read funct ion enables t he caller t o read any microcode updat e dat a t hat already
exist s in a BI OS and make decisions about t he addit ion of new updat es. As a result
of a successful call, t he BI OS copies t he microcode updat e int o t he locat ion point ed
t o by ES: DI , wit h t he cont ent s of all Updat e block( s) t hat are used t o st ore t he spec-
ified microcode updat e.
I f t he specified block is not a header block, but does cont ain valid dat a from a micro-
code updat e t hat spans mult iple updat e blocks, t hen t he BI OS must ret urn Failure
wit h t he NOT_EMPTY error code in AH.
An updat e block is considered unused and available for st oring a new updat e if it s
Header Version cont ains t he value 0FFFFFFFFH aft er ret urn from t his funct ion call.
The act ual implement at ion of NVRAM st orage management is not specified here and
is BI OS dependent . As an example, t he act ual dat a value used t o represent an
empt y block by t he BI OS may be zero, rat her t han 0FFFFFFFFH. The BI OS is respon-
sible for t ranslat ing t his informat ion int o t he header provided by t his funct ion.
9.11.8.9 Return Codes
Aft er t he call has been made, t he ret urn codes list ed in Table 9- 18 are available in t he
AH regist er.
AL OEM Error Additional OEM Information
Return Codes (see Table 9-18 for code definitions)
SUCCESS The function completed successfully.
READ_FAILURE There was a failure because of the
inability to read the storage device.
UPDATE_NUM_INVALID Update number exceeds the maximum
number of update blocks implemented
by the BIOS.
NOT_EMPTY The specified update block is a
subsequent block in use to store a valid
microcode update that spans multiple
blocks.
The specified block is not a header block
and is not empty.
Table 9-17. Parameters for the Read Microcode Update Data Function (Contd.)
Vol. 3 9-65
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 9-18. Return Code Definitions
Return Code Value Description
SUCCESS 00H The function completed successfully.
NOT_IMPLEMENTED 86H The function is not implemented.
ERASE_FAILURE 90H A failure because of the inability to erase the storage
device.
WRITE_FAILURE 91H A failure because of the inability to write the storage
device.
READ_FAILURE 92H A failure because of the inability to read the storage
device.
STORAGE_FULL 93H The BIOS non-volatile storage area is unable to
accommodate the update because all available update
blocks are filled with updates that are needed for
processors in the system.
CPU_NOT_PRESENT 94H The processor stepping does not currently exist in the
system.
INVALID_HEADER 95H The update header contains a header or loader version
that is not recognized by the BIOS.
INVALID_HEADER_CS 96H The update does not checksum correctly.
SECURITY_FAILURE 97H The update was rejected by the processor.
INVALID_REVISION 98H The same or more recent revision of the update exists
in the storage device.
UPDATE_NUM_INVALID 99H The update number exceeds the maximum number of
update blocks implemented by the BIOS.
NOT_EMPTY 9AH The specified update block is a subsequent block in use
to store a valid microcode update that spans multiple
blocks.
The specified block is not a header block and is not
empty.
9-66 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Vol. 3 10-1
CHAPTER 10
ADVANCED PROGRAMMABLE
INTERRUPT CONTROLLER (APIC)
The Advanced Programmable I nt errupt Cont roller ( API C) , referred t o in t he following
sect ions as t he local API C, was int roduced int o t he I A- 32 processors wit h t he Pent ium
processor ( see Sect ion 19. 27, Advanced Programmable I nt errupt Cont roller
( API C) ) and is included in t he P6 family, Pent ium 4, I nt el Xeon processors, and ot her
more recent I nt el 64 and I A- 32 processor families ( see Sect ion 10. 4. 2, Presence of
t he Local API C ) . The local API C performs t wo primary funct ions for t he processor:
I t receives int errupt s from t he processor s int errupt pins, from int ernal sources
and from an ext ernal I / O API C ( or ot her ext ernal int errupt cont roller) . I t sends
t hese t o t he processor core for handling.
I n mult iple processor ( MP) syst ems, it sends and receives int erprocessor
int errupt ( I PI ) messages t o and from ot her logical processors on t he syst em bus.
I PI messages can be used t o dist ribut e int errupt s among t he processors in t he
syst em or t o execut e syst em wide funct ions ( such as, boot ing up processors or
dist ribut ing work among a group of processors) .
The ext ernal I / O API C is part of I nt els syst em chip set . I t s primary funct ion is t o
receive ext ernal int errupt event s from t he syst em and it s associat ed I / O devices and
relay t hem t o t he local API C as int errupt messages. I n MP syst ems, t he I / O API C also
provides a mechanism for dist ribut ing ext ernal int errupt s t o t he local API Cs of
select ed processors or groups of processors on t he syst em bus.
This chapt er provides a descript ion of t he local API C and it s programming int erface.
I t also provides an overview of t he int erface bet ween t he local API C and t he I / O
API C. Cont act I nt el for det ailed informat ion about t he I / O API C.
When a local API C has sent an int errupt t o it s processor core for handling, t he
processor uses t he int errupt and except ion handling mechanism described in Chapt er
6, I nt errupt and Except ion Handling. See Sect ion 6. 1, I nt errupt and Except ion
Overview, for an int roduct ion t o int errupt and except ion handling.
10.1 LOCAL AND I/O APIC OVERVIEW
Each local API C consist s of a set of API C regist ers ( see Table 10- 1) and associat ed
hardware t hat cont rol t he delivery of int errupt s t o t he processor core and t he gener-
at ion of I PI messages. The API C regist ers are memory mapped and can be read and
writ t en t o using t he MOV inst ruct ion.
Local API Cs can receive int errupt s from t he following sources:
Local l y connect ed I / O devi ces These int errupt s originat e as an edge or
level assert ed by an I / O device t hat is connect ed direct ly t o t he processor s local
10-2 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
int errupt pins ( LI NT0 and LI NT1) . The I / O devices may also be connect ed t o an
8259- t ype int errupt cont roller t hat is in t urn connect ed t o t he processor t hrough
one of t he local int errupt pins.
Ex t er nal l y connect ed I / O devi ces These int errupt s originat e as an edge or
level assert ed by an I / O device t hat is connect ed t o t he int errupt input pins of an
I / O API C. I nt errupt s are sent as I / O int errupt messages from t he I / O API C t o one
or more of t he processors in t he syst em.
I nt er - pr ocessor i nt er r upt s ( I PI s) An I nt el 64 or I A- 32 processor can use
t he I PI mechanism t o int errupt anot her processor or group of processors on t he
syst em bus. I PI s are used for soft ware self- int errupt s, int errupt forwarding, or
preempt ive scheduling.
API C t i mer gener at ed i nt er r upt s The local API C t imer can be programmed
t o send a local int errupt t o it s associat ed processor when a programmed count is
reached ( see Sect ion 10. 5. 4, API C Timer ) .
Per f or mance moni t or i ng count er i nt er r upt s P6 family, Pent ium 4, and
I nt el Xeon processors provide t he abilit y t o send an int errupt t o it s associat ed
processor when a performance- monit oring count er overflows ( see Sect ion
30. 8. 5. 8, Generat ing an I nt errupt on Overflow ) .
Ther mal Sensor i nt er r upt s Pent ium 4 and I nt el Xeon processors provide t he
abilit y t o send an int errupt t o t hemselves when t he int ernal t hermal sensor has
been t ripped ( see Sect ion 14. 5. 2, Thermal Monit or ) .
API C i nt er nal er r or i nt er r upt s When an error condit ion is recognized wit hin
t he local API C ( such as an at t empt t o access an unimplement ed regist er) , t he
API C can be programmed t o send an int errupt t o it s associat ed processor ( see
Sect ion 10. 5. 3, Error Handling ) .
Of t hese int errupt sources: t he processor s LI NT0 and LI NT1 pins, t he API C t imer, t he
performance- monit oring count ers, t he t hermal sensor, and t he int ernal API C error
det ect or are referred t o as l ocal i nt er r upt sour ces. Upon receiving a signal from a
local int errupt source, t he local API C delivers t he int errupt t o t he processor core
using an int errupt delivery prot ocol t hat has been set up t hrough a group of API C
regist ers called t he l ocal v ect or t abl e or LVT ( see Sect ion 10. 5. 1, Local Vect or
Table ) . A separat e ent ry is provided in t he local vect or t able for each local int errupt
source, which allows a specific int errupt delivery prot ocol t o be set up for each
source. For example, if t he LI NT1 pin is going t o be used as an NMI pin, t he LI NT1
ent ry in t he local vect or t able can be set up t o deliver an int errupt wit h vect or number
2 ( NMI int errupt ) t o t he processor core.
The local API C handles int errupt s from t he ot her t wo int errupt sources ( ext ernally
connect ed I / O devices and I PI s) t hrough it s I PI message handling facilit ies.
A processor can generat e I PI s by programming t he int errupt command regist er ( I CR)
in it s local API C ( see Sect ion 10. 6. 1, I nt errupt Command Regist er ( I CR) ) . The act
of writ ing t o t he I CR causes an I PI message t o be generat ed and issued on t he
syst em bus ( for Pent ium 4 and I nt el Xeon processors) or on t he API C bus ( for
Pent ium and P6 family processors) . See Sect ion 10. 2, Syst em Bus Vs. API C Bus.
Vol. 3 10-3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
I PI s can be sent t o ot her processors in t he syst em or t o t he originat ing processor
( self- int errupt s) . When t he t arget processor receives an I PI message, it s local API C
handles t he message aut omat ically ( using informat ion included in t he message such
as vect or number and t rigger mode) . See Sect ion 10. 6, I ssuing I nt erprocessor
I nt errupt s, for a det ailed explanat ion of t he local API Cs I PI message delivery and
accept ance mechanism.
The local API C can also receive int errupt s from ext ernally connect ed devices t hrough
t he I / O API C ( see Figure 10- 1) . The I / O API C is responsible for receiving int errupt s
generat ed by syst em hardware and I / O devices and forwarding t hem t o t he local
API C as int errupt messages.
I ndividual pins on t he I / O API C can be programmed t o generat e a specific int errupt
vect or when assert ed. The I / O API C also has a virt ual wire mode t hat allows it t o
communicat e wit h a st andard 8259A- st yle ext ernal int errupt cont roller. Not e t hat t he
local API C can be disabled ( see Sect ion 10. 4. 3, Enabling or Disabling t he Local
API C ) . This allows an associat ed processor core t o receive int errupt s direct ly from
an 8259A int errupt cont roller.
Bot h t he local API C and t he I / O API C are designed t o operat e in MP syst ems ( see
Figures 10- 2 and 10- 3) . Each local API C handles int errupt s from t he I / O API C, I PI s
from processors on t he syst em bus, and self- generat ed int errupt s. I nt errupt s can
Figure 10-1. Relationship of Local APIC and I/O APIC In Single-Processor Systems
I/O APIC
External
Interrupts
System Chip Set
System Bus
Processor Core
Local APIC
Pentium 4 and
Local
Interrupts
Bridge
PCI
Intel Xeon Processors
I/O APIC
External
Interrupts
System Chip Set
3-Wire APIC Bus
Processor Core
Local APIC
Pentium and P6
Local
Interrupts
Family Processors
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
10-4 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
also be delivered t o t he individual processors t hrough t he local int errupt pins;
however, t his mechanism is commonly not used in MP syst ems.
Figure 10-2. Local APICs and I/O APIC When Intel Xeon Processors Are Used in
Multiple-Processor Systems
Figure 10-3. Local APICs and I/O APIC When P6 Family Processors Are Used in
Multiple-Processor Systems
I/O APIC
External
Interrupts
System Chip Set
Processor System Bus
CPU
Local APIC
Processor #2
CPU
Local APIC
Processor #3
CPU
Local APIC
Processor #1
CPU
Local APIC
Processor #3
Bridge
PCI
IPIs IPIs IPIs
Interrupt
Messages
IPIs
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
CPU
Local APIC
Processor #2
CPU
Local APIC
Processor #3
CPU
Local APIC
Processor #1
Interrupt
Messages
I/O APIC
External
Interrupts
System Chip Set
3-wire APIC Bus
CPU
Local APIC
Processor #4
IPIs
IPIs IPIs IPIs
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Vol. 3 10-5
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The I PI mechanism is t ypically used in MP syst ems t o send fixed int errupt s ( int er-
rupt s for a specific vect or number) and special- purpose int errupt s t o processors on
t he syst em bus. For example, a local API C can use an I PI t o forward a fixed int errupt
t o anot her processor for servicing. Special- purpose I PI s ( including NMI , I NI T, SMI
and SI PI I PI s) allow one or more processors on t he syst em bus t o perform syst em-
wide boot - up and cont rol funct ions.
The following sect ions focus on t he local API C and it s implement at ion in t he
Pent ium 4, I nt el Xeon, and P6 family processors. I n t hese sect ions, t he t erms local
API C and I / O API C refer t o local and I / O API Cs used wit h t he P6 family processors
and t o local and I / O xAPI Cs used wit h t he Pent ium 4 and I nt el Xeon processors ( see
Sect ion 10. 3, The I nt el
82489DX ext ernal API C. See Sect ion 19. 27. 1, Soft ware Visible Differ-
ences Bet ween t he Local API C and t he 82489DX.
The API C archit ect ure used in t he Pent ium 4 and I nt el Xeon processors ( called t he
xAPI C archit ect ure) is an ext ension of t he API C archit ect ure found in t he P6 family
processors. The primary difference bet ween t he API C and xAPI C archit ect ures is t hat
wit h t he xAPI C archit ect ure, t he local API Cs and t he I / O API C communicat e t hrough
t he syst em bus. Wit h t he API C archit ect ure, t hey communicat ion t hrough t he API C
bus ( see Sect ion 10. 2, Syst em Bus Vs. API C Bus ) . Also, some API C archit ect ural
feat ures have been ext ended and/ or modified in t he xAPI C archit ect ure. These
ext ensions and modificat ions are described in Sect ion 10. 4 t hrough Sect ion 10. 10.
The x2API C archit ect ure is an ext ension of t he xAPI C archit ect ure, primarily t o
increase processor addressabilit y. The x2API C archit ect ure provides backward
10-6 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
compat ibilit y t o t he xAPI C archit ect ure and forward ext endabilit y for fut ure I nt el
plat form innovat ions. These ext ensions and modificat ions are support ed by a new
mode of execut ion ( x 2API C mode) are det ailed in Sect ion 10. 12.
10.4 LOCAL APIC
The following sect ions describe t he archit ect ure of t he local API C and how t o det ect
it , ident ify it , and det ermine it s st at us. Descript ions of how t o program t he local API C
are given in Sect ion 10. 5. 1, Local Vect or Table, and Sect ion 10. 6. 1, I nt errupt
Command Regist er ( I CR) .
10.4.1 The Local APIC Block Diagram
Figure 10- 4 gives a funct ional block diagram for t he local API C. Soft ware int eract s
wit h t he local API C by reading and writ ing it s regist ers. API C regist ers are memory-
mapped t o a 4- KByt e region of t he processor s physical address space wit h an init ial
st art ing address of FEE00000H. For correct API C operat ion, t his address space must
be mapped t o an area of memory t hat has been designat ed as st rong uncacheable
( UC) . See Sect ion 11. 3, Met hods of Caching Available.
I n MP syst em configurat ions, t he API C regist ers for I nt el 64 or I A- 32 processors on
t he syst em bus are init ially mapped t o t he same 4- KByt e region of t he physical
address space. Soft ware has t he opt ion of changing init ial mapping t o a different
4- KByt e region for all t he local API Cs or of mapping t he API C regist ers for each local
API C t o it s own 4- KByt e region. Sect ion 10. 4. 5, Relocat ing t he Local API C Regis-
t ers, describes how t o relocat e t he base address for API C regist ers.
On processors support ing x2API C archit ect ure ( indicat ed by CPUI D. 01H: ECX[ 21] =
1) , t he local API C support s operat ion in t he xAPI C mode ( as described in Sect ion
10. 4. Addit ionally, soft ware can enable t he local API C t o operat e in x2API C mode for
ext ended processor addressabilit y ( see Sect ion 10. 12) .
NOTE
For P6 family, Pent ium 4, and I nt el Xeon processors, t he API C
handles all memory accesses t o addresses wit hin t he 4- KByt e API C
regist er space int ernally and no ext ernal bus cycles are produced. For
t he Pent ium processors wit h an on- chip API C, bus cycles are
produced for accesses t o t he API C regist er space. Thus, for soft ware
int ended t o run on Pent ium processors, syst em soft ware should
explicit ly not map t he API C regist er space t o regular syst em memory.
Doing so can result in an invalid opcode except ion ( # UD) being
generat ed or unpredict able execut ion.
Vol. 3 10-7
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-4. Local APIC Structure
Current Count
Register
Initial Count
Register
Divide Configuration
Register
Version Register
Error Status
Register
In-Service Register (ISR)
Vector
Decode
Interrupt Command
Register (ICR)
Acceptance
Logic
Vec[3:0]
& TMR Bit
Register
Select
INIT
NMI
SMI
Protocol
Translation Logic
Dest. Mode
& Vector
Processor System Bus
3
APIC ID
Register
Logical Destination
Register
Destination Format
Register
Timer
Local
Interrupts 0,1
Performance
Monitoring Counters
1
Error
Timer
Local Vector Table
DATA/ADDR
Prioritizer
Task Priority Register
EOI Register
INTR
EXTINT
INTA
LINT0/1
1. Introduced in P6 family processors.
Thermal Sensor
2
2. Introduced in the Pentium 4 and Intel Xeon processors.
Perf. Mon.
Thermal
(Internal
Interrupt)
Sensor
(Internal
Interrupt)
Spurious Vector
Register
Local
Interrupts
3. Three-wire APIC bus in P6 family and Pentium processors.
To
CPU
Core
From
CPU
Core
Interrupt Request Register (IRR)
Trigger Mode Register (TMR)
To
CPU
Core
Processor Priority
Register
4. Not implemented in Pentium 4 and Intel Xeon processors.
Arb. ID
Register
4
10-8 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Table 10- 1 shows how t he API C regist ers are mapped int o t he 4- KByt e API C regist er
space. Regist ers are 32 bit s, 64 bit s, or 256 bit s in widt h; all are aligned on 128- bit
boundaries. All 32- bit regist ers should be accessed using 128- bit aligned 32- bit loads
or st ores. Some processors may support loads and st ores of less t han 32 bit s t o some
of t he API C regist ers. This is model specific behavior and is not guarant eed t o work
on all processors. Any FP/ MMX/ SSE access t o an API C regist er, or any access t hat
t ouches byt es 4 t hrough 15 of an API C regist er may cause undefined behavior and
must not be execut ed. This undefined behavior could include hangs, incorrect result s
or unexpect ed except ions, including machine checks, and may vary bet ween imple-
ment at ions. Wider regist ers ( 64- bit or 256- bit ) must be accessed using mult iple 32-
bit loads or st ores, wit h all accesses being 128- bit aligned.
The local API C regist ers list ed in Table 10- 1 are not MSRs. The only MSR associat ed
wit h t he programming of t he local API C is t he I A32_API C_BASE MSR ( see Sect ion
10. 4. 3, Enabling or Disabling t he Local API C ) .
NOTE
I n processors based on I nt el Microarchit ect ure ( Nehalem) t he Local
API C I D Regist er is no longer Read/ Writ e; it is Read Only.
Table 10-1 Local APIC Register Address Map
Address Register Name Software
Read/Write
FEE0 0000H Reserved
FEE0 0010H Reserved
FEE0 0020H Local APIC ID Register Read/Write.
FEE0 0030H Local APIC Version Register Read Only.
FEE0 0040H Reserved
FEE0 0050H Reserved
FEE0 0060H Reserved
FEE0 0070H Reserved
FEE0 0080H Task Priority Register (TPR) Read/Write.
FEE0 0090H Arbitration Priority Register
1
(APR) Read Only.
FEE0 00A0H Processor Priority Register (PPR) Read Only.
FEE0 00B0H EOI Register Write Only.
FEE0 00C0H Remote Read Register
1
(RRD) Read Only
FEE0 00D0H Logical Destination Register Read/Write.
FEE0 00E0H Destination Format Register Read/Write (see
Section 10.6.2.2).
Vol. 3 10-9
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
FEE0 00F0H Spurious Interrupt Vector Register Read/Write (see
Section 10.9.
FEE0 0100H In-Service Register (ISR); bits 31:0 Read Only.
FEE0 0110H In-Service Register (ISR); bits 63:32 Read Only.
FEE0 0120H In-Service Register (ISR); bits 95:64 Read Only.
FEE0 0130H In-Service Register (ISR); bits 127:96 Read Only.
FEE0 0140H In-Service Register (ISR); bits 159:128 Read Only.
FEE0 0150H In-Service Register (ISR); bits 191:160 Read Only.
FEE0 0160H In-Service Register (ISR); bits 223:192 Read Only.
FEE0 0170H In-Service Register (ISR); bits 255:224 Read Only.
FEE0 0180H Trigger Mode Register (TMR); bits 31:0 Read Only.
FEE0 0190H Trigger Mode Register (TMR); bits 63:32 Read Only.
FEE0 01A0H Trigger Mode Register (TMR); bits 95:64 Read Only.
FEE0 01B0H Trigger Mode Register (TMR); bits 127:96 Read Only.
FEE0 01C0H Trigger Mode Register (TMR); bits 159:128 Read Only.
FEE0 01D0H Trigger Mode Register (TMR); bits 191:160 Read Only.
FEE0 01E0H Trigger Mode Register (TMR); bits 223:192 Read Only.
FEE0 01F0H Trigger Mode Register (TMR); bits 255:224 Read Only.
FEE0 0200H Interrupt Request Register (IRR); bits 31:0 Read Only.
FEE0 0210H Interrupt Request Register (IRR); bits 63:32 Read Only.
FEE0 0220H Interrupt Request Register (IRR); bits 95:64 Read Only.
FEE0 0230H Interrupt Request Register (IRR); bits 127:96 Read Only.
FEE0 0240H Interrupt Request Register (IRR); bits 159:128 Read Only.
FEE0 0250H Interrupt Request Register (IRR); bits 191:160 Read Only.
FEE0 0260H Interrupt Request Register (IRR); bits 223:192 Read Only.
FEE0 0270H Interrupt Request Register (IRR); bits 255:224 Read Only.
FEE0 0280H Error Status Register Read Only.
FEE0 0290H through
FEE0 02E0H
Reserved
FEE0 02F0H LVT CMCI Registers Read/Write.
FEE0 0300H Interrupt Command Register (ICR); bits 0-31 Read/Write.
Table 10-1 Local APIC Register Address Map (Contd.)
Address Register Name Software
Read/Write
10-10 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.4.2 Presence of the Local APIC
Beginning wit h t he P6 family processors, t he presence or absence of an on- chip local
API C can be det ect ed using t he CPUI D inst ruct ion. When t he CPUI D inst ruct ion is
execut ed wit h a source operand of 1 in t he EAX regist er, bit 9 of t he CPUI D feat ure
flags ret urned in t he EDX regist er indicat es t he presence ( set ) or absence ( clear) of a
local API C.
10.4.3 Enabling or Disabling the Local APIC
The local API C can be enabled or disabled in eit her of t wo ways:
FEE0 0310H Interrupt Command Register (ICR); bits 32-63 Read/Write.
FEE0 0320H LVT Timer Register Read/Write.
FEE0 0330H LVT Thermal Sensor Register
2
Read/Write.
FEE0 0340H LVT Performance Monitoring Counters
Register
3
Read/Write.
FEE0 0350H LVT LINT0 Register Read/Write.
FEE0 0360H LVT LINT1 Register Read/Write.
FEE0 0370H LVT Error Register Read/Write.
FEE0 0380H Initial Count Register (for Timer) Read/Write.
FEE0 0390H Current Count Register (for Timer) Read Only.
FEE0 03A0H through
FEE0 03D0H
Reserved
FEE0 03E0H Divide Configuration Register (for Timer) Read/Write.
FEE0 03F0H Reserved
NOTES:
1. Not supported in the Pentium 4 and Intel Xeon processors. The Illegal Register Access bit (7) of
the ESR will not be set when writing to these registers.
2. Introduced in the Pentium 4 and Intel Xeon processors. This APIC register and its associated
function are implementation dependent and may not be present in future IA-32 or Intel 64 pro-
cessors.
3. Introduced in the Pentium Pro processor. This APIC register and its associated function are
implementation dependent and may not be present in future IA-32 or Intel 64 processors.
Table 10-1 Local APIC Register Address Map (Contd.)
Address Register Name Software
Read/Write
Vol. 3 10-11
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
1. Using t he API C global enable/ disable flag in t he I A32_API C_BASE MSR ( MSR
address 1BH; see Figure 10- 5) :
When I A32_API C_BASE[ 11] is 0, t he processor is funct ionally equivalent t o
an I A- 32 processor wit hout an on- chip API C. The CPUI D feat ure flag for t he
API C ( see Sect ion 10. 4. 2, Presence of t he Local API C ) is also set t o 0.
When I A32_API C_BASE[ 11] is set t o 0, processor API Cs based on t he 3- wire
API C bus cannot be generally re- enabled unt il a syst em hardware reset . The
3- wire bus loses t rack of arbit rat ion t hat would be necessary for complet e re-
enabling. Cert ain API C funct ionalit y can be enabled ( for example:
performance and t hermal monit oring int errupt generat ion) .
For processors t hat use Front Side Bus ( FSB) delivery of int errupt s, soft ware
may disable or enable t he API C by set t ing and reset t ing
I A32_API C_BASE[ 11] . A hardware reset is not required t o re- st art API C
funct ionalit y, if soft ware guarant ees no int errupt will be sent t o t he API C as
I A32_API C_BASE[ 11] is cleared.
When I A32_API C_BASE[ 11] is set t o 0, prior init ializat ion t o t he API C may be
lost and t he API C may ret urn t o t he st at e described in Sect ion 10. 4. 7. 1,
Local API C St at e Aft er Power- Up or Reset .
2. Using t he API C soft ware enable/ disable flag in t he spurious- int errupt vect or
regist er ( see Figure 10- 23) :
I f I A32_API C_BASE[ 11] is 1, soft ware can t emporarily disable a local API C at
any t ime by clearing t he API C soft ware enable/ disable flag in t he spurious-
int errupt vect or regist er ( see Figure 10- 23) . The st at e of t he local API C when
in t his soft ware- disabled st at e is described in Sect ion 10. 4. 7. 2, Local API C
St at e Aft er I t Has Been Soft ware Disabled.
When t he local API C is in t he soft ware- disabled st at e, it can be re- enabled at
any t ime by set t ing t he API C soft ware enable/ disable flag t o 1.
For t he Pent ium processor, t he API CEN pin ( which is shared wit h t he PI CD1 pin) is
used during power- up or RESET t o disable t he local API C.
Not e t hat each ent ry in t he LVT has a mask bit t hat can be used t o inhibit int errupt s
from being delivered t o t he processor from select ed local int errupt sources ( t he
LI NT0 and LI NT1 pins, t he API C t imer, t he performance- monit oring count ers, t he
t hermal sensor, and/ or t he int ernal API C error det ect or) .
10.4.4 Local APIC Status and Location
The st at us and locat ion of t he local API C are cont ained in t he I A32_API C_BASE MSR
( see Figure 10- 5) . MSR bit funct ions are described below:
BSP f l ag, bi t 8 I ndicat es if t he processor is t he boot st rap processor ( BSP) .
See Sect ion 8. 4, Mult iple- Processor ( MP) I nit ializat ion. Following a power- up or
RESET, t his flag is set t o 1 for t he processor select ed as t he BSP and set t o 0 for
t he remaining processors ( APs) .
10-12 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
API C Gl obal Enabl e f l ag, bi t 11 Enables or disables t he local API C ( see
Sect ion 10. 4. 3, Enabling or Disabling t he Local API C ) . This flag is available in
t he Pent ium 4, I nt el Xeon, and P6 family processors. I t is not guarant eed t o be
available or available at t he same locat ion in fut ure I nt el 64 or I A- 32 processors.
API C Base f i el d, bi t s 12 t hr ough 35 Specifies t he base address of t he API C
regist ers. This 24- bit value is ext ended by 12 bit s at t he low end t o form t he base
address. This aut omat ically aligns t he address on a 4- KByt e boundary. Following
a power- up or RESET, t he field is set t o FEE0 0000H.
Bit s 0 t hrough 7, bit s 9 and 10, and bit s MAXPHYADDR
1
t hrough 63 in t he
I A32_API C_BASE MSR are reserved.
10.4.5 Relocating the Local APIC Registers
The Pent ium 4, I nt el Xeon, and P6 family processors permit t he st art ing address of
t he API C regist ers t o be relocat ed from FEE00000H t o anot her physical address by
modifying t he value in t he 24- bit base address field of t he I A32_API C_BASE MSR.
This ext ension of t he API C archit ect ure is provided t o help resolve conflict s wit h
memory maps of exist ing syst ems and t o allow individual processors in an MP syst em
t o map t heir API C regist ers t o different locat ions in physical memory.
10.4.6 Local APIC ID
At power up, syst em hardware assigns a unique API C I D t o each local API C on t he
syst em bus ( for Pent ium 4 and I nt el Xeon processors) or on t he API C bus ( for P6
family and Pent ium processors) . The hardware assigned API C I D is based on syst em
t opology and includes encoding for socket posit ion and clust er informat ion ( see
Figure 8- 2) .
I n MP syst ems, t he local API C I D is also used as a processor I D by t he BI OS and t he
operat ing syst em. Some processors permit soft ware t o modify t he API C I D. However,
t he abilit y of soft ware t o modify t he API C I D is processor model specific. Because of
1. The MAXPHYADDR is 36 bits for processors that do not support CPUID leaf 80000008H, or indi-
cated by CPUID.80000008H:EAX[bits 7:0] for processors that support CPUID leaf 80000008H.
Figure 10-5. IA32_APIC_BASE MSR (APIC_BASE_MSR in P6 Family)
BSPProcessor is BSP
APIC global enable/disable
APIC BaseBase physical address
63 0 7 10 11 8 9 12
Reserved
MAXPHYADDR
APIC Base Reserved
Vol. 3 10-13
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
t his, operat ing syst em soft ware should avoid writ ing t o t he local API C I D regist er. The
value ret urned by bit s 31- 24 of t he EBX regist er ( when t he CPUI D inst ruct ion is
execut ed wit h a source operand value of 1 in t he EAX regist er) is always t he I nit ial
API C I D ( det ermined by t he plat form init ializat ion) . This is t rue even if soft ware has
changed t he value in t he Local API C I D regist er.
The processor receives t he hardware assigned API C I D ( or I nit ial API C I D) by
sampling pins A11# and A12# and pins BR0# t hrough BR3# ( for t he Pent ium 4, I nt el
Xeon, and P6 family processors) and pins BE0# t hrough BE3# ( for t he Pent ium
processor) . The API C I D lat ched from t hese pins is st ored in t he API C I D field of t he
local API C I D regist er ( see Figure 10- 6) , and is used as t he I nit ial API C I D for t he
processor.
For t he P6 family and Pent ium processors, t he local API C I D field in t he local API C I D
regist er is 4 bit s. Encodings 0H t hrough EH can be used t o uniquely ident ify 15
different processors connect ed t o t he API C bus. For t he Pent ium 4 and I nt el Xeon
processors, t he xAPI C specificat ion ext ends t he local API C I D field t o 8 bit s. These
can be used t o ident ify up t o 255 processors in t he syst em.
10.4.7 Local APIC State
The following sect ions describe t he st at e of t he local API C and it s regist ers following
a power- up or RESET, aft er t he local API C has been soft ware disabled, following an
I NI T reset , and following an I NI T- deassert message.
Figure 10-6. Local APIC ID Register
31 27 24 0
Reserved APIC ID
Address: 0FEE0 0020H
Value after reset: 0000 0000H
P6 family and Pentium processors
Pentium 4 processors, Xeon processors, and later processors
31 24 0
Reserved APIC ID
MSR Address: 802H
31 0
x2APIC ID
x2APIC Mode
10-14 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
x2API C will int roduce 32- bit I D; see Sect ion 10. 12.
10.4.7.1 Local APIC State After Power-Up or Reset
Following a power- up or RESET of t he processor, t he st at e of local API C and it s regis-
t ers are as follows:
The following regist ers are reset t o all 0s:
I RR, I SR, TMR, I CR, LDR, and TPR
Timer init ial count and t imer current count regist ers
Divide configurat ion regist er
The DFR regist er is reset t o all 1s.
The LVT regist er is reset t o 0s except for t he mask bit s; t hese are set t o 1s.
The local API C version regist er is not affect ed.
The local API C I D regist er is set t o a unique API C I D. ( Pent ium and P6 family
processors only) . The Arb I D regist er is set t o t he value in t he API C I D regist er.
The spurious- int errupt vect or regist er is init ialized t o 000000FFH. By set t ing bit 8
t o 0, soft ware disables t he local API C.
I f t he processor is t he only processor in t he syst em or it is t he BSP in an MP
syst em ( see Sect ion 8. 4.1, BSP and AP Processors ) ; t he local API C will respond
normally t o I NI T and NMI messages, t o I NI T# signals and t o STPCLK# signals. I f
t he processor is in an MP syst em and has been designat ed as an AP; t he local
API C will respond t he same as for t he BSP. I n addit ion, it will respond t o SI PI
messages. For P6 family processors only, an AP will not respond t o a STPCLK#
signal.
10.4.7.2 Local APIC State After It Has Been Software Disabled
When t he API C soft ware enable/ disable flag in t he spurious int errupt vect or regist er
has been explicit ly cleared ( as opposed t o being cleared during a power up or
RESET) , t he local API C is t emporarily disabled ( see Sect ion 10. 4. 3, Enabling or
Disabling t he Local API C ) . The operat ion and response of a local API C while in t his
soft ware- disabled st at e is as follows:
The local API C will respond normally t o I NI T, NMI , SMI , and SI PI messages.
Pending int errupt s in t he I RR and I SR regist ers are held and require masking or
handling by t he CPU.
The local API C can st ill issue I PI s. I t is soft wares responsibilit y t o avoid issuing
I PI s t hrough t he I PI mechanism and t he I CR regist er if sending int errupt s
t hrough t his mechanism is not desired.
The recept ion or t ransmission of any I PI s t hat are in progress when t he local API C
is disabled are complet ed before t he local API C ent ers t he soft ware- disabled
st at e.
Vol. 3 10-15
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The mask bit s for all t he LVT ent ries are set . At t empt s t o reset t hese bit s will be
ignored.
( For Pent ium and P6 family processors) The local API C cont inues t o list en t o all
bus messages in order t o keep it s arbit rat ion I D synchronized wit h t he rest of t he
syst em.
10.4.7.3 Local APIC State After an INIT Reset (Wait-for-SIPI State)
An I NI T reset of t he processor can be init iat ed in eit her of t wo ways:
By assert ing t he processor s I NI T# pin.
By sending t he processor an I NI T I PI ( an I PI wit h t he delivery mode set t o I NI T) .
Upon receiving an I NI T t hrough eit her of t hese mechanisms, t he processor responds
by beginning t he init ializat ion process of t he processor core and t he local API C. The
st at e of t he local API C following an I NI T reset is t he same as it is aft er a power- up or
hardware RESET, except t hat t he API C I D and arbit rat ion I D regist ers are not
affect ed. This st at e is also referred t o at t he wait - for- SI PI st at e ( see also: Sect ion
8. 4. 2, MP I nit ializat ion Prot ocol Requirement s and Rest rict ions ) .
10.4.7.4 Local APIC State After It Receives an INIT-Deassert IPI
Only t he Pent ium and P6 family processors support t he I NI T- deassert I PI . An I NI T-
disassert I PI has no affect on t he st at e of t he API C, ot her t han t o reload t he arbit ra-
t ion I D regist er wit h t he value in t he API C I D regist er.
10.4.8 Local APIC Version Register
The local API C cont ains a hardwired version regist er. Soft ware can use t his regist er t o
ident ify t he API C version ( see Figure 10- 7) . I n addit ion, t he regist er specifies t he
number of ent ries in t he local vect or t able ( LVT) for a specific implement at ion.
The fields in t he local API C version regist er are as follows:
Ver si on The version numbers of t he local API C:
1XH Local API C. For Pent ium 4 and I nt el Xeon
processors, 14H is ret urned.
0XH 82489DX ext ernal API C.
20H - FFH Reserved.
Max LVT Ent r y Shows t he number of LVT ent ries minus 1. For t he Pent ium 4 and
I nt el Xeon processors ( which have 6 LVT ent ries) , t he value
ret urned in t he Max LVT field is 5; for t he P6 family processors
( which have 5 LVT ent ries) , t he value ret urned is 4; for t he
Pent ium processor ( which has 4 LVT ent ries) , t he value ret urned
is 3.
10-16 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Suppr ess EOI - br oadcast s
I ndicat es whet her soft ware can inhibit t he broadcast of EOI
message by set t ing bit 12 of t he Spurious I nt errupt Vect or
Regist er; see Sect ion 10. 8. 5 and Sect ion 10. 9.
10.5 HANDLING LOCAL INTERRUPTS
The following sect ions describe facilit ies t hat are provided in t he local API C for
handling local int errupt s. These include: t he processor s LI NT0 and LI NT1 pins, t he
API C t imer, t he performance- monit oring count ers, t he t hermal sensor, and t he
int ernal API C error det ect or. Local int errupt handling facilit ies include: t he LVT, t he
error st at us regist er ( ESR) , t he divide configurat ion regist er ( DCR) , and t he init ial
count and current count regist ers.
10.5.1 Local Vector Table
The local vect or t able ( LVT) allows soft ware t o specify t he manner in which t he local
int errupt s are delivered t o t he processor core. I t consist s of t he following 32- bit API C
regist ers ( see Figure 10- 8) , one for each local int errupt :
LVT Ti mer Regi st er ( FEE0 0320H) Specifies int errupt delivery when t he
API C t imer signals an int errupt ( see Sect ion 10. 5. 4, API C Timer ) .
LVT Ther mal Moni t or Regi st er ( FEE0 0330H) Specifies int errupt delivery
when t he t hermal sensor generat es an int errupt ( see Sect ion 14. 5. 2, Thermal
Monit or ) . This LVT ent ry is implement at ion specific, not archit ect ural. I f imple-
ment ed, it will always be at base address FEE0 0330H.
LVT Per f or mance Count er Regi st er ( FEE0 0340H) Specifies int errupt
delivery when a performance count er generat es an int errupt on overflow ( see
Sect ion 30. 8. 5.8, Generat ing an I nt errupt on Overflow ) . This LVT ent ry is
implement at ion specific, not archit ect ural. I f implement ed, it is not guarant eed
t o be at base address FEE0 0340H.
Figure 10-7. Local APIC Version Register
31 0
Reserved
7 8 23 15
Support for EOI-broadcast suppression
16
Reserved
25 24
Version Max LVT Entry
Value after reset: 00BN 00VVH
V = Version, N = # of LVT entries minus 1,
Address: FEE0 0030H
B = 1 if EOI-broadcast suppression supported
Vol. 3 10-17
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
LVT LI NT0 Regi st er ( FEE0 0350H) Specifies int errupt delivery when an
int errupt is signaled at t he LI NT0 pin.
LVT LI NT1 Regi st er ( FEE0 0360H) Specifies int errupt delivery when an
int errupt is signaled at t he LI NT1 pin.
LVT Er r or Regi st er ( FEE0 0370H) Specifies int errupt delivery when t he
API C det ect s an int ernal error ( see Sect ion 10. 5. 3, Error Handling ) .
CMCI LVT Regi st er ( FEE0 02F0H) Specifies int errupt delivery when an
overflow condit ion of correct ed machine check error count reaching a t hreshold
value occurred in a machine check bank support ing CMCI ( see Sect ion 15. 5. 1,
CMCI Local API C I nt erface ) .
The LVT performance count er regist er and it s associat ed int errupt were int roduced in
t he P6 processors and are also present in t he Pent ium 4 and I nt el Xeon processors.
The LVT t hermal monit or regist er and it s associat ed int errupt were int roduced in t he
Pent ium 4 and I nt el Xeon processors.
As shown in Figures 10- 8, some of t hese fields and flags are not available ( and
reserved) for some ent ries.
The set up informat ion t hat can be specified in t he regist ers of t he LVT t able is as
follows:
Vect or I nt errupt vect or number.
Del i ver y Mode Specifies t he t ype of int errupt t o be sent t o t he processor. Some
delivery modes will only operat e as int ended when used in
conj unct ion wit h a specific t rigger mode. The allowable delivery
modes are as follows:
000 ( Fi x ed) Delivers t he int errupt specified in t he vect or
field.
010 ( SMI ) Delivers an SMI int errupt t o t he processor
core t hrough t he processor s local SMI signal
pat h. When using t his delivery mode, t he
vect or field should be set t o 00H for fut ure
compat ibilit y.
100 ( NMI ) Delivers an NMI int errupt t o t he processor.
The vect or informat ion is ignored.
101 ( I NI T) Delivers an I NI T request t o t he processor
core, which causes t he processor t o perform
an I NI T. When using t his delivery mode, t he
vect or field should be set t o 00H for fut ure
compat ibilit y.
10-18 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-8. Local Vector Table (LVT)
31 0 7
Vector
Timer Mode
0: One-shot
1: Periodic
12 15 16 17 18
Delivery Mode
000: Fixed
100: NMI
Mask
0: Not Masked
1: Masked
Address: FEE0 0350H
Value After Reset: 0001 0000H
Reserved
12 13 15 16
Vector
31 0 7 8 10
Address: FEE0 0360H
Address: FEE0 0370H
Vector
Vector
Error
LINT1
LINT0
Value after Reset: 0001 0000H
Address: FEE0 0320H
111: ExtlNT
All other combinations
are Reserved
Interrupt Input
Pin Polarity
Trigger Mode
0: Edge
1: Level
Remote
IRR
Delivery Status
0: Idle
1: Send Pending
Timer
13 11 8
11
14
17
Address: FEE0 0340H
Performance
Vector
Thermal
Vector
Mon. Counters
Sensor
Address: FEE0 0330H
(Pentium 4 and Intel Xeon processors.) When a
performance monitoring counters interrupt is generated,
the mask bit for its associated LVT entry is set.
010: SMI
101: INIT
Vol. 3 10-19
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
111 ( Ex t I NT) Causes t he processor t o respond t o t he in-
t errupt as if t he int errupt originat ed in an
ext ernally connect ed ( 8259A- compat ible)
int errupt cont roller. A special I NTA bus cycle
corresponding t o Ext I NT, is rout ed t o t he ex-
t ernal cont roller. The ext ernal cont roller is
expect ed t o supply t he vect or informat ion.
The API C archit ect ure support s only one Ex-
t I NT source in a syst em, usually cont ained in
t he compat ibilit y bridge.
Del i ver y St at us ( Read Onl y)
I ndicat es t he int errupt delivery st at us, as follows:
0 ( I dl e) There is current ly no act ivit y for t his int er-
rupt source, or t he previous int errupt from
t his source was delivered t o t he processor
core and accept ed.
1 ( Send Pendi ng)
I ndicat es t hat an int errupt from t his source
has been delivered t o t he processor core,
but has not yet been accept ed ( see Sect ion
10. 5. 5, Local I nt errupt Accept ance ) .
I nt er r upt I nput Pi n Pol ar i t y
Specifies t he polarit y of t he corresponding int errupt pin: ( 0)
act ive high or ( 1) act ive low.
Remot e I RR Fl ag ( Read Onl y)
For fixed mode, level- t riggered int errupt s; t his flag is set when
t he local API C accept s t he int errupt for servicing and is reset
when an EOI command is received from t he processor. The
meaning of t his flag is undefined for edge- t riggered int errupt s
and ot her delivery modes.
Tr i gger Mode Select s t he t rigger mode for t he local LI NT0 and LI NT1 pins: ( 0)
edge sensit ive and ( 1) level sensit ive. This flag is only used
when t he delivery mode is Fixed. When t he delivery mode is
NMI , SMI , or I NI T, t he t rigger mode is always edge sensit ive.
When t he delivery mode is Ext I NT, t he t rigger mode is always
level sensit ive. The t imer and error int errupt s are always t reat ed
as edge sensit ive.
I f t he local API C is not used in conj unct ion wit h an I / O API C and
fixed delivery mode is select ed; t he Pent ium 4, I nt el Xeon, and
P6 family processors will always use level- sensit ive t riggering,
regardless if edge- sensit ive t riggering is select ed.
Mask I nt errupt mask: ( 0) enables recept ion of t he int errupt and ( 1)
inhibit s recept ion of t he int errupt . When t he local API C handles
a performance- monit oring count ers int errupt , it aut omat ically
10-20 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
set s t he mask flag in t he corresponding LVT ent ry. This flag will
remain set unt il soft ware clears it .
Ti mer Mode Select s t he t imer mode: ( 0) one- shot and ( 1) periodic ( see
Sect ion 10.5. 4, API C Timer ) .
10.5.2 Valid Interrupt Vectors
The I nt el 64 and I A- 32 archit ect ures define 256 vect or numbers, ranging from 0
t hrough 255 ( see Sect ion 6. 2, Except ion and I nt errupt Vect ors ) . Local and I / O
API Cs support 240 of t hese vect ors ( in t he range of 16 t o 255) as valid int errupt s.
When an int errupt vect or in t he range of 0 t o 15 is sent or received t hrough t he local
API C, t he API C indicat es an illegal vect or in it s Error St at us Regist er ( see Sect ion
10. 5. 3, Error Handling ) . The I nt el 64 and I A- 32 archit ect ures reserve vect ors 16
t hrough 31 for predefined int errupt s, except ions, and I nt el- reserved encodings ( see
Table 6- 1) . However, t he local API C does not t reat vect ors in t his range as illegal.
When an illegal vect or value ( 0 t o 15) is writ t en t o an LVT ent ry and t he delivery
mode is Fixed ( bit s 8- 11 equal 0) , t he API C may signal an illegal vect or error, wit hout
regard t o whet her t he mask bit is set or whet her an int errupt is act ually seen on t he
input .
10.5.3 Error Handling
The local API C provides an error st at us regist er ( ESR) t hat it uses t o record errors
t hat it det ect s when handling int errupt s ( see Figure 10- 9) . An API C error int errupt is
generat ed when t he local API C set s one of t he error bit s in t he ESR. The LVT error
regist er allows select ion of t he int errupt vect or t o be delivered t o t he processor core
when API C error is det ect ed. The LVT error regist er also provides a means of masking
an API C error int errupt .
The ESR is a writ e/ read regist er. A writ e ( of any value) t o t he ESR must be done t o
updat e t he regist er before at t empt ing t o read it . This writ e clears any previously
logged errors and updat es t he ESR wit h any errors det ect ed since t he last writ e t o t he
ESR. Errors are collect ed regardless of LVT Error mask bit , but t he API C will only
issue an int errupt due t o t he error if t he LVT Error mask bit is cleared.
The funct ions of t he ESR are list ed in Table 10- 2.
Error handling in x2API C mode is discussed in Sect ion 10. 12. 8.
Vol. 3 10-21
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-9. Error Status Register (ESR)
Table 10-2. ESR Flags
FLAG Function
Send Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it sent on the APIC bus.
Receive Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it received on the APIC
bus.
Send Accept Error (P6 family and Pentium processors only) Set when the local APIC
detects that a message it sent was not accepted by any APIC on the
APIC bus.
Receive Accept Error (P6 family and Pentium processors only) Set when the local APIC
detects that the message it received was not accepted by any APIC
on the APIC bus, including itself.
Send Illegal Vector Set when the local APIC detects an illegal vector in the message that
it is sending.
Receive Illegal Vector Set when the local APIC detects an illegal vector in the message it
received, including an illegal vector code in the local vector table
interrupts or in a self-interrupt.
Address: FEE0 0280H
Value after reset: 0H
31 0
Reserved
7 8 1 2 3 4 5 6
Illegal Register Address
1
Received Illegal Vector
Send Illegal Vector
Reserved
Receive Accept Error
2
Send Accept Error
2
Receive Checksum Error
2
Send Checksum Error
2
2. Only used in the P6 family and Pentium processors;
reserved in Intel Core, Pentium 4 and Intel Xeon processors.
1. Used in Intel Core, Pentium 4, Intel Xeon, and P6 family
processors; reserved in the Pentium processor.
NOTES:
10-22 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.5.4 APIC Timer
The local API C unit cont ains a 32- bit programmable t imer t hat is available t o soft -
ware t o t ime event s or operat ions. This t imer is set up by programming four regis-
t ers: t he divide configurat ion regist er ( see Figure 10- 10) , t he init ial- count and
current - count regist ers ( see Figure 10- 11) , and t he LVT t imer regist er ( see
Figure 10- 8) .
I f CPUI D. 06H: EAX. ARAT[ bit 2] = 1, t he processor s API C t imer runs at a const ant
rat e regardless of P- st at e t ransit ions and it cont inues t o run at t he same rat e in deep
C- st at es.
I f CPUI D. 06H: EAX. ARAT[ bit 2] = 0 or if CPUI D 06H is not support ed, t he API C t imer
may t emporarily st op while t he processor is in deep C- st at es or during t ransit ions
caused by Enhanced I nt el SpeedSt ep Technology.
Illegal Reg. Address (Intel Core, Intel
Virt ualizat ion Technology for Direct ed I / O, Rev 1.2 specificat ion.
Modificat ions t o ACPI int erfaces t o support x2API C are described in Appendix A,
ACPI Ext ensions for x2API C Support , of t he I nt el
mi cr oar chi t ect ur e codename West mer e The L1 cache is divided int o t wo
sect ions: one sect ion is dedicat ed t o caching inst ruct ions ( pre- decoded inst ruc-
t ions) and t he ot her caches dat a. The L2 cache is a unified dat a and inst ruct ion
cache. Each processor core has it s own L1 and L2. The L3 cache is an inclusive,
unified dat a and inst ruct ion cache, shared by all processor cores inside a physical
package. No t race cache is implement ed.
I nt el
Xeon
pr ocessor f ami l y
based on I nt el
4 and I nt el
Xeon
mi cr oar chi t ect ur e The t race cache caches decoded inst ruct ions ( ops) from
t he inst ruct ion decoder and t he L1 cache cont ains dat a. The L2 and L3 caches are
unified dat a and inst ruct ion caches locat ed on t he processor chip. Dualcore
processors have t wo L2, one in each processor core. Not e t hat t he L3 cache is
only implement ed on some I nt el Xeon processors.
P6 f ami l y pr ocessor s The L1 cache is divided int o t wo sect ions: one
dedicat ed t o caching inst ruct ions ( pre- decoded inst ruct ions) and t he ot her t o
caching dat a. The L2 cache is a unified dat a and inst ruct ion cache locat ed on t he
processor chip. P6 family processors do not implement a t race cache.
Pent i um
TECHNOLOGY SYSTEM
PROGRAMMING
This chapt er describes t hose feat ures of t he I nt el
MMX
MMX
MMX
MMX
MMX
MMX
MMX
TECHNOLOGY
Enhanced I nt el SpeedSt ep
Core Solo, I nt el
Core Duo, I nt el
At om and I nt el
4 and I nt el
Xeon
64 Processors
For I nt el 64 archit ect ure processors, debug regist ers DR0DR7 are 64 bit s. I n 16- bit
or 32- bit modes ( prot ect ed mode and compat ibilit y mode) , writ es t o a debug regist er
fill t he upper 32 bit s wit h zeros. Reads from a debug regist er ret urn t he lower 32 bit s.
I n 64- bit mode, MOV DRn inst ruct ions read or writ e all 64 bit s. Operand- size prefixes
are ignored.
I n 64- bit mode, t he upper 32 bit s of DR6 and DR7 are reserved and must be writ t en
wit h zeros. Writ ing 1 t o any of t he upper 32 bit s result s in a # GP( 0) except ion ( see
Figure 16- 2) . All 64 bit s of DR0DR3 are writ able by soft ware. However, MOV DRn
inst ruct ions do not check t hat addresses writ t en t o DR0DR3 are in t he linear-
address limit s of t he processor implement at ion ( address mat ching is support ed only
on valid addresses generat ed by t he processor implement at ion) . Break point condi-
t ions for 8- byt e memory read/ writ es are support ed in all modes.
Data operations that do not trap
- Read or write
- Read
- Read or write
- Read or write
- Read
- Read or write
A0000H
A0002H
A0003H
B0000H
C0000H
C0004H
1
1
4
2
2
4
Table 16-1. Breakpoint Examples (Contd.)
Debug Register Setup
Debug Register R/Wn Breakpoint Address LENn
Vol. 3 16-9
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.3 DEBUG EXCEPTIONS
The I nt el 64 and I A- 32 archit ect ures dedicat e t wo int errupt vect ors t o handling
debug except ions: vect or 1 ( debug except ion, # DB) and vect or 3 ( breakpoint excep-
t ion, # BP) . The following sect ions describe how t hese except ions are generat ed and
t ypical except ion handler operat ions.
16.3.1 Debug Exception (#DB)Interrupt Vector 1
The debug- except ion handler is usually a debugger program or part of a larger soft -
ware syst em. The processor generat es a debug except ion for any of several condi-
t ions. The debugger checks flags in t he DR6 and DR7 regist ers t o det ermine which
condit ion caused t he except ion and which ot her condit ions might apply. Table 16- 2
shows t he st at es of t hese flags following t he generat ion of each kind of breakpoint
condit ion.
I nst ruct ion- breakpoint and general- det ect condit ion ( see Sect ion 16. 3. 1. 3, General-
Det ect Except ion Condit ion ) result in fault s; ot her debug- except ion condit ions result
in t raps. The debug except ion may report one or bot h at one t ime. The following
sect ions describe each class of debug except ion.
Figure 16-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
DR7
L
Reserved
0
1 2 3 4 5 6 9 10 17 18 25 26 27 28 29 30
G
0
L
1
L
2
L
3
G
3
L
E
G
E
G
2
G
1
G
D
R/W
0
LEN
0
R/W
1
LEN
1
R/W
2
LEN
2
R/W
3
LEN
3
31 16 15 13 14 12 11 8 7 0
DR6
B
0
1 2 3 4 5 6 9 10
B
1
B
2
B
3
0 1 1 1 1 1 1 1 1 1 B
D
B
S
B
T
63 32
63 32
DR6
DR7
0 0 0 0 1
Reserved (set to 1)
16-10 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
See also: Chapt er 6, I nt errupt 1Debug Except ion ( # DB) , in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3A.
16.3.1.1 Instruction-Breakpoint Exception Condition
The processor report s an inst ruct ion breakpoint when it at t empt s t o execut e an
inst ruct ion at an address specified in a breakpoint - address regist er ( DB0 t hrough
DR3) t hat has been set up t o det ect inst ruct ion execut ion ( R/ W flag is set t o 0) . Upon
report ing t he inst ruct ion breakpoint , t he processor generat es a fault - class, debug
except ion ( # DB) before it execut es t he t arget inst ruct ion for t he breakpoint .
I nst ruct ion breakpoint s are t he highest priorit y debug except ions. They are serviced
before any ot her except ions det ect ed during t he decoding or execut ion of an inst ruc-
t ion. However, if a code inst r uct ion br eakpoint is placed on an inst r uct ion locat ed
immediat ely aft er a POP SS/ MOV SS inst ruct ion, t he breakpoint may not be t rig-
ger ed. I n most sit uat ions, POP SS/ MOV SS will inhibit such int er r upt s ( see
MOVMove and POPPop a Value fr om t he St ack in Chapt er s 3 and 4 of t he
I nt el 64 and I A- 32 Archit ect ur es Soft ware Developers Manual, Volumes
2A & 2B) .
Because t he debug except ion for an inst ruct ion breakpoint is generat ed before t he
inst ruct ion is execut ed, if t he inst ruct ion breakpoint is not removed by t he except ion
handler; t he processor will det ect t he inst ruct ion breakpoint again when t he inst ruc-
t ion is rest art ed and generat e anot her debug except ion. To prevent looping on an
inst ruct ion breakpoint , t he I nt el 64 and I A- 32 archit ect ures provide t he RF flag
Table 16-2. Debug Exception Conditions
Debug or Breakpoint Condition DR6 Flags
Tested
DR7 Flags
Tested
Exception Class
Single-step trap BS = 1 Trap
Instruction breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 0 Fault
Data write breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 1 Trap
I/O read or write breakpoint, at
addresses defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 2 Trap
Data read or write (but not instruction
fetches), at addresses defined by DRn
and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 3 Trap
General detect fault, resulting from an
attempt to modify debug registers
(usually in conjunction with in-circuit
emulation)
BD = 1 Fault
Task switch BT = 1 Trap
Vol. 3 16-11
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
( resume flag) in t he EFLAGS regist er ( see Sect ion 2. 3, Syst em Flags and Fields in
t he EFLAGS Regist er, in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 3A) . When t he RF flag is set , t he processor ignores inst ruct ion
breakpoint s.
All I nt el 64 and I A- 32 processors manage t he RF flag as follows. The RF Flag is
cleared at t he st art of t he inst ruct ion aft er t he check for code breakpoint , CS limit
violat ion and FP except ions. Task Swit ches and I RETD/ I RETQ inst ruct ions t ransfer
t he RF image from t he TSS/ st ack t o t he EFLAGS regist er.
When calling an event handler, I nt el 64 and I A- 32 processors est ablish t he value of
t he RF flag in t he EFLAGS image pushed on t he st ack:
For any fault - class except ion except a debug except ion generat ed in response t o
an inst ruct ion breakpoint , t he value pushed for RF is 1.
For any int errupt arriving aft er any it erat ion of a repeat ed st ring inst ruct ion but
t he last it erat ion, t he value pushed for RF is 1.
For any t rap- class except ion generat ed by any it erat ion of a repeat ed st ring
inst ruct ion but t he last it erat ion, t he value pushed for RF is 1.
For ot her cases, t he value pushed for RF is t he value t hat was in EFLAG. RF at t he
t ime t he event handler was called. This includes:
Debug except ions generat ed in response t o inst ruct ion breakpoint s
Hardware- generat ed int errupt s arriving bet ween inst ruct ions ( including
t hose arriving aft er t he last it erat ion of a repeat ed st ring inst ruct ion)
Trap- class except ions generat ed aft er an inst ruct ion complet es ( including
t hose generat ed aft er t he last it erat ion of a repeat ed st ring inst ruct ion)
Soft ware- generat ed int errupt s ( RF is pushed as 0, since it was cleared at t he
st art of t he soft ware int errupt )
As not ed above, t he processor does not set t he RF flag prior t o calling t he debug
except ion handler for debug except ions result ing from inst ruct ion breakpoint s. The
debug except ion handler can prevent recurrence of t he inst ruct ion breakpoint by
set t ing t he RF flag in t he EFLAGS image on t he st ack. I f t he RF flag in t he EFLAGS
image is set when t he processor ret urns from t he except ion handler, it is copied int o
t he RF flag in t he EFLAGS regist er by I RETD/ I RETQ or a t ask swit ch t hat causes t he
ret urn. The processor t hen ignores inst ruct ion breakpoint s for t he durat ion of t he
next inst ruct ion. ( Not e t hat t he POPF, POPFD, and I RET inst ruct ions do not t ransfer
t he RF image int o t he EFLAGS regist er. ) Set t ing t he RF flag does not prevent ot her
t ypes of debug- except ion condit ions ( such as, I / O or dat a breakpoint s) from being
det ect ed, nor does it prevent non- debug except ions from being generat ed.
For t he Pent ium processor, when an inst ruct ion breakpoint coincides wit h anot her
fault - t ype except ion ( such as a page fault ) , t he processor may generat e one spurious
debug except ion aft er t he second except ion has been handled, even t hough t he
debug except ion handler set t he RF flag in t he EFLAGS image. To prevent a spurious
except ion wit h Pent ium processors, all fault - class except ion handlers should set t he
RF flag in t he EFLAGS image.
16-12 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.3.1.2 Data Memory and I/O Breakpoint Exception Conditions
Dat a memory and I / O breakpoint s are report ed when t he processor at t empt s t o
access a memory or I / O address specified in a breakpoint - address regist er ( DB0
t hrough DR3) t hat has been set up t o det ect dat a or I / O accesses ( R/ W flag is set t o
1, 2, or 3) . The processor generat es t he except ion aft er it execut es t he inst ruct ion
t hat made t he access, so t hese breakpoint condit ion causes a t rap- class except ion t o
be generat ed.
Because dat a breakpoint s are t raps, t he original dat a is overwrit t en before t he t rap
except ion is generat ed. I f a debugger needs t o save t he cont ent s of a writ e break-
point locat ion, it should save t he original cont ent s before set t ing t he breakpoint . The
handler can report t he saved value aft er t he breakpoint is t riggered. The address in
t he debug regist ers can be used t o locat e t he new value st ored by t he inst ruct ion t hat
t riggered t he breakpoint .
I nt el486 and lat er processors ignore t he GE and LE flags in DR7. I n I nt el386 proces-
sors, exact dat a breakpoint mat ching does not occur unless it is enabled by set t ing
t he LE and/ or t he GE flags.
P6 family processors are unable t o report dat a breakpoint s exact ly for t he REP MOVS
and REP STOS inst ruct ions unt il t he complet ion of t he it erat ion aft er t he it erat ion in
which t he breakpoint occurred.
For repeat ed I NS and OUTS inst ruct ions t hat generat e an I / O- breakpoint debug
except ion, t he processor generat es t he except ion aft er t he complet ion of t he first
it erat ion. Repeat ed I NS and OUTS inst ruct ions generat e a memory- breakpoint debug
except ion aft er t he it erat ion in which t he memory address breakpoint locat ion is
accessed.
16.3.1.3 General-Detect Exception Condition
When t he GD flag in DR7 is set , t he general- det ect debug except ion occurs when a
program at t empt s t o access any of t he debug regist ers ( DR0 t hrough DR7) at t he
same t ime t hey are being used by anot her applicat ion, such as an emulat or or
debugger. This prot ect ion feat ure guarant ees full cont rol over t he debug regist ers
when required. The debug except ion handler can det ect t his condit ion by checking
t he st at e of t he BD flag in t he DR6 regist er. The processor generat es t he except ion
before it execut es t he MOV inst ruct ion t hat accesses a debug regist er, which causes
a fault - class except ion t o be generat ed.
16.3.1.4 Single-Step Exception Condition
The processor generat es a single- st ep debug except ion if ( while an inst ruct ion is
being execut ed) it det ect s t hat t he TF flag in t he EFLAGS regist er is set . The excep-
t ion is a t rap- class except ion, because t he except ion is generat ed aft er t he inst ruc-
t ion is execut ed. The processor will not generat e t his except ion aft er t he inst ruct ion
t hat set s t he TF flag. For example, if t he POPF inst ruct ion is used t o set t he TF flag, a
Vol. 3 16-13
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
single- st ep t rap does not occur unt il aft er t he inst ruct ion t hat follows t he POPF
inst ruct ion.
The processor clears t he TF flag before calling t he except ion handler. I f t he TF flag
was set in a TSS at t he t ime of a t ask swit ch, t he except ion occurs aft er t he first
inst ruct ion is execut ed in t he new t ask.
The TF flag normally is not cleared by privilege changes inside a t ask. The I NT n and
I NTO inst ruct ions, however, do clear t his flag. Therefore, soft ware debuggers t hat
single- st ep code must recognize and emulat e I NT n or I NTO inst ruct ions rat her t han
execut ing t hem direct ly. To maint ain prot ect ion, t he operat ing syst em should check
t he CPL aft er any single- st ep t rap t o see if single st epping should cont inue at t he
current privilege level.
The int errupt priorit ies guarant ee t hat , if an ext ernal int errupt occurs, single st ep-
ping st ops. When bot h an ext ernal int errupt and a single- st ep int errupt occur
t oget her, t he single- st ep int errupt is processed first . This operat ion clears t he TF flag.
Aft er saving t he ret urn address or swit ching t asks, t he ext ernal int errupt input is
examined before t he first inst ruct ion of t he single- st ep handler execut es. I f t he
ext ernal int errupt is st ill pending, t hen it is serviced. The ext ernal int errupt handler
does not run in single- st ep mode. To single st ep an int errupt handler, single st ep an
I NT n inst ruct ion t hat calls t he int errupt handler.
16.3.1.5 Task-Switch Exception Condition
The processor generat es a debug except ion aft er a t ask swit ch if t he T flag of t he new
t ask' s TSS is set . This except ion is generat ed aft er program cont rol has passed t o t he
new t ask, and prior t o t he execut ion of t he first inst ruct ion of t hat t ask. The except ion
handler can det ect t his condit ion by examining t he BT flag of t he DR6 regist er.
I f ent ry 1 ( # DB) in t he I DT is a t ask gat e, t he T bit of t he corresponding TSS should
not be set . Failure t o observe t his rule will put t he processor in a loop.
16.3.2 Breakpoint Exception (#BP)Interrupt Vector 3
The breakpoint except ion ( int errupt 3) is caused by execut ion of an I NT 3 inst ruct ion.
See Chapt er 6, I nt errupt 3Breakpoint Except ion ( # BP) . Debuggers use break
except ions in t he same way t hat t hey use t he breakpoint regist ers; t hat is, as a
mechanism for suspending program execut ion t o examine regist ers and memory
locat ions. Wit h earlier I A- 32 processors, breakpoint except ions are used ext ensively
for set t ing inst ruct ion breakpoint s.
Wit h t he I nt el386 and lat er I A- 32 processors, it is more convenient t o set break-
point s wit h t he breakpoint - address regist ers ( DR0 t hrough DR3) . However, t he
breakpoint except ion st ill is useful for breakpoint ing debuggers, because a break-
point except ion can call a separat e except ion handler. The breakpoint except ion is
also useful when it is necessary t o set more breakpoint s t han t here are debug regis-
t ers or when breakpoint s are being placed in t he source code of a program under
development .
16-14 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING OVERVIEW
P6 family processors int roduced t he abilit y t o set breakpoint s on t aken branches,
int errupt s, and except ions, and t o single- st ep from one branch t o t he next . This
capabilit y has been modified and ext ended in t he Pent ium 4, I nt el Xeon, Pent ium M,
I nt el
Core Solo, I nt el
Core Duo, I nt el
Core2 Duo, I nt el
Core i7 and
I nt el
Core
2 Duo and I nt el
At om
Processor Family)
Sect ion 16. 6, Last Branch, I nt errupt , and Except ion Recording ( I nt el
Core
i7 Processor Family)
Sect ion 16. 7, Last Branch, I nt errupt , and Except ion Recording ( Processors
based on I nt el Net Burst
Core
Solo and I nt el
Core
Duo Processors)
Sect ion 16. 9, Last Branch, I nt errupt , and Except ion Recording ( Pent ium M
Processors)
Sect ion 16. 10, Last Branch, I nt errupt , and Except ion Recording ( P6 Family
Processors)
The following subsect ions of Sect ion 16. 4 describe common feat ures of profiling
branches. These feat ures are generally enabled using t he I A32_DEBUGCTL MSR
( older processor may have implement ed a subset or model- specific feat ures, see
definit ions of MSR_DEBUGCTLA, MSR_DEBUGCTLB, MSR_DEBUGCTL) .
16.4.1 IA32_DEBUGCTL MSR
The I A32_DEBUGCTL MSR provides bit field cont rols t o enable debug t race int er-
rupt s, debug t race st ores, t race messages enable, single st epping on branches, last
branch record recording, and t o cont rol freezing of LBR st ack or performance
count ers on a PMI request . I A32_DEBUGCTL MSR is locat ed at regist er address
01D9H.
See Figure 16- 3 for t he MSR layout and t he bullet s below for a descript ion of t he
flags:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s, and/ or
except ions t aken by t he processor ( prior t o a debug except ion being generat ed)
Vol. 3 16-15
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
in t he last branch record ( LBR) st ack. For more informat ion, see t he Sect ion
16.5. 1, LBR St ack .
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor t reat s
t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag rat her t han
a single- st ep on inst ruct ions flag. This mechanism allows single- st epping t he
processor on t aken branches, int errupt s, and except ions. See Sect ion 16. 4. 3,
Single- St epping on Branches, Except ions, and I nt errupt s, for more informat ion
about t he BTF flag.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race messages
are enabled. When t he processor det ect s a t aken branch, int errupt , or except ion;
it sends t he branch record out on t he syst em bus as a branch t race message
( BTM) . See Sect ion 16.4.4, Branch Trace Messages, for more informat ion about
t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , t he flag enables BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS save
area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t 8) When set , t he BTS facilit ies
generat e an int errupt when t he BTS buffer is full. When clear, BTMs are logged t o
t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace St ore ( BTS) ,
for a descript ion of t his mechanism.
BTS_OFF_OS ( br anch t r ace of f i n pr i vi l eged code) f l ag ( bi t 9) When set ,
BTS or BTM is skipped if CPL is 0. See Sect ion 16.7.2.
BTS_OFF_USR ( br anch t r ace of f i n user code) f l ag ( bi t 10) When set ,
BTS or BTM is skipped if CPL is great er t han 0. See Sect ion 16.7.2.
Figure 16-3. IA32_DEBUGCTL MSR for Processors based
on Intel Core
microarchitecture
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
9 10
BTS_OFF_OS BTS off in OS
BTS_OFF_USR BTS off in user code
FREEZE_LBRS_ON_PMI
FREEZE_PERFMON_ON_PMI
11 12 14
FREEZE_WHILE_SMM_EN
16-16 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) When set , t he LBR st ack is frozen on a
hardware PMI request ( e.g. when a count er overflows and is configured t o t rigger
PMI ) .
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) When set , a PMI request clears
each of t he ENABLE field of MSR_PERF_GLOBAL_CTRL MSR ( see Figure 30- 3) t o
disable all t he count ers.
FREEZE_WHI LE_SMM_EN ( bi t 14) I f t his bit is set , upon t he delivery of an
SMI , t he processor will clear all t he enable bit s of I A32_PERF_GLOBAL_CTRL,
save a copy of t he cont ent of I A32_DEBUGCTL and disable LBR, BTF, TR, and BTS
fields of I A32_DEBUGCTL before t ransferring cont rol t o t he SMI handler. Subse-
quent ly, t he enable bit s of I A32_PERF_GLOBAL_CTRL will be set t o 1, t he saved
copy of I A32_DEBUGCTL prior t o SMI delivery will be rest ored, aft er t he SMI
handler issues RSM t o complet e it s service. Not e t hat syst em soft ware must
check I A32_DEBUGCTL. t o det ermine if t he processor support s t he
FREEZE_WHI LE_SMM_EN cont rol bit . FREEZE_WHI LE_SMM_EN is support ed if
I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 30. 11 for det ails of det ect ing t he presence of I A32_PERF_CAPABI LI TI ES
MSR.
16.4.2 Monitoring Branches, Exceptions, and Interrupts
When t he LBR flag ( bit 0) in t he I A32_DEBUGCTL MSR is set , t he processor aut omat -
ically begins recording branch records for t aken branches, int errupt s, and except ions
( except for debug except ions) in t he LBR st ack MSRs.
When t he processor generat es a a debug except ion ( # DB) , it aut omat ically clears t he
LBR flag before execut ing t he except ion handler. This act ion does not clear previously
st ored LBR st ack MSRs. The branch record for t he last four t aken branches, int errupt s
and/ or except ions are ret ained for analysis.
A debugger can use t he linear addresses in t he LBR st ack t o re- set breakpoint s in t he
breakpoint address regist ers ( DR0 t hrough DR3) . This allows a backward t race from
t he manifest at ion of a part icular bug t oward it s source.
I f t he LBR flag is cleared and TR flag in t he I A32_DEBUGCTL MSR remains set , t he
processor will cont inue t o updat e LBR st ack MSRs. This is because BTM informat ion
must be generat ed from ent ries in t he LBR st ack. A # DB does not aut omat ically clear
t he TR flag.
16.4.3 Single-Stepping on Branches, Exceptions, and Interrupts
When soft ware set s bot h t he BTF flag ( bit 1) in t he I A32_DEBUGCTL MSR and t he TF
flag in t he EFLAGS regist er, t he processor generat es a single- st ep debug except ion
t he next t ime it t akes a branch, services an int errupt , or generat es an except ion. This
mechanism allows t he debugger t o single- st ep on cont rol t ransfers caused by
branches, int errupt s, and except ions. This cont rol- flow single st epping helps isolat e
Vol. 3 16-17
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
a bug t o a part icular block of code before inst ruct ion single- st epping furt her narrows
t he search. I f t he BTF flag is set when t he processor generat es a debug except ion,
t he processor clears t he BTF flag along wit h t he TF flag. The debugger must reset t he
BTF and TF flags before resuming program execut ion t o cont inue cont rol- flow single
st epping.
16.4.4 Branch Trace Messages
Set t ing t he TR flag ( bit 6) in t he I A32_DEBUGCTL MSR enables branch t race
messages ( BTMs) . Thereaft er, when t he processor det ect s a branch, except ion, or
int errupt , it sends a branch record out on t he syst em bus as a BTM. A debugging
device t hat is monit oring t he syst em bus can read t hese messages and synchronize
operat ions wit h t aken branch, int errupt , and except ion event s.
When int errupt s or except ions occur in conj unct ion wit h a t aken branch, addit ional
BTMs are sent out on t he bus, as described in Sect ion 16. 4. 2, Monit oring Branches,
Except ions, and I nt errupt s.
Unlike t he P6 family and Core family processors, t he Pent ium 4, At om, and I nt el Xeon
processors can collect branch records in t he LBR st ack MSRs while at t he same t ime
sending/ st oring BTMs when bot h t he TR and LBR flags are set in t he I A32_DEBUGCTL
MSR ( in t he case of Pent ium 4, processor, MSR_DEBUGCTLA) .
16.4.5 Branch Trace Store (BTS)
A t race of t aken branches, int errupt s, and except ions is useful for debugging code by
providing a met hod of det ermining t he decision pat h t aken t o reach a part icular code
locat ion. The LBR flag ( bit 0) of I A32_DEBUGCTL provides a mechanism for capt uring
records of t aken branches, int errupt s, and except ions and saving t hem in t he last
branch record ( LBR) st ack MSRs, set t ing t he TR flag for sending t hem out ont o t he
syst em bus as BTMs. The branch t race st ore ( BTS) mechanism provides t he addi-
t ional capabilit y of saving t he branch records in a memory- resident BTS buffer, which
is part of t he DS save area. The BTS buffer can be configured t o be circular so t hat
t he most recent branch records are always available or it can be configured t o
generat e an int errupt when t he buffer is nearly full so t hat all t he branch records can
be saved. The BTI NT flag ( bit 8) can be used t o enable t he generat ion of int errupt
when t he BTS buffer is full. See Sect ion 16. 4. 9. 2, Set t ing Up t he DS Save Area. for
addit ional det ails.
Set t ing t his flag ( BTS) alone can great ly reduce t he performance of t he processor.
CPL- qualified branch t race st oring mechanism can help mit igat e t he performance
impact of sending/ logging branch t race messages.
16-18 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.6 CPL-Qualified Branch Trace Mechanism
CPL- qualified branch t race mechanism is available t o a subset of I nt el 64 and I A- 32
processors t hat support t he branch t race st oring mechanism. The processor support s
t he CPL- qualified branch t race mechanism if CPUI D. 01H: ECX[ bit 4] = 1.
The CPL- qualified branch t race mechanism is described in Sect ion 16. 4. 9. 4. Syst em
soft ware can select ively specify CPL qualificat ion t o not send/ st ore Branch Trace
Messages associat ed wit h a specified privilege level. Two bit fields, BTS_OFF_USR
( bit 10) and BTS_OFF_OS ( bit 9) , are provided in t he debug cont rol regist er t o
specify t he CPL of BTMs t hat will not be logged in t he BTS buffer or sent on t he bus.
16.4.7 Freezing LBR and Performance Counters on PMI
Many issues may generat e a performance monit oring int errupt ( PMI ) ; a PMI service
handler will need t o det ermine cause t o handle t he sit uat ion. Two capabilit ies t hat
allow a PMI service rout ine t o improve branch t racing and performance monit oring
are:
Fr eezi ng LBRs on PMI ( bi t 11) The processor freezes LBRs on a PMI request
by clearing t he LBR bit ( bit 0) in I A32_DEBUGCTL. Soft ware must t hen re- enable
I A32_DEBUGCTL. [ 0] t o cont inue monit oring branches. When using t his feat ure,
soft ware should be careful about writ es t o I A32_DEBUGCTL t o avoid re- enabling
LBRs by accident if t hey were j ust disabled.
Fr eezi ng PMCs on PMI ( bit 12) The processor freezes t he performance
count ers on a PMI request by clearing t he MSR_PERF_GLOBAL_CTRL MSR ( see
Figure 30- 3) . The PMCs affect ed include bot h general- purpose count ers and
fixed- funct ion count ers ( see Sect ion 30. 4. 1, Fixed- funct ion Performance
Count ers ) . Soft ware must re- enable count s by writ ing 1s t o t he corresponding
enable bit s in MSR_PERF_GLOBAL_CTRL before leaving a PMI service rout ine t o
cont inue count er operat ion.
Freezing LBRs and PMCs on PMI s occur when:
A performance count er had an overflow and was programmed t o signal a PMI in
case of an overflow.
For t he general- purpose count ers; t his is done by set t ing bit 20 of t he
I A32_PERFEVTSELx regist er.
For t he fixed- funct ion count ers; t his is done by set t ing t he 3rd bit in t he
corresponding 4- bit cont rol field of t he MSR_PERF_FI XED_CTR_CTRL regist er
( see Figure 30- 1) or I A32_FI XED_CTR_CTRL MSR ( see Figure 30- 2) .
The PEBS buffer is almost full and reaches t he int errupt t hreshold.
The BTS buffer is almost full and reaches t he int errupt t hreshold.
Vol. 3 16-19
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8 LBR Stack
The last branch record st ack and t op- of- st ack ( TOS) point er MSRs are support ed
across I nt el 64 and I A- 32 processor families. However, t he number of MSRs in t he
LBR st ack and t he valid range of TOS point er value can vary bet ween different
processor families. Table 16- 3 list s t he LBR st ack size and TOS point er range for
several processor families according t o t he CPUI D signat ures of Display-
Family/ DisplayModel encoding ( see CPUI D inst ruct ion in Chapt er 3 of I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2A) .
The last branch recording mechanism t racks not only branch inst ruct ions ( like JMP,
Jcc, LOOP and CALL inst ruct ions) , but also ot her operat ions t hat cause a change in
t he inst ruct ion point er ( like ext ernal int errupt s, t raps and fault s) . The branch
recording mechanisms generally employs a set of MSRs, referred t o as last branch
record ( LRB) st ack. The size and exact locat ions of t he LRB st ack are generally
model- specific.
Last Br anch Recor d ( LBR) St ack The LBR consist s of N pairs of MSRs ( N is
list ed in t he LBR st ack size column of Table 16- 3) t hat st ore source and
dest inat ion address of recent branches ( see Figure 16- 3) :
MSR_LASTBRANCH_0_FROM_I P ( address is model specific) t hrough t he next
consecut ive ( N- 1) MSR address st ore source addresses
MSR_LASTBRANCH_0_TO_I P ( address is model specific ) t hrough t he next
consecut ive ( N- 1) MSR address st ore dest inat ion addresses.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The lowest significant M
bit s of t he TOS Point er MSR ( MSR_LASTBRANCH_TOS, address is model specific)
cont ains an M- bit point er t o t he MSR in t he LBR st ack t hat cont ains t he most
recent branch, int errupt , or except ion recorded. The valid range of t he M- bit POS
point er is given in Table 16- 3.
Table 16-3. LBR Stack Size and TOS Pointer Range
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
06_1AH, 06_1EH, 06_1FH,
06_2EH
16 0 to 15
06_17H, 06_1DH 4 0 to 3
06_0FH 4 0 to 3
06_1CH 8 0 to 7
16-20 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8.1 LBR Stack and Intel
64 Processors
LBR MSRs are 64- bit s. I f I A- 32e mode is disabled, only t he lower 32- bit s of t he
address is recorded. I f I A- 32e mode is enabled, t he processor writ es 64- bit values
int o t he MSR.
I n 64- bit mode, last branch records st ore 64- bit addresses; in compat ibilit y mode,
t he upper 32- bit s of last branch records are cleared.
Soft ware should query an archit ect ural MSR I A32_PERF_CAPABI LI TI ES[ 5: 0]
about t he format of t he address t hat is st ored in t he LBR st ack. Four format s are
defined by t he following encoding:
000000B ( 32- bi t r ecor d f or mat ) St ores 32- bit offset in current CS of
respect ive source/ dest inat ion,
000001B ( 64- bi t LI P r ecor d f or mat ) St ores 64- bit linear address of
respect ive source/ dest inat ion,
000010B ( 64- bi t EI P r ecor d f or mat ) St ores 64- bit offset ( effect ive
address) of respect ive source/ dest inat ion.
000011B ( 64- bi t EI P r ecor d f or mat ) and Fl ags St ores 64- bit offset
( effect ive address) of respect ive source/ dest inat ion. LBR flags are support ed
in t he upper bit s of FROM regist er in t he LBR st ack. See LBR st ack det ails
below for flag support and definit ion.
Processor s support for t he archit ect ural MSR I A32_PERF_CAPABI LI TI ES is
provided by CPUI D.01H: ECX[ PERF_CAPAB_MSR] ( bit 15) .
16.4.8.2 LBR Stack and IA-32 Processors
The LBR MSRs in I A- 32 processors int roduced prior t o I nt el 64 archit ect ure st ore t he
32- bit To Linear Address and From Linear Address using t he high and low half of
each 64- bit MSR.
Figure 16-4. 64-bit Address Layout of LBR MSR
63
Source Address
0
0 63
Destination Address
MSR_LASTBRANCH_0_FROM_IP through MSR_LASTBRANCH_(N-1)_FROM_IP
MSR_LASTBRANCH_0_TO_IP through MSR_LASTBRANCH_(N-1)_TO_IP
Vol. 3 16-21
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8.3 Last Exception Records and Intel 64 Architecture
I nt el 64 and I A- 32 processors also provide MSRs t hat st ore t he branch record for t he
last branch t aken prior t o an except ion or an int errupt . The locat ion of t he last excep-
t ion record ( LER) MSRs are model specific. The MSRs t hat st ore last except ion
records are 64- bit s. I f I A- 32e mode is disabled, only t he lower 32- bit s of t he address
is recorded. I f I A- 32e mode is enabled, t he processor writ es 64- bit values int o t he
MSR. I n 64- bit mode, last except ion records st ore 64- bit addresses; in compat ibilit y
mode, t he upper 32- bit s of last except ion records are cleared.
16.4.9 BTS and DS Save Area
The Debug st or e ( DS) feat ure flag ( bit 21) , ret urned by CPUI D. 1: EDX[ 21] I ndicat es
t hat t he processor provides t he debug st ore ( DS) mechanism. This mechanism
allows BTMs t o be st ored in a memory- resident BTS buffer. See Sect ion 16. 4. 5,
Branch Trace St ore ( BTS) . Precise event - based sampling ( PEBS, see Sect ion
30. 4. 4, Precise Event Based Sampling ( PEBS) , ) also uses t he DS save area
provided by debug st ore mechanism. When CPUI D. 1: EDX[ 21] is set , t he following
BTS facilit ies are available:
The BTS_UNAVAI LABLE flag in t he I A32_MI SC_ENABLE MSR indicat es ( when
clear) t he availabilit y of t he BTS facilit ies, including t he abilit y t o set t he BTS and
BTI NT bit s in t he MSR_DEBUGCTLA MSR.
The I A32_DS_AREA MSR can be programmed t o point t o t he DS save area.
The debug st ore ( DS) save area is a soft ware- designat ed area of memory t hat is
used t o collect t he following t wo t ypes of informat ion:
Br anch r ecor ds When t he BTS flag in t he I A32_DEBUGCTL MSR is set , a
branch record is st ored in t he BTS buffer in t he DS save area whenever a t aken
branch, int errupt , or except ion is det ect ed.
PEBS r ecor ds When a performance count er is configured for PEBS, a PEBS
record is st ored in t he PEBS buffer in t he DS save area aft er t he count er overflow
occurs. This record cont ains t he archit ect ural st at e of t he processor ( st at e of t he
8 general purpose regist ers, EI P regist er, and EFLAGS regist er) at t he next
occurrence of t he PEBS event t hat caused t he count er t o overflow. When t he
st at e informat ion has been logged, t he count er is aut omat ically reset t o a
preselect ed value, and event count ing begins again. This feat ure is available only
for a subset of t he performance event s on processors t hat support PEBS.
NOTES
DS save area and recording mechanism is not available in t he SMM.
The feat ure is disabled on t ransit ion t o t he SMM mode. Similarly DS
recording is disabled on t he generat ion of a machine check except ion
16-22 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
and is cleared on processor RESET and I NI T. DS recording is available
in real address mode.
The BTS and PEBS facilit ies may not be available on all processors.
The availabilit y of t hese facilit ies is indicat ed by t he
BTS_UNAVAI LABLE and PEBS_UNAVAI LABLE flags, respect ively, in
t he I A32_MI SC_ENABLE MSR ( see Appendix B) .
The DS save area is divided int o t hree part s ( see Figure 16- 5) : buffer management
area, branch t race st ore ( BTS) buffer, and PEBS buffer. The buffer management area
is used t o define t he locat ion and size of t he BTS and PEBS buffers. The processor
t hen uses t he buffer management area t o keep t rack of t he branch and/ or PEBS
records in t heir respect ive buffers and t o record t he performance count er reset value.
The linear address of t he first byt e of t he DS buffer management area is specified
wit h t he I A32_DS_AREA MSR.
The fields in t he buffer management area are as follows:
BTS buf f er base Linear address of t he first byt e of t he BTS buffer. This
address should point t o a nat ural doubleword boundary.
BTS i ndex Linear address of t he first byt e of t he next BTS record t o be writ t en
t o. I nit ially, t his address should be t he same as t he address in t he BTS buffer
base field.
BTS absol ut e max i mum Linear address of t he next byt e past t he end of t he
BTS buffer. This address should be a mult iple of t he BTS record size ( 12 byt es)
plus 1.
BTS i nt er r upt t hr eshol d Linear address of t he BTS record on which an
int errupt is t o be generat ed. This address must point t o an offset from t he BTS
buffer base t hat is a mult iple of t he BTS record size. Also, it must be several
records short of t he BTS absolut e maximum address t o allow a pending int errupt
t o be handled prior t o processor writ ing t he BTS absolut e maximum record.
PEBS buf f er base Linear address of t he first byt e of t he PEBS buffer. This
address should point t o a nat ural doubleword boundary.
PEBS i ndex Linear address of t he first byt e of t he next PEBS record t o be
writ t en t o. I nit ially, t his address should be t he same as t he address in t he PEBS
buffer base field.
Vol. 3 16-23
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
PEBS absol ut e max i mum Linear address of t he next byt e past t he end of t he
PEBS buffer. This address should be a mult iple of t he PEBS record size ( 40 byt es)
plus 1.
PEBS i nt er r upt t hr eshol d Linear address of t he PEBS record on which an
int errupt is t o be generat ed. This address must point t o an offset from t he PEBS
buffer base t hat is a mult iple of t he PEBS record size. Also, it must be several
records short of t he PEBS absolut e maximum address t o allow a pending
int errupt t o be handled prior t o processor writ ing t he PEBS absolut e maximum
record.
Figure 16-5. DS Save Area
BTS Buffer Base
BTS Index
BTS Absolute
BTS Interrupt
PEBS Absolute
PEBS Interrupt
PEBS
Maximum
Maximum
Threshold
PEBS Index
PEBS Buffer Base
Threshold
Counter Reset
Reserved
0H
4H
8H
CH
10H
14H
18H
1CH
20H
24H
30H
Branch Record 0
Branch Record 1
Branch Record n
PEBS Record 0
PEBS Record 1
PEBS Record n
BTS Buffer
PEBS Buffer
DS Buffer Management Area
IA32_DS_AREA MSR
16-24 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
PEBS count er r eset val ue A 40- bit value t hat t he count er is t o be reset t o
aft er st at e informat ion has collect ed following count er overflow. This value allows
st at e informat ion t o be collect ed aft er a preset number of event s have been
count ed.
Figures 16- 6 shows t he st ruct ure of a 12- byt e branch record in t he BTS buffer. The
fields in each record are as follows:
Last br anch f r om Linear address of t he inst ruct ion from which t he branch,
int errupt , or except ion was t aken.
Last br anch t o Linear address of t he branch t arget or t he first inst ruct ion in
t he int errupt or except ion service rout ine.
Br anch pr edi ct ed Bit 4 of field indicat es whet her t he branch t hat was t aken
was predict ed ( set ) or not predict ed ( clear) .
Figures 16- 7 shows t he st ruct ure of t he 40- byt e PEBS records. Nominally t he regist er
values are t hose at t he beginning of t he inst ruct ion t hat caused t he event . However,
t here are cases where t he regist ers may be logged in a part ially modified st at e. The
linear I P field shows t he value in t he EI P regist er t ranslat ed from an offset int o t he
current code segment t o a linear address.
Figure 16-6. 32-bit Branch Trace Record Format
Last Branch From
Last Branch To
Branch Predicted
0H
4H
8H
0
31 4
Vol. 3 16-25
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.9.1 DS Save Area and IA-32e Mode Operation
When I A- 32e mode is act ive ( I A32_EFER. LMA = 1) , t he st ruct ure of t he DS save area
is shown in Figure 16- 8. The organizat ion of each field in I A- 32e mode operat ion is
similar t o t hat of non- I A- 32e mode operat ion. However, each field now st ores a
64- bit address. The I A32_DS_AREA MSR holds t he 64- bit linear address of t he first
byt e of t he DS buffer management area.
Figure 16-7. PEBS Record Format
EFLAGS 0H
4H
8H
0
31
Linear IP
10H
18H
14H
1CH
20H
24H
CH
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
16-26 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
When I A- 32e mode is act ive, t he st ruct ure of a branch t race record is similar t o t hat
shown in Figure 16- 6, but each field is 8 byt es in lengt h. This makes each BTS record
24 byt es ( see Figure 16- 9) . The st ruct ure of a PEBS record is similar t o t hat shown in
Figure 16- 7, but each field is 8 byt es in lengt h and archit ect ural st at es include
regist er R8 t hrough R15. This makes t he size of a PEBS record in 64- bit mode 144
byt es ( see Figure 16- 10) .
Figure 16-8. IA-32e Mode DS Save Area
BTS Buffer Base
BTS Index
BTS Absolute
BTS Interrupt
PEBS Absolute
PEBS Interrupt
PEBS
Maximum
Maximum
Threshold
PEBS Index
PEBS Buffer Base
Threshold
Counter Reset
Reserved
0H
8H
10H
18H
20H
28H
30H
38H
40H
48H
50H
Branch Record 0
Branch Record 1
Branch Record n
PEBS Record 0
PEBS Record 1
PEBS Record n
BTS Buffer
PEBS Buffer
DS Buffer Management Area
IA32_DS_AREA MSR
Vol. 3 16-27
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Fields in t he buffer management area of a DS save area are described in Sect ion
16. 4. 9.
The format of a branch t race record and a PEBS record are t he same as t he 64- bit
record format s shown in Figures 16- 9 and Figures 16- 10, wit h t he except ion t hat t he
branch predict ed bit is not support ed by I nt el Core microarchit ect ure or I nt el At om
microarchit ect ure. The 64- bit record format s for BTS and PEBS apply t o DS save area
for all operat ing modes.
Figure 16-9. 64-bit Branch Trace Record Format
Figure 16-10. 64-bit PEBS Record Format
Last Branch From
Last Branch To
Branch Predicted
0H
8H
10H
0
63 4
RFLAGS 0H
8H
10H
0
63
RIP
20H
30H
28H
38H
40H
48H
18H
RAX
RBX
RCX
RDX
RSI
RDI
RBP
RSP
R8
...
R15
50H
...
88H
16-28 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
The procedures used t o program I A32_DEBUG_CTRL MSR t o set up a BTS buffer or a
CPL- qualified BTS are described in Sect ion 16.4. 9. 3 and Sect ion 16.4. 9. 4.
Required element s for writ ing a DS int errupt service rout ine are largely t he same on
processors t hat support using DS Save area for BTS or PEBS records. However, on
processors based on I nt el Net Burst
CORE
ATOM
PROCESSOR FAMILY)
The I nt el Core 2 Duo processor family and I nt el Xeon processors based on I nt el Core
microarchit ect ure or enhanced I nt el Core microarchit ect ure provide last branch
int errupt and except ion recording. The facilit ies described in t his sect ion also apply t o
I nt el At om processor family. These capabilit ies are similar t o t hose found in Pent ium
4 processors, including support for t he following facilit ies:
Debug Tr ace and Br anch Recor di ng Cont r ol The I A32_DEBUGCTL MSR
provide bit fields for soft ware t o configure mechanisms relat ed t o debug t race,
branch recording, branch t race st ore, and performance count er operat ions. See
Sect ion 16. 4. 1 for a descript ion of t he flags. See Figure 16- 3 for t he MSR layout .
Last br anch r ecor d ( LBR) st ack There are a collect ion of MSR pairs t hat
st ore t he source and dest inat ion addresses relat ed t o recent ly execut ed
branches. See Sect ion 16. 5. 1.
Moni t or i ng and si ngl e- st eppi ng of br anches, ex cept i ons, and i nt er r upt s
See Sect ion 16. 4. 2 and Sect ion 16. 4. 3. I n addit ion, t he abilit y t o freeze t he
LBR st ack on a PMI request is available.
The I nt el At om processor family clears t he TR flag when t he
FREEZE_LBRS_ON_PMI flag is set .
Br anch t r ace messages See Sect ion 16. 4. 4.
Last ex cept i on r ecor ds See Sect ion 16. 7. 3.
Br anch t r ace st or e and CPL- qual i f i ed BTS See Sect ion 16. 4. 5.
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) see Sect ion 16. 4.7.
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) see Sect ion 16. 4. 7.
FREEZE_WHI LE_SMM_EN ( bi t 14) FREEZE_WHI LE_SMM_EN is support ed
if I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 16. 4. 1.
Vol. 3 16-33
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.5.1 LBR Stack
The last branch record st ack and t op- of- st ack ( TOS) point er MSRs are support ed
across I nt el Core 2, I nt el Xeon and I nt el At om processor families. Four pair of MSRs
are support ed in t he LBR st ack
Last Br anch Recor d ( LBR) St ack
MSR_LASTBRANCH_0_FROM_I P ( address 40H) t hrough
MSR_LASTBRANCH_3_FROM_I P ( address 43H) st ore source addresses
MSR_LASTBRANCH_0_TO_I P ( address 60H) t hrough
MSR_LASTBRANCH_3_To_I P ( address 63H) st ore dest inat ion addresses.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The lowest significant 2
bit s of t he TOS Point er MSR ( MSR_LASTBRANCH_TOS, address 1C9H) cont ains a
point er t o t he MSR in t he LBR st ack t hat cont ains t he most recent branch,
int errupt , or except ion recorded.
For compat ibilit y, t he MSR_LER_TO_LI P and t he MSR_LER_FROM_LI P MSRs) dupli-
cat e funct ions of t he Last Except ionToI P and Last Except ionFromI P MSRs found in P6
family processors.
16.6 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (INTEL
CORE
I7 PROCESSOR FAMILY)
The I nt el Core i7 processor family and I nt el Xeon processors based on I nt el
microarchit ect ure codename Nehalem support last branch int errupt and except ion
recording. These capabilit ies are similar t o t hose found in I nt el Core 2 processors and
adds addit ional capabilit ies:
Debug Tr ace and Br anch Recor di ng Cont r ol The I A32_DEBUGCTL MSR
provides bit fields for soft ware t o configure mechanisms relat ed t o debug t race,
branch recording, branch t race st ore, and performance count er operat ions. See
Sect ion 16. 4. 1 for a descript ion of t he flags. See Figure 16- 11 for t he MSR layout .
Last br anch r ecor d ( LBR) st ack There are 16 MSR pairs t hat st ore t he
source and dest inat ion addresses relat ed t o recent ly execut ed branches. See
Sect ion 16. 6. 1.
Moni t or i ng and si ngl e- st eppi ng of br anches, ex cept i ons, and i nt er r upt s
See Sect ion 16. 4. 2 and Sect ion 16. 4. 3. I n addit ion, t he abilit y t o freeze t he
LBR st ack on a PMI request is available.
Br anch t r ace messages The I A32_DEBUGCTL MSR provides bit fields for
soft ware t o enable each logical processor t o generat e branch t race messages.
See Sect ion 16. 4. 4. However, not all BTM messages are observable using t he
I nt el
QPI link.
Last ex cept i on r ecor ds See Sect ion 16. 7. 3.
16-34 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Br anch t r ace st or e and CPL- qual i f i ed BTS See Sect ion 16. 4. 6 and Sect ion
16. 4. 5.
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) see Sect ion 16. 4.7.
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) see Sect ion 16. 4. 7.
FREEZE_WHI LE_SMM_EN ( bi t 14) FREEZE_WHI LE_SMM_EN is support ed
if I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 16. 4. 1.
Processors based on I nt el microarchit ect ure codename Nehalem provide addit ional
capabilit ies:
I ndependent cont r ol of uncor e PMI The I A32_DEBUGCTL MSR provides a
bit field ( see Figure 16- 11) for soft ware t o enable each logical processor t o
receive an uncore count er overflow int errupt .
LBR f i l t er i ng Processors based on I nt el microarchit ect ure codename Nehalem
support filt ering of LBR based on combinat ion of CPL and branch t ype condit ions.
When LBR filt ering is enabled, t he LBR st ack only capt ures t he subset of branches
t hat are specified by MSR_LBR_SELECT.
16.6.1 LBR Stack
Processors based on I nt el microarchit ect ure codename Nehalem provide 16 pairs of
MSR t o record last branch record informat ion. The layout of each MSR pair is shown
in Table 16- 6 and Table 16- 7.
Figure 16-11. IA32_DEBUGCTL MSR for Processors based
on Intel microarchitecture codename Nehalem
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
9 10
BTS_OFF_OS BTS off in OS
BTS_OFF_USR BTS off in user code
FREEZE_LBRS_ON_PMI
FREEZE_PERFMON_ON_PMI
11 12 14
FREEZE_WHILE_SMM_EN
UNCORE_PMI_EN
13
Vol. 3 16-35
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Processors based on I nt el microarchit ect ure codename Nehalem have an LBR MSR
St ack as shown in Table 16- 8.
Table 16-8. LBR Stack Size and TOS Pointer Range
16.6.2 Filtering of Last Branch Records
MSR_LBR_SELECT is cleared t o zero at RESET, and LBR filt ering is disabled, i. e. all
branches will be capt ured. MSR_LBR_SELECT provides bit fields t o specify t he condi-
t ions of subset s of branches t hat will not be capt ured in t he LBR. The layout of
MSR_LBR_SELECT is shown in Table 16- 9.
Table 16-6. IA32_LASTBRACH_x_FROM_IP
Bit Field Bit Offset Access Description
Data 47:0 R/O The linear address of the branch instruction itself,
This is the branch from address
SIGN_EXt 62:48 R/0 Signed extension of bit 47 of this register
MISPRED 63 R/O When set, indicates the branch was predicted;
otherwise, the branch was mispredicted.
Table 16-7. IA32_LASTBRACH_x_TO_IP
Bit Field Bit Offset Access Description
Data 47:0 R/O The linear address of the target of the branch
instruction itself, This is the branch to address
SIGN_EXt 63:48 R/0 Signed extension of bit 47 of this register
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
06_1AH 16 0 to 15
Table 16-9. MSR_LBR_SELECT
Bit Field Bit Offset Access Description
CPL_EQ_0 0 R/W When set, do not capture branches occurring in ring 0
CPL_NEQ_0 1 R/W When set, do not capture branches occurring in ring
>0
16-36 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.7 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PROCESSORS BASED ON INTEL
NETBURST
MICROARCHITECTURE)
Pent ium 4 and I nt el Xeon processors based on I nt el Net Burst microarchit ect ure
provide t he following met hods for recording t aken branches, int errupt s and excep-
t ions:
St ore branch records in t he last branch record ( LBR) st ack MSRs for t he most
recent t aken branches, int errupt s, and/ or except ions in MSRs. A branch record
consist of a branch- from and a branch- t o inst ruct ion address.
Send t he branch records out on t he syst em bus as branch t race messages
( BTMs) .
Log BTMs in a memory- resident branch t race st ore ( BTS) buffer.
To support t hese funct ions, t he processor provides t he following MSRs and relat ed
facilit ies:
MSR_DEBUGCTLA MSR Enables last branch, int errupt , and except ion
recording; single- st epping on t aken branches; branch t race messages ( BTMs) ;
and branch t race st ore ( BTS) . This regist er is named DebugCt lMSR in t he P6
family processors.
Debug st or e ( DS) f eat ur e f l ag ( CPUI D. 1: EDX. DS[ bi t 21] ) I ndicat es t hat
t he processor provides t he debug st ore ( DS) mechanism, which allows BTMs t o
be st ored in a memory- resident BTS buffer.
CPL- qual i f i ed debug st or e ( DS) f eat ur e f l ag ( CPUI D.1: ECX.DS- CPL[ bi t
4] ) I ndicat es t hat t he processor provides a CPL- qualified debug st ore ( DS)
mechanism, which allows soft ware t o select ively skip sending and st oring BTMs,
according t o specified current privilege level set t ings, int o a memory- resident
BTS buffer.
JCC 2 R/W When set, do not capture conditional branches
NEAR_REL_CALL 3 R/W When set, do not capture near relative calls
NEAR_IND_CALL 4 R/W When set, do not capture near indirect calls
NEAR_RET 5 R/W When set, do not capture near returns
NEAR_IND_JMP 6 R/W When set, do not capture near indirect jumps
NEAR_REL_JMP 7 R/W When set, do not capture near relative jumps
FAR_BRANCH 8 R/W When set, do not capture far branches
Reserved 63:9 Must be zero
Table 16-9. MSR_LBR_SELECT (Contd.)
Bit Field Bit Offset Access Description
Vol. 3 16-37
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
I A32_MI SC_ENABLE MSR I ndicat es t hat t he processor provides t he BTS
facilit ies.
Last br anch r ecor d ( LBR) st ack The LBR st ack is a cir cular st ack t hat
consist s of four MSRs ( MSR_LASTBRANCH_0 t hr ough MSR_LASTBRANCH_3) for
t he Pent ium 4 and I nt el Xeon pr ocessor family [ CPUI D family 0FH, models 0H-
02H] . The LBR st ack consist s of 16 MSR pairs ( MSR_LASTBRANCH_0_FROM_LI P
t hr ough MSR_LASTBRANCH_15_FROM_LI P and MSR_LASTBRANCH_0_TO_LI P
t hr ough MSR_LASTBRANCH_15_TO_LI P) for t he Pent ium 4 and I nt el Xeon
pr ocessor family [ CPUI D family 0FH, model 03H] .
Last br anch r ecor d t op- of - st ack ( TOS) poi nt er The TOS Point er MSR
cont ains a 2- bit point er ( 0- 3) t o t he MSR in t he LBR st ack t hat cont ains t he most
recent branch, int errupt , or except ion recorded for t he Pent ium 4 and I nt el Xeon
processor family [ CPUI D family 0FH, models 0H- 02H] . This point er becomes a
4- bit point er ( 0- 15) for t he Pent ium 4 and I nt el Xeon processor family [ CPUI D
family 0FH, model 03H] . See also: Table 16- 10, Figure 16- 12, and Sect ion
16. 7. 2, LBR St ack for Processors Based on I nt el Net Burst
Microarchitecture
The LBR st ack is made up of LBR MSRs t hat are t reat ed by t he processor as a circular
st ack. The TOS point er ( MSR_LASTBRANCH_TOS MSR) point s t o t he LBR MSR ( or
Figure 16-12. MSR_DEBUGCTLA MSR for Pentium 4 and Intel Xeon Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
5 4 3 2 1 0
BTS Branch trace store
Reserved
6 7
BTS_OFF_OS Disable storing CPL_0 BTS
BTS_OFF_USR Disable storing non-CPL_0 BTS
Vol. 3 16-39
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
LBR MSR pair) t hat cont ains t he most recent ( last ) branch record placed on t he st ack.
Prior t o placing a new branch record on t he st ack, t he TOS is increment ed by 1. When
t he TOS point er reaches it maximum value, it wraps around t o 0. See Table 16- 10
and Figure 16- 12.
Table 16-10. LBR MSR Stack Size and TOS Pointer Range for the Pentium
4 and the
Intel
Xeon
Processor Family
The regist ers in t he LBR MSR st ack and t he MSR_LASTBRANCH_TOS MSR are read-
only and can be read using t he RDMSR inst ruct ion.
Figure 16- 13 shows t he layout of a branch record in an LBR MSR ( or MSR pair) . Each
branch record consist s of t wo linear addresses, which represent t he from and t o
inst ruct ion point ers for a branch, int errupt , or except ion. The cont ent s of t he from
and t o addresses differ, depending on t he source of t he branch:
Tak en br anch I f t he record is for a t aken branch, t he from address is t he
address of t he branch inst ruct ion and t he t o address is t he t arget inst ruct ion of
t he branch.
I nt er r upt I f t he record is for an int errupt , t he from address t he ret urn
inst ruct ion point er ( RI P) saved for t he int errupt and t he t o address is t he
address of t he first inst ruct ion in t he int errupt handler rout ine. The RI P is t he
linear address of t he next inst ruct ion t o be execut ed upon ret urning from t he
int errupt handler.
Ex cept i on I f t he record is for an except ion, t he from address is t he linear
address of t he inst ruct ion t hat caused t he except ion t o be generat ed and t he t o
address is t he address of t he first inst ruct ion in t he except ion handler rout ine.
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
Family 0FH, Models 0H-02H;
MSRs at locations 1DBH-
1DEH.
4 0 to 3
Family 0FH, Models; MSRs at
locations 680H-68FH.
16 0 to 15
Family 0FH, Model 03H; MSRs
at locations 6C0H-6CFH.
16 0 to 15
16-40 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Addit ional informat ion is saved if an except ion or int errupt occurs in conj unct ion wit h
a branch inst ruct ion. I f a branch inst ruct ion generat es a t rap t ype except ion, t wo
branch records are st ored in t he LBR st ack: a branch record for t he branch inst ruct ion
followed by a branch record for t he except ion.
I f a branch inst ruct ion is immediat ely followed by an int errupt , a branch record is
st ored in t he LBR st ack for t he branch inst ruct ion followed by a record for t he
int errupt .
16.7.3 Last Exception Records
The Pent ium 4, I nt el Xeon, Pent ium M, I nt el
Core Solo, I nt el
Core Duo, I nt el
Core2 Duo, I nt el
Core i7 and I nt el
CORE
CORE
DUO PROCESSORS)
I nt el Core Solo and I nt el Core Duo processors provide last branch int errupt and
except ion recording. This capabilit y is almost ident ical t o t hat found in Pent ium 4 and
I nt el Xeon processors. There are differences in t he st ack and in some MSR names
and locat ions.
Not e t he following:
I A32_DEBUGCTL MSR Enables debug t race int errupt , debug t race st ore,
t race messages enable, performance monit oring breakpoint flags, single
st epping on branches, and last branch. I A32_DEBUGCTL MSR is locat ed at
regist er address 01D9H.
See Figure 16- 14 for t he layout and t he ent ries below for a descript ion of t he
flags:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s,
and/ or except ions t aken by t he processor ( prior t o a debug except ion being
generat ed) in t he last branch record ( LBR) st ack. For more informat ion, see
t he Last Branch Record ( LBR) St ack below.
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor
t reat s t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag
rat her t han a single- st ep on inst ruct ions flag. This mechanism allows
single- st epping t he processor on t aken branches, int errupt s, and except ions.
See Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt er-
rupt s, for more informat ion about t he BTF flag.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race
messages are enabled. When t he processor det ect s a t aken branch,
int errupt , or except ion; it sends t he branch record out on t he syst em bus as
a branch t race message ( BTM) . See Sect ion 16. 4. 4, Branch Trace Messages,
for more informat ion about t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , t he flag enables BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS
save area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t s 8) When set , t he BTS
facilit ies generat e an int errupt when t he BTS buffer is full. When clear, BTMs are
logged t o t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace
St ore ( BTS) , for a descript ion of t his mechanism.
16-42 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Debug st or e ( DS) f eat ur e f l ag ( bi t 21) , r et ur ned by t he CPUI D
i nst r uct i on I ndicat es t hat t he processor provides t he debug st ore ( DS)
mechanism, which allows BTMs t o be st ored in a memory- resident BTS buffer.
See Sect ion 16. 4. 5, Branch Trace St ore ( BTS) .
Last Br anch Recor d ( LBR) St ack The LBR st ack consist s of 8 MSRs
( MSR_LASTBRANCH_0 t hrough MSR_LASTBRANCH_7) ; bit s 31- 0 hold t he from
address, bit s 63- 32 hold t he t o address ( MSR addresses st art at 40H) . See
Figure 16- 15.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The TOS Point er MSR
cont ains a 3- bit point er ( bit s 2- 0) t o t he MSR in t he LBR st ack t hat cont ains t he
most recent branch, int errupt , or except ion recorded. For I nt el Core Solo and
I nt el Core Duo processors, t his MSR is locat ed at regist er address 01C9H.
For compat ibilit y, t he I nt el Core Solo and I nt el Core Duo processors provide t wo 32-
bit MSRs ( t he MSR_LER_TO_LI P and t he MSR_LER_FROM_LI P MSRs) t hat duplicat e
funct ions of t he Last Except ionToI P and Last Except ionFromI P MSRs found in P6 family
processors.
For det ails, see Sect ion 16. 7, Last Branch, I nt errupt , and Except ion Recording
( Processors based on I nt el Net Burst
Core
Solo and I nt el
Core
Duo Processors.
Figure 16-14. IA32_DEBUGCTL MSR for Intel Core Solo
and Intel Core
Duo Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
Vol. 3 16-43
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PENTIUM M PROCESSORS)
Like t he Pent ium 4 and I nt el Xeon processor family, Pent ium M processors provide
last branch int errupt and except ion recording. The capabilit y operat es almost ident i-
cally t o t hat found in Pent ium 4 and I nt el Xeon processors. There are differences in
t he shape of t he st ack and in some MSR names and locat ions. Not e t he following:
MSR_DEBUGCTLB MSR Enables debug t race int errupt , debug t race st ore,
t race messages enable, performance monit oring breakpoint flags, single
st epping on branches, and last branch. For Pent ium M processors, t his MSR is
locat ed at regist er address 01D9H. See Figure 16- 16 and t he ent ries below for a
descript ion of t he flags.
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s,
and/ or except ions t aken by t he processor ( prior t o a debug except ion being
generat ed) in t he last branch record ( LBR) st ack. For more informat ion, see
t he Last Branch Record ( LBR) St ack bullet below.
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor
t reat s t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag
rat her t han a single- st ep on inst ruct ions flag. This mechanism allows
single- st epping t he processor on t aken branches, int errupt s, and except ions.
See Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt er-
rupt s, for more informat ion about t he BTF flag.
PBi ( per f or mance moni t or i ng/ br eak poi nt pi ns) f l ags ( bi t s 5- 2)
When t hese flags are set , t he performance monit oring/ breakpoint pins on t he
processor ( BP0# , BP1# , BP2# , and BP3# ) report breakpoint mat ches in t he
corresponding breakpoint - address regist ers ( DR0 t hrough DR3) . The
processor assert s t hen deassert s t he corresponding BPi# pin when a
breakpoint mat ch occurs. When a PBi flag is clear, t he performance
monit oring/ breakpoint pins report performance event s. Processor execut ion
is not affect ed by report ing performance event s.
Figure 16-15. LBR Branch Record Layout for the Intel Core Solo
and Intel
Core Duo Processor
0 63
From Linear Address To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7
16-44 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race
messages are enabled. When t he processor det ect s a t aken branch,
int errupt , or except ion, it sends t he branch record out on t he syst em bus as a
branch t race message ( BTM) . See Sect ion 16. 4. 4, Branch Trace Messages,
for more informat ion about t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , enables t he BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS
save area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t s 8) When set , t he BTS
facilit ies generat e an int errupt when t he BTS buffer is full. When clear, BTMs are
logged t o t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace
St ore ( BTS) , for a descript ion of t his mechanism.
Debug st or e ( DS) f eat ur e f l ag ( bi t 21) , r et ur ned by t he CPUI D
i nst r uct i on I ndicat es t hat t he processor provides t he debug st ore ( DS)
mechanism, which allows BTMs t o be st ored in a memory- resident BTS buffer.
See Sect ion 16. 4. 5, Branch Trace St ore ( BTS) .
Last Br anch Recor d ( LBR) St ack The LBR st ack consist s of 8 MSRs
( MSR_LASTBRANCH_0 t hrough MSR_LASTBRANCH_7) ; bit s 31- 0 hold t he from
address, bit s 63- 32 hold t he t o address. For Pent ium M Processors, t hese pairs
are locat ed at regist er addresses 040H- 047H. See Figure 16- 17.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The TOS Point er MSR
cont ains a 3- bit point er ( bit s 2- 0) t o t he MSR in t he LBR st ack t hat cont ains t he
most recent branch, int errupt , or except ion recorded. For Pent ium M Processors,
t his MSR is locat ed at regist er address 01C9H.
Figure 16-16. MSR_DEBUGCTLB MSR for Pentium M Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
PB3/2/1/0 Performance monitoring breakpoint flags
Vol. 3 16-45
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
For more det ail on t hese capabilit ies, see Sect ion 16. 7. 3, Last Except ion Records,
and Appendix B. 8, MSRs I n t he Pent ium M Processor.
16.10 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (P6 FAMILY PROCESSORS)
The P6 family processors provide five MSRs for recording t he last branch, int errupt ,
or except ion t aken by t he processor: DEBUGCTLMSR, Last BranchToI P, Last Branch-
FromI P, Last Except ionToI P, and Last Except ionFromI P. These regist ers can be used t o
collect last branch records, t o set breakpoint s on branches, int errupt s, and excep-
t ions, and t o single- st ep from one branch t o t he next .
See Appendix B, Model- Specific Regist ers ( MSRs) , for a det ailed descript ion of each
of t he last branch recording MSRs.
16.10.1 DEBUGCTLMSR Register
The version of t he DEBUGCTLMSR regist er found in t he P6 family processors enables
last branch, int errupt , and except ion recording; t aken branch breakpoint s; t he
breakpoint report ing pins; and t race messages. This regist er can be writ t en t o using
t he WRMSR inst ruct ion, when operat ing at privilege level 0 or when in real- address
mode. A prot ect ed- mode operat ing syst em procedure is required t o provide user
access t o t his regist er. Figure 16- 18 shows t he flags in t he DEBUGCTLMSR regist er
for t he P6 family processors. The funct ions of t hese flags are as follows:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records t he source and t arget addresses ( in t he Last BranchToI P,
Last BranchFromI P, Last Except ionToI P, and Last Except ionFromI P MSRs) for t he
last branch and t he last except ion or int errupt t aken by t he processor prior t o a
debug except ion being generat ed. The processor clears t his flag whenever a
debug except ion, such as an inst ruct ion or dat a breakpoint or single- st ep t rap
occurs.
Figure 16-17. LBR Branch Record Layout for the Pentium M Processor
0 63
From Linear Address To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7
16-46 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor t reat s
t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag. See
Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt errupt s.
PBi ( per f or mance moni t or i ng/ br eak poi nt pi ns) f l ags ( bi t s 2 t hr ough 5)
When t hese flags are set , t he performance monit oring/ breakpoint pins on t he
processor ( BP0# , BP1# , BP2# , and BP3# ) report breakpoint mat ches in t he
corresponding breakpoint - address regist ers ( DR0 t hrough DR3) . The processor
assert s t hen deassert s t he corresponding BPi# pin when a breakpoint mat ch
occurs. When a PBi flag is clear, t he performance monit oring/ breakpoint pins
report performance event s. Processor execut ion is not affect ed by report ing
performance event s.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , t race messages are
enabled as described in Sect ion 16. 4. 4, Branch Trace Messages. Set t ing t his
flag great ly reduces t he performance of t he processor. When t race messages are
enabled, t he values st ored in t he Last BranchToI P, Last BranchFromI P, Last Excep-
t ionToI P, and Last Except ionFromI P MSRs are undefined.
16.10.2 Last Branch and Last Exception MSRs
The Last BranchToI P and Last BranchFromI P MSRs are 32- bit regist ers for recording
t he inst ruct ion point ers for t he last branch, int errupt , or except ion t hat t he processor
t ook prior t o a debug except ion being generat ed. When a branch occurs, t he
processor loads t he address of t he branch inst ruct ion int o t he Last BranchFromI P MSR
and loads t he t arget address for t he branch int o t he Last BranchToI P MSR.
When an int errupt or except ion occurs ( ot her t han a debug except ion) , t he address
of t he inst ruct ion t hat was int errupt ed by t he except ion or int errupt is loaded int o t he
Last BranchFromI P MSR and t he address of t he except ion or int errupt handler t hat is
called is loaded int o t he Last BranchToI P MSR.
The Last Except ionToI P and Last Except ionFromI P MSRs ( also 32- bit regist ers) record
t he inst ruct ion point ers for t he last branch t hat t he processor t ook prior t o an excep-
Figure 16-18. DEBUGCTLMSR Register (P6 Family Processors)
31
TR Trace messages enable
PBi Performance monitoring/breakpoint pins
BTF Single-step on branches
LBR Last branch/interrupt/exception
7 6 5 4 3 2 1 0
P
B
2
P
B
1
P
B
0
B
T
F
T
R
L
B
R
P
B
3
Reserved
Vol. 3 16-47
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
t ion or int errupt being generat ed. When an except ion or int errupt occurs, t he
cont ent s of t he Last BranchToI P and Last BranchFromI P MSRs are copied int o t hese
regist ers before t he t o and from addresses of t he except ion or int errupt are recorded
in t he Last BranchToI P and Last BranchFromI P MSRs.
These regist ers can be read using t he RDMSR inst ruct ion.
Not e t hat t he values st ored in t he Last BranchToI P, Last BranchFromI P, Last Except ion-
ToI P, and Last Except ionFromI P MSRs are offset s int o t he current code segment , as
opposed t o linear addresses, which are saved in last branch records for t he Pent ium
4 and I nt el Xeon processors.
16.10.3 Monitoring Branches, Exceptions, and Interrupts
When t he LBR flag in t he DEBUGCTLMSR regist er is set , t he processor aut omat ically
begins recording branches t hat it t akes, except ions t hat are generat ed ( except for
debug except ions) , and int errupt s t hat are serviced. Each t ime a branch, except ion,
or int errupt occurs, t he processor records t he t o and from inst ruct ion point ers in t he
Last BranchToI P and Last BranchFromI P MSRs. I n addit ion, for int errupt s and excep-
t ions, t he processor copies t he cont ent s of t he Last BranchToI P and Last Branch-
FromI P MSRs int o t he Last Except ionToI P and Last Except ionFromI P MSRs prior t o
recording t he t o and from addresses of t he int errupt or except ion.
When t he processor generat es a debug except ion ( # DB) , it aut omat ically clears t he
LBR flag before execut ing t he except ion handler, but does not t ouch t he last branch
and last except ion MSRs. The addresses for t he last branch, int errupt , or except ion
t aken are t hus ret ained in t he Last BranchToI P and Last BranchFromI P MSRs and t he
addresses of t he last branch prior t o an int errupt or except ion are ret ained in t he
Last Except ionToI P, and Last Except ionFromI P MSRs.
The debugger can use t he last branch, int errupt , and/ or except ion addresses in
combinat ion wit h code- segment select ors ret rieved from t he st ack t o reset break-
point s in t he breakpoint - address regist ers ( DR0 t hrough DR3) , allowing a backward
t race from t he manifest at ion of a part icular bug t oward it s source. Because t he
inst ruct ion point ers recorded in t he Last BranchToI P, Last BranchFromI P, Last Except i-
onToI P, and Last Except ionFromI P MSRs are offset s int o a code segment , soft ware
must det ermine t he segment base address of t he code segment associat ed wit h t he
cont rol t ransfer t o calculat e t he linear address t o be placed in t he breakpoint - address
regist ers. The segment base address can be det ermined by reading t he segment
select or for t he code segment from t he st ack and using it t o locat e t he segment
descript or for t he segment in t he GDT or LDT. The segment base address can t hen be
read from t he segment descript or.
Before resuming program execut ion from a debug- except ion handler, t he handler
must set t he LBR flag again t o re- enable last branch and last except ion/ int errupt
recording.
16-48 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.11 TIME-STAMP COUNTER
The I nt el 64 and I A- 32 archit ect ures ( beginning wit h t he Pent ium processor) define a
t ime- st amp count er mechanism t hat can be used t o monit or and ident ify t he relat ive
t ime occurrence of processor event s. The count er s archit ect ure includes t he
following component s:
TSC f l ag A feat ure bit t hat indicat es t he availabilit y of t he t ime- st amp count er.
The count er is available in an if t he funct ion CPUI D.1: EDX. TSC[ bit 4] = 1.
I A32_TI ME_STAMP_COUNTER MSR ( called TSC MSR in P6 family and
Pent ium processors) The MSR used as t he count er.
RDTSC i nst r uct i on An inst ruct ion used t o read t he t ime- st amp count er.
TSD f l ag A cont rol regist er flag is used t o enable or disable t he t ime- st amp
count er ( enabled if CR4.TSD[ bit 2] = 1) .
The t ime- st amp count er ( as implement ed in t he P6 family, Pent ium, Pent ium M,
Pent ium 4, I nt el Xeon, I nt el Core Solo and I nt el Core Duo processors and lat er
processors) is a 64- bit count er t hat is set t o 0 following a RESET of t he processor.
Following a RESET, t he count er increment s even when t he processor is halt ed by t he
HLT inst ruct ion or t he ext ernal STPCLK# pin. Not e t hat t he assert ion of t he ext ernal
DPSLP# pin may cause t he t ime- st amp count er t o st op.
Processor families increment t he t ime- st amp count er different ly:
For Pent ium M processors ( family [ 06H] , models [ 09H, 0DH] ) ; for Pent ium 4
processors, I nt el Xeon processors ( family [ 0FH] , models [ 00H, 01H, or 02H] ) ;
and for P6 family processors: t he t ime- st amp count er increment s wit h every
int ernal processor clock cycle.
The int ernal processor clock cycle is det ermined by t he current core- clock t o bus-
clock rat io. I nt el SpeedSt ep t echnology t ransit ions may also impact t he
processor clock.
For Pent ium 4 processors, I nt el Xeon processors ( family [ 0FH] , models [ 03H and
higher] ) ; for I nt el Core Solo and I nt el Core Duo processors ( family [ 06H] , model
[ 0EH] ) ; for t he I nt el Xeon processor 5100 series and I nt el Core 2 Duo processors
( family [ 06H] , model [ 0FH] ) ; for I nt el Core 2 and I nt el Xeon processors ( family
[ 06H] , display_model [ 17H] ) ; for I nt el At om processors ( family [ 06H] ,
display_model [ 1CH] ) : t he t ime- st amp count er increment s at a const ant rat e.
That rat e may be set by t he maximum core- clock t o bus- clock rat io of t he
processor or may be set by t he maximum resolved frequency at which t he
processor is boot ed. The maximum resolved frequency may differ from t he
maximum qualified frequency of t he processor, see Sect ion 30. 10. 5 for more
det ail.
The specific processor configurat ion det ermines t he behavior. Const ant TSC
behavior ensures t hat t he durat ion of each clock t ick is uniform and support s t he
use of t he TSC as a wall clock t imer even if t he processor core changes frequency.
This is t he archit ect ural behavior moving forward.
Vol. 3 16-49
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
NOTE
To det ermine average processor clock frequency, I nt el recommends
t he use of EMON logic t o count processor core clocks over t he period
of t ime for which t he average is required. See Sect ion 30. 10,
Count ing Clocks, and Appendix A, Performance-
Monit oring Event s, for more informat ion.
The RDTSC inst ruct ion reads t he t ime- st amp count er and is guarant eed t o ret urn a
monot onically increasing unique value whenever execut ed, except for a 64- bit
count er wraparound. I nt el guarant ees t hat t he t ime- st amp count er will not wrap-
around wit hin 10 years aft er being reset . The period for count er wrap is longer for
Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors.
Normally, t he RDTSC inst ruct ion can be execut ed by programs and procedures
running at any privilege level and in virt ual- 8086 mode. The TSD flag allows use of
t his inst ruct ion t o be rest rict ed t o programs and procedures running at privilege level
0. A secure operat ing syst em would set t he TSD flag during syst em init ializat ion t o
disable user access t o t he t ime- st amp count er. An operat ing syst em t hat disables
user access t o t he t ime- st amp count er should emulat e t he inst ruct ion t hrough a
user- accessible programming int erface.
The RDTSC inst ruct ion is not serializing or ordered wit h ot her inst ruct ions. I t does not
necessarily wait unt il all previous inst ruct ions have been execut ed before reading t he
count er. Similarly, subsequent inst ruct ions may begin execut ion before t he RDTSC
inst ruct ion operat ion is performed.
The RDMSR and WRMSR inst ruct ions read and writ e t he t ime- st amp count er, t reat ing
t he t ime- st amp count er as an ordinary MSR ( address 10H) . I n t he Pent ium 4, I nt el
Xeon, and P6 family processors, all 64- bit s of t he t ime- st amp count er are read using
RDMSR ( j ust as wit h RDTSC) . When WRMSR is used t o writ e t he t ime- st amp count er
on processors before family [ 0FH] , models [ 03H, 04H] : only t he low- order 32- bit s of
t he t ime- st amp count er can be writ t en ( t he high- order 32 bit s are cleared t o 0) . For
family [ 0FH] , models [ 03H, 04H, 06H] ; for family [ 06H] ] , model [ 0EH, 0FH] ; for
family [ 06H] ] , display_model [ 17H, 1AH, 1CH, 1DH] : all 64 bit s are writ able.
16.11.1 Invariant TSC
The t ime st amp count er in newer processors may support an enhancement , referred
t o as invariant TSC. Processor s support for invariant TSC is indicat ed by
CPUI D. 80000007H: EDX[ 8] .
The invariant TSC will run at a const ant rat e in all ACPI P- , C- . and T- st at es. This is
t he archit ect ural behavior moving forward. On processors wit h invariant TSC
support , t he OS may use t he TSC for wall clock t imer services ( inst ead of ACPI or
HPET t imers) . TSC reads are much more efficient and do not incur t he overhead
associat ed wit h a ring t ransit ion or access t o a plat form resource.
16-50 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.11.2 IA32_TSC_AUX Register and RDTSCP Support
Processors based on I nt el microarchit ect ure codename Nehalem provide an auxiliary
TSC regist er, I A32_TSC_AUX t hat is designed t o be used in conj unct ion wit h
I A32_TSC. I A32_TSC_AUX provides a 32- bit field t hat is init ialized by privileged soft -
ware wit h a signat ure value ( for example, a logical processor I D) .
The primary usage of I A32_TSC_AUX in conj unct ion wit h I A32_TSC is t o allow soft -
ware t o read t he 64- bit t ime st amp in I A32_TSC and signat ure value in
I A32_TSC_AUX wit h t he inst ruct ion RDTSCP in an at omic operat ion. RDTSCP ret urns
t he 64- bit t ime st amp in EDX: EAX and t he 32- bit TSC_AUX signat ure value in ECX.
The at omicit y of RDTSCP ensures t hat no cont ext swit ch can occur bet ween t he reads
of t he TSC and TSC_AUX values.
Support for RDTSCP is indicat ed by CPUI D. 80000001H: EDX[ 27] . As wit h RDTSC
inst ruct ion, non- ring 0 access is cont rolled by CR4. TSD ( Time St amp Disable flag) .
User mode soft ware can use RDTSCP t o det ect if CPU migrat ion has occurred
bet ween successive reads of t he TSC. I t can also be used t o adj ust for per- CPU differ-
ences in TSC values in a NUMA syst em.
Vol. 3 17-1
CHAPTER 17
8086 EMULATION
I A- 32 processors ( beginning wit h t he I nt el386 processor) provide t wo ways t o
execut e new or legacy programs t hat are assembled and/ or compiled t o run on an
I nt el 8086 processor:
Real- address mode.
Virt ual- 8086 mode.
Figure 2- 3 shows t he relat ionship of t hese operat ing modes t o prot ect ed mode and
syst em management mode ( SMM) .
When t he processor is powered up or reset , it is placed in t he real- address mode.
This operat ing mode almost exact ly duplicat es t he execut ion environment of t he
I nt el 8086 processor, wit h some ext ensions. Virt ually any program assembled and/ or
compiled t o run on an I nt el 8086 processor will run on an I A- 32 processor in t his
mode.
When running in prot ect ed mode, t he processor can be swit ched t o virt ual- 8086
mode t o run 8086 programs. This mode also duplicat es t he execut ion environment of
t he I nt el 8086 processor, wit h ext ensions. I n virt ual- 8086 mode, an 8086 program
runs as a separat e prot ect ed- mode t ask. Legacy 8086 programs are t hus able t o run
under an operat ing syst em ( such as Microsoft Windows* ) t hat t akes advant age of
prot ect ed mode and t o use prot ect ed- mode facilit ies, such as t he prot ect ed- mode
int errupt - and except ion- handling facilit ies. Prot ect ed- mode mult it asking permit s
mult iple virt ual- 8086 mode t asks ( wit h each t ask running a separat e 8086 program)
t o be run on t he processor along wit h ot her non- virt ual- 8086 mode t asks.
This sect ion describes bot h t he basic real- address mode execut ion environment and
t he virt ual- 8086- mode execut ion environment , available on t he I A- 32 processors
beginning wit h t he I nt el386 processor.
17.1 REAL-ADDRESS MODE
The I A- 32 archit ect ures real- address mode runs programs writ t en for t he I nt el 8086,
I nt el 8088, I nt el 80186, and I nt el 80188 processors, or for t he real- address mode of
t he I nt el 286, I nt el386, I nt el486, Pent ium, P6 family, Pent ium 4, and I nt el Xeon
processors.
The execut ion environment of t he processor in real- address mode is designed t o
duplicat e t he execut ion environment of t he I nt el 8086 processor. To an 8086
program, a processor operat ing in real- address mode behaves like a high- speed
8086 processor. The principal feat ures of t his archit ect ure are defined in Chapt er 3,
Basic Execut ion Environment , of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1.
17-2 Vol. 3
8086 EMULATION
The following is a summary of t he core feat ures of t he real- address mode execut ion
environment as would be seen by a program writ t en for t he 8086:
The processor support s a nominal 1- MByt e physical address space ( see Sect ion
17. 1. 1, Address Translat ion in Real- Address Mode , for specific det ails) . This
address space is divided int o segment s, each of which can be up t o 64 KByt es in
lengt h. The base of a segment is specified wit h a 16- bit segment select or, which
is zero ext ended t o form a 20- bit offset from address 0 in t he address space. An
operand wit hin a segment is addressed wit h a 16- bit offset from t he base of t he
segment . A physical address is t hus formed by adding t he offset t o t he 20- bit
segment base ( see Sect ion 17. 1. 1, Address Translat ion in Real- Address Mode ) .
All operands in nat ive 8086 code are 8- bit or 16- bit values. ( Operand size
override prefixes can be used t o access 32- bit operands. )
Eight 16- bit general- purpose regist ers are provided: AX, BX, CX, DX, SP, BP, SI ,
and DI . The ext ended 32 bit regist ers ( EAX, EBX, ECX, EDX, ESP, EBP, ESI , and
EDI ) are accessible t o programs t hat explicit ly perform a size override operat ion.
Four segment regist ers are provided: CS, DS, SS, and ES. ( The FS and GS
regist ers are accessible t o programs t hat explicit ly access t hem. ) The CS regist er
cont ains t he segment select or for t he code segment ; t he DS and ES regist ers
cont ain segment select ors for dat a segment s; and t he SS regist er cont ains t he
segment select or for t he st ack segment .
The 8086 16- bit inst ruct ion point er ( I P) is mapped t o t he lower 16- bit s of t he EI P
regist er. Not e t his regist er is a 32- bit regist er and unint ent ional address wrapping
may occur.
The 16- bit FLAGS regist er cont ains st at us and cont rol flags. ( This regist er is
mapped t o t he 16 least significant bit s of t he 32- bit EFLAGS regist er. )
All of t he I nt el 8086 inst ruct ions are support ed ( see Sect ion 17. 1. 3, I nst ruct ions
Support ed in Real-Address Mode ) .
A single, 16- bit - wide st ack is provided for handling procedure calls and
invocat ions of int errupt and except ion handlers. This st ack is cont ained in t he
st ack segment ident ified wit h t he SS regist er. The SP ( st ack point er) regist er
cont ains an offset int o t he st ack segment . The st ack grows down ( t oward lower
segment offset s) from t he st ack point er. The BP ( base point er) regist er also
cont ains an offset int o t he st ack segment t hat can be used as a point er t o a
paramet er list . When a CALL inst ruct ion is execut ed, t he processor pushes t he
current inst ruct ion point er ( t he 16 least - significant bit s of t he EI P regist er and,
on far calls, t he current value of t he CS regist er) ont o t he st ack. On a ret urn,
init iat ed wit h a RET inst ruct ion, t he processor pops t he saved inst ruct ion point er
from t he st ack int o t he EI P regist er ( and CS regist er on far ret urns) . When an
implicit call t o an int errupt or except ion handler is execut ed, t he processor
pushes t he EI P, CS, and EFLAGS ( low- order 16- bit s only) regist ers ont o t he
st ack. On a ret urn from an int errupt or except ion handler, init iat ed wit h an I RET
inst ruct ion, t he processor pops t he saved inst ruct ion point er and EFLAGS image
from t he st ack int o t he EI P, CS, and EFLAGS regist ers.
Vol. 3 17-3
8086 EMULATION
A single int errupt t able, called t he int errupt vect or t able or int errupt t able, is
provided for handling int errupt s and except ions ( see Figure 17- 2) . The int errupt
t able ( which has 4- byt e ent ries) t akes t he place of t he int errupt descript or t able
( I DT, wit h 8- byt e ent ries) used when handling prot ect ed- mode int errupt s and
except ions. I nt errupt and except ion vect or numbers provide an index t o ent ries
in t he int errupt t able. Each ent ry provides a point er ( called a vect or ) t o an
int errupt - or except ion- handling procedure. See Sect ion 17.1.4, I nt errupt and
Except ion Handling , for more det ails. I t is possible for soft ware t o relocat e t he
I DT by means of t he LI DT inst ruct ion on I A- 32 processors beginning wit h t he
I nt el386 processor.
The x87 FPU is act ive and available t o execut e x87 FPU inst ruct ions in real-
address mode. Programs writ t en t o run on t he I nt el 8087 and I nt el 287 mat h
coprocessors can be run in real- address mode wit hout modificat ion.
The following ext ensions t o t he I nt el 8086 execut ion environment are available in t he
I A- 32 archit ect ures real- address mode. I f backwards compat ibilit y t o I nt el 286 and
I nt el 8086 processors is required, t hese feat ures should not be used in new programs
writ t en t o run in real- address mode.
Two addit ional segment regist ers ( FS and GS) are available.
Many of t he int eger and syst em inst ruct ions t hat have been added t o lat er I A- 32
processors can be execut ed in real- address mode ( see Sect ion 17. 1. 3, I nst ruc-
t ions Support ed in Real-Address Mode ) .
The 32- bit operand prefix can be used in real- address mode programs t o execut e
t he 32- bit forms of inst ruct ions. This prefix also allows real- address mode
programs t o use t he processor s 32- bit general- purpose regist ers.
The 32- bit address prefix can be used in real- address mode programs, allowing
32- bit offset s.
The following sect ions describe address format ion, regist ers, available inst ruct ions,
and int errupt and except ion handling in real- address mode. For informat ion on I / O in
real- address mode, see Chapt er 13, I nput / Out put , of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1.
17.1.1 Address Translation in Real-Address Mode
I n real- address mode, t he processor does not int erpret segment select ors as indexes
int o a descript or t able; inst ead, it uses t hem direct ly t o form linear addresses as t he
8086 processor does. I t shift s t he segment select or left by 4 bit s t o form a 20- bit
base address ( see Figure 17- 1) . The offset int o a segment is added t o t he base
address t o creat e a linear address t hat maps direct ly t o t he physical address space.
When using 8086- st yle address t ranslat ion, it is possible t o specify addresses larger
t han 1 MByt e. For example, wit h a segment select or value of FFFFH and an offset of
FFFFH, t he linear ( and physical) address would be 10FFEFH ( 1 megabyt e plus 64
KByt es) . The 8086 processor, which can form addresses only up t o 20 bit s long, t run-
cat es t he high- order bit , t hereby wrapping t his address t o FFEFH. When operat ing
17-4 Vol. 3
8086 EMULATION
in real- address mode, however, t he processor does not t runcat e such an address and
uses it as a physical address. ( Not e, however, t hat for I A- 32 processors beginning
wit h t he I nt el486 processor, t he A20M# signal can be used in real- address mode t o
mask address line A20, t hereby mimicking t he 20- bit wrap- around behavior of t he
8086 processor. ) Care should be t ake t o ensure t hat A20M# based address wrapping
is handled correct ly in mult iprocessor based syst em.
The I A- 32 processors beginning wit h t he I nt el386 processor can generat e 32- bit
offset s using an address override prefix; however, in real- address mode, t he value of
a 32- bit offset may not exceed FFFFH wit hout causing an except ion.
For full compat ibilit y wit h I nt el 286 real- address mode, pseudo- prot ect ion fault s
( int errupt 12 or 13) occur if a 32- bit offset is generat ed out side t he range 0 t hrough
FFFFH.
17.1.2 Registers Supported in Real-Address Mode
The regist er set available in real- address mode includes all t he regist ers defined for
t he 8086 processor plus t he new regist ers int roduced in lat er I A- 32 processors, such
as t he FS and GS segment regist ers, t he debug regist ers, t he cont rol regist ers, and
t he float ing- point unit regist ers. The 32- bit operand prefix allows a real- address
mode program t o use t he 32- bit general- purpose regist ers ( EAX, EBX, ECX, EDX,
ESP, EBP, ESI , and EDI ) .
17.1.3 Instructions Supported in Real-Address Mode
The following inst ruct ions make up t he core inst ruct ion set for t he 8086 processor. I f
backwards compat ibilit y t o t he I nt el 286 and I nt el 8086 processors is required, only
t hese inst ruct ions should be used in a new program writ t en t o run in real- address
mode.
Figure 17-1. Real-Address Mode Address Translation
19 0
16-bit Segment Selector
3
0 0 0 0 Base
19 0
16-bit Effective Address
15
0 0 0 0 Offset
0
20-bit Linear Address
Linear
Address
+
=
4
16
19
Vol. 3 17-5
8086 EMULATION
Move ( MOV) inst ruct ions t hat move operands bet ween general- purpose
regist ers, segment regist ers, and bet ween memory and general- purpose
regist ers.
The exchange ( XCHG) inst ruct ion.
Load segment regist er inst ruct ions LDS and LES.
Arit hmet ic inst ruct ions ADD, ADC, SUB, SBB, MUL, I MUL, DI V, I DI V, I NC, DEC,
CMP, and NEG.
Logical inst ruct ions AND, OR, XOR, and NOT.
Decimal inst ruct ions DAA, DAS, AAA, AAS, AAM, and AAD.
St ack inst ruct ions PUSH and POP ( t o general- purpose regist ers and segment
regist ers) .
Type conversion inst ruct ions CWD, CDQ, CBW, and CWDE.
Shift and rot at e inst ruct ions SAL, SHL, SHR, SAR, ROL, ROR, RCL, and RCR.
TEST inst ruct ion.
Cont rol inst ruct ions JMP, Jcc, CALL, RET, LOOP, LOOPE, and LOOPNE.
I nt errupt inst ruct ions I NT n, I NTO, and I RET.
EFLAGS cont rol inst ruct ions STC, CLC, CMC, CLD, STD, LAHF, SAHF, PUSHF, and
POPF.
I / O inst ruct ions I N, I NS, OUT, and OUTS.
Load effect ive address ( LEA) inst ruct ion, and t ranslat e ( XLATB) inst ruct ion.
LOCK prefix.
Repeat prefixes REP, REPE, REPZ, REPNE, and REPNZ.
Processor halt ( HLT) inst ruct ion.
No operat ion ( NOP) inst ruct ion.
The following inst ruct ions, added t o lat er I A- 32 processors ( some in t he I nt el 286
processor and t he remainder in t he I nt el386 processor) , can be execut ed in real-
address mode, if backwards compat ibilit y t o t he I nt el 8086 processor is not required.
Move ( MOV) inst ruct ions t hat operat e on t he cont rol and debug regist ers.
Load segment regist er inst ruct ions LSS, LFS, and LGS.
Generalized mult iply inst ruct ions and mult iply immediat e dat a.
Shift and rot at e by immediat e count s.
St ack inst ruct ions PUSHA, PUSHAD, POPA and POPAD, and PUSH immediat e
dat a.
Move wit h sign ext ension inst ruct ions MOVSX and MOVZX.
Long- displacement Jcc inst ruct ions.
Exchange inst ruct ions CMPXCHG, CMPXCHG8B, and XADD.
St ring inst ruct ions MOVS, CMPS, SCAS, LODS, and STOS.
17-6 Vol. 3
8086 EMULATION
Bit t est and bit scan inst ruct ions BT, BTS, BTR, BTC, BSF, and BSR; t he byt e- set -
on condit ion inst ruct ion SETcc; and t he byt e swap ( BSWAP) inst ruct ion.
Double shift inst ruct ions SHLD and SHRD.
EFLAGS cont rol inst ruct ions PUSHF and POPF.
ENTER and LEAVE cont rol inst ruct ions.
BOUND inst ruct ion.
CPU ident ificat ion ( CPUI D) inst ruct ion.
Syst em inst ruct ions CLTS, I NVD, WI NVD, I NVLPG, LGDT, SGDT, LI DT, SI DT,
LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
Execut ion of any of t he ot her I A- 32 archit ect ure inst ruct ions ( not given in t he
previous t wo list s) in real- address mode result in an invalid- opcode except ion ( # UD)
being generat ed.
17.1.4 Interrupt and Exception Handling
When operat ing in real- address mode, soft ware must provide int errupt and excep-
t ion- handling facilit ies t hat are separat e from t hose provided in prot ect ed mode.
Even during t he early st ages of processor init ializat ion when t he processor is st ill in
real- address mode, element ary real- address mode int errupt and except ion- handling
facilit ies must be provided t o insure reliable operat ion of t he processor, or t he init ial-
izat ion code must insure t hat no int errupt s or except ions will occur.
The I A- 32 processors handle int errupt s and except ions in real- address mode similar
t o t he way t hey handle t hem in prot ect ed mode. When a processor receives an int er-
rupt or generat es an except ion, it uses t he vect or number of t he int errupt or excep-
t ion as an index int o t he int errupt t able. ( I n prot ect ed mode, t he int errupt t able is
called t he i nt er r upt descr i pt or t abl e ( I DT) , but in real- address mode, t he t able is
usually called t he i nt er r upt vect or t abl e, or simply t he i nt er r upt t abl e. ) The ent ry
in t he int errupt vect or t able provides a point er t o an int errupt - or except ion- handler
procedure. ( The point er consist s of a segment select or for a code segment and a 16-
bit offset int o t he segment . ) The processor performs t he following act ions t o make an
implicit call t o t he select ed handler:
1. Pushes t he current values of t he CS and EI P regist ers ont o t he st ack. ( Only t he 16
least - significant bit s of t he EI P regist er are pushed. )
2. Pushes t he low- order 16 bit s of t he EFLAGS regist er ont o t he st ack.
3. Clears t he I F flag in t he EFLAGS regist er t o disable int errupt s.
4. Clears t he TF, RC, and AC flags, in t he EFLAGS regist er.
5. Transfers program cont rol t o t he locat ion specified in t he int errupt vect or t able.
An I RET inst ruct ion at t he end of t he handler procedure reverses t hese st eps t o
ret urn program cont rol t o t he int errupt ed program. Except ions do not ret urn error
codes in real- address mode.
Vol. 3 17-7
8086 EMULATION
The int errupt vect or t able is an array of 4- byt e ent ries ( see Figure 17- 2) . Each ent ry
consist s of a far point er t o a handler procedure, made up of a segment select or and
an offset . The processor scales t he int errupt or except ion vect or by 4 t o obt ain an
offset int o t he int errupt t able. Following reset , t he base of t he int errupt vect or t able
is locat ed at physical address 0 and it s limit is set t o 3FFH. I n t he I nt el 8086
processor, t he base address and limit of t he int errupt vect or t able cannot be
changed. I n t he lat er I A- 32 processors, t he base address and limit of t he int errupt
vect or t able are cont ained in t he I DTR regist er and can be changed using t he LI DT
inst ruct ion.
( For backward compat ibilit y t o I nt el 8086 processors, t he default base address and
limit of t he int errupt vect or t able should not be changed. )
Table 17- 1 shows t he int errupt and except ion vect ors t hat can be generat ed in real-
address mode and virt ual- 8086 mode, and in t he I nt el 8086 processor. See Chapt er
6, I nt errupt and Except ion Handling , for a descript ion of t he except ion condit ions.
Figure 17-2. Interrupt Vector Table in Real-Address Mode
0
2
4
8
12
0 15
Segment Selector
Offset
* Interrupt vector number 0 selects entry 0
Interrupt Vector 0*
Entry 1
Entry 2
Entry 3
Up to Entry 255
IDTR
(called interrupt vector 0) in the interrupt
vector table. Interrupt vector 0 in turn
points to the start of the interrupt handler
for interrupt 0.
17-8 Vol. 3
8086 EMULATION
17.2 VIRTUAL-8086 MODE
Virt ual- 8086 mode is act ually a special t ype of a t ask t hat runs in prot ect ed mode.
When t he operat ing- syst em or execut ive swit ches t o a virt ual- 8086- mode t ask, t he
processor emulat es an I nt el 8086 processor. The execut ion environment of t he
processor while in t he 8086- emulat ion st at e is t he same as is described in Sect ion
17. 1, Real- Address Mode for real- address mode, including t he ext ensions. The
maj or difference bet ween t he t wo modes is t hat in virt ual- 8086 mode t he 8086
emulat or uses some prot ect ed- mode services ( such as t he prot ect ed- mode int errupt
and except ion- handling and paging facilit ies) .
As in real- address mode, any new or legacy program t hat has been assembled
and/ or compiled t o run on an I nt el 8086 processor will run in a virt ual- 8086- mode
t ask. And several 8086 programs can be run as virt ual- 8086- mode t asks concur-
rent ly wit h normal prot ect ed- mode t asks, using t he processor s mult it asking
facilit ies.
Table 17-1. Real-Address Mode Exceptions and Interrupts
Vector
No.
Description Real-Address
Mode
Virtual-8086
Mode
Intel 8086
Processor
0 Divide Error (#DE) Yes Yes Yes
1 Debug Exception (#DB) Yes Yes No
2 NMI Interrupt Yes Yes Yes
3 Breakpoint (#BP) Yes Yes Yes
4 Overflow (#OF) Yes Yes Yes
5 BOUND Range Exceeded (#BR) Yes Yes Reserved
6 Invalid Opcode (#UD) Yes Yes Reserved
7 Device Not Available (#NM) Yes Yes Reserved
8 Double Fault (#DF) Yes Yes Reserved
9 (Intel reserved. Do not use.) Reserved Reserved Reserved
10 Invalid TSS (#TS) Reserved Yes Reserved
11 Segment Not Present (#NP) Reserved Yes Reserved
12 Stack Fault (#SS) Yes Yes Reserved
13 General Protection (#GP)* Yes Yes Reserved
14 Page Fault (#PF) Reserved Yes Reserved
15 (Intel reserved. Do not use.) Reserved Reserved Reserved
16 Floating-Point Error (#MF) Yes Yes Reserved
17 Alignment Check (#AC) Reserved Yes Reserved
18 Machine Check (#MC) Yes Yes Reserved
Vol. 3 17-9
8086 EMULATION
17.2.1 Enabling Virtual-8086 Mode
The processor runs in virt ual- 8086 mode when t he VM ( virt ual machine) flag in t he
EFLAGS regist er is set . This flag can only be set when t he processor swit ches t o a
new prot ect ed- mode t ask or resumes virt ual- 8086 mode via an I RET inst ruct ion.
Syst em soft ware cannot change t he st at e of t he VM flag direct ly in t he EFLAGS
regist er ( for example, by using t he POPFD inst ruct ion) . I nst ead it changes t he flag in
t he image of t he EFLAGS regist er st ored in t he TSS or on t he st ack following a call t o
an int errupt - or except ion- handler procedure. For example, soft ware set s t he VM flag
in t he EFLAGS image in t he TSS when first creat ing a virt ual- 8086 t ask.
The processor t est s t he VM flag under t hree general condit ions:
When loading segment regist ers, t o det ermine whet her t o use 8086- st yle
address t ranslat ion.
When decoding inst ruct ions, t o det ermine which inst ruct ions are not support ed in
virt ual- 8086 mode and which inst ruct ions are sensit ive t o I OPL.
When checking privileged inst ruct ions, on page accesses, or when performing
ot her permission checks. ( Virt ual- 8086 mode always execut es at CPL 3.)
17.2.2 Structure of a Virtual-8086 Task
A virt ual- 8086- mode t ask consist s of t he following it ems:
A 32- bit TSS for t he t ask.
The 8086 program.
A virt ual- 8086 monit or.
8086 operat ing- syst em services.
The TSS of t he new t ask must be a 32- bit TSS, not a 16- bit TSS, because t he 16- bit
TSS does not load t he most - significant word of t he EFLAGS regist er, which cont ains
t he VM flag. All TSSs, st acks, dat a, and code used t o handle except ions when in
virt ual- 8086 mode must also be 32- bit segment s.
19-31 (Intel reserved. Do not use.) Reserved Reserved Reserved
32-
255
User Defined Interrupts Yes Yes Yes
NOTE:
* In the real-address mode, vector 13 is the segment overrun exception. In protected and vir-
tual-8086 modes, this exception covers all general-protection error conditions, including traps
to the virtual-8086 monitor from virtual-8086 mode.
Table 17-1. Real-Address Mode Exceptions and Interrupts (Contd.)
Vector
No.
Description Real-Address
Mode
Virtual-8086
Mode
Intel 8086
Processor
17-10 Vol. 3
8086 EMULATION
The processor ent ers virt ual- 8086 mode t o run t he 8086 program and ret urns t o
prot ect ed mode t o run t he virt ual- 8086 monit or.
The virt ual- 8086 monit or is a 32- bit prot ect ed- mode code module t hat runs at a CPL
of 0. The monit or consist s of init ializat ion, int errupt - and except ion- handling, and I / O
emulat ion procedures t hat emulat e a personal comput er or ot her 8086- based plat -
form. Typically, t he monit or is eit her part of or closely associat ed wit h t he prot ect ed-
mode general- prot ect ion ( # GP) except ion handler, which also runs at a CPL of 0. As
wit h any prot ect ed- mode code module, code- segment descript ors for t he virt ual-
8086 monit or must exist in t he GDT or in t he t asks LDT. The virt ual- 8086 monit or
also may need dat a- segment descript ors so it can examine t he I DT or ot her part s of
t he 8086 program in t he first 1 MByt e of t he address space. The linear addresses
above 10FFEFH are available for t he monit or, t he operat ing syst em, and ot her syst em
soft ware.
The 8086 operat ing- syst em services consist s of a kernel and/ or operat ing- syst em
procedures t hat t he 8086 program makes calls t o. These services can be imple-
ment ed in eit her of t he following t wo ways:
They can be included in t he 8086 program. This approach is desirable for eit her
of t he following reasons:
The 8086 program code modifies t he 8086 operat ing- syst em services.
There is not sufficient development t ime t o merge t he 8086 operat ing-
syst em services int o main operat ing syst em or execut ive.
They can be implement ed or emulat ed in t he virt ual- 8086 monit or. This approach
is desirable for any of t he following reasons:
The 8086 operat ing- syst em procedures can be more easily coordinat ed
among several virt ual- 8086 t asks.
Memory can be saved by not duplicat ing 8086 operat ing- syst em procedure
code for several virt ual- 8086 t asks.
The 8086 operat ing- syst em procedures can be easily emulat ed by calls t o t he
main operat ing syst em or execut ive.
The approach chosen for implement ing t he 8086 operat ing- syst em services may
result in different virt ual- 8086- mode t asks using different 8086 operat ing- syst em
services.
17.2.3 Paging of Virtual-8086 Tasks
Even t hough a program running in virt ual- 8086 mode can use only 20- bit linear
addresses, t he processor convert s t hese addresses int o 32- bit linear addresses
before mapping t hem t o t he physical address space. I f paging is being used, t he
8086 address space for a program running in virt ual- 8086 mode can be paged and
locat ed in a set of pages in physical address space. I f paging is used, it is t ransparent
t o t he program running in virt ual- 8086 mode j ust as it is for any t ask running on t he
processor.
Vol. 3 17-11
8086 EMULATION
Paging is not necessary for a single virt ual- 8086- mode t ask, but paging is useful or
necessary in t he following sit uat ions:
When running mult iple virt ual- 8086- mode t asks. Here, paging allows t he lower 1
MByt e of t he linear address space for each virt ual- 8086- mode t ask t o be mapped
t o a different physical address locat ion.
When emulat ing t he 8086 address- wraparound t hat occurs at 1 MByt e. When
using 8086- st yle address t ranslat ion, it is possible t o specify addresses larger
t han 1 MByt e. These addresses aut omat ically wraparound in t he I nt el 8086
processor ( see Sect ion 17. 1. 1, Address Translat ion in Real-Address Mode ) . I f
any 8086 programs depend on address wraparound, t he same effect can be
achieved in a virt ual- 8086- mode t ask by mapping t he linear addresses bet ween
100000H and 110000H and linear addresses bet ween 0 and 10000H t o t he same
physical addresses.
When sharing t he 8086 operat ing- syst em services or ROM code t hat is common
t o several 8086 programs running as different 8086- mode t asks.
When redirect ing or t rapping references t o memory- mapped I / O devices.
17.2.4 Protection within a Virtual-8086 Task
Prot ect ion is not enforced bet ween t he segment s of an 8086 program. Eit her of t he
following t echniques can be used t o prot ect t he syst em soft ware running in a virt ual-
8086- mode t ask from t he 8086 program:
Reserve t he first 1 MByt e plus 64 KByt es of each t asks linear address space for
t he 8086 program. An 8086 processor t ask cannot generat e addresses out side
t his range.
Use t he U/ S flag of page- t able ent ries t o prot ect t he virt ual- 8086 monit or and
ot her syst em soft ware in t he virt ual- 8086 mode t ask space. When t he processor
is in virt ual- 8086 mode, t he CPL is 3. Therefore, an 8086 processor program has
only user privileges. I f t he pages of t he virt ual- 8086 monit or have supervisor
privilege, t hey cannot be accessed by t he 8086 program.
17.2.5 Entering Virtual-8086 Mode
Figure 17- 3 summarizes t he met hods of ent ering and leaving virt ual- 8086 mode.
The processor swit ches t o virt ual- 8086 mode in eit her of t he following sit uat ions:
Task swit ch when t he VM flag is set t o 1 in t he EFLAGS regist er image st ored in
t he TSS for t he t ask. Here t he t ask swit ch can be init iat ed in eit her of t wo ways:
A CALL or JMP inst ruct ion.
An I RET inst ruct ion, where t he NT flag in t he EFLAGS image is set t o 1.
Ret urn from a prot ect ed- mode int errupt or except ion handler when t he VM flag is
set t o 1 in t he EFLAGS regist er image on t he st ack.
17-12 Vol. 3
8086 EMULATION
When a t ask swit ch is used t o ent er virt ual- 8086 mode, t he TSS for t he virt ual- 8086-
mode t ask must be a 32- bit TSS. ( I f t he new TSS is a 16- bit TSS, t he upper word of
t he EFLAGS regist er is not in t he TSS, causing t he processor t o clear t he VM flag
when it loads t he EFLAGS regist er. ) The processor updat es t he VM flag prior t o
loading t he segment regist ers from t heir images in t he new TSS. The new set t ing of
t he VM flag det ermines whet her t he processor int erpret s t he cont ent s of t he segment
regist ers as 8086- st yle segment select ors or prot ect ed- mode segment select ors.
When t he VM flag is set , t he segment regist ers are loaded from t he TSS, using 8086-
st yle address t ranslat ion t o form base addresses.
See Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode , for infor-
mat ion on ent ering virt ual- 8086 mode on a ret urn from an int errupt or except ion
handler.
Vol. 3 17-13
8086 EMULATION
Figure 17-3. Entering and Leaving Virtual-8086 Mode
Monitor
Virtual-8086
Real Mode
Code
Protected-
Mode Tasks
Virtual-8086
Mode Tasks
(8086
Programs)
Protected-
Mode Interrupt
and Exception
Handlers
Task Switch
1
VM = 1
Protected
Mode
Virtual-8086
Mode
Real-Address
Mode
RESET
PE=1
PE=0 or
RESET
#GP Exception
3
CALL
RET
Task Switch
VM=0
Redirect Interrupt to 8086 Program
Interrupt or Exception Handler
6
IRET
4
Interrupt or
Exception
2
VM = 0
NOTES:
- CALL or JMP where the VM flag in the EFLAGS image is 1.
- IRET where VM is 1 and NT is 1.
4. Normal return from protected-mode interrupt or exception handler.
3. General-protection exception caused by software interrupt (INT n), IRET,
POPF, PUSHF, IN, or OUT when IOPL is less than 3.
2. Hardware interrupt or exception; software interrupt (INT n) when IOPL is 3.
5. A return from the 8086 monitor to redirect an interrupt or exception back
to an interrupt or exception handler in the 8086 program running in virtual-
6. Internal redirection of a software interrupt (INT n) when VME is 1,
IOPL is <3, and the redirection bit is 1.
IRET
5
8086 mode.
1. Task switch carried out in either of two ways:
17-14 Vol. 3
8086 EMULATION
17.2.6 Leaving Virtual-8086 Mode
The processor can leave t he virt ual- 8086 mode only t hrough an int errupt or excep-
t ion. The following are sit uat ions where an int errupt or except ion will lead t o t he
processor leaving virt ual- 8086 mode ( see Figure 17- 3) :
The processor services a hardware int errupt generat ed t o signal t he suspension
of execut ion of t he virt ual- 8086 applicat ion. This hardware int errupt may be
generat ed by a t imer or ot her ext ernal mechanism. Upon receiving t he hardware
int errupt , t he processor ent ers prot ect ed mode and swit ches t o a prot ect ed-
mode ( or anot her virt ual- 8086 mode) t ask eit her t hrough a t ask gat e in t he
prot ect ed- mode I DT or t hrough a t rap or int errupt gat e t hat point s t o a handler
t hat init iat es a t ask swit ch. A t ask swit ch from a virt ual- 8086 t ask t o anot her t ask
loads t he EFLAGS regist er from t he TSS of t he new t ask. The value of t he VM flag
in t he new EFLAGS det ermines if t he new t ask execut es in virt ual- 8086 mode or
not .
The processor services an except ion caused by code execut ing t he virt ual- 8086
t ask or services a hardware int errupt t hat belongs t o t he virt ual- 8086 t ask.
Here, t he processor ent ers prot ect ed mode and services t he except ion or
hardware int errupt t hrough t he prot ect ed- mode I DT ( normally t hrough an
int errupt or t rap gat e) and t he prot ect ed- mode except ion- and int errupt -
handlers. The processor may handle t he except ion or int errupt wit hin t he cont ext
of t he virt ual 8086 t ask and ret urn t o virt ual- 8086 mode on a ret urn from t he
handler procedure. The processor may also execut e a t ask swit ch and handle t he
except ion or int errupt in t he cont ext of anot her t ask.
The processor services a soft ware int errupt generat ed by code execut ing in t he
virt ual- 8086 t ask ( such as a soft ware int errupt t o call a MS- DOS* operat ing
syst em rout ine) . The processor provides several met hods of handling t hese
soft ware int errupt s, which are discussed in det ail in Sect ion 17. 3. 3, Class
3Soft ware I nt errupt Handling in Virt ual- 8086 Mode . Most of t hem involve t he
processor ent ering prot ect ed mode, oft en by means of a general- prot ect ion
( # GP) except ion. I n prot ect ed mode, t he processor can send t he int errupt t o t he
virt ual- 8086 monit or for handling and/ or redirect t he int errupt back t o t he
applicat ion program running in virt ual- 8086 mode t ask for handling.
I A- 32 processors t hat incorporat e t he virt ual mode ext ension ( enabled wit h t he
VME flag in cont rol regist er CR4) are capable of redirect ing soft ware- generat ed
int errupt s back t o t he programs int errupt handlers wit hout leaving virt ual- 8086
mode. See Sect ion 17.3.3. 4, Met hod 5: Soft ware I nt errupt Handling , for more
informat ion on t his mechanism.
A hardware reset init iat ed by assert ing t he RESET or I NI T pin is a special kind of
int errupt . When a RESET or I NI T is signaled while t he processor is in virt ual- 8086
mode, t he processor leaves virt ual- 8086 mode and ent ers real- address mode.
Execut ion of t he HLT inst ruct ion in virt ual- 8086 mode will cause a general-
prot ect ion ( GP# ) fault , which t he prot ect ed- mode handler generally sends t o t he
virt ual- 8086 monit or. The virt ual- 8086 monit or t hen det ermines t he correct
Vol. 3 17-15
8086 EMULATION
execut ion sequence aft er verifying t hat it was ent ered as a result of a HLT
execut ion.
See Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode , for infor-
mat ion on leaving virt ual- 8086 mode t o handle an int errupt or except ion generat ed
in virt ual- 8086 mode.
17.2.7 Sensitive Instructions
When an I A- 32 processor is running in virt ual- 8086 mode, t he CLI , STI , PUSHF, POPF,
I NT n, and I RET inst ruct ions are sensit ive t o I OPL. The I N, I NS, OUT, and OUTS
inst ruct ions, which are sensit ive t o I OPL in prot ect ed mode, are not sensit ive in
virt ual- 8086 mode.
The CPL is always 3 while running in virt ual- 8086 mode; if t he I OPL is less t han 3, an
at t empt t o use t he I OPL- sensit ive inst ruct ions list ed above t riggers a general- prot ec-
t ion except ion ( # GP) . These inst ruct ions are sensit ive t o I OPL t o give t he virt ual-
8086 monit or a chance t o emulat e t he facilit ies t hey affect .
17.2.8 Virtual-8086 Mode I/O
Many 8086 programs writ t en for non- mult it asking syst ems direct ly access I / O port s.
This pract ice may cause problems in a mult it asking environment . I f more t han one
program accesses t he same port , t hey may int erfere wit h each ot her. Most mult i-
t asking syst ems require applicat ion programs t o access I / O port s t hrough t he oper-
at ing syst em. This result s in simplified, cent ralized cont rol.
The processor provides I / O prot ect ion for creat ing I / O t hat is compat ible wit h t he
environment and t ransparent t o 8086 programs. Designers may t ake any of several
possible approaches t o prot ect ing I / O port s:
Prot ect t he I / O address space and generat e except ions for all at t empt s t o
perform I / O direct ly.
Let t he 8086 program perform I / O direct ly.
Generat e except ions on at t empt s t o access specific I / O port s.
Generat e except ions on at t empt s t o access specific memory- mapped I / O port s.
The met hod of cont rolling access t o I / O port s depends upon whet her t hey are
I / O- port mapped or memory mapped.
17.2.8.1 I/O-Port-Mapped I/O
The I / O permission bit map in t he TSS can be used t o generat e except ions on
at t empt s t o access specific I / O port addresses. The I / O permission bit map of each
virt ual- 8086- mode t ask det ermines which I / O addresses generat e except ions for
t hat t ask. Because each t ask may have a different I / O permission bit map, t he
addresses t hat generat e except ions for one t ask may be different from t he addresses
17-16 Vol. 3
8086 EMULATION
for anot her t ask. This differs from prot ect ed mode in which, if t he CPL is less t han or
equal t o t he I OPL, I / O access is allowed wit hout checking t he I / O permission bit map.
See Chapt er 13, I nput / Out put , in t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1, for more informat ion about t he I / O permission bit
map.
17.2.8.2 Memory-Mapped I/O
I n syst ems which use memory- mapped I / O, t he paging facilit ies of t he processor can
be used t o generat e except ions for at t empt s t o access I / O port s. The virt ual- 8086
monit or may use paging t o cont rol memory- mapped I / O in t hese ways:
Map part of t he linear address space of each t ask t hat needs t o perform I / O t o t he
physical address space where I / O port s are placed. By put t ing t he I / O port s at
different addresses ( in different pages) , t he paging mechanism can enforce
isolat ion bet ween t asks.
Map part of t he linear address space t o pages t hat are not - present . This
generat es an except ion whenever a t ask at t empt s t o perform I / O t o t hose pages.
Syst em soft ware t hen can int erpret t he I / O operat ion being at t empt ed.
Soft ware emulat ion of t he I / O space may require t oo much operat ing syst em int er-
vent ion under some condit ions. I n t hese cases, it may be possible t o generat e an
except ion for only t he first at t empt t o access I / O. The syst em soft ware t hen may
det ermine whet her a program can be given exclusive cont rol of I / O t emporarily, t he
prot ect ion of t he I / O space may be lift ed, and t he program allowed t o run at full
speed.
17.2.8.3 Special I/O Buffers
Buffers of int elligent cont rollers ( for example, a bit - mapped frame buffer) also can be
emulat ed using page mapping. The linear space for t he buffer can be mapped t o a
different physical space for each virt ual- 8086- mode t ask. The virt ual- 8086 monit or
t hen can cont rol which virt ual buffer t o copy ont o t he real buffer in t he physical
address space.
17.3 INTERRUPT AND EXCEPTION HANDLING
IN VIRTUAL-8086 MODE
When t he processor receives an int errupt or det ect s an except ion condit ion while in
virt ual- 8086 mode, it invokes an int errupt or except ion handler, j ust as it does in
prot ect ed or real- address mode. The int errupt or except ion handler t hat is invoked
and t he mechanism used t o invoke it depends on t he class of int errupt or except ion
t hat has been det ect ed or generat ed and t he st at e of various syst em flags and fields.
Vol. 3 17-17
8086 EMULATION
I n virt ual- 8086 mode, t he int errupt s and except ions are divided int o t hree classes for
t he purposes of handling:
Cl ass 1 All processor- generat ed except ions and all hardware int errupt s,
including t he NMI int errupt and t he hardware int errupt s sent t o t he processor s
ext ernal int errupt delivery pins. All class 1 except ions and int errupt s are handled
by t he prot ect ed- mode except ion and int errupt handlers.
Cl ass 2 Special case for maskable hardware int errupt s ( Sect ion 6. 3. 2,
Maskable Hardware I nt errupt s ) when t he virt ual mode ext ensions are enabled.
Cl ass 3 All soft ware- generat ed int errupt s, t hat is int errupt s generat ed wit h
t he I NT n inst ruct ion
1
.
The met hod t he processor uses t o handle class 2 and 3 int errupt s depends on t he
set t ing of t he following flags and fields:
I OPL f i el d ( bi t s 12 and 13 i n t he EFLAGS r egi st er ) Cont rols how class 3
soft ware int errupt s are handled when t he processor is in virt ual- 8086 mode ( see
Sect ion 2. 3, Syst em Flags and Fields in t he EFLAGS Regist er ) . This field also
cont rols t he enabling of t he VI F and VI P flags in t he EFLAGS regist er when t he
VME flag is set . The VI F and VI P flags are provided t o assist in t he handling of
class 2 maskable hardware int errupt s.
VME f l ag ( bi t 0 i n cont r ol r egi st er CR4) Enables t he virt ual mode ext ension
for t he processor when set ( see Sect ion 2. 5, Cont rol Regist ers ) .
Sof t w ar e i nt er r upt r edi r ect i on bi t map ( 32 by t es i n t he TSS, see
Fi gur e 17- 5) Cont ains 256 flags t hat indicat es how class 3 soft ware
int errupt s should be handled when t hey occur in virt ual- 8086 mode. A soft ware
int errupt can be direct ed eit her t o t he int errupt and except ion handlers in t he
current ly running 8086 program or t o t he prot ect ed- mode int errupt and
except ion handlers.
The vi r t ual i nt er r upt f l ag ( VI F) and vi r t ual i nt er r upt pendi ng f l ag ( VI P)
i n t he EFLAGS r egi st er Provides vi r t ual i nt er r upt suppor t for t he handling
of class 2 maskable hardware int errupt s ( see Sect ion 17. 3. 2, Class 2Maskable
Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual I nt errupt
Mechanism ) .
NOTE
The VME flag, soft ware int errupt redirect ion bit map, and VI F and VI P
flags are only available in I A- 32 processors t hat support t he virt ual
mode ext ensions. These ext ensions were int roduced in t he I A- 32
archit ect ure wit h t he Pent ium processor.
The following sect ions describe t he act ions t hat processor t akes and t he possible
act ions of int errupt and except ion handlers for t he t wo classes of int errupt s described
1. The INT 3 instruction is a special case (see the description of the INT n instruction in Chapter 3,
Instruction Set Reference, A-M, of the Intel 64 and IA-32 Architectures Software Developers
Manual, Volume 2A).
17-18 Vol. 3
8086 EMULATION
in t he previous paragraphs. These sect ions describe t hree possible t ypes of int errupt
and except ion handlers:
Pr ot ect ed- mode i nt er r upt and ex cept i ons handl er s These are t he
st andard handlers t hat t he processor calls t hrough t he prot ect ed- mode I DT.
Vi r t ual - 8086 moni t or i nt er r upt and ex cept i on handl er s These handlers
are resident in t he virt ual- 8086 monit or, and t hey are commonly accessed
t hrough a general- prot ect ion except ion ( # GP, int errupt 13) t hat is direct ed t o t he
prot ect ed- mode general- prot ect ion except ion handler.
8086 pr ogr am i nt er r upt and ex cept i on handl er s These handlers are part
of t he 8086 program t hat is running in virt ual- 8086 mode.
The following sect ions describe how t hese handlers are used, depending on t he
select ed class and met hod of int errupt and except ion handling.
17.3.1 Class 1Hardware Interrupt and Exception Handling in
Virtual-8086 Mode
I n virt ual- 8086 mode, t he Pent ium, P6 family, Pent ium 4, and I nt el Xeon processors
handle hardware int errupt s and except ions in t he same manner as t hey are handled
by t he I nt el486 and I nt el386 processors. They invoke t he prot ect ed- mode int errupt
or except ion handler t hat t he int errupt or except ion vect or point s t o in t he I DT. Here,
t he I DT ent ry must cont ain eit her a 32- bit t rap or int errupt gat e or a t ask gat e. The
following sect ions describe various ways t hat a virt ual- 8086 mode int errupt or excep-
t ion can be handled aft er t he prot ect ed- mode handler has been invoked.
See Sect ion 17. 3. 2, Class 2Maskable Hardware I nt errupt Handling in Virt ual- 8086
Mode Using t he Virt ual I nt errupt Mechanism , for a descript ion of t he virt ual int errupt
mechanism t hat is available for handling maskable hardware int errupt s while in
virt ual- 8086 mode. When t his mechanism is eit her not available or not enabled,
maskable hardware int errupt s are handled in t he same manner as except ions, as
described in t he following sect ions.
17.3.1.1 Handling an Interrupt or Exception Through a Protected-Mode
Trap or Interrupt Gate
When an int errupt or except ion vect or point s t o a 32- bit t rap or int errupt gat e in t he
I DT, t he gat e must in t urn point t o a nonconforming, privilege- level 0, code segment .
When accessing t his code segment , processor performs t he following st eps.
1. Swit ches t o 32- bit prot ect ed mode and privilege level 0.
2. Saves t he st at e of t he processor on t he privilege- level 0 st ack. The st at es of t he
EI P, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS regist ers are saved ( see
Figure 17- 4) .
3. Clears t he segment regist ers. Saving t he DS, ES, FS, and GS regist ers on t he
st ack and t hen clearing t he regist ers let s t he int errupt or except ion handler safely
Vol. 3 17-19
8086 EMULATION
save and rest ore t hese regist ers regardless of t he t ype segment select ors t hey
cont ain ( prot ect ed- mode or 8086- st yle) . The int errupt and except ion handlers,
which may be called in t he cont ext of eit her a prot ect ed- mode t ask or a virt ual-
8086- mode t ask, can use t he same code sequences for saving and rest oring t he
regist ers for any t ask. Clearing t hese regist ers before execut ion of t he I RET
inst ruct ion does not cause a t rap in t he int errupt handler. I nt errupt procedures
t hat expect values in t he segment regist ers or t hat ret urn values in t he segment
regist ers must use t he regist er images saved on t he st ack for privilege level 0.
4. Clears VM, NT, RF and TF flags ( in t he EFLAGS regist er) . I f t he gat e is an int errupt
gat e, clears t he I F flag.
5. Begins execut ing t he select ed int errupt or except ion handler.
I f t he t rap or int errupt gat e references a procedure in a conforming segment or in a
segment at a privilege level ot her t han 0, t he processor generat es a general- prot ec-
t ion except ion ( # GP) . Here, t he error code is t he segment select or of t he code
segment t o which a call was at t empt ed.
Figure 17-4. Privilege Level 0 Stack After Interrupt or
Exception in Virtual-8086 Mode
Unused
Old GS
Old ESP
With Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP
Error Code New ESP
TSS
Unused
Old GS
Old ESP
Without Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP New ESP
TSS
17-20 Vol. 3
8086 EMULATION
I nt errupt and except ion handlers can examine t he VM flag on t he st ack t o det ermine
if t he int errupt ed procedure was running in virt ual- 8086 mode. I f so, t he int errupt or
except ion can be handled in one of t hree ways:
The prot ect ed- mode int errupt or except ion handler t hat was called can handle
t he int errupt or except ion.
The prot ect ed- mode int errupt or except ion handler can call t he virt ual- 8086
monit or t o handle t he int errupt or except ion.
The virt ual- 8086 monit or ( if called) can in t urn pass cont rol back t o t he 8086
programs int errupt and except ion handler.
I f t he int errupt or except ion is handled wit h a prot ect ed- mode handler, t he handler
can ret urn t o t he int errupt ed program in virt ual- 8086 mode by execut ing an I RET
inst ruct ion. This inst ruct ion loads t he EFLAGS and segment regist ers from t he
images saved in t he privilege level 0 st ack ( see Figure 17- 4) . A set VM flag in t he
EFLAGS image causes t he processor t o swit ch back t o virt ual- 8086 mode. The CPL at
t he t ime t he I RET inst ruct ion is execut ed must be 0, ot herwise t he processor does
not change t he st at e of t he VM flag.
The virt ual- 8086 monit or runs at privilege level 0, like t he prot ect ed- mode int errupt
and except ion handlers. I t is commonly closely t ied t o t he prot ect ed- mode general-
prot ect ion except ion ( # GP, vect or 13) handler. I f t he prot ect ed- mode int errupt or
except ion handler calls t he virt ual- 8086 monit or t o handle t he int errupt or except ion,
t he ret urn from t he virt ual- 8086 monit or t o t he int errupt ed virt ual- 8086 mode
program requires t wo ret urn inst ruct ions: a RET inst ruct ion t o ret urn t o t he
prot ect ed- mode handler and an I RET inst ruct ion t o ret urn t o t he int errupt ed
program.
The virt ual- 8086 monit or has t he opt ion of direct ing t he int errupt and except ion back
t o an int errupt or except ion handler t hat is part of t he int errupt ed 8086 program, as
described in Sect ion 17. 3. 1. 2, Handling an I nt errupt or Except ion Wit h an 8086
Program I nt errupt or Except ion Handler .
17.3.1.2 Handling an Interrupt or Exception With an 8086 Program
Interrupt or Exception Handler
Because it was designed t o run on an 8086 processor, an 8086 program running in a
virt ual- 8086- mode t ask cont ains an 8086- st yle int errupt vect or t able, which st art s at
linear address 0. I f t he virt ual- 8086 monit or correct ly direct s an int errupt or excep-
t ion vect or back t o t he virt ual- 8086- mode t ask it came from, t he handlers in t he
8086 program can handle t he int errupt or except ion. The virt ual- 8086 monit or must
carry out t he following st eps t o send an int errupt or except ion back t o t he 8086
program:
1. Use t he 8086 int errupt vect or t o locat e t he appropriat e handler procedure in t he
8086 program int errupt t able.
Vol. 3 17-21
8086 EMULATION
2. St ore t he EFLAGS ( low- order 16 bit s only) , CS and EI P values of t he 8086
program on t he privilege- level 3 st ack. This is t he st ack t hat t he virt ual- 8086-
mode t ask is using. ( The 8086 handler may use or modify t his informat ion. )
3. Change t he ret urn link on t he privilege- level 0 st ack t o point t o t he privilege- level
3 handler procedure.
4. Execut e an I RET inst ruct ion t o pass cont rol t o t he 8086 program handler.
5. When t he I RET inst ruct ion from t he privilege- level 3 handler t riggers a general-
prot ect ion except ion ( # GP) and t hus effect ively again calls t he virt ual- 8086
monit or, rest ore t he ret urn link on t he privilege- level 0 st ack t o point t o t he
original, int errupt ed, privilege- level 3 procedure.
6. Copy t he low order 16 bit s of t he EFLAGS image from t he privilege- level 3 st ack
t o t he privilege- level 0 st ack ( because some 8086 handlers modify t hese flags t o
ret urn informat ion t o t he code t hat caused t he int errupt ) .
7. Execut e an I RET inst ruct ion t o pass cont rol back t o t he int errupt ed 8086
program.
Not e t hat if an operat ing syst em int ends t o support all 8086 MS- DOS- based
programs, it is necessary t o use t he act ual 8086 int errupt and except ion handlers
supplied wit h t he program. The reason for t his is t hat some programs modify t heir
own int errupt vect or t able t o subst it ut e ( or hook in series) t heir own specialized
int errupt and except ion handlers.
17.3.1.3 Handling an Interrupt or Exception Through a Task Gate
When an int errupt or except ion vect or point s t o a t ask gat e in t he I DT, t he processor
performs a t ask swit ch t o t he select ed int errupt - or except ion- handling t ask. The
following act ions are carried out as part of t his t ask swit ch:
1. The EFLAGS regist er wit h t he VM flag set is saved in t he current TSS.
2. The link field in t he TSS of t he called t ask is loaded wit h t he segment select or of
t he TSS for t he int errupt ed virt ual- 8086- mode t ask.
3. The EFLAGS regist er is loaded from t he image in t he new TSS, which clears t he
VM flag and causes t he processor t o swit ch t o prot ect ed mode.
4. The NT flag in t he EFLAGS regist er is set .
5. The processor begins execut ing t he select ed int errupt - or except ion- handler
t ask.
When an I RET inst ruct ion is execut ed in t he handler t ask and t he NT flag in t he
EFLAGS regist er is set , t he processors swit ches from a prot ect ed- mode int errupt - or
except ion- handler t ask back t o a virt ual- 8086- mode t ask. Here, t he EFLAGS and
segment regist ers are loaded from images saved in t he TSS for t he virt ual- 8086-
mode t ask. I f t he VM flag is set in t he EFLAGS image, t he processor swit ches back t o
virt ual- 8086 mode on t he t ask swit ch. The CPL at t he t ime t he I RET inst ruct ion is
17-22 Vol. 3
8086 EMULATION
execut ed must be 0, ot herwise t he processor does not change t he st at e of t he VM
flag.
17.3.2 Class 2Maskable Hardware Interrupt Handling in
Virtual-8086 Mode Using the Virtual Interrupt Mechanism
Maskable hardware int errupt s are t hose int errupt s t hat are delivered t hrough t he
I NTR# pin or t hrough an int errupt request t o t he local API C ( see Sect ion 6. 3. 2,
Maskable Hardware I nt errupt s ) . These int errupt s can be inhibit ed ( masked) from
int errupt ing an execut ing program or t ask by clearing t he I F flag in t he EFLAGS
regist er.
When t he VME flag in cont rol regist er CR4 is set and t he I OPL field in t he EFLAGS
regist er is less t han 3, t wo addit ional flags are act ivat ed in t he EFLAGS regist er:
VI F ( virt ual int errupt ) flag, bit 19 of t he EFLAGS regist er.
VI P ( virt ual int errupt pending) flag, bit 20 of t he EFLAGS regist er.
These flags provide t he virt ual- 8086 monit or wit h more efficient cont rol over
handling maskable hardware int errupt s t hat occur during virt ual- 8086 mode t asks.
They also reduce int errupt - handling overhead, by eliminat ing t he need for all I F
relat ed operat ions ( such as PUSHF, POPF, CLI , and STI inst ruct ions) t o t rap t o t he
virt ual- 8086 monit or. The purpose and use of t hese flags are as follows.
NOTE
The VI F and VI P flags are only available in I A- 32 processors t hat
support t he virt ual mode ext ensions. These ext ensions were
int roduced in t he I A- 32 archit ect ure wit h t he Pent ium processor.
When t his mechanism is eit her not available or not enabled,
maskable hardware int errupt s are handled as class 1 int errupt s.
Here, if VI F and VI P flags are needed, t he virt ual- 8086 monit or can
implement t hem in soft ware.
Exist ing 8086 programs commonly set and clear t he I F flag in t he EFLAGS regist er t o
enable and disable maskable hardware int errupt s, respect ively; for example, t o
disable int errupt s while handling anot her int errupt or an except ion. This pract ice
works well in single t ask environment s, but can cause problems in mult it asking and
mult iple- processor environment s, where it is oft en desirable t o prevent an applica-
t ion program from having direct cont rol over t he handling of hardware int errupt s.
When using earlier I A- 32 processors, t his problem was oft en solved by creat ing a
virt ual I F flag in soft ware. The I A- 32 processors ( beginning wit h t he Pent ium
processor) provide hardware support for t his virt ual I F flag t hrough t he VI F and VI P
flags.
The VI F flag is a virt ualized version of t he I F flag, which an applicat ion program
running from wit hin a virt ual- 8086 t ask can used t o cont rol t he handling of maskable
hardware int errupt s. When t he VI F flag is enabled, t he CLI and STI inst ruct ions
operat e on t he VI F flag inst ead of t he I F flag. When an 8086 program execut es t he
Vol. 3 17-23
8086 EMULATION
CLI inst ruct ion, t he processor clears t he VI F flag t o request t hat t he virt ual- 8086
monit or inhibit maskable hardware int errupt s from int errupt ing program execut ion;
when it execut es t he STI inst ruct ion, t he processor set s t he VI F flag request ing t hat
t he virt ual- 8086 monit or enable maskable hardware int errupt s for t he 8086
program. But act ually t he I F flag, managed by t he operat ing syst em, always cont rols
whet her maskable hardware int errupt s are enabled. Also, if under t hese circum-
st ances an 8086 program t ries t o read or change t he I F flag using t he PUSHF or POPF
inst ruct ions, t he processor will change t he VI F flag inst ead, leaving I F unchanged.
The VI P flag provides soft ware a means of recording t he exist ence of a deferred ( or
pending) maskable hardware int errupt . This flag is read by t he processor but never
explicit ly writ t en by t he processor; it can only be writ t en by soft ware.
I f t he I F flag is set and t he VI F and VI P flags are enabled, and t he processor receives
a maskable hardware int errupt ( int errupt vect or 0 t hrough 255) , t he processor
performs and t he int errupt handler soft ware should perform t he following
operat ions:
1. The processor invokes t he prot ect ed- mode int errupt handler for t he int errupt
received, as described in t he following st eps. These st eps are almost ident ical t o
t hose described for met hod 1 int errupt and except ion handling in Sect ion
17. 3. 1. 1, Handling an I nt errupt or Except ion Through a Prot ect ed- Mode Trap or
I nt errupt Gat e :
a. Swit ches t o 32- bit prot ect ed mode and privilege level 0.
b. Saves t he st at e of t he processor on t he privilege- level 0 st ack. The st at es of
t he EI P, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS regist ers are saved ( see
Figure 17- 4) .
c. Clears t he segment regist ers.
d. Clears t he VM flag in t he EFLAGS regist er.
e. Begins execut ing t he select ed prot ect ed- mode int errupt handler.
2. The recommended act ion of t he prot ect ed- mode int errupt handler is t o read t he
VM flag from t he EFLAGS image on t he st ack. I f t his flag is set , t he handler makes
a call t o t he virt ual- 8086 monit or.
3. The virt ual- 8086 monit or should read t he VI F flag in t he EFLAGS regist er.
I f t he VI F flag is clear, t he virt ual- 8086 monit or set s t he VI P flag in t he
EFLAGS image on t he st ack t o indicat e t hat t here is a deferred int errupt
pending and ret urns t o t he prot ect ed- mode handler.
I f t he VI F flag is set , t he virt ual- 8086 monit or can handle t he int errupt if it
belongs t o t he 8086 program running in t he int errupt ed virt ual- 8086 t ask;
ot herwise, it can call t he prot ect ed- mode int errupt handler t o handle t he
int errupt .
4. The prot ect ed- mode handler execut es a ret urn t o t he program execut ing in
virt ual- 8086 mode.
17-24 Vol. 3
8086 EMULATION
5. Upon ret urning t o virt ual- 8086 mode, t he processor cont inues execut ion of t he
8086 program.
When t he 8086 program is ready t o receive maskable hardware int errupt s, it
execut es t he STI inst ruct ion t o set t he VI F flag ( enabling maskable hardware
int errupt s) . Prior t o set t ing t he VI F flag, t he processor aut omat ically checks t he VI P
flag and does one of t he following, depending on t he st at e of t he flag:
I f t he VI P flag is clear ( indicat ing no pending int errupt s) , t he processor set s t he
VI F flag.
I f t he VI P flag is set ( indicat ing a pending int errupt ) , t he processor generat es a
general- prot ect ion except ion ( # GP) .
The recommended act ion of t he prot ect ed- mode general- prot ect ion except ion
handler is t o t hen call t he virt ual- 8086 monit or and let it handle t he pending int er-
rupt . Aft er handling t he pending int errupt , t he t ypical act ion of t he virt ual- 8086
monit or is t o clear t he VI P flag and set t he VI F flag in t he EFLAGS image on t he st ack,
and t hen execut e a ret urn t o t he virt ual- 8086 mode. The next t ime t he processor
receives a maskable hardware int errupt , it will t hen handle it as described in st eps 1
t hrough 5 earlier in t his sect ion.
I f t he processor finds t hat bot h t he VI F and VI P flags are set at t he beginning of an
inst ruct ion, it generat es a general- prot ect ion except ion. This act ion allows t he
virt ual- 8086 monit or t o handle t he pending int errupt for t he virt ual- 8086 mode t ask
for which t he VI F flag is enabled. Not e t hat t his sit uat ion can only occur immediat ely
following execut ion of a POPF or I RET inst ruct ion or upon ent ering a virt ual- 8086
mode t ask t hrough a t ask swit ch.
Not e t hat t he st at es of t he VI F and VI P flags are not modified in real- address mode or
during t ransit ions bet ween real- address and prot ect ed modes.
NOTE
The virt ual int errupt mechanism described in t his sect ion is also
available for use in prot ect ed mode, see Sect ion 17. 4, Prot ect ed-
Mode Virt ual I nt errupt s .
17.3.3 Class 3Software Interrupt Handling in Virtual-8086 Mode
When t he processor receives a soft ware int errupt ( an int errupt generat ed wit h t he
I NT n inst ruct ion) while in virt ual- 8086 mode, it can use any of six different met hods
t o handle t he int errupt . The met hod select ed depends on t he set t ings of t he VME flag
in cont rol regist er CR4, t he I OPL field in t he EFLAGS regist er, and t he soft ware int er-
rupt redirect ion bit map in t he TSS. Table 17- 2 list s t he six met hods of handling soft -
ware int errupt s in virt ual- 8086 mode and t he respect ive set t ings of t he VME flag,
I OPL field, and t he bit s in t he int errupt redirect ion bit map for each met hod. The t able
also summarizes t he various act ions t he processor t akes for each met hod.
The VME flag enables t he virt ual mode ext ensions for t he Pent ium and lat er I A- 32
processors. When t his flag is clear, t he processor responds t o int errupt s and excep-
Vol. 3 17-25
8086 EMULATION
t ions in virt ual- 8086 mode in t he same manner as an I nt el386 or I nt el486 processor
does. When t his flag is set , t he virt ual mode ext ension provides t he following
enhancement s t o virt ual- 8086 mode:
Speeds up t he handling of soft ware- generat ed int errupt s in virt ual- 8086 mode by
allowing t he processor t o bypass t he virt ual- 8086 monit or and redirect soft ware
int errupt s back t o t he int errupt handlers t hat are part of t he current ly running
8086 program.
Support s virt ual int errupt s for soft ware writ t en t o run on t he 8086 processor.
The I OPL value int eract s wit h t he VME flag and t he bit s in t he int errupt redirect ion bit
map t o det ermine how specific soft ware int errupt s should be handled.
The soft ware int errupt redirect ion bit map ( see Figure 17- 5) is a 32- byt e field in t he
TSS. This map is locat ed direct ly below t he I / O permission bit map in t he TSS. Each
bit in t he int errupt redirect ion bit map is mapped t o an int errupt vect or. Bit 0 in t he
int errupt redirect ion bit map ( which maps t o vect or zero in t he int errupt t able) is
locat ed at t he I / O base map address in t he TSS minus 32 byt es. When a bit in t his bit
map is set , it indicat es t hat t he associat ed soft ware int errupt ( int errupt generat ed
wit h an I NT n inst ruct ion) should be handled t hrough t he prot ect ed- mode I DT and
int errupt and except ion handlers. When a bit in t his bit map is clear, t he processor
redirect s t he associat ed soft ware int errupt back t o t he int errupt t able in t he 8086
program ( locat ed at linear address 0 in t he programs address space) .
NOTE
The soft ware int errupt redirect ion bit map does not affect hardware
generat ed int errupt s and except ions. Hardware generat ed int errupt s
and except ions are always handled by t he prot ect ed- mode int errupt
and except ion handlers.
17-26 Vol. 3
8086 EMULATION
Table 17-2. Software Interrupt Handling Methods While in Virtual-8086 Mode
Method VME IOPL
Bit in
Redir.
Bitmap* Processor Action
1 0 3 X Interrupt directed to a protected-mode interrupt handler:
Switches to privilege-level 0 stack
Pushes GS, FS, DS and ES onto privilege-level 0 stack
Pushes SS, ESP, EFLAGS, CS and EIP of interrupted task onto
privilege-level 0 stack
Clears VM, RF, NT, and TF flags
If serviced through interrupt gate, clears IF flag
Clears GS, FS, DS and ES to 0
Sets CS and EIP from interrupt gate
2 0 < 3 X Interrupt directed to protected-mode general-protection
exception (#GP) handler.
3 1 < 3 1 Interrupt directed to a protected-mode general-protection
exception (#GP) handler; VIF and VIP flag support for handling
class 2 maskable hardware interrupts.
4 1 3 1 Interrupt directed to protected-mode interrupt handler: (see
method 1 processor action).
5 1 3 0 Interrupt redirected to 8086 program interrupt handler:
Pushes EFLAGS
Pushes CS and EIP (lower 16 bits only)
Clears IF flag
Clears TF flag
Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
6 1 < 3 0 Interrupt redirected to 8086 program interrupt handler; VIF and
VIP flag support for handling class 2 maskable hardware
interrupts:
Pushes EFLAGS with IOPL set to 3 and VIF copied to IF
Pushes CS and EIP (lower 16 bits only)
Clears the VIF flag
Clears TF flag
Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
NOTE:
* When set to 0, software interrupt is redirected back to the 8086 program interrupt handler;
when set to 1, interrupt is directed to protected-mode handler.
Vol. 3 17-27
8086 EMULATION
Redirect ing soft ware int errupt s back t o t he 8086 program pot ent ially speeds up
int errupt handling because a swit ch back and fort h bet ween virt ual- 8086 mode and
prot ect ed mode is not required. This lat t er int errupt - handling t echnique is part icu-
larly useful for 8086 operat ing syst ems ( such as MS- DOS) t hat use t he I NT n inst ruc-
t ion t o call operat ing syst em procedures.
The CPUI D inst ruct ion can be used t o verify t hat t he virt ual mode ext ension is imple-
ment ed on t he processor. Bit 1 of t he feat ure flags regist er ( EDX) indicat es t he avail-
abilit y of t he virt ual mode ext ension ( see CPUI DCPU I dent ificat ion in Chapt er 3,
I nst ruct ion Set Reference, A- M , of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2A) .
The following sect ions describe t he six met hods ( or mechanisms) for handling soft -
ware int errupt s in virt ual- 8086 mode. See Sect ion 17. 3. 2, Class 2Maskable Hard-
ware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual I nt errupt
Mechanism , for a descript ion of t he use of t he VI F and VI P flags in t he EFLAGS
regist er for handling maskable hardware int errupt s.
17.3.3.1 Method 1: Software Interrupt Handling
When t he VME flag in cont rol regist er CR4 is clear and t he I OPL field is 3, a Pent ium
or lat er I A- 32 processor handles soft ware int errupt s in t he same manner as t hey are
handled by an I nt el386 or I nt el486 processor. I t execut es an implicit call t o t he int er-
Figure 17-5. Software Interrupt Redirection Bit Map in TSS
I/O Map Base
Task-State Segment (TSS)
64H
31 24 23
0
1 1 1 1 1 1 1 1
I/O Permission Bit Map
0
I / O map
base must
not exceed
DFFFH.
Last byt e of
bit
map must be
Software Interrupt Redirection Bit Map (32 Bytes)
17-28 Vol. 3
8086 EMULATION
rupt handler in t he prot ect ed- mode I DT point ed t o by t he int errupt vect or. See
Sect ion 17. 3. 1, Class 1Hardware I nt errupt and Except ion Handling in Virt ual- 8086
Mode , for a complet e descript ion of t his mechanism and it s possible uses.
17.3.3.2 Methods 2 and 3: Software Interrupt Handling
When a soft ware int errupt occurs in virt ual- 8086 mode and t he met hod 2 or 3 condi-
t ions are present , t he processor generat es a general- prot ect ion except ion ( # GP) .
Met hod 2 is enabled when t he VME flag is set t o 0 and t he I OPL value is less t han 3.
Here t he I OPL value is used t o bypass t he prot ect ed- mode int errupt handlers and
cause any soft ware int errupt t hat occurs in virt ual- 8086 mode t o be t reat ed as a
prot ect ed- mode general- prot ect ion except ion ( # GP) . The general- prot ect ion excep-
t ion handler calls t he virt ual- 8086 monit or, which can t hen emulat e an 8086-
program int errupt handler or pass cont rol back t o t he 8086 programs handler, as
described in Sect ion 17. 3. 1. 2, Handling an I nt errupt or Except ion Wit h an 8086
Program I nt errupt or Except ion Handler .
Met hod 3 is enabled when t he VME flag is set t o 1, t he I OPL value is less t han 3, and
t he corresponding bit for t he soft ware int errupt in t he soft ware int errupt redirect ion
bit map is set t o 1. Here, t he processor performs t he same operat ion as it does for
met hod 2 soft ware int errupt handling. I f t he corresponding bit for t he soft ware int er-
rupt in t he soft ware int errupt redirect ion bit map is set t o 0, t he int errupt is handled
using met hod 6 ( see Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling ) .
17.3.3.3 Method 4: Software Interrupt Handling
Met hod 4 handling is enabled when t he VME flag is set t o 1, t he I OPL value is 3, and
t he bit for t he int errupt vect or in t he redirect ion bit map is set t o 1. Met hod 4 soft -
ware int errupt handling allows met hod 1 st yle handling when t he virt ual mode ext en-
sion is enabled; t hat is, t he int errupt is direct ed t o a prot ect ed- mode handler ( see
Sect ion 17. 3. 3. 1, Met hod 1: Soft ware I nt errupt Handling ) .
17.3.3.4 Method 5: Software Interrupt Handling
Met hod 5 soft ware int errupt handling provides a st reamlined met hod of redirect ing
soft ware int errupt s ( invoked wit h t he I NT n inst ruct ion) t hat occur in virt ual 8086
mode back t o t he 8086 programs int errupt vect or t able and it s int errupt handlers.
Met hod 5 handling is enabled when t he VME flag is set t o 1, t he I OPL value is 3, and
t he bit for t he int errupt vect or in t he redirect ion bit map is set t o 0. The processor
performs t he following act ions t o make an implicit call t o t he select ed 8086 program
int errupt handler:
1. Pushes t he low- order 16 bit s of t he EFLAGS regist er ont o t he st ack.
2. Pushes t he current values of t he CS and EI P regist ers ont o t he current st ack.
( Only t he 16 least - significant bit s of t he EI P regist er are pushed and no st ack
swit ch occurs. )
Vol. 3 17-29
8086 EMULATION
3. Clears t he I F flag in t he EFLAGS regist er t o disable int errupt s.
4. Clears t he TF flag, in t he EFLAGS regist er.
5. Locat es t he 8086 program int errupt vect or t able at linear address 0 for t he 8086-
mode t ask.
6. Loads t he CS and EI P regist ers wit h values from t he int errupt vect or t able ent ry
point ed t o by t he int errupt vect or number. Only t he 16 low- order bit s of t he EI P
are loaded and t he 16 high- order bit s are set t o 0. The int errupt vect or t able is
assumed t o be at linear address 0 of t he current virt ual- 8086 t ask.
7. Begins execut ing t he select ed int errupt handler.
An I RET inst ruct ion at t he end of t he handler procedure reverses t hese st eps t o
ret urn program cont rol t o t he int errupt ed 8086 program.
Not e t hat wit h met hod 5 handling, a mode swit ch from virt ual- 8086 mode t o
prot ect ed mode does not occur. The processor remains in virt ual- 8086 mode
t hroughout t he int errupt - handling operat ion.
The met hod 5 handling act ions are virt ually ident ical t o t he act ions t he processor
t akes when handling soft ware int errupt s in real- address mode. The benefit of using
met hod 5 handling t o access t he 8086 program handlers is t hat it avoids t he over-
head of met hods 2 and 3 handling, which requires first going t o t he virt ual- 8086
monit or, t hen t o t he 8086 program handler, t hen back again t o t he virt ual- 8086
monit or, before ret urning t o t he int errupt ed 8086 program ( see Sect ion 17. 3. 1. 2,
Handling an I nt errupt or Except ion Wit h an 8086 Program I nt errupt or Except ion
Handler ) .
NOTE
Met hods 1 and 4 handling can handle a soft ware int errupt in a virt ual-
8086 t ask wit h a regular prot ect ed- mode handler, but t his approach
requires all virt ual- 8086 t asks t o use t he same soft ware int errupt
handlers, which generally does not give sufficient lat it ude t o t he
programs running in t he virt ual- 8086 t asks, part icularly MS- DOS
programs.
17.3.3.5 Method 6: Software Interrupt Handling
Met hod 6 handling is enabled when t he VME flag is set t o 1, t he I OPL value is less
t han 3, and t he bit for t he int errupt or except ion vect or in t he redirect ion bit map is
set t o 0. Wit h met hod 6 int errupt handling, soft ware int errupt s are handled in t he
same manner as was described for met hod 5 handling ( see Sect ion 17. 3. 3. 4,
Met hod 5: Soft ware I nt errupt Handling ) .
Met hod 6 differs from met hod 5 in t hat wit h t he I OPL value set t o less t han 3, t he VI F
and VI P flags in t he EFLAGS regist er are enabled, providing virt ual int errupt support
for handling class 2 maskable hardware int errupt s ( see Sect ion 17. 3. 2, Class
2Maskable Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual
I nt errupt Mechanism ) . These flags provide t he virt ual- 8086 monit or wit h an effi-
17-30 Vol. 3
8086 EMULATION
cient means of handling maskable hardware int errupt s t hat occur during a virt ual-
8086 mode t ask. Also, because t he I OPL value is less t han 3 and t he VI F flag is
enabled, t he informat ion pushed on t he st ack by t he processor when invoking t he
int errupt handler is slight ly different bet ween met hods 5 and 6 ( see Table 17- 2) .
17.4 PROTECTED-MODE VIRTUAL INTERRUPTS
The I A- 32 processors ( beginning wit h t he Pent ium processor) also support t he VI F
and VI P flags in t he EFLAGS regist er in prot ect ed mode by set t ing t he PVI ( prot ect ed-
mode virt ual int errupt ) flag in t he CR4 regist er. Set t ing t he PVI flag allows applica-
t ions running at privilege level 3 t o execut e t he CLI and STI inst ruct ions wit hout
causing a general- prot ect ion except ion ( # GP) or affect ing hardware int errupt s.
When t he PVI flag is set t o 1, t he CPL is 3, and t he I OPL is less t han 3, t he STI and
CLI inst ruct ions set and clear t he VI F flag in t he EFLAGS regist er, leaving I F unaf-
fect ed. I n t his mode of operat ion, an applicat ion running in prot ect ed mode and at a
CPL of 3 can inhibit int errupt s in t he same manner as is described in Sect ion 17. 3. 2,
Class 2Maskable Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he
Virt ual I nt errupt Mechanism , for a virt ual- 8086 mode t ask. When t he applicat ion
execut es t he CLI inst ruct ion, t he processor clears t he VI F flag. I f t he processor
receives a maskable hardware int errupt , t he processor invokes t he prot ect ed- mode
int errupt handler. This handler checks t he st at e of t he VI F flag in t he EFLAGS regist er.
I f t he VI F flag is clear ( indicat ing t hat t he act ive t ask does not want t o have int errupt s
handled now) , t he handler set s t he VI P flag in t he EFLAGS image on t he st ack and
ret urns t o t he privilege- level 3 applicat ion, which cont inues program execut ion.
When t he applicat ion execut es a STI inst ruct ion t o set t he VI F flag, t he processor
aut omat ically invokes t he general- prot ect ion except ion handler, which can t hen
handle t he pending int errupt . Aft er handing t he pending int errupt , t he handler t ypi-
cally set s t he VI F flag and clears t he VI P flag in t he EFLAGS image on t he st ack and
execut es a ret urn t o t he applicat ion program. The next t ime t he processor receives a
maskable hardware int errupt , t he processor will handle it in t he normal manner for
int errupt s received while t he processor is operat ing at a CPL of 3.
As wit h t he virt ual mode ext ension ( enabled wit h t he VME flag in t he CR4 regist er) ,
t he prot ect ed- mode virt ual int errupt ext ension only affect s maskable hardware
int errupt s ( int errupt vect ors 32 t hrough 255) . NMI int errupt s and except ions are
handled in t he normal manner.
When prot ect ed- mode virt ual int errupt s are disabled ( t hat is, when t he PVI flag in
cont rol regist er CR4 is set t o 0, t he CPL is less t han 3, or t he I OPL value is 3) , t hen
t he CLI and STI inst ruct ions execut e in a manner compat ible wit h t he I nt el486
processor. That is, if t he CPL is great er ( less privileged) t han t he I / O privilege level
( I OPL) , a general- prot ect ion except ion occurs. I f t he I OPL value is 3, CLI and STI
clear or set t he I F flag, respect ively.
PUSHF, POPF, I RET and I NT are execut ed like in t he I nt el486 processor, regardless of
whet her prot ect ed- mode virt ual int errupt s are enabled.
Vol. 3 17-31
8086 EMULATION
I t is only possible t o ent er virt ual- 8086 mode t hrough a t ask swit ch or t he execut ion
of an I RET inst ruct ion, and it is only possible t o leave virt ual- 8086 mode by fault ing
t o a prot ect ed- mode int errupt handler ( t ypically t he general- prot ect ion except ion
handler, which in t urn calls t he virt ual 8086- mode monit or) . I n bot h cases, t he
EFLAGS regist er is saved and rest ored. This is not t rue, however, in prot ect ed mode
when t he PVI flag is set and t he processor is not in virt ual- 8086 mode. Here, it is
possible t o call a procedure at a different privilege level, in which case t he EFLAGS
regist er is not saved or modified. However, t he st at es of VI F and VI P flags are never
examined by t he processor when t he CPL is not 3.
17-32 Vol. 3
8086 EMULATION
Vol. 3 18-1
CHAPTER 18
MIXING 16-BIT AND 32-BIT CODE
Program modules writ t en t o run on I A- 32 processors can be eit her 16- bit modules or
32- bit modules. Table 18- 1 shows t he charact erist ic of 16- bit and 32- bit modules.
The I A- 32 processors funct ion most efficient ly when execut ing 32- bit program
modules. They can, however, also execut e 16- bit program modules, in any of t he
following ways:
I n real- address mode.
I n virt ual- 8086 mode.
Syst em management mode ( SMM) .
As a prot ect ed- mode t ask, when t he code, dat a, and st ack segment s for t he t ask
are all configured as a 16- bit segment s.
By int egrat ing 16- bit and 32- bit segment s int o a single prot ect ed- mode t ask.
By int egrat ing 16- bit operat ions int o 32- bit code segment s.
Real- address mode, virt ual- 8086 mode, and SMM are nat ive 16- bit modes. A legacy
program assembled and/ or compiled t o run on an I nt el 8086 or I nt el 286 processor
should run in real- address mode or virt ual- 8086 mode wit hout modificat ion. Sixt een-
bit program modules can also be writ t en t o run in real- address mode for handling
syst em init ializat ion or t o run in SMM for handling syst em management funct ions.
See Chapt er 17, 8086 Emulat ion, for det ailed informat ion on real- address mode
and virt ual- 8086 mode; see Chapt er 26, Syst em Management Mode, for informa-
t ion on SMM.
This chapt er describes how t o int egrat e 16- bit program modules wit h 32- bit program
modules when operat ing in prot ect ed mode and how t o mix 16- bit and 32- bit code
wit hin 32- bit code segment s.
Table 18-1. Characteristics of 16-Bit and 32-Bit Program Modules
Characteristic 16-Bit Program Modules 32-Bit Program Modules
Segment Size 0 to 64 KBytes 0 to 4 GBytes
Operand Sizes 8 bits and 16 bits 8 bits and 32 bits
Pointer Offset Size (Address
Size)
16 bits 32 bits
Stack Pointer Size 16 Bits 32 Bits
Control Transfers Allowed to
Code Segments of This Size
16 Bits 32 Bits
18-2 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
18.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES
The following I A- 32 archit ect ure mechanisms are used t o dist inguish bet ween and
support 16- bit and 32- bit segment s and operat ions:
The D ( default operand and address size) flag in code- segment descript ors.
The B ( default st ack size) flag in st ack- segment descript ors.
16- bit and 32- bit call gat es, int errupt gat es, and t rap gat es.
Operand- size and address- size inst ruct ion prefixes.
16- bit and 32- bit general- purpose regist ers.
The D flag in a code- segment descript or det ermines t he default operand- size and
address- size for t he inst ruct ions of a code segment . ( I n real- address mode and
virt ual- 8086 mode, which do not use segment descript ors, t he default is 16 bit s. ) A
code segment wit h it s D flag set is a 32- bit segment ; a code segment wit h it s D flag
clear is a 16- bit segment .
The B flag in t he st ack- segment descript or specifies t he size of st ack point er ( t he
32- bit ESP regist er or t he 16- bit SP regist er) used by t he processor for implicit st ack
references. The B flag for all dat a descript ors also cont rols upper address range for
expand down segment s.
When t ransferring program cont rol t o anot her code segment t hrough a call gat e,
int errupt gat e, or t rap gat e, t he operand size used during t he t ransfer is det ermined
by t he t ype of gat e used ( 16- bit or 32- bit ) , ( not by t he D- flag or prefix of t he t ransfer
inst ruct ion) . The gat e t ype det ermines how ret urn informat ion is saved on t he st ack
( or st acks) .
For most efficient and t rouble- free operat ion of t he processor, 32- bit programs or
t asks should have t he D flag in t he code- segment descript or and t he B flag in t he
st ack- segment descript or set , and 16- bit programs or t asks should have t hese flags
clear. Program cont rol t ransfers from 16- bit segment s t o 32- bit segment s ( and vice
versa) are handled most efficient ly t hrough call, int errupt , or t rap gat es.
I nst ruct ion prefixes can be used t o override t he default operand size and address size
of a code segment . These prefixes can be used in real- address mode as well as in
prot ect ed mode and virt ual- 8086 mode. An operand- size or address- size prefix only
changes t he size for t he durat ion of t he inst ruct ion.
18.2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A
CODE SEGMENT
The following t wo inst ruct ion prefixes allow mixing of 32- bit and 16- bit operat ions
wit hin one segment :
The operand- size prefix ( 66H)
The address- size prefix ( 67H)
Vol. 3 18-3
MIXING 16-BIT AND 32-BIT CODE
These prefixes reverse t he default size select ed by t he D flag in t he code- segment
descript or. For example, t he processor can int erpret t he ( MOV mem, reg) inst ruct ion
in any of four ways:
I n a 32- bit code segment :
Moves 32 bit s from a 32- bit regist er t o memory using a 32- bit effect ive
address.
I f preceded by an operand- size prefix, moves 16 bit s from a 16- bit regist er t o
memory using a 32- bit effect ive address.
I f preceded by an address- size prefix, moves 32 bit s from a 32- bit regist er t o
memory using a 16- bit effect ive address.
I f preceded by bot h an address- size prefix and an operand- size prefix, moves
16 bit s from a 16- bit regist er t o memory using a 16- bit effect ive address.
I n a 16- bit code segment :
Moves 16 bit s from a 16- bit regist er t o memory using a 16- bit effect ive
address.
I f preceded by an operand- size prefix, moves 32 bit s from a 32- bit regist er t o
memory using a 16- bit effect ive address.
I f preceded by an address- size prefix, moves 16 bit s from a 16- bit regist er t o
memory using a 32- bit effect ive address.
I f preceded by bot h an address- size prefix and an operand- size prefix, moves
32 bit s from a 32- bit regist er t o memory using a 32- bit effect ive address.
The previous examples show t hat any inst ruct ion can generat e any combinat ion of
operand size and address size regardless of whet her t he inst ruct ion is in a 16- or
32- bit segment . The choice of t he 16- or 32- bit default for a code segment is
normally based on t he following crit eria:
Per f or mance Always use 32- bit code segment s when possible. They run
much fast er t han 16- bit code segment s on P6 family processors, and somewhat
fast er on earlier I A- 32 processors.
The oper at i ng syst em t he code segment w i l l be r unni ng on I f t he
operat ing syst em is a 16- bit operat ing syst em, it may not support 32- bit program
modules.
Mode of oper at i on I f t he code segment is being designed t o run in real-
address mode, virt ual- 8086 mode, or SMM, it must be a 16- bit code segment .
Back w ar d compat i bi l i t y t o ear l i er I A- 32 pr ocessor s I f a code segment
must be able t o run on an I nt el 8086 or I nt el 286 processor, it must be a 16- bit
code segment .
18-4 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
18.3 SHARING DATA AMONG MIXED-SIZE CODE
SEGMENTS
Dat a segment s can be accessed from bot h 16- bit and 32- bit code segment s. When a
dat a segment t hat is larger t han 64 KByt es is t o be shared among 16- and 32- bit
code segment s, t he dat a t hat is t o be accessed from t he 16- bit code segment s must
be locat ed wit hin t he first 64 KByt es of t he dat a segment . The reason for t his is t hat
16- bit point ers by definit ion can only point t o t he first 64 KByt es of a segment .
A st ack t hat spans less t han 64 KByt es can be shared by bot h 16- and 32- bit code
segment s. This class of st acks includes:
St acks in expand- up segment s wit h t he G ( granularit y) and B ( big) flags in t he
st ack- segment descript or clear.
St acks in expand- down segment s wit h t he G and B flags clear.
St acks in expand- up segment s wit h t he G flag set and t he B flag clear and where
t he st ack is cont ained complet ely wit hin t he lower 64 KByt es. ( Offset s great er
t han FFFFH can be used for dat a, ot her t han t he st ack, which is not shared. )
See Sect ion 3. 4.5, Segment Descript ors, for a descript ion of t he G and B flags and
t he expand- down st ack t ype.
The B flag cannot , in general, be used t o change t he size of st ack used by a 16- bit
code segment . This flag cont rols t he size of t he st ack point er only for implicit st ack
references such as t hose caused by int errupt s, except ions, and t he PUSH, POP, CALL,
and RET inst ruct ions. I t does not cont rol explicit st ack references, such as accesses
t o paramet ers or local variables. A 16- bit code segment can use a 32- bit st ack only if
t he code is modified so t hat all explicit references t o t he st ack are preceded by t he
32- bit address- size prefix, causing t hose references t o use 32- bit addressing and
explicit writ es t o t he st ack point er are preceded by a 32- bit operand- size prefix.
I n 32- bit , expand- down segment s, all offset s may be great er t han 64 KByt es; t here-
fore, 16- bit code cannot use t his kind of st ack segment unless t he code segment is
modified t o use 32- bit addressing.
18.4 TRANSFERRING CONTROL AMONG MIXED-SIZE CODE
SEGMENTS
There are t hree ways for a procedure in a 16- bit code segment t o safely make a call
t o a 32- bit code segment :
Make t he call t hrough a 32- bit call gat e.
Make a 16- bit call t o a 32- bit int erface procedure. The int erface procedure t hen
makes a 32- bit call t o t he int ended dest inat ion.
Modify t he 16- bit procedure, insert ing an operand- size prefix before t he call, t o
change it t o a 32- bit call.
Vol. 3 18-5
MIXING 16-BIT AND 32-BIT CODE
Likewise, t here are t hree ways for procedure in a 32- bit code segment t o safely make
a call t o a 16- bit code segment :
Make t he call t hrough a 16- bit call gat e. Here, t he EI P value at t he CALL
inst ruct ion cannot exceed FFFFH.
Make a 32- bit call t o a 16- bit int erface procedure. The int erface procedure t hen
makes a 16- bit call t o t he int ended dest inat ion.
Modify t he 32- bit procedure, insert ing an operand- size prefix before t he call,
changing it t o a 16- bit call. Be cert ain t hat t he ret urn offset does not exceed
FFFFH.
These met hods of t ransferring program cont rol overcome t he following archit ect ural
limit at ions imposed on calls bet ween 16- bit and 32- bit code segment s:
Point ers from 16- bit code segment s ( which by default can only be 16 bit s) cannot
be used t o address dat a or code locat ed beyond FFFFH in a 32- bit segment .
The operand- size at t ribut es for a CALL and it s companion RETURN inst ruct ion
must be t he same t o maint ain st ack coherency. This is also t rue for implicit calls
t o int errupt and except ion handlers and t heir companion I RET inst ruct ions.
A 32- bit paramet ers ( part icularly a point er paramet er) great er t han FFFFH
cannot be squeezed int o a 16- bit paramet er locat ion on a st ack.
The size of t he st ack point er ( SP or ESP) changes when swit ching bet ween 16- bit
and 32- bit code segment s.
These limit at ions are discussed in great er det ail in t he following sect ions.
18.4.1 Code-Segment Pointer Size
For cont rol- t ransfer inst ruct ions t hat use a point er t o ident ify t he next inst ruct ion
( t hat is, t hose t hat do not use gat es) , t he operand- size at t ribut e det ermines t he size
of t he offset port ion of t he point er. The implicat ions of t his rule are as follows:
A JMP, CALL, or RET inst ruct ion from a 32- bit segment t o a 16- bit segment is
always possible using a 32- bit operand size, providing t he 32- bit point er does not
exceed FFFFH.
A JMP, CALL, or RET inst ruct ion from a 16- bit segment t o a 32- bit segment
cannot address a dest inat ion great er t han FFFFH, unless t he inst ruct ion is given
an operand- size prefix.
See Sect ion 18. 4. 5, Writ ing I nt erface Procedures, for an int erface procedure t hat
can t ransfer program cont rol from 16- bit segment s t o dest inat ions in 32- bit
segment s beyond FFFFH.
18.4.2 Stack Management for Control Transfer
Because t he st ack is managed different ly for 16- bit procedure calls t han for 32- bit
calls, t he operand- size at t ribut e of t he RET inst ruct ion must mat ch t hat of t he CALL
18-6 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
inst ruct ion ( see Figure 18- 1) . On a 16- bit call, t he processor pushes t he cont ent s of
t he 16- bit I P regist er and ( for calls bet ween privilege levels) t he 16- bit SP regist er.
The mat ching RET inst ruct ion must also use a 16- bit operand size t o pop t hese 16- bit
values from t he st ack int o t he 16- bit regist ers.
A 32- bit CALL inst ruct ion pushes t he cont ent s of t he 32- bit EI P regist er and ( for
int er- privilege- level calls) t he 32- bit ESP regist er. Here, t he mat ching RET inst ruct ion
must use a 32- bit operand size t o pop t hese 32- bit values from t he st ack int o t he
32- bit regist ers. I f t he t wo part s of a CALL/ RET inst ruct ion pair do not have mat ching
operand sizes, t he st ack will not be managed correct ly and t he values of t he inst ruc-
t ion point er and st ack point er will not be rest ored t o correct values.
Figure 18-1. Stack after Far 16- and 32-Bit Calls
SP
After 16-bit Call
PARM 1
IP SP
SS
PARM 2
CS
0 31
SS
EIP
After 32-bit Call
CS
ESP
ESP
PARM 2
PARM 1
0 31
With Privilege Transition
Stack
Growth
After 16-bit Call
PARM 1
IP SP
PARM 2
CS
0 31
Without Privilege Transition
Stack
Growth
After 32-bit Call
PARM 1
ESP
PARM 2
CS
0 31
EIP
Undefined
Vol. 3 18-7
MIXING 16-BIT AND 32-BIT CODE
While execut ing 32- bit code, if a call is made t o a 16- bit code segment which is at t he
same or a more privileged level ( t hat is, t he DPL of t he called code segment is less
t han or equal t o t he CPL of t he calling code segment ) t hrough a 16- bit call gat e, t hen
t he upper 16- bit s of t he ESP regist er may be unreliable upon ret urning t o t he 32- bit
code segment ( t hat is, aft er execut ing a RET in t he 16- bit code segment ) .
When t he CALL inst ruct ion and it s mat ching RET inst ruct ion are in code segment s
t hat have D flags wit h t he same values ( t hat is, bot h are 32- bit code segment s or
bot h are 16- bit code segment s) , t he default set t ings may be used. When t he CALL
inst ruct ion and it s mat ching RET inst ruct ion are in segment s which have different
D- flag set t ings, an operand- size prefix must be used.
18.4.2.1 Controlling the Operand-Size Attribute For a Call
Three t hings can det ermine t he operand- size of a call:
The D flag in t he segment descript or for t he calling code segment .
An operand- size inst ruct ion prefix.
The t ype of call gat e ( 16- bit or 32- bit ) , if a call is made t hrough a call gat e.
When a call is made wit h a point er ( rat her t han a call gat e) , t he D flag for t he calling
code segment det ermines t he operand- size for t he CALL inst ruct ion. This operand-
size at t ribut e can be overridden by prepending an operand- size prefix t o t he CALL
inst ruct ion. So, for example, if t he D flag for a code segment is set for 16 bit s and t he
operand- size prefix is used wit h a CALL inst ruct ion, t he processor will cause t he infor-
mat ion st ored on t he st ack t o be st ored in 32- bit format . I f t he call is t o a 32- bit code
segment , t he inst ruct ions in t hat code segment will be able t o read t he st ack coher-
ent ly. Also, a RET inst ruct ion from t he 32- bit code segment wit hout an operand- size
prefix will maint ain st ack coherency wit h t he 16- bit code segment being ret urned t o.
When a CALL inst ruct ion references a call- gat e descript or, t he t ype of call is det er-
mined by t he t ype of call gat e ( 16- bit or 32- bit ) . The offset t o t he dest inat ion in t he
code segment being called is t aken from t he gat e descript or; t herefore, if a 32- bit call
gat e is used, a procedure in a 16- bit code segment can call a procedure locat ed more
t han 64 KByt es from t he base of a 32- bit code segment , because a 32- bit call gat e
uses a 32- bit offset .
Not e t hat regardless of t he operand size of t he call and how it is det ermined, t he size
of t he st ack point er used ( SP or ESP) is always cont rolled by t he B flag in t he st ack-
segment descript or current ly in use ( t hat is, when B is clear, SP is used, and when B
is set , ESP is used) .
An unmodified 16- bit code segment t hat has run successfully on an 8086 processor
or in real- mode on a lat er I A- 32 archit ect ure processor will have it s D flag clear and
will not use operand- size override prefixes. As a result , all CALL inst ruct ions in t his
code segment will use t he 16- bit operand- size at t ribut e. Procedures in t hese code
18-8 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
segment s can be modified t o safely call procedures t o 32- bit code segment s in eit her
of t wo ways:
Relink t he CALL inst ruct ion t o point t o 32- bit call gat es ( see Sect ion 18. 4. 2. 2,
Passing Paramet ers Wit h a Gat e ) .
Add a 32- bit operand- size prefix t o each CALL inst ruct ion.
18.4.2.2 Passing Parameters With a Gate
When referencing 32- bit gat es wit h 16- bit procedures, it is import ant t o consider t he
number of paramet ers passed in each procedure call. The count field of t he gat e
descript or specifies t he size of t he paramet er st ring t o copy from t he current st ack t o
t he st ack of a more privileged ( numerically lower privilege level) procedure. The
count field of a 16- bit gat e specifies t he number of 16- bit words t o be copied,
whereas t he count field of a 32- bit gat e specifies t he number of 32- bit doublewords
t o be copied. The count field for a 32- bit gat e must t hus be half t he size of t he
number of words being placed on t he st ack by a 16- bit procedure. Also, t he 16- bit
procedure must use an even number of words as paramet ers.
18.4.3 Interrupt Control Transfers
A program- cont rol t ransfer caused by an except ion or int errupt is always carried out
t hrough an int errupt or t rap gat e ( locat ed in t he I DT) . Here, t he t ype of t he gat e
( 16- bit or 32- bit ) det ermines t he operand- size at t ribut e used in t he implicit call t o
t he except ion or int errupt handler procedure in anot her code segment .
A 32- bit int errupt or t rap gat e provides a safe int erface t o a 32- bit except ion or int er-
rupt handler when t he except ion or int errupt occurs in eit her a 32- bit or a 16- bit code
segment . I t is somet imes impract ical, however, t o place except ion or int errupt
handlers in 16- bit code segment s, because only 16- bit ret urn addresses are saved on
t he st ack. I f an except ion or int errupt occurs in a 32- bit code segment when t he EI P
was great er t han FFFFH, t he 16- bit handler procedure cannot provide t he correct
ret urn address.
18.4.4 Parameter Translation
When segment offset s or point ers ( which cont ain segment offset s) are passed as
paramet ers bet ween 16- bit and 32- bit procedures, some t ranslat ion is required. I f a
32- bit procedure passes a point er t o dat a locat ed beyond 64 KByt es t o a 16- bit
procedure, t he 16- bit procedure cannot use it . Except for t his limit at ion, int erface
code can perform any format conversion bet ween 32- bit and 16- bit point ers t hat
may be needed.
Paramet ers passed by value bet ween 32- bit and 16- bit code also may require t rans-
lat ion bet ween 32- bit and 16- bit format s. The form of t he t ranslat ion is applicat ion-
dependent .
Vol. 3 18-9
MIXING 16-BIT AND 32-BIT CODE
18.4.5 Writing Interface Procedures
Placing int erface code bet ween 32- bit and 16- bit procedures can be t he solut ion t o
t he following int erface problems:
Allowing procedures in 16- bit code segment s t o call procedures wit h offset s
great er t han FFFFH in 32- bit code segment s.
Mat ching operand- size at t ribut es bet ween companion CALL and RET inst ruct ions.
Translat ing paramet ers ( dat a) , including managing paramet er st rings wit h a
variable count or an odd number of 16- bit words.
The possible invalidat ion of t he upper bit s of t he ESP regist er.
The int erface procedure is simplified where t hese rules are followed.
1. The int erface procedure must reside in a 32- bit code segment ( t he D flag for t he
code- segment descript or is set ) .
2. All procedures t hat may be called by 16- bit procedures must have offset s not
great er t han FFFFH.
3. All ret urn addresses saved by 16- bit procedures must have offset s not great er
t han FFFFH.
The int erface procedure becomes more complex if any of t hese rules are violat ed. For
example, if a 16- bit procedure calls a 32- bit procedure wit h an ent ry point beyond
FFFFH, t he int erface procedure will need t o provide t he offset t o t he ent ry point . The
mapping bet ween 16- and 32- bit addresses is only performed aut omat ically when a
call gat e is used, because t he gat e descript or for a call gat e cont ains a 32- bit
address. When a call gat e is not used, t he int erface code must provide t he 32- bit
address.
The st ruct ure of t he int erface procedure depends on t he t ypes of calls it is going t o
support , as follows:
Cal l s f r om 16- bi t pr ocedur es t o 32- bi t pr ocedur es Calls t o t he int erface
procedure from a 16- bit code segment are made wit h 16- bit CALL inst ruct ions
( by default , because t he D flag for t he calling code- segment descript or is clear) ,
and 16- bit operand- size prefixes are used wit h RET inst ruct ions t o ret urn from
t he int erface procedure t o t he calling procedure. Calls from t he int erface
procedure t o 32- bit procedures are performed wit h 32- bit CALL inst ruct ions ( by
default , because t he D flag for t he int erface procedures code segment is set ) ,
and ret urns from t he called procedures t o t he int erface procedure are performed
wit h 32- bit RET inst ruct ions ( also by default ) .
Cal l s f r om 32- bi t pr ocedur es t o 16- bi t pr ocedur es Calls t o t he int erface
procedure from a 32- bit code segment are made wit h 32- bit CALL inst ruct ions
( by default ) , and ret urns t o t he calling procedure from t he int erface procedure
are made wit h 32- bit RET inst ruct ions ( also by default ) . Calls from t he int erface
procedure t o 16- bit procedures require t he CALL inst ruct ions t o have t he
operand- size prefixes, and ret urns from t he called procedures t o t he int erface
procedure are performed wit h 16- bit RET inst ruct ions ( by default ) .
18-10 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
Vol. 3 19-1
CHAPTER 19
ARCHITECTURE COMPATIBILITY
I nt el 64 and I A- 32 processors are binary compat ible. Compat ibilit y means t hat ,
wit hin limit ed const raint s, programs t hat execut e on previous generat ions of proces-
sors will produce ident ical result s when execut ed on lat er processors. The compat i-
bilit y const raint s and any implement at ion differences bet ween t he I nt el 64 and I A- 32
processors are described in t his chapt er.
Each new processor has enhanced t he soft ware visible archit ect ure from t hat found
in earlier I nt el 64 and I A- 32 processors. Those enhancement s have been defined
wit h considerat ion for compat ibilit y wit h previous and fut ure processors. This chapt er
also summarizes t he compat ibilit y considerat ions for t hose ext ensions.
19.1 PROCESSOR FAMILIES AND CATEGORIES
I A- 32 processors are referred t o in several different ways in t his chapt er, depending
on t he t ype of compat ibilit y informat ion being relat ed, as described in t he following:
I A- 32 Pr ocessor s All t he I nt el processors based on t he I nt el I A- 32 Archi-
t ect ure, which include t he 8086/ 88, I nt el 286, I nt el386, I nt el486, Pent ium,
Pent ium Pro, Pent ium I I , Pent ium III, Pent ium 4, and I nt el Xeon processors.
32- bi t Pr ocessor s All t he I A- 32 processors t hat use a 32- bit archit ect ure,
which include t he I nt el386, I nt el486, Pent ium, Pent ium Pro, Pent ium I I ,
Pent ium III, Pent ium 4, and I nt el Xeon processors.
16- bi t Pr ocessor s All t he I A- 32 processors t hat use a 16- bit archit ect ure,
which include t he 8086/ 88 and I nt el 286 processors.
P6 Fami l y Pr ocessor s All t he I A- 32 processors t hat are based on t he P6
microarchit ect ure, which include t he Pent ium Pro, Pent ium I I , and Pent ium III
processors.
Pent i um
Pent i um
Xeon
Hyper-Threading Technology and I nt el
287 or I nt el
Pent ium
and I nt el486
Processors, and in t he following paragraph describing t he I nt el486 processor.
Specifically, t he st ore buffers are flushed before t he I N inst ruct ion is execut ed. No
reads ( as a result of cache miss) are reordered around previously generat ed writ es
sit t ing in t he st ore buffers. The implicat ion of t his is t hat t he st ore buffers will be
flushed or empt ied before a subsequent bus cycle is run on t he ext ernal bus.
On bot h t he I nt el486 and Pent ium processors, under cert ain condit ions, a memory
read will go ont o t he ext ernal bus before t he pending memory writ es in t he buffer
even t hough t he writ es occurred earlier in t he program execut ion. A memory read
will only be reordered in front of all writ es pending in t he buffers if all writ es pending
in t he buffers are cache hit s and t he read is a cache miss. Under t hese condit ions, t he
I nt el486 and Pent ium processors will not read from an ext ernal memory locat ion t hat
needs t o be updat ed by one of t he pending writ es.
During a locked bus cycle, t he I nt el486 processor will always access ext ernal
memory, it will never look for t he locat ion in t he on- chip cache. All dat a pending in
t he I nt el486 processor' s st ore buffers will be writ t en t o memory before a locked cycle
is allowed t o proceed t o t he ext ernal bus. Thus, t he locked bus cycle can be used for
eliminat ing t he possibilit y of reordering read cycles on t he I nt el486 processor. The
Pent ium processor does check it s cache on a read- modify- writ e access and, if t he
cache line has been modified, writ es t he cont ent s back t o memory before locking t he
bus. The P6 family processors writ e t o t heir cache on a read- modify- writ e operat ion
( if t he access does not split across a cache line) and does not writ e back t o syst em
19-42 Vol. 3
ARCHITECTURE COMPATIBILITY
memory. I f t he access does split across a cache line, it locks t he bus and accesses
syst em memory.
I / O reads are never reordered in front of buffered memory writ es on an I A- 32
processor. This ensures an updat e of all memory locat ions before reading t he st at us
from an I / O device.
19.35 BUS LOCKING
The I nt el 286 processor performs t he bus locking different ly t han t he I nt el P6 family,
Pent ium, I nt el486, and I nt el386 processors. Programs t hat use forms of memory
locking specific t o t he I nt el 286 processor may not run properly when run on lat er
processors.
A locked inst ruct ion is guarant eed t o lock only t he area of memory defined by t he
dest inat ion operand, but may lock a larger memory area. For example, t ypical 8086
and I nt el 286 configurat ions lock t he ent ire physical memory space. Programmers
should not depend on t his.
On t he I nt el 286 processor, t he LOCK prefix is sensit ive t o I OPL. I f t he CPL is great er
t han t he I OPL, a general- prot ect ion except ion ( # GP) is generat ed. On t he I nt el386
DX, I nt el486, and Pent ium, and P6 family processors, no check against I OPL is
performed.
The Pent ium processor aut omat ically assert s t he LOCK# signal when acknowledging
ext ernal int errupt s. Aft er signaling an int errupt request , an ext ernal int errupt
cont roller may use t he dat a bus t o send t he int errupt vect or t o t he processor. Aft er
receiving t he int errupt request signal, t he processor assert s LOCK# t o insure t hat no
ot her dat a appears on t he dat a bus unt il t he int errupt vect or is received. This bus
locking does not occur on t he P6 family processors.
19.36 BUS HOLD
Unlike t he 8086 and I nt el 286 processors, but like t he I nt el386 and I nt el486 proces-
sors, t he P6 family and Pent ium processors respond t o request s for cont rol of t he bus
from ot her pot ent ial bus mast ers, such as DMA cont rollers, bet ween t ransfers of
part s of an unaligned operand, such as t wo words which form a doubleword. Unlike
t he I nt el386 processor, t he P6 family, Pent ium and I nt el486 processors respond t o
bus hold during reset init ializat ion.
19.37 MODEL-SPECIFIC EXTENSIONS TO THE IA-32
Cert ain ext ensions t o t he I A- 32 are specific t o a processor or family of I A- 32 proces-
sors and may not be implement ed or implement ed in t he same way in fut ure proces-
Vol. 3 19-43
ARCHITECTURE COMPATIBILITY
sors. The following sect ions describe t hese model- specific ext ensions. The CPUI D
inst ruct ion indicat es t he availabilit y of some of t he model- specific feat ures.
19.37.1 Model-Specific Registers
The Pent ium processor int roduced a set of model- specific regist ers ( MSRs) for use in
cont rolling hardware funct ions and performance monit oring. To access t hese MSRs,
t wo new inst ruct ions were added t o t he I A- 32 archit ect ure: read MSR ( RDMSR) and
writ e MSR ( WRMSR) . The MSRs in t he Pent ium processor are not guarant eed t o be
duplicat ed or provided in t he next generat ion I A- 32 processors.
The P6 family processors great ly increased t he number of MSRs available t o soft -
ware. See Appendix B, Model- Specific Regist ers ( MSRs) , for a complet e list of t he
available MSRs. The new regist ers cont rol t he debug ext ensions, t he performance
count ers, t he machine- check except ion capabilit y, t he machine- check archit ect ure,
and t he MTRRs. These regist ers are accessible using t he RDMSR and WRMSR inst ruc-
t ions. Specific informat ion on some of t hese new MSRs is provided in t he following
sect ions. As wit h t he Pent ium processor MSR, t he P6 family processor MSRs are not
guarant eed t o be duplicat ed or provided in t he next generat ion I A- 32 processors.
19.37.2 RDMSR and WRMSR Instructions
The RDMSR ( read model- specific regist er) and WRMSR ( writ e model- specific
regist er) inst ruct ions recognize a much larger number of model- specific regist ers in
t he P6 family processors. ( See RDMSRRead from Model Specific Regist er and
WRMSRWrit e t o Model Specific Regist er in t he I nt el 64 and I A- 32 Archit ect ures
Soft ware Developers Manual, Volumes 2A & 2B for more informat ion. )
19.37.3 Memory Type Range Registers
Memory t ype range regist ers ( MTRRs) are a new feat ure int roduced int o t he I A- 32 in
t he Pent ium Pro processor. MTRRs allow t he processor t o opt imize memory opera-
t ions for different t ypes of memory, such as RAM, ROM, frame buffer memory, and
memory- mapped I / O.
MTRRs are MSRs t hat cont ain an int ernal map of how physical address ranges are
mapped t o various t ypes of memory. The processor uses t his int ernal memory map
t o det ermine t he cacheabilit y of various physical memory locat ions and t he opt imal
met hod of accessing memory locat ions. For example, if a memory locat ion is speci-
fied in an MTRR as writ e- t hrough memory, t he processor handles accesses t o t his
locat ion as follows. I t reads dat a from t hat locat ion in lines and caches t he read dat a
or maps all writ es t o t hat locat ion t o t he bus and updat es t he cache t o maint ain cache
coherency. I n mapping t he physical address space wit h MTRRs, t he processor recog-
nizes five t ypes of memory: uncacheable ( UC) , uncacheable, speculat able, writ e-
combining ( WC) , writ e- t hrough ( WT) , writ e- prot ect ed ( WP) , and writ eback ( WB) .
19-44 Vol. 3
ARCHITECTURE COMPATIBILITY
Earlier I A- 32 processors ( such as t he I nt el486 and Pent ium processors) used t he
KEN# ( cache enable) pin and ext ernal logic t o maint ain an ext ernal memory map and
signal cacheable accesses t o t he processor. The MTRR mechanism simplifies hard-
ware designs by eliminat ing t he KEN# pin and t he ext ernal logic required t o drive it .
See Chapt er 9, Processor Management and I nit ializat ion, and Appendix B, Model-
Specific Regist ers ( MSRs) , for more informat ion on t he MTRRs.
19.37.4 Machine-Check Exception and Architecture
The Pent ium processor int roduced a new except ion called t he machine- check excep-
t ion ( # MC, int errupt 18) . This except ion is used t o det ect hardware- relat ed errors,
such as a parit y error on a read cycle.
The P6 family processors ext end t he t ypes of errors t hat can be det ect ed and t hat
generat e a machine- check except ion. I t also provides a new machine- check archit ec-
t ure for recording informat ion about a machine- check error and provides ext ended
recovery capabilit y.
The machine- check archit ect ure provides several banks of report ing regist ers for
recording machine- check errors. Each bank of regist ers is associat ed wit h a specific
hardware unit in t he processor. The primary focus of t he machine checks is on bus
and int erconnect operat ions; however, checks are also made of t ranslat ion lookaside
buffer ( TLB) and cache operat ions.
The machine- check archit ect ure can correct some errors aut omat ically and allow for
reliable rest art of inst ruct ion execut ion. I t also collect s sufficient informat ion for soft -
ware t o use in correct ing ot her machine errors not correct ed by hardware.
See Chapt er 15, Machine- Check Archit ect ure, for more informat ion on t he
machine- check except ion and t he machine- check archit ect ure.
19.37.5 Performance-Monitoring Counters
The P6 family and Pent ium processors provide t wo performance- monit oring count ers
for use in monit oring int ernal hardware operat ions. The number of performance
monit oring count ers and associat ed programming int erfaces may be implement at ion
specific for Pent ium 4 processors, Pent ium M processors. Lat er processors may have
implement ed t hese as part of an archit ect ural performance monit oring feat ure. The
archit ect ural and non- archit ect ural performance monit oring int erfaces for different
processor families are described in Chapt er 30, Performance Monit oring, . Appendix
A, Performance- Monit oring Event s, list s all t he event s t hat can be count ed for
archit ect ural performance monit oring event s and non- archit ect ural event s. The
count ers are set up, st art ed, and st opped using t wo MSRs and t he RDMSR and
WRMSR inst ruct ions. For t he P6 family processors, t he current count for a part icular
count er can be read using t he new RDPMC inst ruct ion.
Vol. 3 19-45
ARCHITECTURE COMPATIBILITY
The performance- monit oring count ers are useful for debugging programs, opt imizing
code, diagnosing syst em failures, or refining hardware designs. See Chapt er 30,
Performance Monit oring, for more informat ion on t hese count ers.
19.38 TWO WAYS TO RUN INTEL 286 PROCESSOR TASKS
When port ing 16- bit programs t o run on 32- bit I A- 32 processors, t here are t wo
approaches t o consider:
Port ing an ent ire 16- bit soft ware syst em t o a 32- bit processor, complet e wit h t he
old operat ing syst em, loader, and syst em builder. Here, all t asks will have 16- bit
TSSs. The 32- bit processor is being used as if it were a fast er version of t he 16- bit
processor.
Port ing select ed 16- bit applicat ions t o run in a 32- bit processor environment wit h
a 32- bit operat ing syst em, loader, and syst em builder. Here, t he TSSs used t o
represent 286 t asks should be changed t o 32- bit TSSs. I t is possible t o mix 16
and 32- bit TSSs, but t he benefit s are small and t he problems are great . All t asks
in a 32- bit soft ware syst em should have 32- bit TSSs. I t is not necessary t o
change t he 16- bit obj ect modules t hemselves; TSSs are usually const ruct ed by
t he operat ing syst em, by t he loader, or by t he syst em builder. See Chapt er 18,
Mixing 16- Bit and 32- Bit Code, for more det ailed informat ion about mixing
16- bit and 32- bit code.
Because t he 32- bit processors use t he cont ent s of t he reserved word of 16- bit
segment descript ors, 16- bit programs t hat place values in t his word may not run
correct ly on t he 32- bit processors.
19-46 Vol. 3
ARCHITECTURE COMPATIBILITY