Sunteți pe pagina 1din 812

Intel 64 and IA-32 Architectures

Software Developers Manual


Volume 3A:
System Programming Guide, Part 1
NOTE: The Intel

64 and IA-32 Architectures Software Developer's Manual


consists of five volumes: Basic Architecture, Order Number 253665;
Instruction Set Reference A-M, Order Number 253666; Instruction Set
Reference N-Z, Order Number 253667; System Programming Guide,
Part 1, Order Number 253668; System Programming Guide, Part 2, Order
Number 253669. Refer to all five volumes when evaluating your design
needs.
Order Number: 253668-034US
March 2010
ii Vol. 3A
I NFORMATI ON I N THI S DOCUMENT I S PROVI DED I N CONNECTI ON WI TH I NTEL PRODUCTS. NO LI CENSE,
EXPRESS OR I MPLI ED, BY ESTOPPEL OR OTHERWI SE, TO ANY I NTELLECTUAL PROPERTY RI GHTS I S
GRANTED BY THI S DOCUMENT. EXCEPT AS PROVI DED I N I NTEL' S TERMS AND CONDI TI ONS OF SALE FOR
SUCH PRODUCTS, I NTEL ASSUMES NO LI ABI LI TY WHATSOEVER AND I NTEL DI SCLAI MS ANY EXPRESS OR
I MPLI ED WARRANTY, RELATI NG TO SALE AND/ OR USE OF I NTEL PRODUCTS I NCLUDI NG LI ABI LI TY OR
WARRANTI ES RELATI NG TO FI TNESS FOR A PARTI CULAR PURPOSE, MERCHANTABI LI TY, OR I NFRI NGEMENT
OF ANY PATENT, COPYRI GHT OR OTHER I NTELLECTUAL PROPERTY RI GHT.
UNLESS OTHERWI SE AGREED I N WRI TI NG BY I NTEL, THE I NTEL PRODUCTS ARE NOT DESI GNED NOR I N-
TENDED FOR ANY APPLI CATI ON I N WHI CH THE FAI LURE OF THE I NTEL PRODUCT COULD CREATE A SI TU-
ATI ON WHERE PERSONAL I NJURY OR DEATH MAY OCCUR.
I nt el may make changes t o specif icat ions and pr oduct descript ions at any t ime, wit hout not ice. Designer s
must not r ely on t he absence or charact er ist ics of any feat ur es or inst r uct ions marked " r eserved" or " un-
def ined. " I nt el r eserves t hese f or f ut ur e def init ion and shall have no responsibilit y what soever f or conf lict s
or incompat ibilit ies ar ising f r om f ut ure changes t o t hem. The inf ormat ion her e is subj ect t o change wit hout
not ice. Do not f inalize a design wit h t his inf ormat ion.
The I nt el

64 ar chit ect ure processor s may cont ain design def ect s or er ror s known as errat a. Cur rent char-
act er ized errat a ar e available on r equest .
I nt el

Hyper -Thr eading Technology r equir es a comput er syst em wit h an I nt el

processor support ing Hyper-


Threading Technology and an I nt el

HT Technology enabled chipset , BI OS and operat ing syst em.
Perf or mance will vary depending on t he specific hardware and sof t ware you use. For more infor mat ion, see
http://www.intel.com/technology/hyperthread/index.htm; including det ails on which processor s suppor t I nt el HT
Technology.
I nt el

Virt ualizat ion Technology r equir es a comput er syst em wit h an enabled I nt el

pr ocessor, BI OS, vir t ual


machine monit or ( VMM) and f or some uses, cert ain plat f orm soft war e enabled f or it . Funct ionalit y, perf or-
mance or ot her benef it s will var y depending on har dwar e and soft war e configurat ions. I nt el

Vir t ualizat ion


Technology- enabled BI OS and VMM applicat ions are curr ent ly in development .
64- bit comput ing on I nt el ar chit ect ur e r equir es a comput er syst em wit h a pr ocessor, chipset , BI OS, oper -
at ing syst em, device dr ivers and applicat ions enabled f or I nt el

64 ar chit ect ur e. Pr ocessor s will not operat e


( including 32- bit operat ion) wit hout an I nt el

64 ar chit ect ur e- enabled BI OS. Per for mance will vary depend-
ing on your har dwar e and sof t war e conf igurat ions. Consult wit h your syst em vendor f or mor e inf ormat ion.
Enabling Execut e Disable Bit funct ionalit y r equir es a PC wit h a processor wit h Execut e Disable Bit capabilit y
and a support ing operat ing syst em. Check wit h your PC manuf act urer on whet her your syst em delivers Ex-
ecut e Disable Bit funct ionalit y.
I nt el, Pent ium, I nt el Xeon, I nt el Net Bur st , I nt el Cor e, I nt el Cor e Solo, I nt el Core Duo, I nt el Core 2 Duo,
I nt el Cor e 2 Ext r eme, I nt el Pent ium D, I t anium, I nt el SpeedSt ep, MMX, I nt el At om, and VTune are t rade-
mar ks or r egist er ed t rademar ks of I nt el Cor porat ion or it s subsidiar ies in t he Unit ed St at es and ot her coun-
t r ies.
* Ot her names and brands may be claimed as t he pr oper t y of ot her s.
Cont act your local I nt el sales off ice or your dist ribut or t o obt ain t he lat est specif icat ions and bef or e placing
your pr oduct or der.
Copies of document s which have an order ing number and are r ef er enced in t his document , or ot her I nt el
lit erat ure, may be obt ained by calling 1- 800- 548- 4725, or by visit ing I nt els websit e at http://www.intel.com
Copyr ight 1997- 2010 I nt el Corporat ion
Vol. 3A iii
CONTENTS
PAGE
CHAPTER 1
ABOUT THIS MANUAL
1.1 PROCESSORS COVERED IN THIS MANUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.3 NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.3.1 Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.3.2 Reserved Bits and Software Compatibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.3.3 Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.4 Hexadecimal and Binary Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.5 Segmented Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3.6 Syntax for CPUID, CR, and MSR Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
1.3.7 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10
1.4 RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.1.1 Global and Local Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.1.1 Global and Local Descriptor Tables in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.2 System Segments, Segment Descriptors, and Gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.2.1 Gates in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.1.3 Task-State Segments and Task Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.1.3.1 Task-State Segments in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.4 Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.4.1 Interrupt and Exception Handling IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.5 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.1.5.1 Memory Management in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.1.6 System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.6.1 System Registers in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.7 Other System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.2 MODES OF OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.3 SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.3.1 System Flags and Fields in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2.4 MEMORY-MANAGEMENT REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2.4.1 Global Descriptor Table Register (GDTR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.4.2 Local Descriptor Table Register (LDTR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.4.3 IDTR Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.4.4 Task Register (TR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.5 CONTROL REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.5.1 CPUID Qualification of Control Register Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
2.6 EXTENDED CONTROL REGISTERS (INCLUDING THE XFEATURE_ENABLED_MASK REGISTER)
2-26
2.7 SYSTEM INSTRUCTION SUMMARY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
2.7.1 Loading and Storing System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.7.2 Verifying of Access Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.7.3 Loading and Storing Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.7.4 Invalidating Caches and TLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
CONTENTS
iv Vol. 3A
PAGE
2.7.5 Controlling the Processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.7.6 Reading Performance-Monitoring and Time-Stamp Counters . . . . . . . . . . . . . . . . . . . . . 2-32
2.7.6.1 Reading Counters in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.7.7 Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.7.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode. . . . . . . . . . . . . . . . . . 2-34
2.7.8 Enabling Processor Extended States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
3.1 MEMORY MANAGEMENT OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 USING SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.2.1 Basic Flat Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.2.2 Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.2.3 Multi-Segment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
3.2.4 Segmentation in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.2.5 Paging and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
3.3 PHYSICAL ADDRESS SPACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
3.3.1 Intel 64 Processors and Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
3.4 LOGICAL AND LINEAR ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
3.4.1 Logical Address Translation in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.4.2 Segment Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
3.4.3 Segment Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
3.4.4 Segment Loading Instructions in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
3.4.5 Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
3.4.5.1 Code- and Data-Segment Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
3.5 SYSTEM DESCRIPTOR TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
3.5.1 Segment Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20
3.5.2 Segment Descriptor Tables in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22
CHAPTER 4
PAGING
4.1 PAGING MODES AND CONTROL BITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.1 Three Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.2 Paging-Mode Enabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.3 Paging-Mode Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.1.4 Enumeration of Paging Features by CPUID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.3 32-BIT PAGING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.4 PAE PAGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.4.1 PDPTE Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.4.2 Linear-Address Translation with PAE Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
4.5 IA-32E PAGING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
4.6 ACCESS RIGHTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-35
4.7 PAGE-FAULT EXCEPTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
4.8 ACCESSED AND DIRTY FLAGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-39
4.9 PAGING AND MEMORY TYPING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
4.9.1 Paging and Memory Typing When the PAT is Not Supported (Pentium Pro and Pentium II
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
4.9.2 Paging and Memory Typing When the PAT is Supported (Pentium III and More Recent
Processor Families) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
Vol. 3A v
CONTENTS
PAGE
4.9.3 Caching Paging-Related Information about Memory Typing . . . . . . . . . . . . . . . . . . . . . . .4-41
4.10 CACHING TRANSLATION INFORMATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41
4.10.1 Process-Context Identifiers (PCIDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-42
4.10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation Lookaside Buffers (TLBs)4-43
4.10.2.1 Page Numbers, Page Frames, and Page Offsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-43
4.10.2.2 Caching Translations in TLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-44
4.10.2.3 Details of TLB Use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-44
4.10.2.4 Global Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-45
4.10.3 Paging-Structure Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-45
4.10.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caches for Paging Structures4-45
4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses . . . . . . . . . . . . .4-48
4.10.3.3 . . . . . . . . . . . . . . . . . . . . Multiple Cached Entries for a Single Paging-Structure Entry4-49
4.10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invalidation of TLBs and Paging-Structure Caches4-50
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches . . . . . . . . . . . . . . . . .4-50
4.10.4.2 Recommended Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-52
4.10.4.3 Optional Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-53
4.10.4.4 Delayed Invalidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-55
4.10.5 . . . . . . . . . . . . . . . . . Propagation of Paging-Structure Changes to Multiple Processors4-56
4.11 INTERACTIONS WITH VIRTUAL-MACHINE EXTENSIONS (VMX) . . . . . . . . . . . . . . . . . . . . . . . 4-57
4.11.1 VMX Transitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-57
4.11.2 VMX Support for Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-58
4.12 USING PAGING FOR VIRTUAL MEMORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
4.13 MAPPING SEGMENTS TO PAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
CHAPTER 5
PROTECTION
5.1 ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION. . . . . . . . . . . . . . . . . . . . . . . 5-1
5.2 FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.2.1 Code Segment Descriptor in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
5.3 LIMIT CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
5.3.1 Limit Checking in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.4 TYPE CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.4.1 Null Segment Selector Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.4.1.1 NULL Segment Checking in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.5 PRIVILEGE LEVELS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS . . . . . . . . . . . . . . . . . . . 5-12
5.6.1 Accessing Data in Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-14
5.7 PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS REGISTER. . . . . . . . . . . . . . . . . . . . . 5-14
5.8 PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING PROGRAM CONTROL BETWEEN CODE
SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
5.8.1 Direct Calls or Jumps to Code Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15
5.8.1.1 Accessing Nonconforming Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-16
5.8.1.2 Accessing Conforming Code Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-17
5.8.2 Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18
5.8.3 Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19
5.8.3.1 IA-32e Mode Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-20
5.8.4 Accessing a Code Segment Through a Call Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-22
5.8.5 Stack Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-25
5.8.5.1 Stack Switching in 64-bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-28
5.8.6 Returning from a Called Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-28
CONTENTS
vi Vol. 3A
PAGE
5.8.7 Performing Fast Calls to System Procedures with the
SYSENTER and SYSEXIT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
5.8.7.1 SYSENTER and SYSEXIT Instructions in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
5.8.8 Fast System Calls in 64-bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
5.9 PRIVILEGED INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33
5.10 POINTER VALIDATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
5.10.1 Checking Access Rights (LAR Instruction). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35
5.10.2 Checking Read/Write Rights (VERR and VERW Instructions) . . . . . . . . . . . . . . . . . . . . . . 5-36
5.10.3 Checking That the Pointer Offset Is Within Limits (LSL Instruction). . . . . . . . . . . . . . . . 5-36
5.10.4 Checking Caller Access Privileges (ARPL Instruction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
5.10.5 Checking Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39
5.11 PAGE-LEVEL PROTECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39
5.11.1 Page-Protection Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.2 Restricting Addressable Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.3 Page Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.11.4 Combining Protection of Both Levels of Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.11.5 Overrides to Page Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.12 COMBINING PAGE AND SEGMENT PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
5.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE BIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
5.13.1 Detecting and Enabling the Execute-Disable Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
5.13.2 Execute-Disable Page Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
5.13.3 Reserved Bit Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-45
5.13.4 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47
CHAPTER 6
INTERRUPT AND EXCEPTION HANDLING
6.1 INTERRUPT AND EXCEPTION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.2 EXCEPTION AND INTERRUPT VECTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3 SOURCES OF INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3.1 External Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3.2 Maskable Hardware Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.3.3 Software-Generated Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4 SOURCES OF EXCEPTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.1 Program-Error Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.2 Software-Generated Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.4.3 Machine-Check Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.5 EXCEPTION CLASSIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.6 PROGRAM OR TASK RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6.7 NONMASKABLE INTERRUPT (NMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.7.1 Handling Multiple NMIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8 ENABLING AND DISABLING INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8.1 Masking Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.8.2 Masking Instruction Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.8.3 Masking Exceptions and Interrupts When Switching Stacks. . . . . . . . . . . . . . . . . . . . . . . 6-11
6.9 PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS . . . . . . . . . . . . . . . . . . 6-11
6.10 INTERRUPT DESCRIPTOR TABLE (IDT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.11 IDT DESCRIPTORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
6.12 EXCEPTION AND INTERRUPT HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
6.12.1 Exception- or Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . 6-18
6.12.1.2 Flag Usage By Exception- or Interrupt-Handler Procedure. . . . . . . . . . . . . . . . . . . . . 6-19
Vol. 3A vii
CONTENTS
PAGE
6.12.2 Interrupt Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-20
6.13 ERROR CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
6.14 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT MODE. . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.14.1 64-Bit Mode IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23
6.14.2 64-Bit Mode Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24
6.14.3 IRET in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.14.4 Stack Switching in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.14.5 Interrupt Stack Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26
6.15 EXCEPTION AND INTERRUPT REFERENCE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
Interrupt 0Divide Error Exception (#DE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28
Interrupt 1Debug Exception (#DB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
Interrupt 2NMI Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
Interrupt 3Breakpoint Exception (#BP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
Interrupt 4Overflow Exception (#OF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32
Interrupt 5BOUND Range Exceeded Exception (#BR) . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
Interrupt 6Invalid Opcode Exception (#UD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34
Interrupt 7Device Not Available Exception (#NM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Interrupt 8Double Fault Exception (#DF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38
Interrupt 9Coprocessor Segment Overrun. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-41
Interrupt 10Invalid TSS Exception (#TS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42
Interrupt 11Segment Not Present (#NP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-46
Interrupt 12Stack Fault Exception (#SS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-48
Interrupt 13General Protection Exception (#GP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-50
Interrupt 14Page-Fault Exception (#PF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-54
Interrupt 16x87 FPU Floating-Point Error (#MF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58
Interrupt 17Alignment Check Exception (#AC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-60
Interrupt 18Machine-Check Exception (#MC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-62
Interrupt 19SIMD Floating-Point Exception (#XM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-64
Interrupts 32 to 255User Defined Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-67
CHAPTER 7
TASK MANAGEMENT
7.1 TASK MANAGEMENT OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.1.1 Task Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.1.2 Task State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.1.3 Executing a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
7.2 TASK MANAGEMENT DATA STRUCTURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
7.2.1 Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
7.2.2 TSS Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
7.2.3 TSS Descriptor in 64-bit mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.2.4 Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.2.5 Task-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-11
7.3 TASK SWITCHING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.4 TASK LINKING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
7.4.1 Use of Busy Flag To Prevent Recursive Task Switching. . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
7.4.2 Modifying Task Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
7.5 TASK ADDRESS SPACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19
7.5.1 Mapping Tasks to the Linear and Physical Address Spaces . . . . . . . . . . . . . . . . . . . . . . . .7-19
7.5.2 Task Logical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
CONTENTS
viii Vol. 3A
PAGE
7.6 16-BIT TASK-STATE SEGMENT (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
7.7 TASK MANAGEMENT IN 64-BIT MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
CHAPTER 8
MULTIPLE-PROCESSOR MANAGEMENT
8.1 LOCKED ATOMIC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.1 Guaranteed Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
8.1.2 Bus Locking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.2.1 Automatic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.2.2 Software Controlled Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.3 Handling Self- and Cross-Modifying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.1.4 Effects of a LOCK Operation on Internal Processor Caches . . . . . . . . . . . . . . . . . . . . . . . . 8-7
8.2 MEMORY ORDERING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
8.2.1 Memory Ordering in the Intel

Pentium

and Intel486

Processors . . . . . . . . . . . . . 8-8
8.2.2 Memory Ordering in P6 and More Recent Processor Families . . . . . . . . . . . . . . . . . . . . . . 8-9
8.2.3 Examples Illustrating the Memory-Ordering Principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11
8.2.3.1 Assumptions, Terminology, and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
8.2.3.2 . . . . . . . . . . . . . . . . Neither Loads Nor Stores Are Reordered with Like Operations8-13
8.2.3.3 Stores Are Not Reordered With Earlier Loads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations . . . . . . . . . . . 8-14
8.2.3.5 Intra-Processor Forwarding Is Allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
8.2.3.6 Stores Are Transitively Visible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
8.2.3.7 Stores Are Seen in a Consistent Order by Other Processors . . . . . . . . . . . . . . . . . . . 8-16
8.2.3.8 Locked Instructions Have a Total Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17
8.2.3.9 . . . . . . . . . . . . . . . . Loads and Stores Are Not Reordered with Locked Instructions8-17
8.2.4 Out-of-Order Stores For String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
8.2.4.1 Memory-Ordering Model for String Operations on Write-back (WB) Memory . . . . 8-19
8.2.4.2 Examples Illustrating Memory-Ordering Principles for String Operations. . . . . . . . 8-20
8.2.5 Strengthening or Weakening the Memory-Ordering Model . . . . . . . . . . . . . . . . . . . . . . . . 8-23
8.3 SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25
8.4 MULTIPLE-PROCESSOR (MP) INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
8.4.1 BSP and AP Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
8.4.2 MP Initialization Protocol Requirements and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . 8-28
8.4.3 MP Initialization Protocol Algorithm for Intel Xeon Processors . . . . . . . . . . . . . . . . . . . . 8-28
8.4.4 MP Initialization Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30
8.4.4.1 Typical BSP Initialization Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31
8.4.4.2 Typical AP Initialization Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33
8.4.5 Identifying Logical Processors in an MP System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33
8.5 INTEL

HYPER-THREADING TECHNOLOGY AND INTEL

MULTI-CORE TECHNOLOGY. 8-35


8.6 DETECTING HARDWARE MULTI-THREADING SUPPORT AND TOPOLOGY. . . . . . . . . . . . . . 8-36
8.6.1 Initializing Processors Supporting Hyper-Threading Technology . . . . . . . . . . . . . . . . . . 8-37
8.6.2 Initializing Multi-Core Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-38
8.6.3 Executing Multiple Threads on an Intel

64 or IA-32 Processor Supporting Hardware


Multi-Threading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-38
8.6.4 Handling Interrupts on an IA-32 Processor Supporting Hardware Multi-Threading . 8-38
8.7 INTEL

HYPER-THREADING TECHNOLOGY ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . 8-39


8.7.1 State of the Logical Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40
8.7.2 APIC Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41
8.7.3 Memory Type Range Registers (MTRR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41
8.7.4 Page Attribute Table (PAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-42
8.7.5 Machine Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-42
Vol. 3A ix
CONTENTS
PAGE
8.7.6 Debug Registers and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-42
8.7.7 Performance Monitoring Counters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43
8.7.8 IA32_MISC_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43
8.7.9 Memory Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43
8.7.10 Serializing Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43
8.7.11 MICROCODE UPDATE Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44
8.7.12 Self Modifying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44
8.7.13 Implementation-Specific Intel HT Technology Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44
8.7.13.1 Processor Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44
8.7.13.2 Processor Translation Lookaside Buffers (TLBs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-45
8.7.13.3 Thermal Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-45
8.7.13.4 External Signal Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-46
8.8 MULTI-CORE ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-47
8.8.1 Logical Processor Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-47
8.8.2 Memory Type Range Registers (MTRR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-47
8.8.3 Performance Monitoring Counters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48
8.8.4 IA32_MISC_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48
8.8.5 MICROCODE UPDATE Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48
8.9 PROGRAMMING CONSIDERATIONS FOR HARDWARE MULTI-THREADING CAPABLE
PROCESSORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48
8.9.1 Hierarchical Mapping of Shared Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-49
8.9.2 Hierarchical Mapping of CPUID Extended Topology Leaf . . . . . . . . . . . . . . . . . . . . . . . . . .8-51
8.9.3 Hierarchical ID of Logical Processors in an MP System . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-52
8.9.3.1 Hierarchical ID of Logical Processors with x2APIC ID. . . . . . . . . . . . . . . . . . . . . . . . . . .8-54
8.9.4 Algorithm for Three-Level Mappings of APIC_ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-55
8.9.5 Identifying Topological Relationships in a MP System. . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-61
8.10 MANAGEMENT OF IDLE AND BLOCKED CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-65
8.10.1 HLT Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-65
8.10.2 PAUSE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-66
8.10.3 Detecting Support MONITOR/MWAIT Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-66
8.10.4 MONITOR/MWAIT Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-67
8.10.5 Monitor/Mwait Address Range Determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-68
8.10.6 Required Operating System Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-69
8.10.6.1 Use the PAUSE Instruction in Spin-Wait Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-69
8.10.6.2 Potential Usage of MONITOR/MWAIT in C0 Idle Loops . . . . . . . . . . . . . . . . . . . . . . . . .8-70
8.10.6.3 Halt Idle Logical Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-72
8.10.6.4 Potential Usage of MONITOR/MWAIT in C1 Idle Loops . . . . . . . . . . . . . . . . . . . . . . . . .8-72
8.10.6.5 Guidelines for Scheduling Threads on Logical Processors Sharing Execution
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-73
8.10.6.6 Eliminate Execution-Based Timing Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-73
8.10.6.7 Place Locks and Semaphores in Aligned, 128-Byte Blocks of Memory. . . . . . . . . . .8-74
CHAPTER 9
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1 INITIALIZATION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.1.1 Processor State After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.1.2 Processor Built-In Self-Test (BIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.1.3 Model and Stepping Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.1.4 First Instruction Executed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
9.2 X87 FPU INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
9.2.1 Configuring the x87 FPU Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
CONTENTS
x Vol. 3A
PAGE
9.2.2 Setting the Processor for x87 FPU Software Emulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9.3 CACHE ENABLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
9.4 MODEL-SPECIFIC REGISTERS (MSRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
9.5 MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
9.6 INITIALIZING SSE/SSE2/SSE3/SSSE3 EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.7 SOFTWARE INITIALIZATION FOR REAL-ADDRESS MODE OPERATION. . . . . . . . . . . . . . . . . 9-10
9.7.1 Real-Address Mode IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
9.7.2 NMI Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
9.8 SOFTWARE INITIALIZATION FOR PROTECTED-MODE OPERATION. . . . . . . . . . . . . . . . . . . . 9-11
9.8.1 Protected-Mode System Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12
9.8.2 Initializing Protected-Mode Exceptions and Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13
9.8.3 Initializing Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13
9.8.4 Initializing Multitasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
9.8.5 Initializing IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
9.8.5.1 IA-32e Mode System Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9.8.5.2 IA-32e Mode Interrupts and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9.8.5.3 64-bit Mode and Compatibility Mode Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
9.8.5.4 Switching Out of IA-32e Mode Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
9.9 MODE SWITCHING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9.9.1 Switching to Protected Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9.9.2 Switching Back to Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
9.10 INITIALIZATION AND MODE SWITCHING EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
9.10.1 Assembler Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
9.10.2 STARTUP.ASM Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
9.10.3 MAIN.ASM Source Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-33
9.10.4 Supporting Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34
9.11 MICROCODE UPDATE FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-36
9.11.1 Microcode Update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37
9.11.2 Optional Extended Signature Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41
9.11.3 Processor Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41
9.11.4 Platform Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-42
9.11.5 Microcode Update Checksum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-44
9.11.6 Microcode Update Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-45
9.11.6.1 Hard Resets in Update Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-46
9.11.6.2 Update in a Multiprocessor System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-46
9.11.6.3 Update in a System Supporting Intel Hyper-Threading Technology . . . . . . . . . . . . 9-46
9.11.6.4 Update in a System Supporting Dual-Core Technology . . . . . . . . . . . . . . . . . . . . . . . . 9-46
9.11.6.5 Update Loader Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-47
9.11.7 Update Signature and Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-47
9.11.7.1 Determining the Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-48
9.11.7.2 Authenticating the Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-48
9.11.8 Pentium 4, Intel Xeon, and P6 Family Processor
Microcode Update Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-49
9.11.8.1 Responsibilities of the BIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-49
9.11.8.2 Responsibilities of the Calling Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-52
9.11.8.3 Microcode Update Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-55
9.11.8.4 INT 15H-based Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-55
9.11.8.5 Function 00HPresence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-56
9.11.8.6 Function 01HWrite Microcode Update Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-57
9.11.8.7 Function 02HMicrocode Update Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-62
9.11.8.8 Function 03HRead Microcode Update Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-63
9.11.8.9 Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-64
Vol. 3A xi
CONTENTS
PAGE
CHAPTER 10
ADVANCED PROGRAMMABLE
INTERRUPT CONTROLLER (APIC)
10.1 LOCAL AND I/O APIC OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2 SYSTEM BUS VS. APIC BUS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5
10.3 THE INTEL

82489DX EXTERNAL APIC, THE APIC, THE XAPIC, AND THE X2APIC. . . . 10-5
10.4 LOCAL APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
10.4.1 The Local APIC Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-6
10.4.2 Presence of the Local APIC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.4.3 Enabling or Disabling the Local APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.4.4 Local APIC Status and Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11
10.4.5 Relocating the Local APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.4.6 Local APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.4.7 Local APIC State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
10.4.7.1 Local APIC State After Power-Up or Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.4.7.2 Local APIC State After It Has Been Software Disabled . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.4.7.3 Local APIC State After an INIT Reset (Wait-for-SIPI State) . . . . . . . . . . . . . . . . . . 10-15
10.4.7.4 Local APIC State After It Receives an INIT-Deassert IPI . . . . . . . . . . . . . . . . . . . . . . 10-15
10.4.8 Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.5 HANDLING LOCAL INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
10.5.1 Local Vector Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
10.5.2 Valid Interrupt Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.5.3 Error Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.5.4 APIC Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
10.5.5 Local Interrupt Acceptance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.6 ISSUING INTERPROCESSOR INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.6.1 Interrupt Command Register (ICR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24
10.6.2 Determining IPI Destination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-29
10.6.2.1 Physical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-30
10.6.2.2 Logical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
10.6.2.3 Broadcast/Self Delivery Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33
10.6.2.4 Lowest Priority Delivery Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33
10.6.3 IPI Delivery and Acceptance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
10.7 SYSTEM AND APIC BUS ARBITRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35
10.8 HANDLING INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35
10.8.1 Interrupt Handling with the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . 10-36
10.8.2 Interrupt Handling with the P6 Family and Pentium Processors . . . . . . . . . . . . . . . . . 10-37
10.8.3 Interrupt, Task, and Processor Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-38
10.8.3.1 Task and Processor Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39
10.8.4 Interrupt Acceptance for Fixed Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
10.8.5 Signaling Interrupt Servicing Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-42
10.8.6 Task Priority in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.8.6.1 Interaction of Task Priorities between CR8 and APIC . . . . . . . . . . . . . . . . . . . . . . . . 10-43
10.9 SPURIOUS INTERRUPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-44
10.10 APIC BUS MESSAGE PASSING MECHANISM AND
PROTOCOL (P6 FAMILY, PENTIUM PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-45
10.10.1 Bus Message Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-46
10.11 MESSAGE SIGNALLED INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-46
10.11.1 Message Address Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-47
10.11.2 Message Data Register Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-48
10.12 EXTENDED XAPIC (X2APIC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-50
CONTENTS
xii Vol. 3A
PAGE
10.12.1 Detecting and Enabling x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-50
10.12.1.1 Instructions to Access APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
10.12.1.2 x2APIC Register Address Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-52
10.12.1.3 Reserved Bit Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-55
10.12.2 x2APIC Register Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-55
10.12.3 MSR Access in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-56
10.12.4 VM-Exit Controls for MSRs and x2APIC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-56
10.12.5 x2APIC State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-57
10.12.5.1 x2APIC States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-57
x2APIC After RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-58
x2APIC Transitions From x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
x2APIC Transitions From Disabled Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
State Changes From xAPIC Mode to x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10.12.6 System Software Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-59
10.12.7 CPUID Extensions And Topology Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-60
10.12.7.1 Consistency of APIC IDs and CPUID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-61
10.12.8 Error Handling in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-61
10.12.9 ICR Operation in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-62
10.12.10 Determining IPI Destination in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
10.12.10.1 Logical Destination Mode in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
10.12.10.2 Deriving Logical x2APIC ID from the Local x2APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
10.12.11 SELF IPI Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
CHAPTER 11
MEMORY CACHE CONTROL
11.1 INTERNAL CACHES, TLBS, AND BUFFERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 CACHING TERMINOLOGY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
11.3 METHODS OF CACHING AVAILABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
11.3.1 Buffering of Write Combining Memory Locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
11.3.2 Choosing a Memory Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
11.3.3 Code Fetches in Uncacheable Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.4 CACHE CONTROL PROTOCOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
11.5 CACHE CONTROL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
11.5.1 Cache Control Registers and Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
11.5.2 Precedence of Cache Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-20
11.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II Processors. . . . . . . . . . 11-20
11.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families. . 11-22
11.5.2.3 Writing Values Across Pages with Different Memory Types . . . . . . . . . . . . . . . . . . 11-23
11.5.3 Preventing Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24
11.5.4 Disabling and Enabling the L3 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
11.5.5 Cache Management Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
11.5.6 L1 Data Cache Context Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.5.6.1 Adaptive Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.5.6.2 Shared Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
11.6 SELF-MODIFYING CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
11.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
11.8 EXPLICIT CACHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28
11.9 INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS). . . . . . . . . . . . . . . . . . . 11-29
11.10 STORE BUFFER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-29
11.11 MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Vol. 3A xiii
CONTENTS
PAGE
11.11.1 MTRR Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
11.11.2 Setting Memory Ranges with MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
11.11.2.1 IA32_MTRR_DEF_TYPE MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
11.11.2.2 Fixed Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34
11.11.2.3 Variable Range MTRRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34
11.11.2.4 System-Management Range Register Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-37
11.11.3 Example Base and Mask Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
11.11.3.1 Base and Mask Calculations for Greater-Than 36-bit Physical Address Support11-40
11.11.4 Range Size and Alignment Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.4.1 MTRR Precedences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.5 MTRR Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
11.11.6 Remapping Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7 MTRR Maintenance Programming Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7.1 MemTypeGet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
11.11.7.2 MemTypeSet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-44
11.11.8 MTRR Considerations in MP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46
11.11.9 Large Page Size Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-47
11.12 PAGE ATTRIBUTE TABLE (PAT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48
11.12.1 Detecting Support for the PAT Feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48
11.12.2 IA32_PAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
11.12.3 Selecting a Memory Type from the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50
11.12.4 Programming the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50
11.12.5 PAT Compatibility with Earlier IA-32 Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-52
CHAPTER 12
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


12.1 EMULATION OF THE MMX INSTRUCTION SET. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12.2 THE MMX STATE AND MMX REGISTER ALIASING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12.2.1 Effect of MMX, x87 FPU, FXSAVE, and FXRSTOR
Instructions on the x87 FPU Tag Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3
12.3 SAVING AND RESTORING THE MMX STATE AND REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . 12-4
12.4 SAVING MMX STATE ON TASK OR CONTEXT SWITCHES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5
12.5 EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING MMX INSTRUCTIONS . . . . . . . . . . . . 12-5
12.5.1 Effect of MMX Instructions on Pending x87 Floating-Point Exceptions. . . . . . . . . . . . .12-6
12.6 DEBUGGING MMX CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
CHAPTER 13
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
PROCESSOR EXTENDED STATES
13.1 PROVIDING OPERATING SYSTEM SUPPORT FOR
SSE/SSE2/SSE3/SSSE3/SSE4 EXTENSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1
13.1.1 Adding Support to an Operating System for SSE/SSE2/SSE3/SSSE3/SSE4 Extensions . .
13-2
13.1.2 Checking for SSE/SSE2/SSE3/SSSE3/SSE4 Extension Support. . . . . . . . . . . . . . . . . . . . .13-2
13.1.3 Checking for Support for the FXSAVE and FXRSTOR Instructions . . . . . . . . . . . . . . . . .13-3
13.1.4 Initialization of the SSE/SSE2/SSE3/SSSE3/SSE4 Extensions. . . . . . . . . . . . . . . . . . . . . .13-3
13.1.5 Providing Non-Numeric Exception Handlers for Exceptions Generated by the
SSE/SSE2/SSE3/SSSE3/SSE4 Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-5
13.1.6 Providing an Handler for the SIMD Floating-Point Exception (#XM) . . . . . . . . . . . . . . . .13-7
13.1.6.1 Numeric Error flag and IGNNE#. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-8
CONTENTS
xiv Vol. 3A
PAGE
13.2 EMULATION OF SSE/SSE2/SSE3/SSSE3/SSE4 EXTENSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
13.3 SAVING AND RESTORING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE. . . . . . . . . . . . . . . . . . 13-8
13.4 SAVING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON TASK OR CONTEXT SWITCHES . 13-9
13.5 DESIGNING OS FACILITIES FOR AUTOMATICALLY SAVING X87 FPU, MMX, AND
SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON TASK OR CONTEXT SWITCHES. . . . . . . . . . . . . . 13-9
13.5.1 Using the TS Flag to Control the Saving of the
x87 FPU, MMX, SSE, SSE2, SSE3 SSSE3 and SSE4 State . . . . . . . . . . . . . . . . . . . . . . . . . 13-10
13.6 XSAVE/XRSTOR AND PROCESSOR EXTENDED STATE MANAGEMENT . . . . . . . . . . . . . . 13-12
13.6.1 XSAVE Header. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-13
13.7 INTEROPERABILITY OF XSAVE/XRSTOR AND FXSAVE/FXRSTOR . . . . . . . . . . . . . . . . . . 13-15
13.8 DETECTION, ENUMERATION, ENABLING PROCESSOR EXTENDED STATE SUPPORT. . 13-17
13.8.1 Application Programming Model and Processor Extended States. . . . . . . . . . . . . . . . . 13-18
CHAPTER 14
POWER AND THERMAL MANAGEMENT
14.1 ENHANCED INTEL SPEEDSTEP

TECHNOLOGY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1
14.1.1 Software Interface For Initiating Performance State Transitions . . . . . . . . . . . . . . . . . 14-1
14.2 P-STATE HARDWARE COORDINATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
14.3 SYSTEM SOFTWARE CONSIDERATIONS AND OPPORTUNISTIC PROCESSOR PERFORMANCE
OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.3.1 Intel Dynamic Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.3.2 System Software Interfaces for Opportunistic Processor Performance Operation . 14-4
14.3.2.1 Discover Hardware Support and Enabling of Opportunistic Processor Operation 14-5
14.3.2.2 OS Control of Opportunistic Processor Performance Operation . . . . . . . . . . . . . . . . 14-5
14.3.2.3 Required Changes to OS Power Management P-state Policy. . . . . . . . . . . . . . . . . . . 14-6
14.3.2.4 Application Awareness of Opportunistic Processor Operation (Optional). . . . . . . . 14-7
14.3.3 Intel Turbo Boost Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8
14.3.4 Performance and Energy Bias Hint support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8
14.4 MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.5 THERMAL MONITORING AND PROTECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
14.5.1 Catastrophic Shutdown Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2 Thermal Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.1 Thermal Monitor 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.2 Thermal Monitor 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.2.3 Two Methods for Enabling TM2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
14.5.2.4 Performance State Transitions and Thermal Monitoring. . . . . . . . . . . . . . . . . . . . . . 14-14
14.5.2.5 Thermal Status Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
14.5.2.6 Adaptive Thermal Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16
14.5.3 Software Controlled Clock Modulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16
14.5.4 Detection of Thermal Monitor and Software Controlled
Clock Modulation Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5 On Die Digital Thermal Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5.1 Digital Thermal Sensor Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.5.2 Reading the Digital Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
CHAPTER 15
MACHINE-CHECK ARCHITECTURE
15.1 MACHINE-CHECK ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1
15.2 COMPATIBILITY WITH PENTIUM

PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
15.3 MACHINE-CHECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Vol. 3A xv
CONTENTS
PAGE
15.3.1 Machine-Check Global Control MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
15.3.1.1 IA32_MCG_CAP MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
15.3.1.2 IA32_MCG_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-5
15.3.1.3 IA32_MCG_CTL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2 Error-Reporting Register Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2.1 IA32_MCi_CTL MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3.2.2 IA32_MCi_STATUS MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-7
15.3.2.3 IA32_MCi_ADDR MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
15.3.2.4 IA32_MCi_MISC MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
15.3.2.5 IA32_MCi_CTL2 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
15.3.2.6 IA32_MCG Extended Machine Check State MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15
15.3.3 Mapping of the Pentium

Processor Machine-Check Errors
to the Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
15.4 ENHANCED CACHE ERROR REPORTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
15.5 CORRECTED MACHINE CHECK ERROR INTERRUPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
15.5.1 CMCI Local APIC Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-19
15.5.2 System Software Recommendation for Managing CMCI and Machine Check Resources .
15-21
15.5.2.1 CMCI Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-21
15.5.2.2 CMCI Threshold Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-22
15.5.2.3 CMCI Interrupt Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-23
15.6 RECOVERY OF UNCORRECTED RECOVERABLE (UCR) ERRORS . . . . . . . . . . . . . . . . . . . . . . 15-23
15.6.1 Detection of Software Error Recovery Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
15.6.2 UCR Error Reporting and Logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
15.6.3 UCR Error Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-25
15.6.4 UCR Error Overwrite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-27
15.7 MACHINE-CHECK AVAILABILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
15.8 MACHINE-CHECK INITIALIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
15.9 INTERPRETING THE MCA ERROR CODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-29
15.9.1 Simple Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
15.9.2 Compound Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
15.9.2.1 Correction Report Filtering (F) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-31
15.9.2.2 Transaction Type (TT) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-31
15.9.2.3 Level (LL) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
15.9.2.4 Request (RRRR) Sub-Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
15.9.2.5 Bus and Interconnect Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-33
15.9.2.6 Memory Controller Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3 Architecturally Defined UCR Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3.1 Architecturally Defined SRAO Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
15.9.3.2 Architecturally Defined SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-36
15.9.4 Multiple MCA Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-38
15.9.5 Machine-Check Error Codes Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-39
15.10 GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-39
15.10.1 Machine-Check Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-40
15.10.2 Pentium

Processor Machine-Check Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . 15-41
15.10.3 Logging Correctable Machine-Check Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-42
15.10.4 Machine-Check Software Handler Guidelines for Error Recovery . . . . . . . . . . . . . . . . 15-44
15.10.4.1 Machine-Check Exception Handler for Error Recovery . . . . . . . . . . . . . . . . . . . . . . . 15-44
15.10.4.2 Corrected Machine-Check Handler for Error Recovery . . . . . . . . . . . . . . . . . . . . . . . 15-50
CONTENTS
xvi Vol. 3A
PAGE
CHAPTER 16
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.1 OVERVIEW OF DEBUG SUPPORT FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1
16.2 DEBUG REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
16.2.1 Debug Address Registers (DR0-DR3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.2 Debug Registers DR4 and DR5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.3 Debug Status Register (DR6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
16.2.4 Debug Control Register (DR7). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5
16.2.5 Breakpoint Field Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6
16.2.6 Debug Registers and Intel

64 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-8
16.3 DEBUG EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9
16.3.1 Debug Exception (#DB)Interrupt Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9
16.3.1.1 Instruction-Breakpoint Exception Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
16.3.1.2 Data Memory and I/O Breakpoint Exception Conditions. . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.3 General-Detect Exception Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.4 Single-Step Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12
16.3.1.5 Task-Switch Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13
16.3.2 Breakpoint Exception (#BP)Interrupt Vector 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13
16.4 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING OVERVIEW. . . . . . . . . . . . . . 16-14
16.4.1 IA32_DEBUGCTL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-14
16.4.2 Monitoring Branches, Exceptions, and Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-16
16.4.3 Single-Stepping on Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . 16-16
16.4.4 Branch Trace Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
16.4.5 Branch Trace Store (BTS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
16.4.6 CPL-Qualified Branch Trace Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18
16.4.7 Freezing LBR and Performance Counters on PMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18
16.4.8 LBR Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19
16.4.8.1 LBR Stack and Intel

64 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
16.4.8.2 LBR Stack and IA-32 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
16.4.8.3 Last Exception Records and Intel 64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-21
16.4.9 BTS and DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-21
16.4.9.1 DS Save Area and IA-32e Mode Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-25
16.4.9.2 Setting Up the DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-28
16.4.9.3 Setting Up the BTS Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-29
16.4.9.4 Setting Up CPL-Qualified BTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-30
16.4.9.5 Writing the DS Interrupt Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-31
16.5 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL

CORE

2 DUO AND
INTEL

ATOM

PROCESSOR FAMILY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-32


16.5.1 LBR Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-33
16.6 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL

CORE

I7 PROCESSOR
FAMILY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-33
16.6.1 LBR Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-34
16.6.2 Filtering of Last Branch Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
16.7 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PROCESSORS BASED ON INTEL
NETBURST

MICROARCHITECTURE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-36
16.7.1 MSR_DEBUGCTLA MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-37
16.7.2 LBR Stack for Processors Based on Intel NetBurst Microarchitecture . . . . . . . . . . . . 16-38
16.7.3 Last Exception Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-40
16.8 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL

CORE

SOLO AND
INTEL

CORE


DUO PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-41
Vol. 3A xvii
CONTENTS
PAGE
16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PENTIUM M PROCESSORS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-43
16.10 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (P6 FAMILY PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-45
16.10.1 DEBUGCTLMSR Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-45
16.10.2 Last Branch and Last Exception MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-46
16.10.3 Monitoring Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-47
16.11 TIME-STAMP COUNTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-48
16.11.1 Invariant TSC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-49
16.11.2 IA32_TSC_AUX Register and RDTSCP Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-50
CHAPTER 17
8086 EMULATION
17.1 REAL-ADDRESS MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1
17.1.1 Address Translation in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-3
17.1.2 Registers Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-4
17.1.3 Instructions Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-4
17.1.4 Interrupt and Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-6
17.2 VIRTUAL-8086 MODE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
17.2.1 Enabling Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-9
17.2.2 Structure of a Virtual-8086 Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-9
17.2.3 Paging of Virtual-8086 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10
17.2.4 Protection within a Virtual-8086 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-11
17.2.5 Entering Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-11
17.2.6 Leaving Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-14
17.2.7 Sensitive Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8 Virtual-8086 Mode I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8.1 I/O-Port-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15
17.2.8.2 Memory-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.2.8.3 Special I/O Buffers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.3 INTERRUPT AND EXCEPTION HANDLING
IN VIRTUAL-8086 MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-16
17.3.1 Class 1Hardware Interrupt and Exception Handling in Virtual-8086 Mode. . . . . . 17-18
17.3.1.1 Handling an Interrupt or Exception Through a Protected-Mode Trap or Interrupt Gate
17-18
17.3.1.2 Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception
Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-20
17.3.1.3 Handling an Interrupt or Exception Through a Task Gate . . . . . . . . . . . . . . . . . . . . 17-21
17.3.2 Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual
Interrupt Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-22
17.3.3 Class 3Software Interrupt Handling in Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . 17-24
17.3.3.1 Method 1: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-27
17.3.3.2 Methods 2 and 3: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.3 Method 4: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.4 Method 5: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-28
17.3.3.5 Method 6: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-29
17.4 PROTECTED-MODE VIRTUAL INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-30
CONTENTS
xviii Vol. 3A
PAGE
CHAPTER 18
MIXING 16-BIT AND 32-BIT CODE
18.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
18.2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT . . . . . . . . . . . . . . . . . 18-2
18.3 SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4
18.4 TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . . . . . 18-4
18.4.1 Code-Segment Pointer Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.4.2 Stack Management for Control Transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.4.2.1 Controlling the Operand-Size Attribute For a Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7
18.4.2.2 Passing Parameters With a Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.3 Interrupt Control Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.4 Parameter Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
18.4.5 Writing Interface Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9
CHAPTER 19
ARCHITECTURE COMPATIBILITY
19.1 PROCESSOR FAMILIES AND CATEGORIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1
19.2 RESERVED BITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
19.3 ENABLING NEW FUNCTIONS AND MODES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
19.4 DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE . . . . . . . . . . . . . . 19-3
19.5 INTEL MMX TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3
19.6 STREAMING SIMD EXTENSIONS (SSE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-3
19.7 STREAMING SIMD EXTENSIONS 2 (SSE2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.8 STREAMING SIMD EXTENSIONS 3 (SSE3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.9 ADDITIONAL STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
19.10 INTEL HYPER-THREADING TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.11 MULTI-CORE TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.12 SPECIFIC FEATURES OF DUAL-CORE PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-5
19.13 NEW INSTRUCTIONS IN THE PENTIUM AND LATER IA-32 PROCESSORS . . . . . . . . . . . . . . 19-5
19.13.1 Instructions Added Prior to the Pentium Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6
19.14 OBSOLETE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.15 UNDEFINED OPCODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.16 NEW FLAGS IN THE EFLAGS REGISTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
19.16.1 Using EFLAGS Flags to Distinguish Between 32-Bit IA-32 Processors . . . . . . . . . . . . . 19-8
19.17 STACK OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-8
19.17.1 PUSH SP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-8
19.17.2 EFLAGS Pushed on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18 X87 FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18.1 Control Register CR0 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
19.18.2 x87 FPU Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10
19.18.2.1 Condition Code Flags (C0 through C3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10
19.18.2.2 Stack Fault Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.3 x87 FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.4 x87 FPU Tag Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
19.18.5 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
19.18.5.1 NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
19.18.5.2 Pseudo-zero, Pseudo-NaN, Pseudo-infinity, and Unnormal Formats . . . . . . . . . . . 19-12
19.18.6 Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
19.18.6.1 Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
19.18.6.2 Numeric Overflow Exception (#O). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
Vol. 3A xix
CONTENTS
PAGE
19.18.6.3 Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.4 Exception Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.5 CS and EIP For FPU Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.6 FPU Error Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
19.18.6.7 Assertion of the FERR# Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-15
19.18.6.8 Invalid Operation Exception On Denormals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-15
19.18.6.9 Alignment Check Exceptions (#AC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.10 Segment Not Present Exception During FLDENV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.11 Device Not Available Exception (#NM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.12 Coprocessor Segment Overrun Exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.13 General Protection Exception (#GP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.6.14 Floating-Point Error Exception (#MF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
19.18.7 Changes to Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.1 FDIV, FPREM, and FSQRT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.2 FSCALE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.3 FPREM1 Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.4 FPREM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.5 FUCOM, FUCOMP, and FUCOMPP Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-17
19.18.7.6 FPTAN Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.7 Stack Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.8 FSIN, FCOS, and FSINCOS Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.9 FPATAN Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.10 F2XM1 Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.11 FLD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-18
19.18.7.12 FXTRACT Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.13 Load Constant Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.14 FSETPM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-19
19.18.7.15 FXAM Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.7.16 FSAVE and FSTENV Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.8 Transcendental Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.9 Obsolete Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
19.18.10 WAIT/FWAIT Prefix Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.18.11 Operands Split Across Segments and/or Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.18.12 FPU Instruction Synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.19 SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-21
19.20 FPU AND MATH COPROCESSOR INITIALIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-22
19.20.1 Intel

387 and Intel

287 Math Coprocessor Initialization. . . . . . . . . . . . . . . . . . . . . 19-22


19.20.2 Intel486 SX Processor and Intel 487 SX Math Coprocessor Initialization . . . . . . . . . 19-22
19.21 CONTROL REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-24
19.22 MEMORY MANAGEMENT FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-25
19.22.1 New Memory Management Control Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-25
19.22.1.1 Physical Memory Addressing Extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-25
19.22.1.2 Global Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
19.22.1.3 Larger Page Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
19.22.2 CD and NW Cache Control Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
19.22.3 Descriptor Types and Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
19.22.4 Changes in Segment Descriptor Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-27
19.23 DEBUG FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-27
19.23.1 Differences in Debug Register DR6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-27
19.23.2 Differences in Debug Register DR7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-27
19.23.3 Debug Registers DR4 and DR5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-27
19.24 RECOGNITION OF BREAKPOINTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-28
CONTENTS
xx Vol. 3A
PAGE
19.25 EXCEPTIONS AND/OR EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-28
19.25.1 Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30
19.25.2 Priority OF Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30
19.26 INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30
19.26.1 Interrupt Propagation Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30
19.26.2 NMI Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30
19.26.3 IDT Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-31
19.27 ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) . . . . . . . . . . . . . . . . . . . . 19-31
19.27.1 Software Visible Differences Between the Local APIC and the 82489DX. . . . . . . . . 19-31
19.27.2 New Features Incorporated in the Local APIC for the P6 Family

and Pentium Processors
19-32
19.27.3 New Features Incorporated in the Local APIC of the Pentium 4 and Intel Xeon Processors
19-32
19.28 TASK SWITCHING AND TSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-32
19.28.1 P6 Family and Pentium Processor TSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-33
19.28.2 TSS Selector Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-33
19.28.3 Order of Reads/Writes to the TSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-33
19.28.4 Using A 16-Bit TSS with 32-Bit Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-33
19.28.5 Differences in I/O Map Base Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-33
19.29 CACHE MANAGEMENT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-34
19.29.1 Self-Modifying Code with Cache Enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-35
19.29.2 Disabling the L3 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
19.30 PAGING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
19.30.1 Large Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
19.30.2 PCD and PWT Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
19.30.3 Enabling and Disabling Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-37
19.31 STACK OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-37
19.31.1 Selector Pushes and Pops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-37
19.31.2 Error Code Pushes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-38
19.31.3 Fault Handling Effects on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-38
19.31.4 Interlevel RET/IRET From a 16-Bit Interrupt or Call Gate . . . . . . . . . . . . . . . . . . . . . . . . 19-38
19.32 MIXING 16- AND 32-BIT SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-39
19.33 SEGMENT AND ADDRESS WRAPAROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-39
19.33.1 Segment Wraparound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-40
19.34 STORE BUFFERS AND MEMORY ORDERING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-40
19.35 BUS LOCKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-42
19.36 BUS HOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-42
19.37 MODEL-SPECIFIC EXTENSIONS TO THE IA-32. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-42
19.37.1 Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-43
19.37.2 RDMSR and WRMSR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-43
19.37.3 Memory Type Range Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-43
19.37.4 Machine-Check Exception and Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-44
19.37.5 Performance-Monitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-44
19.38 TWO WAYS TO RUN INTEL 286 PROCESSOR TASKS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-45
CHAPTER 20
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS
20.1 OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1
20.2 VIRTUAL MACHINE ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1
20.3 INTRODUCTION TO VMX OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1
20.4 LIFE CYCLE OF VMM SOFTWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Vol. 3A xxi
CONTENTS
PAGE
20.5 VIRTUAL-MACHINE CONTROL STRUCTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
20.6 DISCOVERING SUPPORT FOR VMX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
20.7 ENABLING AND ENTERING VMX OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4
20.8 RESTRICTIONS ON VMX OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-5
CHAPTER 21
VIRTUAL-MACHINE CONTROL STRUCTURES
21.1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1
21.2 FORMAT OF THE VMCS REGION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-3
21.3 ORGANIZATION OF VMCS DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-4
21.4 GUEST-STATE AREA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-5
21.4.1 Guest Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-5
21.4.2 Guest Non-Register State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-7
21.5 HOST-STATE AREA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-10
21.6 VM-EXECUTION CONTROL FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-11
21.6.1 Pin-Based VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-11
21.6.2 Processor-Based VM-Execution Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-12
21.6.3 Exception Bitmap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-16
21.6.4 I/O-Bitmap Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-16
21.6.5 Time-Stamp Counter Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-17
21.6.6 Guest/Host Masks and Read Shadows for CR0 and CR4. . . . . . . . . . . . . . . . . . . . . . . . . 21-17
21.6.7 CR3-Target Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-17
21.6.8 Controls for APIC Accesses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-18
21.6.9 MSR-Bitmap Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-19
21.6.10 Executive-VMCS Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-20
21.6.11 Extended-Page-Table Pointer (EPTP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-20
21.6.12 Virtual-Processor Identifier (VPID) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-20
21.6.13 Controls for PAUSE-Loop Exiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-21
21.7 VM-EXIT CONTROL FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-21
21.7.1 VM-Exit Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-21
21.7.2 VM-Exit Controls for MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-23
21.8 VM-ENTRY CONTROL FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-24
21.8.1 VM-Entry Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-24
21.8.2 VM-Entry Controls for MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-25
21.8.3 VM-Entry Controls for Event Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-25
21.9 VM-EXIT INFORMATION FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-27
21.9.1 Basic VM-Exit Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-27
21.9.2 Information for VM Exits Due to Vectored Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-28
21.9.3 Information for VM Exits That Occur During Event Delivery . . . . . . . . . . . . . . . . . . . . . 21-29
21.9.4 Information for VM Exits Due to Instruction Execution. . . . . . . . . . . . . . . . . . . . . . . . . . 21-30
21.9.5 VM-Instruction Error Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-30
21.10 SOFTWARE USE OF THE VMCS AND RELATED STRUCTURES . . . . . . . . . . . . . . . . . . . . . . . 21-31
21.10.1 Software Use of Virtual-Machine Control Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-31
21.10.2 VMREAD, VMWRITE, and Encodings of VMCS Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-32
21.10.3 Initializing a VMCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-34
21.10.4 Software Access to Related Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-35
21.10.5 VMXON Region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-35
CONTENTS
xxii Vol. 3A
PAGE
CHAPTER 22
VMX NON-ROOT OPERATION
22.1 INSTRUCTIONS THAT CAUSE VM EXITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1
22.1.1 Relative Priority of Faults and VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1
22.1.2 Instructions That Cause VM Exits Unconditionally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2
22.1.3 Instructions That Cause VM Exits Conditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-3
22.2 APIC-ACCESS VM EXITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7
22.2.1 Linear Accesses to the APIC-Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7
22.2.1.1 Linear Accesses That Cause APIC-Access VM Exits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7
22.2.1.2 Priority of APIC-Access VM Exits Caused by Linear Accesses . . . . . . . . . . . . . . . . . . 22-9
22.2.1.3 Instructions That May Cause Page Faults or EPT Violations Without Accessing
Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-10
22.2.2 Guest-Physical Accesses to the APIC-Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-10
22.2.2.1 Guest-Physical Accesses That Might Not Cause APIC-Access VM Exits . . . . . . . . 22-11
22.2.2.2 Priority of APIC-Access VM Exits Caused by Guest-Physical Accesses . . . . . . . . . 22-12
22.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physical Accesses to the APIC-Access Page22-12
22.2.4 VTPR Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-13
22.3 OTHER CAUSES OF VM EXITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-14
22.4 CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION . . . . . . . . . . . 22-16
22.5 APIC ACCESSES THAT DO NOT CAUSE VM EXITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-21
22.5.1 Linear Accesses to the APIC-Access Page Using Large-Page Translations . . . . . . . . 22-22
22.5.2 Physical Accesses to the APIC-Access Page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-22
22.5.3 VTPR Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-22
22.5.3.1 Treatment of Individual VTPR Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-23
22.5.3.2 Operations with Multiple Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-23
22.5.3.3 TPR-Shadow Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-25
22.6 OTHER CHANGES IN VMX NON-ROOT OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-25
22.6.1 Event Blocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-25
22.6.2 Treatment of Task Switches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-26
22.7 FEATURES SPECIFIC TO VMX NON-ROOT OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-27
22.7.1 VMX-Preemption Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-27
22.7.2 Monitor Trap Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-28
22.7.3 Translation of Guest-Physical Addresses Using EPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-29
22.8 UNRESTRICTED GUESTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-29
CHAPTER 23
VM ENTRIES
23.1 BASIC VM-ENTRY CHECKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-2
23.2 CHECKS ON VMX CONTROLS AND HOST-STATE AREA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-3
23.2.1 Checks on VMX Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-3
23.2.1.1 VM-Execution Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-3
23.2.1.2 VM-Exit Control Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-6
23.2.1.3 VM-Entry Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-7
23.2.2 Checks on Host Control Registers and MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-8
23.2.3 Checks on Host Segment and Descriptor-Table Registers. . . . . . . . . . . . . . . . . . . . . . . . . 23-9
23.2.4 Checks Related to Address-Space Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-9
23.3 CHECKING AND LOADING GUEST STATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-10
23.3.1 Checks on the Guest State Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-10
23.3.1.1 Checks on Guest Control Registers, Debug Registers, and MSRs . . . . . . . . . . . . . . 23-10
23.3.1.2 Checks on Guest Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-12
Vol. 3A xxiii
CONTENTS
PAGE
23.3.1.3 Checks on Guest Descriptor-Table Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-15
23.3.1.4 Checks on Guest RIP and RFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-15
23.3.1.5 Checks on Guest Non-Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-16
23.3.1.6 Checks on Guest Page-Directory-Pointer-Table Entries . . . . . . . . . . . . . . . . . . . . . . 23-18
23.3.2 Loading Guest State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-19
23.3.2.1 Loading Guest Control Registers, Debug Registers, and MSRs . . . . . . . . . . . . . . . . 23-20
23.3.2.2 Loading Guest Segment Registers and Descriptor-Table Registers . . . . . . . . . . . 23-21
23.3.2.3 Loading Guest RIP, RSP, and RFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-22
23.3.2.4 Loading Page-Directory-Pointer-Table Entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-22
23.3.2.5 Updating Non-Register State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-23
23.3.3 Clearing Address-Range Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-23
23.4 LOADING MSRS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-23
23.5 EVENT INJECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-24
23.5.1 Vectored-Event Injection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-24
23.5.1.1 Details of Vectored-Event Injection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-25
23.5.1.2 VM Exits During Event Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-27
23.5.1.3 Event Injection for VM Entries to Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . 23-28
23.5.2 Injection of Pending MTF VM Exits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-28
23.6 SPECIAL FEATURES OF VM ENTRY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-29
23.6.1 Interruptibility State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-29
23.6.2 Activity State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-30
23.6.3 Delivery of Pending Debug Exceptions after VM Entry. . . . . . . . . . . . . . . . . . . . . . . . . . 23-31
23.6.4 VMX-Preemption Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-32
23.6.5 Interrupt-Window Exiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-32
23.6.6 NMI-Window Exiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-33
23.6.7 VM Exits Induced by the TPR Shadow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-33
23.6.8 Pending MTF VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-34
23.6.9 VM Entries and Advanced Debugging Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-34
23.7 VM-ENTRY FAILURES DURING OR AFTER LOADING GUEST STATE. . . . . . . . . . . . . . . . . . 23-34
23.8 MACHINE CHECKS DURING VM ENTRY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-35
CHAPTER 24
VM EXITS
24.1 ARCHITECTURAL STATE BEFORE A VM EXIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-1
24.2 RECORDING VM-EXIT INFORMATION AND UPDATING VM-ENTRY CONTROL FIELDS. . . 24-5
24.2.1 Basic VM-Exit Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-5
24.2.2 Information for VM Exits Due to Vectored Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-14
24.2.3 Information for VM Exits During Event Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-15
24.2.4 Information for VM Exits Due to Instruction Execution. . . . . . . . . . . . . . . . . . . . . . . . . . 24-17
24.3 SAVING GUEST STATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-26
24.3.1 Saving Control Registers, Debug Registers, and MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . 24-27
24.3.2 Saving Segment Registers and Descriptor-Table Registers. . . . . . . . . . . . . . . . . . . . . . 24-27
24.3.3 Saving RIP, RSP, and RFLAGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-28
24.3.4 Saving Non-Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-30
24.4 SAVING MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-32
24.5 LOADING HOST STATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-33
24.5.1 Loading Host Control Registers, Debug Registers, MSRs . . . . . . . . . . . . . . . . . . . . . . . . 24-33
24.5.2 Loading Host Segment and Descriptor-Table Registers . . . . . . . . . . . . . . . . . . . . . . . . . 24-35
24.5.3 Loading Host RIP, RSP, and RFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-36
24.5.4 Checking and Loading Host Page-Directory-Pointer-Table Entries . . . . . . . . . . . . . . . 24-36
24.5.5 Updating Non-Register State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-37
CONTENTS
xxiv Vol. 3A
PAGE
24.5.6 Clearing Address-Range Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-38
24.6 LOADING MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-38
24.7 VMX ABORTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-39
24.8 MACHINE CHECK DURING VM EXIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-40
CHAPTER 25
VMX SUPPORT FOR ADDRESS TRANSLATION
25.1 VIRTUAL PROCESSOR IDENTIFIERS (VPIDS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-1
25.2 THE EXTENDED PAGE TABLE MECHANISM (EPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-2
25.2.1 EPT Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-2
25.2.2 EPT Translation Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-4
25.2.3 EPT-Induced VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-10
25.2.3.1 EPT Misconfigurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-12
25.2.3.2 EPT Violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-12
25.2.3.3 Prioritization of EPT-Induced VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-13
25.2.4 EPT and Memory Typing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-14
25.2.4.1 Memory Type Used for Accessing EPT Paging Structures . . . . . . . . . . . . . . . . . . . . 25-15
25.2.4.2 Memory Type Used for Translated Guest-Physical Addresses . . . . . . . . . . . . . . . . 25-15
25.3 CACHING TRANSLATION INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-16
25.3.1 Information That May Be Cached . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-16
25.3.2 Creating and Using Cached Translation Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-17
25.3.3 Invalidating Cached Translation Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-19
25.3.3.1 Operations that Invalidate Cached Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-19
25.3.3.2 Operations that Need Not Invalidate Cached Mappings . . . . . . . . . . . . . . . . . . . . . . . 25-21
25.3.3.3 Guidelines for Use of the INVVPID Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-21
25.3.3.4 Guidelines for Use of the INVEPT Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-23
CHAPTER 26
SYSTEM MANAGEMENT MODE
26.1 SYSTEM MANAGEMENT MODE OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-1
26.1.1 System Management Mode and VMX Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-2
26.2 SYSTEM MANAGEMENT INTERRUPT (SMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-3
26.3 SWITCHING BETWEEN SMM AND THE OTHER
PROCESSOR OPERATING MODES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-3
26.3.1 Entering SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-3
26.3.2 Exiting From SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-4
26.4 SMRAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-5
26.4.1 SMRAM State Save Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-6
26.4.1.1 SMRAM State Save Map and Intel 64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-8
26.4.2 SMRAM Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-11
26.4.2.1 System Management Range Registers (SMRR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-12
26.5 SMI HANDLER EXECUTION ENVIRONMENT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-12
26.6 EXCEPTIONS AND INTERRUPTS WITHIN SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-14
26.7 MANAGING SYNCHRONOUS AND ASYNCHRONOUS
SYSTEM MANAGEMENT INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-15
26.7.1 I/O State Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-16
26.8 NMI HANDLING WHILE IN SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-17
26.9 SMM REVISION IDENTIFIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-18
26.10 AUTO HALT RESTART. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-18
26.10.1 Executing the HLT Instruction in SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
Vol. 3A xxv
CONTENTS
PAGE
26.11 SMBASE RELOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
26.11.1 Relocating SMRAM to an Address Above 1 MByte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-20
26.12 I/O INSTRUCTION RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-20
26.12.1 Back-to-Back SMI Interrupts When I/O Instruction Restart Is Being Used. . . . . . . . . 26-22
26.13 SMM MULTIPLE-PROCESSOR CONSIDERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-22
26.14 DEFAULT TREATMENT OF SMIS AND SMM WITH VMX OPERATION AND SMX OPERATION .
26-23
26.14.1 Default Treatment of SMI Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-23
26.14.2 Default Treatment of RSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-24
26.14.3 Protection of CR4.VMXE in SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-26
26.15 DUAL-MONITOR TREATMENT OF SMIs AND SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-26
26.15.1 Dual-Monitor Treatment Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-26
26.15.2 SMM VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-27
26.15.2.1 Architectural State Before a VM Exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-27
26.15.2.2 Updating the Current-VMCS and Executive-VMCS Pointers. . . . . . . . . . . . . . . . . . . 26-27
26.15.2.3 Recording VM-Exit Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-28
26.15.2.4 Saving Guest State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-29
26.15.2.5 Updating Non-Register State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-30
26.15.3 Operation of an SMM Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-30
26.15.4 VM Entries that Return from SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-30
26.15.4.1 Checks on the Executive-VMCS Pointer Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-30
26.15.4.2 Checks on VM-Execution Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-31
26.15.4.3 Checks on VM-Entry Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-31
26.15.4.4 Checks on the Guest State Area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-32
26.15.4.5 Loading Guest State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-32
26.15.4.6 VMX-Preemption Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-32
26.15.4.7 Updating the Current-VMCS and SMM-Transfer VMCS Pointers. . . . . . . . . . . . . . . 26-33
26.15.4.8 VM Exits Induced by VM Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-33
26.15.4.9 SMI Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-33
26.15.4.10 Failures of VM Entries That Return from SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-34
26.15.5 Enabling the Dual-Monitor Treatment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-34
26.15.6 Activating the Dual-Monitor Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-36
26.15.6.1 Initial Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-36
26.15.6.2 MSEG Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-38
26.15.6.3 Updating the Current-VMCS and Executive-VMCS Pointers. . . . . . . . . . . . . . . . . . . 26-38
26.15.6.4 Loading Host State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-38
26.15.6.5 Loading MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-40
26.15.7 Deactivating the Dual-Monitor Treatment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-40
26.16 SMI AND PROCESSOR EXTENDED STATE MANAGEMENT. . . . . . . . . . . . . . . . . . . . . . . . . . . 26-41
CHAPTER 27
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS
27.1 VMX SYSTEM PROGRAMMING OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-1
27.2 SUPPORTING PROCESSOR OPERATING MODES IN GUEST ENVIRONMENTS. . . . . . . . . . . 27-1
27.2.1 Emulating Guest Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27-2
27.3 MANAGING VMCS REGIONS AND POINTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-2
27.4 USING VMX INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-4
27.5 VMM SETUP & TEAR DOWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-6
27.5.1 Algorithms for Determining VMX Capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27-7
27.6 PREPARATION AND LAUNCHING A VIRTUAL MACHINE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-10
27.7 HANDLING OF VM EXITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-11
CONTENTS
xxvi Vol. 3A
PAGE
27.7.1 Handling VM Exits Due to Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-12
27.7.1.1 Reflecting Exceptions to Guest Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-12
27.7.1.2 Resuming Guest Software after Handling an Exception . . . . . . . . . . . . . . . . . . . . . . 27-14
27.8 MULTI-PROCESSOR CONSIDERATIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-15
27.8.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-16
27.8.2 Moving a VMCS Between Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-16
27.8.3 Paired Index-Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-17
27.8.4 External Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-17
27.8.5 CPUID Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-18
27.9 32-BIT AND 64-BIT GUEST ENVIRONMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-18
27.9.1 Operating Modes of Guest Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-18
27.9.2 Handling Widths of VMCS Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-19
27.9.2.1 Natural-Width VMCS Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-19
27.9.2.2 64-Bit VMCS Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-19
27.9.3 IA-32e Mode Hosts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-19
27.9.4 IA-32e Mode Guests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-20
27.9.5 32-Bit Guests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-21
27.10 HANDLING MODEL SPECIFIC REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-22
27.10.1 Using VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-22
27.10.2 Using VM-Exit Controls for MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-23
27.10.3 Using VM-Entry Controls for MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-23
27.10.4 Handling Special-Case MSRs and Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-23
27.10.4.1 Handling IA32_EFER MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-23
27.10.4.2 Handling the SYSENTER and SYSEXIT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-24
27.10.4.3 Handling the SYSCALL and SYSRET Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-24
27.10.4.4 Handling the SWAPGS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-24
27.10.4.5 Implementation Specific Behavior on Writing to Certain MSRs . . . . . . . . . . . . . . . . 27-25
27.10.5 Handling Accesses to Reserved MSR Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-25
27.11 HANDLING ACCESSES TO CONTROL REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-25
27.12 PERFORMANCE CONSIDERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-25
CHAPTER 28
VIRTUALIZATION OF SYSTEM RESOURCES
28.1 OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-1
28.2 VIRTUALIZATION SUPPORT FOR DEBUGGING FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-1
28.2.1 Debug Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-2
28.3 MEMORY VIRTUALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-3
28.3.1 Processor Operating Modes & Memory Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-3
28.3.2 Guest & Host Physical Address Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-3
28.3.3 Virtualizing Virtual Memory by Brute Force. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-4
28.3.4 Alternate Approach to Memory Virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-4
28.3.5 Details of Virtual TLB Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-6
28.3.5.1 Initialization of Virtual TLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-7
28.3.5.2 Response to Page Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-8
28.3.5.3 Response to Uses of INVLPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-11
28.3.5.4 Response to CR3 Writes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-11
28.4 MICROCODE UPDATE FACILITY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-11
28.4.1 Early Load of Microcode Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-12
28.4.2 Late Load of Microcode Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-12
Vol. 3A xxvii
CONTENTS
PAGE
CHAPTER 29
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR
29.1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-1
29.2 INTERRUPT HANDLING IN VMX OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-1
29.3 EXTERNAL INTERRUPT VIRTUALIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-3
29.3.1 Virtualization of Interrupt Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-4
29.3.2 Control of Platform Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-5
29.3.2.1 PIC Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-6
29.3.2.2 xAPIC Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-6
29.3.2.3 Local APIC Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-6
29.3.2.4 I/O APIC Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-7
29.3.2.5 Virtualization of Message Signaled Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-8
29.3.3 Examples of Handling of External Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-8
29.3.3.1 Guest Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-8
29.3.3.2 Processor Treatment of External Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-9
29.3.3.3 Processing of External Interrupts by VMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29-9
29.3.3.4 Generation of Virtual Interrupt Events by VMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-10
29.4 ERROR HANDLING BY VMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-11
29.4.1 VM-Exit Failures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-11
29.4.2 Machine Check Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-12
29.4.3 MCA Error Handling Guidelines for VMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-13
29.4.3.1 VMM Error Handling Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-13
29.4.3.2 Basic VMM MCA error recovery handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-14
29.4.3.3 Implementation Considerations for the Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . 29-14
29.4.3.4 MCA Virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-15
29.4.3.5 Implementation Considerations for the MCA Virtualization Model. . . . . . . . . . . . . 29-15
29.5 HANDLING ACTIVITY STATES BY VMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-15
CHAPTER 30
PERFORMANCE MONITORING
30.1 PERFORMANCE MONITORING OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-1
30.2 ARCHITECTURAL PERFORMANCE MONITORING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-2
30.2.1 Architectural Performance Monitoring Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-3
30.2.1.1 Architectural Performance Monitoring Version 1 Facilities . . . . . . . . . . . . . . . . . . . . .30-4
30.2.2 Additional Architectural Performance Monitoring Extensions. . . . . . . . . . . . . . . . . . . . . .30-6
30.2.2.1 Architectural Performance Monitoring Version 2 Facilities . . . . . . . . . . . . . . . . . . . . .30-6
30.2.2.2 Architectural Performance Monitoring Version 3 Facilities . . . . . . . . . . . . . . . . . . . 30-10
30.2.3 Pre-defined Architectural Performance Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-12
30.3 PERFORMANCE MONITORING (INTEL

CORE

SOLO AND INTEL

CORE

DUO
PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-14
30.4 PERFORMANCE MONITORING (PROCESSORS BASED ON INTEL

CORE


MICROARCHITECTURE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-16
30.4.1 Fixed-function Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-18
30.4.2 Global Counter Control Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-19
30.4.3 At-Retirement Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-21
30.4.4 Precise Event Based Sampling (PEBS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-22
30.4.4.1 Setting up the PEBS Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
30.4.4.2 PEBS Record Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
30.4.4.3 Writing a PEBS Interrupt Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-23
CONTENTS
xxviii Vol. 3A
PAGE
30.5 PERFORMANCE MONITORING (PROCESSORS BASED ON INTEL

ATOM


MICROARCHITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-25
30.6 PERFORMANCE MONITORING FOR PROCESSORS BASED ON INTEL

MICROARCHITECTURE
CODENAME NEHALEM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-26
30.6.1 Enhancements of Performance Monitoring in the Processor Core . . . . . . . . . . . . . . . . 30-27
30.6.1.1 Precise Event Based Sampling (PEBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-27
30.6.1.2 Load Latency Performance Monitoring Facility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-32
30.6.1.3 Off-core Response Performance Monitoring in the Processor Core. . . . . . . . . . . . 30-34
30.6.2 Performance Monitoring Facility in the Uncore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-37
30.6.2.1 Uncore Performance Monitoring Management Facility. . . . . . . . . . . . . . . . . . . . . . . . 30-37
30.6.2.2 Uncore Performance Event Configuration Facility. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-40
30.6.2.3 Uncore Address/Opcode Match MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-42
30.6.3 Intel Xeon Processor 7500 Series Performance Monitoring Facility . . . . . . . . . . . . . . 30-43
30.7 PERFORMANCE MONITORING FOR PROCESSORS BASED ON NEXT GENERATION INTEL

PROCESSOR (CODENAMED WESTMERE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-46
30.8 PERFORMANCE MONITORING (PROCESSORS
BASED ON INTEL NETBURST MICROARCHITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-46
30.8.1 ESCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-50
30.8.2 Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-52
30.8.3 CCCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-53
30.8.4 Debug Store (DS) Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-55
30.8.5 Programming the Performance Counters
for Non-Retirement Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
30.8.5.1 Selecting Events to Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
30.8.5.2 Filtering Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-58
30.8.5.3 Starting Event Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
30.8.5.4 Reading a Performance Counters Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
30.8.5.5 Halting Event Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-61
30.8.5.6 Cascading Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-61
30.8.5.7 EXTENDED CASCADING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-62
30.8.5.8 Generating an Interrupt on Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-64
30.8.5.9 Counter Usage Guideline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-64
30.8.6 At-Retirement Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-65
30.8.6.1 Using At-Retirement Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-66
30.8.6.2 Tagging Mechanism for Front_end_event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-67
30.8.6.3 Tagging Mechanism For Execution_event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-67
30.8.6.4 Tagging Mechanism for Replay_event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-68
30.8.7 Precise Event-Based Sampling (PEBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-68
30.8.7.1 Detection of the Availability of the PEBS Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.2 Setting Up the DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.3 Setting Up the PEBS Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.4 Writing a PEBS Interrupt Service Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-69
30.8.7.5 Other DS Mechanism Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-70
30.8.8 Operating System Implications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-70
30.9 PERFORMANCE MONITORING AND INTEL HYPER-THREADING TECHNOLOGY IN
PROCESSORS BASED ON INTEL NETBURST MICROARCHITECTURE . . . . . . . . . . . . . . . . . 30-70
30.9.1 ESCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-71
30.9.2 CCCR MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-72
30.9.3 IA32_PEBS_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-74
30.9.4 Performance Monitoring Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-74
30.10 COUNTING CLOCKS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-76
30.10.1 Non-Halted Clockticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-77
Vol. 3A xxix
CONTENTS
PAGE
30.10.2 Non-Sleep Clockticks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-78
30.10.3 Incrementing the Time-Stamp Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.10.4 Non-Halted Reference Clockticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.10.5 Cycle Counting and Opportunistic Processor Operation . . . . . . . . . . . . . . . . . . . . . . . . . 30-79
30.11 PERFORMANCE MONITORING, BRANCH PROFILING AND SYSTEM EVENTS . . . . . . . . . . 30-80
30.12 PERFORMANCE MONITORING AND DUAL-CORE TECHNOLOGY. . . . . . . . . . . . . . . . . . . . . . 30-81
30.13 PERFORMANCE MONITORING ON 64-BIT INTEL XEON PROCESSOR MP WITH UP TO 8-
MBYTE L3 CACHE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-81
30.14 PERFORMANCE MONITORING ON L3 AND CACHING BUS CONTROLLER SUB-SYSTEMS . 30-
86
30.14.1 Overview of Performance Monitoring with L3/Caching Bus Controller . . . . . . . . . . . 30-88
30.14.2 GBSQ Event Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-89
30.14.3 GSNPQ Event Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-91
30.14.4 FSB Event Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-93
30.14.4.1 FSB Sub-Event Mask Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-94
30.14.5 Common Event Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-95
30.15 PERFORMANCE MONITORING (P6 FAMILY PROCESSOR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-95
30.15.1 PerfEvtSel0 and PerfEvtSel1 MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-96
30.15.2 PerfCtr0 and PerfCtr1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-98
30.15.3 Starting and Stopping the Performance-Monitoring Counters . . . . . . . . . . . . . . . . . . . 30-98
30.15.4 Event and Time-Stamp Monitoring Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-99
30.15.5 Monitoring Counter Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-99
30.16 PERFORMANCE MONITORING (PENTIUM PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . 30-100
30.16.1 Control and Event Select Register (CESR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-100
30.16.2 Use of the Performance-Monitoring Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-102
30.16.3 Events Counted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-102
APPENDIX A
PERFORMANCE-MONITORING EVENTS
A.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A.2 PERFORMANCE MONITORING EVENTS FOR INTEL

CORE

I7 PROCESSOR FAMILY AND


XEON PROCESSOR FAMILY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
A.3 PERFORMANCE MONITORING EVENTS FOR NEXT GENERATION INTEL

PROCESSOR
(CODENAMED WESTMERE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-53
A.4 PERFORMANCE MONITORING EVENTS FOR INTEL

XEON

PROCESSOR 5200, 5400


SERIES AND INTEL

CORE

2

EXTREME PROCESSORS QX 9000 SERIES . . . . . . . . . . . A-108
A.5 PERFORMANCE MONITORING EVENTS FOR INTEL

XEON

PROCESSOR 3000, 3200,


5100, 5300 SERIES AND INTEL

CORE

2

DUO PROCESSORS. . . . . . . . . . . . . . . . . . . . . A-108
A.6 PERFORMANCE MONITORING EVENTS FOR INTEL

ATOM

PROCESSORS. . . . . . . . . A-153
A.7 PERFORMANCE MONITORING EVENTS FOR INTEL

CORE

SOLO AND INTEL

CORE

DUO PROCESSORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-176
A.8 PENTIUM 4 AND INTEL XEON PROCESSOR PERFORMANCE-MONITORING EVENTS. . . A-185
A.9 PERFORMANCE MONITORING EVENTS FOR
INTEL

PENTIUM

M PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-234
A.10 P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . . . . . . . . . . . . . A-237
A.11 PENTIUM PROCESSOR PERFORMANCE-
MONITORING EVENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-255
CONTENTS
xxx Vol. 3A
PAGE
APPENDIX B
MODEL-SPECIFIC REGISTERS (MSRS)
B.1 ARCHITECTURAL MSRS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.2 MSRS IN THE INTEL

CORE

2 PROCESSOR FAMILY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-40
B.3 MSRS IN THE INTEL

ATOM

PROCESSOR FAMILY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-61
B.4 MSRS IN THE INTEL

MICROARCHITECTURE CODENAME NEHALEM . . . . . . . . . . . . . . . . . B-76


B.4.1 Additional MSRs In the Intel

Xeon Processor 5500 and 3400 Series . . . . . . . . . . . . B-95


B.4.2 Additional MSRs In the Intel

Xeon Processor 7500 Series . . . . . . . . . . . . . . . . . . . . . . B-98


B.5 MSRS IN THE INTEL XEON PROCESSOR 5600 SERIES (CODENAMED WESTMERE). . . . B-119
B.6 MSRS IN THE PENTIUM

4 AND INTEL

XEON

PROCESSORS . . . . . . . . . . . . . . . . . . B-121
B.6.1 MSRs Unique to Intel Xeon Processor MP with L3 Cache . . . . . . . . . . . . . . . . . . . . . . . . B-161
B.7 MSRS IN INTEL

CORE

SOLO AND INTEL

CORE


DUO PROCESSORS . . . . . . . . . . B-164
B.8 MSRS IN THE PENTIUM M PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-177
B.9 MSRS IN THE P6 FAMILY PROCESSORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-187
B.10 MSRS IN PENTIUM PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-199
APPENDIX C
MP INITIALIZATION FOR P6 FAMILY PROCESSORS
C.1 OVERVIEW OF THE MP INITIALIZATION PROCESS FOR P6 FAMILY PROCESSORS . . . . . . . C-1
C.2 MP INITIALIZATION PROTOCOL ALGORITHM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
C.2.1 Error Detection and Handling During the MP Initialization Protocol . . . . . . . . . . . . . . . . . .C-4
APPENDIX D
PROGRAMMING THE LINT0 AND LINT1 INPUTS
D.1 CONSTANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
D.2 LINT[0:1] PINS PROGRAMMING PROCEDURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
APPENDIX E
INTERPRETING MACHINE-CHECK
ERROR CODES
E.1 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 06H MACHINE ERROR
CODES FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
E.2 INCREMENTAL DECODING INFORMATION: INTEL CORE 2 PROCESSOR FAMILY MACHINE
ERROR CODES FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-5
E.2.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor 7400 Series . . . .E-9
E.2.1.1 Processor Machine Check Status Register
Incremental MCA Error Code Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .E-9
E.2.2 Intel Xeon Processor 7400 Model Specific Error Code Field. . . . . . . . . . . . . . . . . . . . . . . E-10
E.2.2.1 Processor Model Specific Error Code Field
Type B: Bus and Interconnect Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
E.2.2.2 Processor Model Specific Error Code Field
Type C: Cache Bus Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
E.3 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY WITH CPUID
DISPLAYFAMILY_DISPLAYMODEL SIGNATURE 06_1AH, MACHINE ERROR CODES FOR
MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-11
E.3.1 QPI Machine Check Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-12
E.3.2 Internal Machine Check Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-13
E.3.3 Memory Controller Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-14
Vol. 3A xxxi
CONTENTS
PAGE
E.4 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 0FH MACHINE ERROR CODES
FOR MACHINE CHECK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-15
E.4.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor MP 7100 Series . E-
16
E.4.1.1 Processor Machine Check Status Register
MCA Error Code Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-18
E.4.2 Other_Info Field (all MCA Error Types) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-19
E.4.3 Processor Model Specific Error Code Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
E.4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MCA Error Type A: L3 ErrorE-21
E.4.3.2 Processor Model Specific Error Code Field
Type B: Bus and Interconnect Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
E.4.3.3 Processor Model Specific Error Code Field
Type C: Cache Bus Controller Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-23
APPENDIX F
APIC BUS MESSAGE FORMATS
F.1 BUS MESSAGE FORMATS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-1
F.2 EOI MESSAGE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-1
F.2.1 Short Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-2
F.2.2 Non-focused Lowest Priority Message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-3
F.2.3 APIC Bus Status Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-5
APPENDIX G
VMX CAPABILITY REPORTING FACILITY
G.1 BASIC VMX INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-1
G.2 RESERVED CONTROLS AND DEFAULT SETTINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
G.3 VM-EXECUTION CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-3
G.3.1 Pin-Based VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-3
G.3.2 Primary Processor-Based VM-Execution Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-4
G.3.3 Secondary Processor-Based VM-Execution Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-5
G.4 VM-EXIT CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-6
G.5 VM-ENTRY CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-7
G.6 MISCELLANEOUS DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-7
G.7 VMX-FIXED BITS IN CR0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-8
G.8 VMX-FIXED BITS IN CR4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
G.9 VMCS ENUMERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
G.10 VPID AND EPT CAPABILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-9
APPENDIX H
FIELD ENCODING IN VMCS
H.1 16-BIT FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.1 16-Bit Control Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.2 16-Bit Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-1
H.1.3 16-Bit Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-2
H.2 64-BIT FIELDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-2
H.2.1 64-Bit Control Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-3
H.2.2 64-Bit Read-Only Data Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-4
H.2.3 64-Bit Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-4
H.2.4 64-Bit Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-5
CONTENTS
xxxii Vol. 3A
PAGE
H.3 32-BIT FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-6
H.3.1 32-Bit Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-6
H.3.2 32-Bit Read-Only Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-7
H.3.3 32-Bit Guest-State Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-7
H.3.4 32-Bit Host-State Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4 NATURAL-WIDTH FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4.1 Natural-Width Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-9
H.4.2 Natural-Width Read-Only Data Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10
H.4.3 Natural-Width Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10
H.4.4 Natural-Width Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-11
APPENDIX I
VMX BASIC EXIT REASONS
Vol. 3A xxxiii
CONTENTS
PAGE
FIGURES
Figure 1-1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Figure 1-2. Syntax for CPUID, CR, and MSR Data Presentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-10
Figure 2-1. IA-32 System-Level Registers and Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode . . . . . . . . . . . . . . . . . . . 2-4
Figure 2-3. Transitions Among the Processors Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
Figure 2-4. System Flags in the EFLAGS Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-13
Figure 2-5. Memory Management Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-16
Figure 2-6. Control Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-19
Figure 2-7. XFEATURE_ENABLED_MASK Register (XCR0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-26
Figure 3-1. Segmentation and Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Figure 3-2. Flat Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Figure 3-3. Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Figure 3-4. Multi-Segment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Figure 3-5. Logical Address to Linear Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
Figure 3-6. Segment Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10
Figure 3-7. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Figure 3-8. Segment Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear. . . . . . . . . . . . . . . . . . . . . .3-15
Figure 3-10. Global and Local Descriptor Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20
Figure 3-11. Pseudo-Descriptor Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22
Figure 4-1. Enabling and Changing Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Figure 4-2. Linear-Address Translation to a 4-KByte Page using 32-Bit Paging. . . . . . . . . . . . .4-11
Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging . . . . . . . . . . . .4-11
Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging . . . . . . . . . . . . . .4-15
Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging . . . . . . . . . . . . . . .4-18
Figure 4-6. Linear-Address Translation to a 2-MByte Page using PAE Paging. . . . . . . . . . . . . . .4-18
Figure 4-7. Formats of CR3 and Paging-Structure Entries with PAE Paging . . . . . . . . . . . . . . . .4-23
Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging . . . . . . . . . . . .4-26
Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging . . . . . . . . . . . .4-27
Figure 4-10. Linear-Address Translation to a 1-GByte Page using IA-32e Paging . . . . . . . . . . . .4-28
Figure 4-11. Formats of CR3 and Paging-Structure Entries with IA-32e Paging. . . . . . . . . . . . . .4-36
Figure 4-12. Page-Fault Error Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-38
Figure 4-13. Memory Management Convention That Assigns a Page Table
to Each Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-60
Figure 5-1. Descriptor Fields Used for Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Figure 5-2. Descriptor Fields with Flags used in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Figure 5-3. Protection Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-10
Figure 5-4. Privilege Check for Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-12
Figure 5-5. Examples of Accessing Data Segments From Various Privilege Levels . . . . . . . . . .5-13
Figure 5-6. Privilege Check for Control Transfer Without Using a Gate . . . . . . . . . . . . . . . . . . . . .5-16
Figure 5-7. Examples of Accessing Conforming and Nonconforming Code Segments From Various
Privilege Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-17
Figure 5-8. Call-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19
Figure 5-9. Call-Gate Descriptor in IA-32e Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-21
Figure 5-10. Call-Gate Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-22
Figure 5-11. Privilege Check for Control Transfer with Call Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-23
Figure 5-12. Example of Accessing Call Gates At Various Privilege Levels . . . . . . . . . . . . . . . . . . .5-25
Figure 5-13. Stack Switching During an Interprivilege-Level Call . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-27
Figure 5-14. MSRs Used by SYSCALL and SYSRET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-33
Figure 5-15. Use of RPL to Weaken Privilege Level of Called Procedure. . . . . . . . . . . . . . . . . . . . .5-38
CONTENTS
xxxiv Vol. 3A
PAGE
Figure 6-1. Relationship of the IDTR and IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
Figure 6-2. IDT Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
Figure 6-3. Interrupt Procedure Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines. . . . . . . 6-18
Figure 6-5. Interrupt Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Figure 6-6. Error Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
Figure 6-7. 64-Bit IDT Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Figure 6-8. IA-32e Mode Stack Usage After Privilege Level Change . . . . . . . . . . . . . . . . . . . . . . . 6-26
Figure 6-9. Page-Fault Error Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-55
Figure 7-1. Structure of a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Figure 7-2. 32-Bit Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Figure 7-3. TSS Descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
Figure 7-4. Format of TSS and LDT Descriptors in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
Figure 7-5. Task Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
Figure 7-6. Task-Gate Descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
Figure 7-7. Task Gates Referencing the Same Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Figure 7-8. Nested Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
Figure 7-9. Overlapping Linear-to-Physical Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
Figure 7-10. 16-Bit TSS Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Figure 7-11. 64-Bit TSS Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24
Figure 8-1. Example of Write Ordering in Multiple-Processor Systems . . . . . . . . . . . . . . . . . . . . . 8-11
Figure 8-2. Interpretation of APIC ID in Early MP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35
Figure 8-3. Local APICs and I/O APIC in MP System Supporting Intel HT Technology. . . . . . . . 8-39
Figure 8-4. IA-32 Processor with Two Logical Processors Supporting Intel HT Technology . 8-40
Figure 8-5. Generalized Four level Interpretation of the APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . 8-50
Figure 8-6. Conceptual Five-level Topology and 32-bit APIC ID Composition . . . . . . . . . . . . . . . 8-50
Figure 8-7. Topological Relationships between Hierarchical IDs in a Hypothetical MP Platform. 8-
53
Figure 9-1. Contents of CR0 Register after Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Figure 9-2. Version Information in the EDX Register after Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Figure 9-3. Processor State After Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of
List File) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-31
Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List File). . . 9-32
Figure 9-6. Task Switching (Lines 282-296 of List File). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-33
Figure 9-7. Applying Microcode Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37
Figure 9-8. Microcode Update Write Operation Flow [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-60
Figure 9-9. Microcode Update Write Operation Flow [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-61
Figure 10-1. Relationship of Local APIC and I/O APIC In Single-Processor Systems. . . . . . . . . . . 10-3
Figure 10-2. Local APICs and I/O APIC When Intel Xeon Processors Are Used in Multiple-
Processor Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Figure 10-3. Local APICs and I/O APIC When P6 Family Processors Are Used in Multiple-Processor
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Figure 10-4. Local APIC Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
Figure 10-5. IA32_APIC_BASE MSR (APIC_BASE_MSR in P6 Family) . . . . . . . . . . . . . . . . . . . . . . . 10-12
Figure 10-6. Local APIC ID Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Figure 10-7. Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
Figure 10-8. Local Vector Table (LVT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18
Figure 10-9. Error Status Register (ESR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Figure 10-10. Divide Configuration Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
Figure 10-11. Initial Count and Current Count Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
Figure 10-12. Interrupt Command Register (ICR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-25
Vol. 3A xxxv
CONTENTS
PAGE
Figure 10-13. Logical Destination Register (LDR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
Figure 10-14. Destination Format Register (DFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
Figure 10-15. Arbitration Priority Register (APR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
Figure 10-16. Interrupt Acceptance Flow Chart for the Local APIC (Pentium 4 and Intel Xeon
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-36
Figure 10-17. Interrupt Acceptance Flow Chart for the Local APIC (P6 Family and Pentium
Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-37
Figure 10-18. Task Priority Register (TPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39
Figure 10-19. Processor Priority Register (PPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
Figure 10-20. IRR, ISR and TMR Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-41
Figure 10-21. EOI Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-42
Figure 10-22. CR8 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43
Figure 10-23. Spurious-Interrupt Vector Register (SVR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-45
Figure 10-24. Layout of the MSI Message Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-47
Figure 10-25. Layout of the MSI Message Data Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-49
Figure 10-26. IA32_APIC_BASE MSR Supporting x2APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
Figure 10-27. Local x2APIC State Transitions with IA32_APIC_BASE, INIT, and RESET . . . . . . 10-58
Figure 10-28. Error Status Register (ESR) in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-62
Figure 10-29. Interrupt Command Register (ICR) in x2APIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 10-63
Figure 10-30. Logical Destination Register in x2APIC Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-64
Figure 10-31. SELF IPI register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-65
Figure 11-1. Cache Structure of the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . . . .11-1
Figure 11-2. Cache Structure of the Intel Core i7 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
Figure 11-3. Cache-Control Registers and Bits Available in Intel 64 and IA-32 Processors . . 11-16
Figure 11-4. Mapping Physical Memory With MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-31
Figure 11-5. IA32_MTRRCAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
Figure 11-6. IA32_MTRR_DEF_TYPE MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33
Figure 11-7. IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn Variable-Range Register
Pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-36
Figure 11-8. IA32_SMRR_PHYSBASE and IA32_SMRR_PHYSMASK SMRR Pair. . . . . . . . . . . . . 11-38
Figure 11-9. IA32_PAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
Figure 12-1. Mapping of MMX Registers to Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . .12-2
Figure 12-2. Mapping of MMX Registers to x87 FPU Data Register Stack . . . . . . . . . . . . . . . . . . .12-7
Figure 13-1. Example of Saving the x87 FPU, MMX, SSE, SSE2, SSE3, and SSSE3 State During an
Operating-System Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
Figure 13-2. Future Layout of XSAVE/XRSTOR Area and XSTATE_BV with Five Sets of Processor
State Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Figure 13-3. OS Enabling of Processor Extended State Support . . . . . . . . . . . . . . . . . . . . . . . . . . 13-17
Figure 13-4. Application Detection of New Instruction Extensions and Processor Extended State
13-19
Figure 14-1. IA32_MPERF MSR and IA32_APERF MSR for P-state Coordination . . . . . . . . . . . . .14-2
Figure 14-2. IA32_PERF_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-6
Figure 14-3. Periodic Query of Activity Ratio of Opportunistic Processor Operation . . . . . . . . .14-7
Figure 14-4. IA32_ENERGY_PERF_BIAS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-9
Figure 14-5. Processor Modulation Through Stop-Clock Mechanism. . . . . . . . . . . . . . . . . . . . . . . 14-11
Figure 14-6. MSR_THERM2_CTL Register On Processors with CPUID Family/Model/Stepping
Signature Encoded as 0x69n or 0x6Dn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Figure 14-7. MSR_THERM2_CTL Register for Supporting TM2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
Figure 14-8. IA32_THERM_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15
Figure 14-9. IA32_THERM_INTERRUPT MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15
Figure 14-10. IA32_CLOCK_MODULATION MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17
Figure 14-11. IA32_THERM_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
CONTENTS
xxxvi Vol. 3A
PAGE
Figure 14-12. IA32_THERM_INTERRUPT Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
Figure 15-1. Machine-Check MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Figure 15-2. IA32_MCG_CAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4
Figure 15-3. IA32_MCG_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5
Figure 15-4. IA32_MCi_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Figure 15-5. IA32_MCi_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8
Figure 15-6. IA32_MCi_ADDR MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Figure 15-7. UCR Support in IA32_MCi_MISC Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
Figure 15-8. IA32_MCi_CTL2 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-14
Figure 15-9. CMCI Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-19
Figure 15-10. Local APIC CMCI LVT Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-20
Figure 16-1. Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
Figure 16-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology . . . . . . . . . . . . . . 16-9
Figure 16-3. IA32_DEBUGCTL MSR for Processors based
on Intel Core

microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-15
Figure 16-4. 64-bit Address Layout of LBR MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
Figure 16-5. DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-23
Figure 16-6. 32-bit Branch Trace Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-24
Figure 16-7. PEBS Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-25
Figure 16-8. IA-32e Mode DS Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-26
Figure 16-9. 64-bit Branch Trace Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-27
Figure 16-10. 64-bit PEBS Record Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-27
Figure 16-11. IA32_DEBUGCTL MSR for Processors based
on Intel microarchitecture (Nehalem). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-34
Figure 16-12. MSR_DEBUGCTLA MSR for Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . 16-38
Figure 16-13. LBR MSR Branch Record Layout for the Pentium 4
and Intel Xeon Processor Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-40
Figure 16-14. IA32_DEBUGCTL MSR for Intel Core Solo
and Intel Core

Duo Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-42
Figure 16-15. LBR Branch Record Layout for the Intel Core Solo
and Intel

Core Duo Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-43
Figure 16-16. MSR_DEBUGCTLB MSR for Pentium M Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-44
Figure 16-17. LBR Branch Record Layout for the Pentium M Processor . . . . . . . . . . . . . . . . . . . . . 16-45
Figure 16-18. DEBUGCTLMSR Register (P6 Family Processors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-46
Figure 17-1. Real-Address Mode Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-4
Figure 17-2. Interrupt Vector Table in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7
Figure 17-3. Entering and Leaving Virtual-8086 Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-13
Figure 17-4. Privilege Level 0 Stack After Interrupt or
Exception in Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-19
Figure 17-5. Software Interrupt Redirection Bit Map in TSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-27
Figure 18-1. Stack after Far 16- and 32-Bit Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
Figure 19-1. I/O Map Base Address Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-34
Figure 20-1. Interaction of a Virtual-Machine Monitor and Guests . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
Figure 21-1. States of VMCS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-3
Figure 25-1. Formats of EPTP and EPT Paging-Structure Entries. . . . . . . . . . . . . . . . . . . . . . . . . . 25-11
Figure 26-1. SMRAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-6
Figure 26-2. SMM Revision Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-18
Figure 26-3. Auto HALT Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
Figure 26-4. SMBASE Relocation Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-20
Figure 26-5. I/O Instruction Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-21
Figure 27-1. VMX Transitions and States of VMCS in a Logical Processor . . . . . . . . . . . . . . . . . . . 27-4
Figure 28-1. Virtual TLB Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-7
Vol. 3A xxxvii
CONTENTS
PAGE
Figure 29-1. Host External Interrupts and Guest Virtual Interrupts . . . . . . . . . . . . . . . . . . . . . . . . .29-5
Figure 30-1. Layout of IA32_PERFEVTSELx MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-4
Figure 30-2. Layout of IA32_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-7
Figure 30-3. Layout of IA32_PERF_GLOBAL_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-8
Figure 30-4. Layout of IA32_PERF_GLOBAL_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-9
Figure 30-5. Layout of IA32_PERF_GLOBAL_OVF_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30-9
Figure 30-6. Layout of IA32_PERFEVTSELx MSRs Supporting Architectural Performance
Monitoring Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-10
Figure 30-7. Layout of IA32_FIXED_CTR_CTRL MSR Supporting Architectural Performance
Monitoring Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-11
Figure 30-8. Layout of Global Performance Monitoring Control MSR . . . . . . . . . . . . . . . . . . . . . . 30-12
Figure 30-9. Layout of MSR_PERF_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-19
Figure 30-10. Layout of MSR_PERF_GLOBAL_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-20
Figure 30-11. Layout of MSR_PERF_GLOBAL_STATUS MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-20
Figure 30-12. Layout of MSR_PERF_GLOBAL_OVF_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-21
Figure 30-13. Layout of IA32_PEBS_ENABLE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-28
Figure 30-14. PEBS Programming Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-30
Figure 30-15. Layout of MSR_PEBS_LD_LAT MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-34
Figure 30-16. Layout of MSR_OFFCORE_RSP_0 and MSR_OFFCORE_RSP_1 to Configure Off-core
Response Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Figure 30-17. Layout of MSR_UNCORE_PERF_GLOBAL_CTRL MSR. . . . . . . . . . . . . . . . . . . . . . . . . 30-38
Figure 30-18. Layout of MSR_UNCORE_PERF_GLOBAL_STATUS MSR. . . . . . . . . . . . . . . . . . . . . . 30-39
Figure 30-19. Layout of MSR_UNCORE_PERF_GLOBAL_OVF_CTRL MSR . . . . . . . . . . . . . . . . . . . 30-39
Figure 30-20. Layout of MSR_UNCORE_PERFEVTSELx MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-40
Figure 30-21. Layout of MSR_UNCORE_FIXED_CTR_CTRL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-41
Figure 30-22. Layout of MSR_UNCORE_ADDR_OPCODE_MATCH MSR . . . . . . . . . . . . . . . . . . . . . . 30-42
Figure 30-23. Distributed Units of the Uncore of Intel Xeon Processor 7500 Series. . . . . . . . . 30-44
Figure 30-24. Event Selection Control Register (ESCR) for Pentium 4
and Intel Xeon Processors without Intel HT Technology Support . . . . . . . . . . . . . 30-51
Figure 30-25. Performance Counter (Pentium 4 and Intel Xeon Processors) . . . . . . . . . . . . . . . . 30-53
Figure 30-26. Counter Configuration Control Register (CCCR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-54
Figure 30-27. Effects of Edge Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-60
Figure 30-28. Event Selection Control Register (ESCR) for the Pentium 4 Processor, Intel Xeon
Processor and Intel Xeon Processor MP Supporting Hyper-Threading Technology30-
71
Figure 30-29. Counter Configuration Control Register (CCCR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-73
Figure 30-30. Layout of IA32_PERF_CAPABILITIES MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-81
Figure 30-31. Block Diagram of 64-bit Intel Xeon Processor MP with 8-MByte L3. . . . . . . . . . . 30-82
Figure 30-32. MSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH. . . . . . . . . . . . . . . . . . . . . . . . 30-83
Figure 30-33. MSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH. . . . . . . . . . . . . . . . . . . . . . . . 30-84
Figure 30-34. MSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H. . . . . . . . . . . . . . . . . . . . . . . 30-85
Figure 30-35. MSR_IFSB_CTL6, Address: 107D2H;
MSR_IFSB_CNTR7, Address: 107D3H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-86
Figure 30-36. Block Diagram of Intel Xeon Processor 7400 Series . . . . . . . . . . . . . . . . . . . . . . . . . 30-87
Figure 30-37. Block Diagram of Intel Xeon Processor 7100 Series . . . . . . . . . . . . . . . . . . . . . . . . . 30-88
Figure 30-38. MSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH . . . . . . . . . . . . . . . . . 30-90
Figure 30-39. MSR_EMON_L3_CTR_CTL2/3, Addresses: 107CEH/107CFH. . . . . . . . . . . . . . . . . . 30-93
Figure 30-40. MSR_EMON_L3_CTR_CTL4/5/6/7, Addresses: 107D0H-107D3H. . . . . . . . . . . . . 30-94
Figure 30-41. PerfEvtSel0 and PerfEvtSel1 MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-97
Figure 30-42. CESR MSR (Pentium Processor Only). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-101
Figure C-1. MP System With Multiple Pentium III Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
CONTENTS
xxxviii Vol. 3A
PAGE
TABLES
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS2-21
Table 2-2. Summary of System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
Table 3-1. Code- and Data-Segment Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Table 3-2. System-Segment and Gate-Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Table 4-1. Properties of Different Paging Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Table 4-2. Paging Structures in the Different Paging Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Table 4-3. Use of CR3 with 32-Bit Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page. . . . . . . . . . . 4-12
Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table. . . . . . . . 4-13
Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page. . . . . . . . . . . . . . . 4-14
Table 4-7. Use of CR3 with PAE Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
Table 4-8. Format of a PAE Page-Directory-Pointer-Table Entry (PDPTE) . . . . . . . . . . . . . . . . . 4-17
Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page . . . . . . . . . . . . . 4-20
Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table . . . . . . . . . . 4-21
Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page . . . . . . . . . . . . . . . . . 4-22
Table 4-12. Use of CR3 with IA-32e Paging and CR3.PCIDE = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
Table 4-13. Use of CR3 with IA-32e Paging and CR3.PCIDE = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
Table 4-14. Format of an IA-32e PML4 Entry (PML4E) that References a Page-Directory-Pointer
Table4-29
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-
GByte Page4-29
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page . . . . . . . . . 4-31
Table 4-16. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that References a
Page Directory4-31
Table 4-18. Format of an IA-32e Page-Directory Entry that References a Page Table . . . . . . 4-33
Table 4-19. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page . . . . . . . . . . . . . 4-34
Table 5-1. Privilege Check Rules for Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
Table 5-2. 64-Bit-Mode Stack Layout After CALLF with CPL Change. . . . . . . . . . . . . . . . . . . . . . 5-28
Table 5-3. Combined Page-Directory and Page-Table Protection . . . . . . . . . . . . . . . . . . . . . . . . . 5-42
Table 5-4. Extended Feature Enable MSR (IA32_EFER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
Table 5-5. IA-32e Mode Page Level Protection Matrix
with Execute-Disable Bit Capability5-44
Table 5-6. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix
with Execute-Disable Bit Capability5-45
Table 5-7. Legacy PAE-Enabled 2-MByte Page Level Protection
with Execute-Disable Bit Capability5-45
Table 5-8. IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit Capability
Enabled5-46
Table 5-9. Reserved Bit Checking WIth Execute-Disable Bit Capability Not Enabled. . . . . . . . 5-47
Table 6-1. Protected-Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . 6-11
Table 6-3. Debug Exception Conditions and Corresponding Exception Classes. . . . . . . . . . . . . 6-29
Table 6-4. Interrupt and Exception Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-38
Table 6-5. Conditions for Generating a Double Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-39
Table 6-6. Invalid TSS Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-42
Table 6-7. Alignment Requirements by Data Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-60
Table 6-8. SIMD Floating-Point Exceptions Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-65
Table 7-1. Exception Conditions Checked During a Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Table 7-2. Effect of a Task Switch on Busy Flag, NT Flag,
Vol. 3A xxxix
CONTENTS
PAGE
Previous Task Link Field, and TS Flag7-17
Table 8-1. Initial APIC IDs for the Logical Processors in a System that has Four Intel Xeon MP
Processors Supporting Intel Hyper-Threading Technology
18-53
Table 8-2. Initial APIC IDs for the Logical Processors in a System that has Two Physical
Processors Supporting Dual-Core and Intel Hyper-Threading Technology8-54
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two Physical
Processors Supporting x2APIC and Intel Hyper-Threading Technology8-54
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT . . . . . . . . . . . . . . . . . . . . . 9-2
Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors . . . . . . . . . . . . . . . 9-7
Table 9-3. Software Emulation Settings of EM, MP, and NE Flags . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing . . . . . . . . . . . . . . . . . . . . . .9-21
Table 9-5. Relationship Between BLD Item and ASM Source File. . . . . . . . . . . . . . . . . . . . . . . . . .9-35
Table 9-6. Microcode Update Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-38
Table 9-7. Microcode Update Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-40
Table 9-8. Extended Processor Signature Table Header Structure . . . . . . . . . . . . . . . . . . . . . . . .9-41
Table 9-9. Processor Signature Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-41
Table 9-10. Processor Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-43
Table 9-11. Microcode Update Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-48
Table 9-12. Microcode Update Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-55
Table 9-13. Parameters for the Presence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-56
Table 9-14. Parameters for the Write Update Data Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-57
Table 9-15. Parameters for the Control Update Sub-function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-62
Table 9-17. Parameters for the Read Microcode Update Data Function . . . . . . . . . . . . . . . . . . . .9-63
Table 9-16. Mnemonic Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-63
Table 9-18. Return Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-65
Table 10-1 Local APIC Register Address Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-8
Table 10-2. ESR Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Table 10-3 Valid Combinations for the Pentium 4 and Intel Xeon Processors
Local xAPIC Interrupt Command Register10-28
Table 10-4 Valid Combinations for the P6 Family Processors
Local APIC Interrupt Command Register10-29
Table 10-5. x2APIC Operating Mode Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51
Table 10-6. Local APIC Register Address Map Supported by x2APIC. . . . . . . . . . . . . . . . . . . . . . 10-52
Table 10-7. MSR/MMIO Interface of a Local x2APIC in Different Modes of Operation . . . . . . 10-56
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors11-2
Table 11-2. Memory Types and Their Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-9
Table 11-3. Methods of Caching Available in Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium
M, Pentium 4, Intel Xeon, P6 Family, and Pentium Processors11-10
Table 11-4. MESI Cache Line States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
Table 11-5. Cache Operating Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17
Table 11-6. Effective Page-Level Memory Type for Pentium Pro and
Pentium II Processors11-21
Table 11-7. Effective Page-Level Memory Types for Pentium III and More Recent Processor
Families11-22
Table 11-8. Memory Types That Can Be Encoded in MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Table 11-9. Address Mapping for Fixed-Range MTRRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-35
Table 11-10. Memory Types That Can Be Encoded With PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-49
Table 11-11. Selection of PAT Entries with PAT, PCD, and PWT Flags . . . . . . . . . . . . . . . . . . . . . 11-50
Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset. . . . . . . . . 11-50
Table 12-1. Action Taken By MMX Instructions
for Different Combinations of EM, MP and TS12-1
CONTENTS
xl Vol. 3A
PAGE
Table 12-2. Effects of MMX Instructions on x87 FPU State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Table 12-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the
x87 FPU Tag Word12-4
Table 13-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2, SSE3, EM, MP,
and TS113-4
Table 13-2. Action Taken for Combinations of OSFXSR, SSSE3, SSE4, EM, and TS . . . . . . . . . . 13-5
Table 13-3. XSAVE Header Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Table 13-4. XRSTOR Action on MXCSR, x87 FPU, XMM Register. . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Table 13-5. XSAVE Action on MXCSR, x87 FPU, XMM Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Table 14-1. On-Demand Clock Modulation Duty Cycle Field Encoding. . . . . . . . . . . . . . . . . . . . . . 14-17
Table 15-1. Bits 54:53 in IA32_MCi_STATUS MSRs
when IA32_MCG_CAP[11] = 1 and UC = 015-9
Table 15-2. Overwrite Rules for Enabled Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
Table 15-3. Address Mode in IA32_MCi_MISC[8:6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
Table 15-4. Extended Machine Check State MSRs
in Processors Without Support for Intel 64 Architecture15-15
Table 15-5. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture15-16
Table 15-6. MC Error Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-26
Table 15-7. Overwrite Rules for UC, CE, and UCR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-27
Table 15-8. IA32_MCi_Status [15:0] Simple Error Code Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 15-30
Table 15-9. IA32_MCi_Status [15:0] Compound Error Code Encoding . . . . . . . . . . . . . . . . . . . . . 15-31
Table 15-10. Encoding for TT (Transaction Type) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
Table 15-11. Level Encoding for LL (Memory Hierarchy Level) Sub-Field . . . . . . . . . . . . . . . . . . . 15-32
Table 15-12. Encoding of Request (RRRR) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-32
Table 15-13. Encodings of PP, T, and II Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-33
Table 15-14. Encodings of MMM and CCCC Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-34
Table 15-15. MCA Compound Error Code Encoding for SRAO Errors . . . . . . . . . . . . . . . . . . . . . . . . 15-35
Table 15-16. IA32_MCi_STATUS Values for SRAO Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-35
Table 15-17. IA32_MCG_STATUS Flag Indication for SRAO Errors . . . . . . . . . . . . . . . . . . . . . . . . . 15-36
Table 15-18. MCA Compound Error Code Encoding for SRAR Errors . . . . . . . . . . . . . . . . . . . . . . . . 15-36
Table 15-19. IA32_MCi_STATUS Values for SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-37
Table 15-20. IA32_MCG_STATUS Flag Indication for SRAR Errors. . . . . . . . . . . . . . . . . . . . . . . . . . 15-37
Table 16-1. Breakpoint Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7
Table 16-2. Debug Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
Table 16-3. LBR Stack Size and TOS Pointer Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19
Table 16-4. IA32_DEBUGCTL Flag Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-29
Table 16-5. CPL-Qualified Branch Trace Store Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-30
Table 16-6. IA32_LASTBRACH_x_FROM_IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-7. IA32_LASTBRACH_x_TO_IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-8. LBR Stack Size and TOS Pointer Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-9. MSR_LBR_SELECT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-35
Table 16-10. LBR MSR Stack Size and TOS Pointer Range for the Pentium 4 and the
Intel Xeon Processor Family16-39
Table 17-1. Real-Address Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
Table 17-2. Software Interrupt Handling Methods While in Virtual-8086 Mode. . . . . . . . . . . . 17-26
Table 18-1. Characteristics of 16-Bit and 32-Bit Program Modules . . . . . . . . . . . . . . . . . . . . . . . . 18-1
Table 19-1. New Instruction in the Pentium Processor and
Later IA-32 Processors19-6
Table 19-2. Recommended Values of the EM, MP, and NE Flags for Intel486 SX
Microprocessor/Intel 487 SX Math Coprocessor System19-22
Table 19-3. EM and MP Flag Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-23
Vol. 3A xli
CONTENTS
PAGE
Table 21-1. Format of the VMCS Region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-3
Table 21-2. Format of Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-6
Table 21-3. Format of Interruptibility State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-8
Table 21-4. Format of Pending-Debug-Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-9
Table 21-5. Definitions of Pin-Based VM-Execution Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-12
Table 21-6. Definitions of Primary Processor-Based VM-Execution Controls . . . . . . . . . . . . . . 21-13
Table 21-7. Definitions of Secondary Processor-Based VM-Execution Controls . . . . . . . . . . . 21-15
Table 21-8. Format of Extended-Page-Table Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-20
Table 21-9. Definitions of VM-Exit Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-21
Table 21-10. Format of an MSR Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-23
Table 21-11. Definitions of VM-Entry Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-24
Table 21-12. Format of the VM-Entry Interruption-Information Field . . . . . . . . . . . . . . . . . . . . . . 21-25
Table 21-13. Format of Exit Reason. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-27
Table 21-14. Format of the VM-Exit Interruption-Information Field. . . . . . . . . . . . . . . . . . . . . . . . 21-28
Table 21-15. Format of the IDT-Vectoring Information Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-29
Table 21-16. Structure of VMCS Component Encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-32
Table 24-1. Exit Qualification for Debug Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-6
Table 24-2. Exit Qualification for Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-6
Table 24-3. Exit Qualification for Control-Register Accesses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-8
Table 24-4. Exit Qualification for MOV DR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-9
Table 24-5. Exit Qualification for I/O Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-9
Table 24-6. Exit Qualification for APIC-Access VM Exits from Linear Accesses and Guest-Physical
Accesses24-10
Table 24-7. Exit Qualification for EPT Violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-11
Table 24-8. Format of the VM-Exit Instruction-Information Field as Used for INS and OUTS . . 24-
18
Table 24-9. Format of the VM-Exit Instruction-Information Field as Used for LIDT, LGDT, SIDT, or
SGDT24-19
Table 24-10. Format of the VM-Exit Instruction-Information Field as Used for LLDT, LTR, SLDT, and
STR24-21
Table 24-11. Format of the VM-Exit Instruction-Information Field as Used for VMCLEAR, VMPTRLD,
VMPTRST, and VMXON24-22
Table 24-12. Format of the VM-Exit Instruction-Information Field as Used for VMREAD and
VMWRITE24-23
Table 24-13. Format of the VM-Exit Instruction-Information Field as Used for INVEPT and INVVPID
24-25
Table 25-1. Format of an EPT PML4 Entry (PML4E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25-5
Table 25-2. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte
Page25-6
Table 25-3. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that References an
EPT Page Directory25-7
Table 25-4. Format of an EPT Page-Directory Entry (PDE) that Maps a 2-MByte Page. . . . . . .25-8
Table 25-5. Format of an EPT Page-Directory Entry (PDE) that References an EPT Page Table . .
25-9
Table 25-6. Format of an EPT Page-Table Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-10
Table 26-1. SMRAM State Save Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26-6
Table 26-2. Processor Signatures and 64-bit SMRAM State Save Map Format. . . . . . . . . . . . . .26-9
Table 26-3. SMRAM State Save Map for Intel 64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26-9
Table 26-4. Processor Register Initialization in SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-13
Table 26-5. I/O Instruction Information in the SMM State Save Map . . . . . . . . . . . . . . . . . . . . . . 26-16
Table 26-6. I/O Instruction Type Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-17
Table 26-7. Auto HALT Restart Flag Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
CONTENTS
xlii Vol. 3A
PAGE
Table 26-8. I/O Instruction Restart Field Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-21
Table 26-9. Exit Qualification for SMIs That Arrive Immediately
After the Retirement of an I/O Instruction26-28
Table 26-10. Format of MSEG Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-35
Table 27-1. Operating Modes for Host and Guest Environments. . . . . . . . . . . . . . . . . . . . . . . . . . 27-18
Table 30-1. UMask and Event Select Encodings for Pre-Defined
Architectural Performance Events30-13
Table 30-2. Core Specificity Encoding within a Non-Architectural Umask. . . . . . . . . . . . . . . . . . 30-15
Table 30-3. Agent Specificity Encoding within a Non-Architectural Umask . . . . . . . . . . . . . . . . 30-15
Table 30-4. HW Prefetch Qualification Encoding within a Non-Architectural Umask. . . . . . . . 30-16
Table 30-5. MESI Qualification Definitions within a Non-Architectural Umask. . . . . . . . . . . . . . 30-16
Table 30-6. Bus Snoop Qualification Definitions within a Non-Architectural Umask . . . . . . . . 30-17
Table 30-7. Snoop Type Qualification Definitions within a Non-Architectural Umask. . . . . . . 30-17
Table 30-8. Association of Fixed-Function Performance Counters with
Architectural Performance Events30-18
Table 30-10. PEBS Performance Events for Intel Core Microarchitecture. . . . . . . . . . . . . . . . . . . 30-22
Table 30-9. At-Retirement Performance Events for Intel Core Microarchitecture. . . . . . . . . . 30-22
Table 30-11. Requirements to Program PEBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-24
Table 30-12. PEBS Record Format for Intel Core i7 Processor Family . . . . . . . . . . . . . . . . . . . . . . 30-29
Table 30-13. Data Source Encoding for Load Latency Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-33
Table 30-14. Off-Core Response Event Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Table 30-15. MSR_OFFCORE_RSP_Z Bit Field Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-35
Table 30-16. Opcode Field Encoding for MSR_UNCORE_ADDR_OPCODE_MATCH. . . . . . . . . . . . 30-42
Table 30-17. Uncore PMU MSR Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-44
Table 30-18. Performance Counter MSRs and Associated CCCR and
ESCR MSRs (Pentium 4 and Intel Xeon Processors)30-47
Table 30-19. Event Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-56
Table 30-20. CCR Names and Bit Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-62
Table 30-21. Effect of Logical Processor and CPL Qualification
for Logical-Processor-Specific (TS) Events30-75
Table 30-22. Effect of Logical Processor and CPL Qualification
for Non-logical-Processor-specific (TI) Events30-76
Table A-1. Architectural Performance Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7
Processor and Intel Xeon Processor 5500 SeriesA-3
Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7
Processor and Intel Xeon Processor 5500 SeriesA-32
Table A-4. Non-Architectural Performance Events In Next Generation Processor Core
(Codenamed Westmere)A-54
Table A-5. Non-Architectural Performance Events In the Processor Uncore for Next Generation
Intel Processor (Codenamed Westmere)A-82
Table A-6. Non-Architectural Performance Events for Processors based on Enhanced Intel Core
MicroarchitectureA-108
Table A-7. Fixed-Function Performance Counter
and Pre-defined Performance EventsA-109
Table A-8. Non-Architectural Performance Events
in Processors Based on Intel Core MicroarchitectureA-110
Table A-9. Non-Architectural Performance Events for Intel Atom Processors . . . . . . . . . . . . A-153
Table A-10. Non-Architectural Performance Events
in Intel Core Solo and Intel Core Duo ProcessorsA-176
Table A-11. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for
Non-Retirement CountingA-185
Vol. 3A xliii
CONTENTS
PAGE
Table A-12. Performance Monitoring Events For Intel NetBurst
Microarchitecture for At-Retirement CountingA-217
Table A-13. Intel NetBurst Microarchitecture Model-Specific Performance Monitoring Events (For
Model Encoding 3, 4 or 6)A-224
Table A-15. List of Metrics Available for Execution Tagging
(For Execution Event Only)A-225
Table A-14. List of Metrics Available for Front_end Tagging
(For Front_end Event Only)A-225
Table A-16. List of Metrics Available for Replay Tagging
(For Replay Event Only)A-226
Table A-17. Event Mask Qualification for Logical Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-228
Table A-18. Performance Monitoring Events on Intel

Pentium

M
ProcessorsA-234
Table A-19. Performance Monitoring Events Modified on Intel

Pentium

M Processors . . A-236
Table A-20. Events That Can Be Counted with the P6 Family Performance-
Monitoring CountersA-238
Table A-21. Events That Can Be Counted with Pentium Processor
Performance-Monitoring CountersA-255
Table B-1. CPUID Signature Values of DisplayFamily_DisplayModel . . . . . . . . . . . . . . . . . . . . . . . . B-1
Table B-2. IA-32 Architectural MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Table B-3. MSRs in Processors Based on Intel Core Microarchitecture . . . . . . . . . . . . . . . . . . . . .B-41
Table B-4. MSRs in Intel Atom Processor Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-61
Table B-5. MSRs in Processors Based on Intel Microarchitecture codename Nehalem . . . . . .B-76
Table B-6. Additional MSRs in Intel Xeon Processor 5500 and 3400 Series. . . . . . . . . . . . . . . .B-95
Table B-7. Additional MSRs in Intel Xeon Processor 7500 Series. . . . . . . . . . . . . . . . . . . . . . . . . .B-98
Table B-8. Additional MSRs supported by Next Generation Intel Processors (Codenamed
Westmere)B-120
Table B-9. MSRs in the Pentium 4 and Intel Xeon Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . B-121
Table B-10. MSRs Unique to 64-bit Intel Xeon Processor MP with
Up to an 8 MB L3 CacheB-161
Table B-11. MSRs Unique to Intel Xeon Processor 7100 Series . . . . . . . . . . . . . . . . . . . . . . . . . . B-163
Table B-12. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon
Processor LVB-164
Table B-13. MSRs in Pentium M Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-178
Table B-14. MSRs in the P6 Family Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-187
Table B-15. MSRs in the Pentium Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-199
Table C-1. Boot Phase IPI Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Table E-1. CPUID DisplayFamily_DisplayModel Signatures for Family 6 . . . . . . . . . . . . . . . . . . . . E-1
Table E-2. Incremental Decoding Information: Processor Family 06H
Machine Error Codes For Machine CheckE-2
Table E-3. CPUID DisplayFamily_DisplayModel Signatures for Processors Based on Intel Core
MicroarchitectureE-5
Table E-4. Incremental Bus Error Codes of Machine Check for Processors Based on Intel Core
MicroarchitectureE-6
Table E-5. Incremental MCA Error Code Types for Intel Xeon Processor 7400. . . . . . . . . . . . . . E-9
Table E-6. Type B Bus and Interconnect Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
Table E-7. Type C Cache Bus Controller Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-10
Table E-8. QPI Machine Check Error codes for IA32_MC0_STATUS and IA32_MC1_STATUSE-12
Table E-9. QPI Machine Check Error codes for IA32_MC0_MISC and IA32_MC1_MISC. . . . . . . E-13
Table E-10. Machine Check Error codes for IA32_MC7_STATUS. . . . . . . . . . . . . . . . . . . . . . . . . . . . E-13
Table E-11. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_STATUS
E-14
CONTENTS
xliv Vol. 3A
PAGE
Table E-12. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_MISC E-
15
Table E-13. Incremental Decoding Information: Processor Family 0FH
Machine Error Codes For Machine CheckE-15
Table E-14. MCi_STATUS Register Bit Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-17
Table E-15. Incremental MCA Error Code for Intel Xeon Processor MP 7100 . . . . . . . . . . . . . . . E-18
Table E-16. Other Information Field Bit Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-20
Table E-17. Type A: L3 Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21
Table E-18. Type B Bus and Interconnect Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-22
Table E-19. Type C Cache Bus Controller Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-23
Table E-20. Decoding Family 0FH Machine Check Codes for Cache Hierarchy Errors . . . . . . . . E-24
Table F-1. EOI Message (14 Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-1
Table F-2. Short Message (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-2
Table F-3. Non-Focused Lowest Priority Message (34 Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-3
Table F-4. APIC Bus Status Cycles Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-5
Table G-1. Memory Types Used For VMCS Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
Table H-1. Encoding for 32-Bit Control Fields (0000_00xx_xxxx_xxx0B) . . . . . . . . . . . . . . . . . H-1
Table H-2. Encodings for 16-Bit Guest-State Fields (0000_10xx_xxxx_xxx0B) . . . . . . . . . . . . H-1
Table H-3. Encodings for 16-Bit Host-State Fields (0000_11xx_xxxx_xxx0B) . . . . . . . . . . . . . H-2
Table H-4. Encodings for 64-Bit Control Fields (0010_00xx_xxxx_xxxAb). . . . . . . . . . . . . . . . . H-3
Table H-5. Encodings for 64-Bit Read-Only Data Field (0010_01xx_xxxx_xxxAb) . . . . . . . . . H-4
Table H-6. Encodings for 64-Bit Guest-State Fields (0010_10xx_xxxx_xxxAb) . . . . . . . . . . . . H-4
Table H-7. Encodings for 64-Bit Host-State Fields (0010_11xx_xxxx_xxxAb) . . . . . . . . . . . . . H-5
Table H-8. Encodings for 32-Bit Control Fields (0100_00xx_xxxx_xxx0B) . . . . . . . . . . . . . . . . H-6
Table H-9. Encodings for 32-Bit Read-Only Data Fields (0100_01xx_xxxx_xxx0B) . . . . . . . . H-7
Table H-10. Encodings for 32-Bit Guest-State Fields
(0100_10xx_xxxx_xxx0B)H-7
Table H-11. Encoding for 32-Bit Host-State Field (0100_11xx_xxxx_xxx0B) . . . . . . . . . . . . . . . H-9
Table H-12. Encodings for Natural-Width Control Fields (0110_00xx_xxxx_xxx0B) . . . . . . . . . H-9
Table H-13. Encodings for Natural-Width Read-Only Data Fields (0110_01xx_xxxx_xxx0B) H-10
Table H-14. Encodings for Natural-Width Guest-State Fields (0110_10xx_xxxx_xxx0B) . . . H-10
Table H-15. Encodings for Natural-Width Host-State Fields (0110_11xx_xxxx_xxx0B) . . . . H-11
Table I-1. Basic Exit Reasons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-1
Vol. 3 1-1
CHAPTER 1
ABOUT THIS MANUAL
The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3A:
Syst em Programming Guide, Part 1 ( order number 253668) and t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B: Syst em Programming
Guide, Part 2 ( order number 253669) are part of a set t hat describes t he archit ect ure
and programming environment of I nt el 64 and I A- 32 Archit ect ure processors. The
ot her volumes in t his set are:
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1: Basic
Archit ect ure ( order number 253665) .
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volumes
2A & 2B: I nst ruct ion Set Reference ( order numbers 253666 and 253667) .
The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1,
describes t he basic archit ect ure and programming environment of I nt el 64 and I A- 32
processors. The I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volumes 2A & 2B, describe t he inst ruct ion set of t he processor and t he opcode st ruc-
t ure. These volumes apply t o applicat ion programmers and t o programmers who
writ e operat ing syst ems or execut ives. The I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volumes 3A & 3B, describe t he operat ing- syst em support
environment of I nt el 64 and I A- 32 processors. These volumes t arget operat ing-
syst em and BI OS designers. I n addit ion, I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 3B, addresses t he programming environment for
classes of soft ware t hat host operat ing syst ems.
1.1 PROCESSORS COVERED IN THIS MANUAL
This manual set includes informat ion pert aining primarily t o t he most recent I nt el


64 and I A- 32 processors, which include:
Pent ium

processors
P6 family processors
Pent ium

4 processors
Pent ium

M processors
I nt el

Xeon

processors
Pent ium

D processors
Pent ium

processor Ext reme Edit ions


64- bit I nt el

Xeon

processors
I nt el

Core Duo processor


I nt el

Core Solo processor


1-2 Vol. 3
ABOUT THIS MANUAL
Dual- Core I nt el

Xeon

processor LV
I nt el

Core2 Duo processor


I nt el

Core2 Quad processor Q6000 series


I nt el

Xeon

processor 3000, 3200 series


I nt el

Xeon

processor 5000 series


I nt el

Xeon

processor 5100, 5300 series


I nt el

Core2 Ext reme processor X7000 and X6800 series


I nt el

Core2 Ext reme QX6000 series


I nt el

Xeon

processor 7100 series


I nt el

Pent ium

Dual- Core processor


I nt el

Xeon

processor 7200, 7300 series


I nt el

Core2 Ext reme QX9000 series


I nt el

Xeon

processor 5200, 5400, 7400 series


I nt el

Core
TM
2 Ext reme processor QX9000 and X9000 series
I nt el

Core
TM
2 Quad processor Q9000 series
I nt el

Core
TM
2 Duo processor E8000, T9000 series
I nt el

At om
TM
processor family
I nt el

Core
TM
i7 processor
I nt el

Core
TM
i5 processor
P6 family processors are I A- 32 processors based on t he P6 family microarchit ect ure.
This includes t he Pent ium

Pro, Pent ium

I I , Pent ium

III, and Pent ium

III Xeon


processors.
The Pent ium

4, Pent ium

D, and Pent ium

processor Ext reme Edit ions are based


on t he I nt el Net Burst

microarchit ect ure. Most early I nt el

Xeon

processors are
based on t he I nt el Net Burst

microarchit ect ure. I nt el Xeon processor 5000, 7100
series are based on t he I nt el Net Burst

microarchit ect ure.
The I nt el

Core Duo, I nt el

Core Solo and dual- core I nt el



Xeon

processor LV
are based on an improved Pent ium

M processor microarchit ect ure.


The I nt el

Xeon

processor 3000, 3200, 5100, 5300, 7200, and 7300 series, I nt el


Pent ium

dual- core, I nt el

Core2 Duo, I nt el

Core2 Quad and I nt el

Core2
Ext reme processors are based on I nt el

Core microarchit ect ure.
The I nt el

Xeon

processor 5200, 5400, 7400 series, I nt el

Core
TM
2 Quad processor
Q9000 series, and I nt el

Core
TM
2 Ext reme processors QX9000, X9000 series, I nt el


Core
TM
2 processor E8000 series are based on Enhanced I nt el

Core
TM
microarchit ec-
t ure.
The I nt el

At om
TM
processor family is based on t he I nt el

At om
TM
microarchit ect ure
and support s I nt el 64 archit ect ure.
Vol. 3 1-3
ABOUT THIS MANUAL
The I nt el

Core
TM
i7 processor and t he I nt el

Core
TM
i5 processor are based on t he
I nt el

microarchit ect ure codename Nehalem and support I nt el 64 archit ect ure.
Processors based on t he Next Generat ion I nt el Processor, codenamed West mere,
support I nt el 64 archit ect ure.
P6 family, Pent ium

M, I nt el

Core Solo, I nt el

Core Duo processors, dual- core


I nt el

Xeon

processor LV, and early generat ions of Pent ium 4 and I nt el Xeon
processors support I A- 32 archit ect ure. The I nt el

At om processor Z5xx series


support I A- 32 archit ect ure.
The I nt el

Xeon

processor 3000, 3200, 5000, 5100, 5200, 5300, 5400, 7100,


7200, 7300, 7400 series, I nt el

Core2 Duo, I nt el

Core2 Ext reme processors,


I nt el Core 2 Quad processors, Pent ium

D processors, Pent ium

Dual- Core
processor, newer generat ions of Pent ium 4 and I nt el Xeon processor family support
I nt el

64 archit ect ure.


I A- 32 archit ect ure is t he inst ruct ion set archit ect ure and programming environment
for I nt el' s 32- bit microprocessors. I nt el

64 archit ect ure is t he inst ruct ion set archi-


t ect ure and programming environment which is a superset of and compat ible wit h
I A- 32 archit ect ure.
1.2 OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE
A descript ion of t his manuals cont ent follows:
Chapt er 1 About Thi s Manual . Gives an overview of all five volumes of t he
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual. I t also describes
t he not at ional convent ions in t hese manuals and list s relat ed I nt el manuals and
document at ion of int erest t o programmers and hardware designers.
Chapt er 2 Syst em Ar chi t ect ur e Ov er vi ew . Describes t he modes of operat ion
used by I nt el 64 and I A- 32 processors and t he mechanisms provided by t he archit ec-
t ures t o support operat ing syst ems and execut ives, including t he syst em- orient ed
regist ers and dat a st ruct ures and t he syst em- orient ed inst ruct ions. The st eps neces-
sary for swit ching bet ween real- address and prot ect ed modes are also ident ified.
Chapt er 3 Pr ot ect ed- Mode Memor y Management . Describes t he dat a st ruc-
t ures, regist ers, and inst ruct ions t hat support segment at ion and paging. The chapt er
explains how t hey can be used t o implement a flat ( unsegment ed) memory model
or a segment ed memory model.
Chapt er 4 Pagi ng. Describes t he paging modes support ed by I nt el 64 and I A- 32
processors.
Chapt er 5 Pr ot ect i on. Describes t he support for page and segment prot ect ion
provided in t he I nt el 64 and I A- 32 archit ect ures. This chapt er also explains t he
implement at ion of privilege rules, st ack swit ching, point er validat ion, user and
supervisor modes.
1-4 Vol. 3
ABOUT THIS MANUAL
Chapt er 6 I nt er r upt and Ex cept i on Handl i ng. Describes t he basic int errupt
mechanisms defined in t he I nt el 64 and I A- 32 archit ect ures, shows how int errupt s
and except ions relat e t o prot ect ion, and describes how t he archit ect ure handles each
except ion t ype. Reference informat ion for each except ion is given at t he end of t his
chapt er.
Chapt er 7 Task Management . Describes mechanisms t he I nt el 64 and I A- 32
archit ect ures provide t o support mult it asking and int er- t ask prot ect ion.
Chapt er 8 Mul t i pl e- Pr ocessor Management . Describes t he inst ruct ions and
flags t hat support mult iple processors wit h shared memory, memory ordering, and
I nt el

Hyper-Threading Technology.
Chapt er 9 Pr ocessor Management and I ni t i al i zat i on. Defines t he st at e of an
I nt el 64 or I A- 32 processor aft er reset init ializat ion. This chapt er also explains how t o
set up an I nt el 64 or I A- 32 processor for real- address mode operat ion and prot ect ed-
mode operat ion, and how t o swit ch bet ween modes.
Chapt er 10 Advanced Pr ogr ammabl e I nt er r upt Cont r ol l er ( API C) .
Describes t he programming int erface t o t he local API C and gives an overview of t he
int erface bet ween t he local API C and t he I / O API C.
Chapt er 11 Memor y Cache Cont r ol . Describes t he general concept of caching
and t he caching mechanisms support ed by t he I nt el 64 or I A- 32 archit ect ures. This
chapt er also describes t he memory t ype range regist ers ( MTRRs) and how t hey can
be used t o map memory t ypes of physical memory. I nformat ion on using t he new
cache cont rol and memory st reaming inst ruct ions int roduced wit h t he Pent ium III,
Pent ium 4, and I nt el Xeon processors is also given.
Chapt er 12 I nt el

MMX Technol ogy Sy st em Pr ogr ammi ng. Describes


t hose aspect s of t he I nt el

MMX t echnology t hat must be handled and considered


at t he syst em programming level, including: t ask swit ching, except ion handling, and
compat ibilit y wit h exist ing syst em environment s.
Chapt er 13 Syst em Pr ogr ammi ng For I nst r uct i on Set Ex t ensi ons And
Pr ocessor Ex t ended St at es. Describes t he operat ing syst em requirement s t o
support SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, including t ask swit ching, excep-
t ion handling, and compat ibilit y wit h exist ing syst em environment s. The lat t er part of
t his chapt er describes t he ext ensible framework of operat ing syst em requirement s t o
support processor ext ended st at es. Processor ext ended st at e may be required by
inst ruct ion set ext ensions beyond t hose of SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions.
Chapt er 14 Pow er and Ther mal Management . Describes facilit ies of I nt el 64
and I A- 32 archit ect ure used for power management and t hermal monit oring.
Chapt er 15 Machi ne- Check Ar chi t ect ur e. Describes t he machine- check
archit ect ure and machine- check except ion mechanism found in t he Pent ium
4, I nt el Xeon, and P6 family processors. Addit ionally, a signaling mechanism
for soft ware t o respond t o hardwar e cor rect ed machine check error is
covered.
Vol. 3 1-5
ABOUT THIS MANUAL
Chapt er 16 Debuggi ng, Br anch Pr of i l es and Ti me- St amp Count er .
Describes t he debugging regist ers and ot her debug mechanism provided in I nt el 64
or I A- 32 processors. This chapt er also describes t he t ime- st amp count er.
Chapt er 17 8086 Emul at i on. Describes t he real- address and virt ual- 8086
modes of t he I A- 32 archit ect ure.
Chapt er 18 Mi x i ng 16- Bi t and 32- Bi t Code. Describes how t o mix 16- bit and
32- bit code modules wit hin t he same program or t ask.
Chapt er 19 I A- 32 Ar chi t ect ur e Compat i bi l i t y . Describes archit ect ural
compat ibilit y among I A- 32 processors.
Chapt er 20 I nt r oduct i on t o Vi r t ual - Machi ne Ex t ensi ons. Describes t he basic
element s of virt ual machine archit ect ure and t he virt ual- machine ext ensions for
I nt el 64 and I A- 32 Archit ect ures.
Chapt er 21 Vi r t ual - Machi ne Cont r ol St r uct ur es. Describes component s t hat
manage VMX operat ion. These include t he working-VMCS point er and t he cont rol-
ling-VMCS point er.
Chapt er 22 VMX Non- Root Oper at i on. Describes t he operat ion of a VMX non-
root operat ion. Processor operat ion in VMX non- root mode can be rest rict ed
programmat ically such t hat cert ain operat ions, event s or condit ions can cause t he
processor t o t ransfer cont rol from t he guest ( running in VMX non- root mode) t o t he
monit or soft ware ( running in VMX root mode) .
Chapt er 23 VM Ent r i es. Describes VM ent ries. VM ent ry t ransit ions t he processor
from t he VMM running in VMX root - mode t o a VM running in VMX non- root mode.
VM- Ent ry is performed by t he execut ion of VMLAUNCH or VMRESUME inst ruct ions.
Chapt er 24 VM Ex i t s. Describes VM exit s. Cert ain event s, operat ions or sit ua-
t ions while t he processor is in VMX non- root operat ion may cause VM- exit t ransit ions.
I n addit ion, VM exit s can also occur on failed VM ent ries.
Chapt er 25 VMX Suppor t f or Addr ess Tr ansl at i on. Describes virt ual- machine
ext ensions t hat support address t ranslat ion and t he virt ualizat ion of physical
memory.
Chapt er 26 Sy st em Management Mode. Describes I nt el 64 and I A- 32 archit ec-
t ures syst em management mode ( SMM) facilit ies.
Chapt er 27 Vi r t ual - Machi ne Moni t or i ng Pr ogr ammi ng Consi der at i ons.
Describes programming considerat ions for VMMs. VMMs manage virt ual machines
( VMs) .
Chapt er 28 Vi r t ual i zat i on of Sy st em Resour ces. Describes t he virt ualizat ion
of t he syst em resources. These include: debugging facilit ies, address t ranslat ion,
physical memory, and microcode updat e facilit ies.
Chapt er 29 Handl i ng Boundar y Condi t i ons i n a Vi r t ual Machi ne Moni t or .
Describes what a VMM must consider when handling except ions, int errupt s, error
condit ions, and t ransit ions bet ween act ivit y st at es.
1-6 Vol. 3
ABOUT THIS MANUAL
Chapt er 30 Per f or mance Moni t or i ng. Describes t he I nt el 64 and I A- 32 archi-
t ect ures facilit ies for monit oring performance.
Appendi x A Per f or mance- Moni t or i ng Event s. List s archit ect ural performance
event s. Non- archit ect ural performance event s ( i. e. model- specific event s) are list ed
for each generat ion of microarchit ect ure.
Appendi x B Model - Speci f i c Regi st er s ( MSRs) . List s t he MSRs available in t he
Pent ium processors, t he P6 family processors, t he Pent ium 4, I nt el Xeon, I nt el Core
Solo, I nt el Core Duo processors, and I nt el Core 2 processor family and describes
t heir funct ions.
Appendi x C MP I ni t i al i zat i on For P6 Fami l y Pr ocessor s. Gives an example of
how t o use of t he MP prot ocol t o boot P6 family processors in n MP syst em.
Appendi x D Pr ogr ammi ng t he LI NT0 and LI NT1 I nput s. Gives an example of
how t o program t he LI NT0 and LI NT1 pins for specific int errupt vect ors.
Appendi x E I nt er pr et i ng Machi ne- Check Er r or Codes. Gives an example of
how t o int erpret t he error codes for a machine- check error t hat occurred on a P6
family processor.
Appendi x F API C Bus Message For mat s. Describes t he message format s for
messages t ransmit t ed on t he API C bus for P6 family and Pent ium processors.
Appendi x G VMX Capabi l i t y Repor t i ng Faci l i t y . Describes t he VMX capabilit y
MSRs. Support for specific VMX feat ures is det ermined by reading capabilit y MSRs.
Appendi x H Fi el d Encodi ng i n VMCS. Enumerat es all fields in t he VMCS and
t heir encodings. Fields are grouped by widt h ( 16- bit , 32- bit , et c.) and t ype ( guest -
st at e, host - st at e, et c. ) .
Appendi x I VM Basi c Ex i t Reasons. Describes t he 32- bit fields t hat encode
reasons for a VM exit . Examples of exit reasons include, but are not limit ed t o: soft -
ware int errupt s, processor except ions, soft ware t raps, NMI s, ext ernal int errupt s, and
t riple fault s.
1.3 NOTATIONAL CONVENTIONS
This manual uses specific not at ion for dat a- st ruct ure format s, for symbolic represen-
t at ion of inst ruct ions, and for hexadecimal and binary numbers. A review of t his
not at ion makes t he manual easier t o read.
1.3.1 Bit and Byte Order
I n illust rat ions of dat a st ruct ures in memory, smaller addresses appear t oward t he
bot t om of t he figure; addresses increase t oward t he t op. Bit posit ions are numbered
from right t o left . The numerical value of a set bit is equal t o t wo raised t o t he power
of t he bit posit ion. I nt el 64 and I A- 32 processors are lit t le endian machines; t his
Vol. 3 1-7
ABOUT THIS MANUAL
means t he byt es of a word are numbered st art ing from t he least significant byt e.
Figure 1- 1 illust rat es t hese convent ions.
1.3.2 Reserved Bits and Software Compatibility
I n many regist er and memory layout descript ions, cert ain bit s are marked as
r eser v ed. When bit s are marked as reserved, it is essent ial for compat ibilit y wit h
fut ure processors t hat soft ware t reat t hese bit s as having a fut ure, t hough unknown,
effect . The behavior of reserved bit s should be regarded as not only undefined, but
unpredict able. Soft ware should follow t hese guidelines in dealing wit h reserved bit s:
Do not depend on t he st at es of any reserved bit s when t est ing t he values of
regist ers which cont ain such bit s. Mask out t he reserved bit s before t est ing.
Do not depend on t he st at es of any reserved bit s when st oring t o memory or t o a
regist er.
Do not depend on t he abilit y t o ret ain informat ion writ t en int o any reserved bit s.
When loading a regist er, always load t he reserved bit s wit h t he values indicat ed
in t he document at ion, if any, or reload t hem wit h values previously read from t he
same regist er.
NOTE
Avoid any soft ware dependence upon t he st at e of reserved bit s in
I nt el 64 and I A- 32 regist ers. Depending upon t he values of reserved
regist er bit s will make soft ware dependent upon t he unspecified
manner in which t he processor handles t hese bit s. Programs t hat
depend upon reserved values risk incompat ibilit y wit h fut ure
processors.
Figure 1-1. Bit and Byte Order
Byte 3
Highest
Data Structure
Byte 1 Byte 2 Byte 0
31 24 23 16 15 8 7 0
Address
Lowest
Bit offset
28
24
20
16
12
8
4
0
Address
Byte Offset
1-8 Vol. 3
ABOUT THIS MANUAL
1.3.3 Instruction Operands
When inst ruct ions are represent ed symbolically, a subset of assembly language is
used. I n t his subset , an inst ruct ion has t he following format :
label: mnemonic argument1, argument2, argument3
where:
A l abel is an ident ifier which is followed by a colon.
A mnemoni c is a reserved name for a class of inst ruct ion opcodes which have
t he same funct ion.
The operands ar gument 1, ar gument 2, and ar gument 3 are opt ional. There
may be from zero t o t hree operands, depending on t he opcode. When present ,
t hey t ake t he form of eit her lit erals or ident ifiers for dat a it ems. Operand
ident ifiers are eit her reserved names of regist ers or are assumed t o be assigned
t o dat a it ems declared in anot her part of t he program ( which may not be shown
in t he example) .
When t wo operands are present in an arit hmet ic or logical inst ruct ion, t he right
operand is t he source and t he left operand is t he dest inat ion.
For example:
LOADREG: MOV EAX, SUBTOTAL
I n t his example LOADREG is a label, MOV is t he mnemonic ident ifier of an opcode,
EAX is t he dest inat ion operand, and SUBTOTAL is t he source operand. Some
assembly languages put t he source and dest inat ion in reverse order.
1.3.4 Hexadecimal and Binary Numbers
Base 16 ( hexadecimal) numbers are represent ed by a st ring of hexadecimal digit s
followed by t he charact er H ( for example, F82EH) . A hexadecimal digit is a charact er
from t he following set : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.
Base 2 ( binary) numbers are represent ed by a st ring of 1s and 0s, somet imes
followed by t he charact er B ( for example, 1010B) . The B designat ion is only used in
sit uat ions where confusion as t o t he t ype of number might arise.
1.3.5 Segmented Addressing
The processor uses byt e addressing. This means memory is organized and accessed
as a sequence of byt es. Whet her one or more byt es are being accessed, a byt e
address is used t o locat e t he byt e or byt es memory. The range of memory t hat can
be addressed is called an addr ess space.
The processor also support s segment ed addressing. This is a form of addressing
where a program may have many independent address spaces, called segment s.
Vol. 3 1-9
ABOUT THIS MANUAL
For example, a program can keep it s code ( inst ruct ions) and st ack in separat e
segment s. Code addresses would always refer t o t he code space, and st ack
addresses would always refer t o t he st ack space. The following not at ion is used t o
specify a byt e address wit hin a segment :
Segment - regist er: Byt e- address
For example, t he following segment address ident ifies t he byt e at address FF79H in
t he segment point ed by t he DS regist er:
DS:FF79H
The following segment address ident ifies an inst ruct ion address in t he code segment .
The CS regist er point s t o t he code segment and t he EI P regist er cont ains t he address
of t he inst ruct ion.
CS:EIP
1.3.6 Syntax for CPUID, CR, and MSR Values
Obt ain feat ure flags, st at us, and syst em informat ion by using t he CPUI D inst ruct ion,
by checking cont rol regist er bit s, and by reading model- specific regist ers. We are
moving t oward a single synt ax t o represent t his t ype of informat ion. See Figure 1- 2.
1-10 Vol. 3
ABOUT THIS MANUAL
1.3.7 Exceptions
An except ion is an event t hat t ypically occurs when an inst ruct ion causes an error.
For example, an at t empt t o divide by zero generat es an except ion. However, some
except ions, such as breakpoint s, occur under ot her condit ions. Some t ypes of excep-
t ions may provide error codes. An error code report s addit ional informat ion about t he
error. An example of t he not at ion used t o show an except ion and error code is shown
below:
#PF(fault code)
Figure 1-2. Syntax for CPUID, CR, and MSR Data Presentation
For ControI Register VaIues
For ModeI-Specific Register VaIues
CPUD.01H : ECX.SSE [bit 25] = 1
Value (or range) of output
Syntax Representation for CPUID Input and Output
Output register and feature flag or
field name with bit position(s)
CR4.OSFXSR[bit 9] = 1
Feature flag or field name
with bit position(s)
Value (or range) of output
Example CR name
Feature flag or field name with bit position(s)
A32_MSC_ENABLES.ENABLEFOPCODE[bit 2] = 1
Value (or range) of output
Example MSR name
OM17732
nput value for EAX defines output
(NOTE: Some leaves require input values for
EAX and ECX. f only one value is present,
EAX is implied.)
Vol. 3 1-11
ABOUT THIS MANUAL
This example refers t o a page- fault except ion under condit ions where an error code
naming a t ype of fault is report ed. Under some condit ions, except ions which produce
error codes may not be able t o report an accurat e code. I n t his case, t he error code
is zero, as shown below for a general- prot ect ion except ion:
#GP(0)
1.4 RELATED LITERATURE
Lit erat ure relat ed t o I nt el 64 and I A- 32 processors is list ed on- line at :
ht t p: / / developer. int el. com/ product s/ processor/ index. ht m
Some of t he document s list ed at t his web sit e can be viewed on- line; ot hers can be
ordered. The lit erat ure available is list ed by I nt el processor and t hen by t he following
lit erat ure t ypes: applicat ions not es, dat a sheet s, manuals, papers, and specificat ion
updat es.
See also:
The dat a sheet for a part icular I nt el 64 or I A- 32 processor
The specificat ion updat e for a part icular I nt el 64 or I A- 32 processor
I nt el

C+ + Compiler document at ion and online help


ht t p: / / www. int el. com/ cd/ soft ware/ product s/ asmo- na/ eng/ index. ht m
I nt el

Fort ran Compiler document at ion and online help


ht t p: / / www. int el. com/ cd/ soft ware/ product s/ asmo- na/ eng/ index. ht m
I nt el

VTune Performance Analyzer document at ion and online help


ht t p: / / www. int el. com/ cd/ soft ware/ product s/ asmo- na/ eng/ index. ht m
I nt el

64 and I A- 32 Archit ect ures Soft ware Developer s Manual ( in five volumes)
ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el

64 and I A- 32 Archit ect ures Opt imizat ion Reference Manual


ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el

Processor I dent ificat ion wit h t he CPUI D I nst ruct ion, AP- 485
ht t p: / / www. int el. com/ design/ processor/ applnot s/ 241618. ht m
I nt el

64 Archit ect ure Memory Ordering Whit e Paper,


ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el

64 Archit ect ure x2API C Specificat ion:


ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el

Virt ualizat ion Technology for Direct ed I / O, Rev 1. 2 specificat ion


ht t p: / / download.int el.com/ t echnology/ comput ing/ vpt ech/ I nt el( r) _VT_for_Direct _I
O.pdf
1-12 Vol. 3
ABOUT THIS MANUAL
I nt el

64 Archit ect ure Processor Topology Enumerat ion:


ht t p: / / soft warecommunit y. int el. com/ art icles/ eng/ 3887. ht m
I nt el

Trust ed Execut ion Technology Measured Launched Environment


Programming Guide, ht t p: / / www. int el. com/ t echnology/ securit y/ index. ht m
Developing Mult i- t hreaded Applicat ions: A Plat form Consist ent Approach
ht t p: / / cache-
www. int el. com/ cd/ 00/ 00/ 05/ 15/ 51534_developing_mult it hreaded_applicat ions. pdf
Using Spin- Loops on I nt el Pent ium 4 Processor and I nt el Xeon Processor MP
ht t p: / / www3. int el. com/ cd/ ids/ developer/ asmo-
na/ eng/ dc/ t hreading/ knowledgebase/ 19083. ht m
More relevant links are:
Soft ware net work link:
ht t p: / / soft warecommunit y. int el. com/ isn/ home/
Developer cent ers:
ht t p: / / www. int el. com/ cd/ ids/ developer/ asmo- na/ eng/ dc/ index. ht m
Processor support general link:
ht t p: / / www. int el. com/ support / processors/
Soft ware product s and packages:
ht t p: / / www. int el. com/ cd/ soft ware/ product s/ asmo- na/ eng/ index. ht m
I nt el

64 and I A- 32 processor manuals ( print ed or PDF downloads) :


ht t p: / / developer. int el. com/ product s/ processor/ manuals/ index. ht m
I nt el

mult i- core t echnology:


ht t p: / / developer. int el. com/ mult i- core/ index. ht m
I nt el

Hyper-Threading Technology ( I nt el

HT Technology) :
ht t p: / / developer. int el. com/ t echnology/ hypert hread/
Vol. 3 2-1
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
I A- 32 archit ect ure ( beginning wit h t he I nt el386 processor family) provides ext ensive
support for operat ing- syst em and syst em- development soft ware. This support offers
mult iple modes of operat ion, which include:
Real mode, prot ect ed mode, virt ual 8086 mode, and syst em management mode.
These are somet imes referred t o as legacy modes.
I nt el 64 archit ect ure support s almost all t he syst em programming facilit ies available
in I A- 32 archit ect ure and ext ends t hem t o a new operat ing mode ( I A- 32e mode) t hat
support s a 64- bit programming environment . I A- 32e mode allows soft ware t o
operat e in one of t wo sub- modes:
64- bit mode support s 64- bit OS and 64- bit applicat ions
Compat ibilit y mode allows most legacy soft ware t o run; it co- exist s wit h 64- bit
applicat ions under a 64- bit OS.
The I A- 32 syst em- level archit ect ure and includes feat ures t o assist in t he following
operat ions:
Memory management
Prot ect ion of soft ware modules
Mult it asking
Except ion and int errupt handling
Mult iprocessing
Cache management
Hardware resource and power management
Debugging and performance monit oring
This chapt er provides a descript ion of each part of t his archit ect ure. I t also describes
t he syst em regist ers t hat are used t o set up and cont rol t he processor at t he syst em
level and gives a brief overview of t he processor s syst em- level ( operat ing syst em)
inst ruct ions.
Many feat ures of t he syst em- level archit ect ural are used only by syst em program-
mers. However, applicat ion programmers may need t o read t his chapt er and t he
following chapt ers in order t o creat e a reliable and secure environment for applica-
t ion programs.
This overview and most subsequent chapt ers of t his book focus on prot ect ed- mode
operat ion of t he I A- 32 archit ect ure. I A- 32e mode operat ion of t he I nt el 64 archit ec-
t ure, as it differs from prot ect ed mode operat ion, is also described.
All I nt el 64 and I A- 32 processors ent er real- address mode following a power- up or
reset ( see Chapt er 9, Processor Management and I nit ializat ion ) . Soft ware t hen
2-2 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
init iat es t he swit ch from real- address mode t o prot ect ed mode. I f I A- 32e mode oper-
at ion is desired, soft ware also init iat es a swit ch from prot ect ed mode t o I A- 32e
mode.
2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE
Syst em- level archit ect ure consist s of a set of regist ers, dat a st ruct ures, and inst ruc-
t ions designed t o support basic syst em- level operat ions such as memory manage-
ment , int errupt and except ion handling, t ask management , and cont rol of mult iple
processors.
Figure 2- 1 provides a summary of syst em regist ers and dat a st ruct ures t hat applies
t o 32- bit modes. Syst em regist ers and dat a st ruct ures t hat apply t o I A- 32e mode are
shown in Figure 2- 2.
Vol. 3 2-3
SYSTEM ARCHITECTURE OVERVIEW
Figure 2-1. IA-32 System-Level Registers and Data Structures
Local Descriptor
Table (LDT)
EFLAGS Register
Control Registers
CR1
CR2
CR3
CR4
CR0
Global Descriptor
Table (GDT)
Interrupt Descriptor
Table (IDT)
IDTR
GDTR
Interrupt Gate
Trap Gate
LDT Desc.
TSS Desc.
Code
Stack
Code
Stack
Code
Stack
Task-State
Segment (TSS)
Code
Data
Stack
Task
Interrupt Handler
Exception Handler
Protected Procedure
TSS Seg. Sel.
Call-Gate
Segment Selector
Dir Table Offset
Linear Address
Page Directory
Pg. Dir. Entry
Linear Address Space
Linear Addr.
0
Seg. Desc.
Segment Sel.
Code, Data or
Stack Segment
Interrupt
Vector
TSS Desc.
Seg. Desc.
Task Gate
Current
TSS
Call Gate
Task-State
Segment (TSS)
Code
Data
Stack
Task
Seg. Desc.
Current
TSS
Current
TSS
Segment Selector
Linear Address
Task Register
CR3*
Page Table
Pg. Tbl. Entry
Page
Physical Addr.
LDTR
This page mapping example is for 4-KByte pages
and the normal 32-bit physical address size.
Register
*Physical Address
Physical Address
XCR0 (XFEM)
2-4 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode
Local Descriptor
Table (LDT)
CR1
CR2
CR3
CR4
CR0
Global Descriptor
Table (GDT)
Interrupt Descriptor
Table (IDT)
IDTR
GDTR
Interrupt Gate
Trap Gate
LDT Desc.
TSS Desc.
Code
Stack
Code
Stack
Code
Stack
Current TSS
Code
Stack
Interr. Handler
Interrupt Handler
Exception Handler
Protected Procedure
TR
Call-Gate
Segment Selector
Linear Address
PML4
PML4.
Linear Address Space
Linear Addr.
0
Seg. Desc.
Segment Sel.
Code, Data or Stack
Segment (Base =0)
Interrupt
Vector
Seg. Desc.
Seg. Desc.
NULL
Call Gate
Task-State
Segment (TSS)
Seg. Desc.
NULL
NULL
Segment Selector
Linear Address
Task Register
CR3*
Page
LDTR
This page mapping example is for 4-KByte pages
and 40-bit physical address size.
Register
*Physical Address
Physical Address
CR8
Control Register
RFLAGS
Offset Table Directory
Page Table
Entry
Physical
Addr. Page Tbl
Entry
Page Dir. Pg. Dir. Ptr.
PML4 Dir. Pointer
Pg. Dir.
Entry
Interrupt Gate
IST
XCR0 (XFEM)
Vol. 3 2-5
SYSTEM ARCHITECTURE OVERVIEW
2.1.1 Global and Local Descriptor Tables
When operat ing in prot ect ed mode, all memory accesses pass t hrough eit her t he
global descript or t able ( GDT) or an opt ional local descript or t able ( LDT) as shown in
Figure 2- 1. These t ables cont ain ent ries called segment descript ors. Segment
descript ors provide t he base address of segment s well as access right s, t ype, and
usage informat ion.
Each segment descript or has an associat ed segment select or. A segment select or
provides t he soft ware t hat uses it wit h an index int o t he GDT or LDT ( t he offset of it s
associat ed segment descript or) , a global/ local flag ( det ermines whet her t he select or
point s t o t he GDT or t he LDT) , and access right s informat ion.
To access a byt e in a segment , a segment select or and an offset must be supplied.
The segment select or provides access t o t he segment descript or for t he segment ( in
t he GDT or LDT) . From t he segment descript or, t he processor obt ains t he base
address of t he segment in t he linear address space. The offset t hen provides t he
locat ion of t he byt e relat ive t o t he base address. This mechanism can be used t o
access any valid code, dat a, or st ack segment , provided t he segment is accessible
from t he current privilege level ( CPL) at which t he processor is operat ing. The CPL is
defined as t he prot ect ion level of t he current ly execut ing code segment .
See Figure 2- 1. The solid arrows in t he figure indicat e a linear address, dashed lines
indicat e a segment select or, and t he dot t ed arrows indicat e a physical address. For
simplicit y, many of t he segment select ors are shown as direct point ers t o a segment .
However, t he act ual pat h from a segment select or t o it s associat ed segment is always
t hrough a GDT or LDT.
The linear address of t he base of t he GDT is cont ained in t he GDT regist er ( GDTR) ;
t he linear address of t he LDT is cont ained in t he LDT regist er ( LDTR) .
2.1.1.1 Global and Local Descriptor Tables in IA-32e Mode
GDTR and LDTR regist ers are expanded t o 64- bit s wide in bot h I A- 32e sub- modes
( 64- bit mode and compat ibilit y mode) . For more informat ion: see Sect ion 3. 5. 2,
Segment Descript or Tables in I A- 32e Mode.
Global and local descript or t ables are expanded in 64- bit mode t o support 64- bit base
addresses, ( 16- byt e LDT descript ors hold a 64- bit base address and various
at t ribut es) . I n compat ibilit y mode, descript ors are not expanded.
2.1.2 System Segments, Segment Descriptors, and Gates
Besides code, dat a, and st ack segment s t hat make up t he execut ion environment of
a program or procedure, t he archit ect ure defines t wo syst em segment s: t he t ask-
st at e segment ( TSS) and t he LDT. The GDT is not considered a segment because it is
not accessed by means of a segment select or and segment descript or. TSSs and LDTs
have segment descript ors defined for t hem.
2-6 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The archit ect ure also defines a set of special descript ors called gat es ( call gat es,
int errupt gat es, t rap gat es, and t ask gat es) . These provide prot ect ed gat eways t o
syst em procedures and handlers t hat may operat e at a different privilege level t han
applicat ion programs and most procedures. For example, a CALL t o a call gat e can
provide access t o a procedure in a code segment t hat is at t he same or a numerically
lower privilege level ( more privileged) t han t he current code segment . To access a
procedure t hrough a call gat e, t he calling procedure
1
supplies t he select or for t he call
gat e. The processor t hen performs an access right s check on t he call gat e, comparing
t he CPL wit h t he privilege level of t he call gat e and t he dest inat ion code segment
point ed t o by t he call gat e.
I f access t o t he dest inat ion code segment is allowed, t he processor get s t he segment
select or for t he dest inat ion code segment and an offset int o t hat code segment from
t he call gat e. I f t he call requires a change in privilege level, t he processor also
swit ches t o t he st ack for t he t arget ed privilege level. The segment select or for t he
new st ack is obt ained from t he TSS for t he current ly running t ask. Gat es also facili-
t at e t ransit ions bet ween 16- bit and 32- bit code segment s, and vice versa.
2.1.2.1 Gates in IA-32e Mode
I n I A- 32e mode, t he following descript ors are 16- byt e descript ors ( expanded t o allow
a 64- bit base) : LDT descript ors, 64- bit TSSs, call gat es, int errupt gat es, and t rap
gat es.
Call gat es facilit at e t ransit ions bet ween 64- bit mode and compat ibilit y mode. Task
gat es are not support ed in I A- 32e mode. On privilege level changes, st ack segment
select ors are not read from t he TSS. I nst ead, t hey are set t o NULL.
2.1.3 Task-State Segments and Task Gates
The TSS ( see Figure 2- 1) defines t he st at e of t he execut ion environment for a t ask.
I t includes t he st at e of general- purpose regist ers, segment regist ers, t he EFLAGS
regist er, t he EI P regist er, and segment select ors wit h st ack point ers for t hree st ack
segment s ( one st ack for each privilege level) . The TSS also includes t he segment
select or for t he LDT associat ed wit h t he t ask and t he base address of t he paging-
st ruct ure hierarchy.
All program execut ion in prot ect ed mode happens wit hin t he cont ext of a t ask ( called
t he current t ask) . The segment select or for t he TSS for t he current t ask is st ored in
t he t ask regist er. The simplest met hod for swit ching t o a t ask is t o make a call or
j ump t o t he new t ask. Here, t he segment select or for t he TSS of t he new t ask is given
in t he CALL or JMP inst ruct ion. I n swit ching t asks, t he processor performs t he
following act ions:
1. St ores t he st at e of t he current t ask in t he current TSS.
1. The word procedure is commonly used in this document as a general term for a logical unit or
block of code (such as a program, procedure, function, or routine).
Vol. 3 2-7
SYSTEM ARCHITECTURE OVERVIEW
2. Loads t he t ask regist er wit h t he segment select or for t he new t ask.
3. Accesses t he new TSS t hrough a segment descript or in t he GDT.
4. Loads t he st at e of t he new t ask from t he new TSS int o t he general- purpose
regist ers, t he segment regist ers, t he LDTR, cont rol regist er CR3 ( base address of
t he paging- st ruct ure hierarchy) , t he EFLAGS regist er, and t he EI P regist er.
5. Begins execut ion of t he new t ask.
A t ask can also be accessed t hrough a t ask gat e. A t ask gat e is similar t o a call gat e,
except t hat it provides access ( t hrough a segment select or) t o a TSS rat her t han a
code segment .
2.1.3.1 Task-State Segments in IA-32e Mode
Hardware t ask swit ches are not support ed in I A- 32e mode. However, TSSs cont inue
t o exist . The base address of a TSS is specified by it s descript or.
A 64- bit TSS holds t he following informat ion t hat is import ant t o 64- bit operat ion:
St ack point er addresses for each privilege level
Point er addresses for t he int errupt st ack t able
Offset address of t he I O- permission bit map ( from t he TSS base)
The t ask regist er is expanded t o hold 64- bit base addresses in I A- 32e mode. See
also: Sect ion 7. 7, Task Management in 64- bit Mode.
2.1.4 Interrupt and Exception Handling
Ext ernal int errupt s, soft ware int errupt s and except ions are handled t hrough t he
int errupt descript or t able ( I DT) . The I DT st ores a collect ion of gat e descript ors t hat
provide access t o int errupt and except ion handlers. Like t he GDT, t he I DT is not a
segment . The linear address for t he base of t he I DT is cont ained in t he I DT regist er
( I DTR) .
Gat e descript ors in t he I DT can be int errupt , t rap, or t ask gat e descript ors. To access
an int errupt or except ion handler, t he processor first receives an int errupt vect or
( int errupt number) from int ernal hardware, an ext ernal int errupt cont roller, or from
soft ware by means of an I NT, I NTO, I NT 3, or BOUND inst ruct ion. The int errupt
vect or provides an index int o t he I DT. I f t he select ed gat e descript or is an int errupt
gat e or a t rap gat e, t he associat ed handler procedure is accessed in a manner similar
t o calling a procedure t hrough a call gat e. I f t he descript or is a t ask gat e, t he handler
is accessed t hrough a t ask swit ch.
2.1.4.1 Interrupt and Exception Handling IA-32e Mode
I n I A- 32e mode, int errupt descript ors are expanded t o 16 byt es t o support 64- bit
base addresses. This is t rue for 64- bit mode and compat ibilit y mode.
2-8 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The I DTR regist er is expanded t o hold a 64- bit base address. Task gat es are not
support ed.
2.1.5 Memory Management
Syst em archit ect ure support s eit her direct physical addressing of memory or virt ual
memory ( t hrough paging) . When physical addressing is used, a linear address is
t reat ed as a physical address. When paging is used: all code, dat a, st ack, and syst em
segment s ( including t he GDT and I DT) can be paged wit h only t he most recent ly
accessed pages being held in physical memory.
The locat ion of pages ( somet imes called page frames) in physical memory is
cont ained in t he paging st ruct ures. These st ruct ures reside in physical memory ( see
Figure 2- 1 for t he case of 32- bit paging) .
The base physical address of t he paging- st ruct ure hierarchy is cont ained in cont rol
regist er CR3. The ent ries in t he paging st ruct ures det ermine t he physical address of
t he base of a page frame, access right s and memory management informat ion.
To use t his paging mechanism, a linear address is broken int o part s. The part s
provide separat e offset s int o t he paging st ruct ures and t he page frame. A syst em can
have a single hierarchy of paging st ruct ures or several. For example, each t ask can
have it s own hierarchy.
2.1.5.1 Memory Management in IA-32e Mode
I n I A- 32e mode, physical memory pages are managed by a set of syst em dat a st ruc-
t ures. I n compat ibilit y mode and 64- bit mode, four levels of syst em dat a st ruct ures
are used. These include:
The page map l evel 4 ( PML4) An ent ry in a PML4 t able cont ains t he physical
address of t he base of a page direct ory point er t able, access right s, and memory
management informat ion. The base physical address of t he PML4 is st ored in
CR3.
A set of page di r ect or y poi nt er t abl es An ent ry in a page direct ory point er
t able cont ains t he physical address of t he base of a page direct ory t able, access
right s, and memory management informat ion.
Set s of page di r ect or i es An ent ry in a page direct ory t able cont ains t he
physical address of t he base of a page t able, access right s, and memory
management informat ion.
Set s of page t abl es An ent ry in a page t able cont ains t he physical address of
a page frame, access right s, and memory management informat ion.
Vol. 3 2-9
SYSTEM ARCHITECTURE OVERVIEW
2.1.6 System Registers
To assist in init ializing t he processor and cont rolling syst em operat ions, t he syst em
archit ect ure provides syst em flags in t he EFLAGS regist er and several syst em
regist ers:
The syst em flags and I OPL field in t he EFLAGS regist er cont rol t ask and mode
swit ching, int errupt handling, inst ruct ion t racing, and access right s. See also:
Sect ion 2. 3, Syst em Flags and Fields in t he EFLAGS Regist er.
The cont rol regist ers ( CR0, CR2, CR3, and CR4) cont ain a variet y of flags and
dat a fields for cont rolling syst em- level operat ions. Ot her flags in t hese regist ers
are used t o indicat e support for specific processor capabilit ies wit hin t he
operat ing syst em or execut ive. See also: Sect ion 2. 5, Cont rol Regist ers.
The debug regist ers ( not shown in Figure 2- 1) allow t he set t ing of breakpoint s for
use in debugging programs and syst ems soft ware. See also: Chapt er 16,
Debugging, Profiling Branches and Time- St amp Count er.
The GDTR, LDTR, and I DTR regist ers cont ain t he linear addresses and sizes
( limit s) of t heir respect ive t ables. See also: Sect ion 2. 4, Memory- Management
Regist ers.
The t ask regist er cont ains t he linear address and size of t he TSS for t he current
t ask. See also: Sect ion 2. 4, Memory- Management Regist ers.
Model- specific regist ers ( not shown in Figure 2- 1) .
The model- specific regist ers ( MSRs) are a group of regist ers available primarily t o
operat ing- syst em or execut ive procedures ( t hat is, code running at privilege level 0) .
These regist ers cont rol it ems such as t he debug ext ensions, t he performance- moni-
t oring count ers, t he machine- check archit ect ure, and t he memory t ype ranges
( MTRRs) .
The number and funct ion of t hese regist ers varies among different members of t he
I nt el 64 and I A- 32 processor families. See also: Sect ion 9. 4, Model- Specific Regis-
t ers ( MSRs) , and Appendix B, Model- Specific Regist ers ( MSRs) .
Most syst ems rest rict access t o syst em regist ers ( ot her t han t he EFLAGS regist er) by
applicat ion programs. Syst ems can be designed, however, where all programs and
procedures run at t he most privileged level ( privilege level 0) . I n such a case, appli-
cat ion programs would be allowed t o modify t he syst em regist ers.
2.1.6.1 System Registers in IA-32e Mode
I n I A- 32e mode, t he four syst em- descript or- t able regist ers ( GDTR, I DTR, LDTR, and
TR) are expanded in hardware t o hold 64- bit base addresses. EFLAGS becomes t he
64- bit RFLAGS regist er. CR0CR4 are expanded t o 64 bit s. CR8 becomes available.
CR8 provides read- writ e access t o t he t ask priorit y regist er ( TPR) so t hat t he oper-
at ing syst em can cont rol t he priorit y classes of ext ernal int errupt s.
I n 64- bit mode, debug regist ers DR0DR7 are 64 bit s. I n compat ibilit y mode,
address- mat ching in DR0DR3 is also done at 64- bit granularit y.
2-10 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
On syst ems t hat support I A- 32e mode, t he ext ended feat ure enable regist er
( I A32_EFER) is available. This model- specific regist er cont rols act ivat ion of I A- 32e
mode and ot her I A- 32e mode operat ions. I n addit ion, t here are several model-
specific regist ers t hat govern I A- 32e mode inst ruct ions:
I A32_Ker nel GSbase Used by SWAPGS inst ruct ion.
I A32_LSTAR Used by SYSCALL inst ruct ion.
I A32_SYSCALL_FLAG_MASK Used by SYSCALL inst ruct ion.
I A32_STAR_CS Used by SYSCALL and SYSRET inst ruct ion.
2.1.7 Other System Resources
Besides t he syst em regist ers and dat a st ruct ures described in t he previous sect ions,
syst em archit ect ure provides t he following addit ional resources:
Operat ing syst em inst ruct ions ( see also: Sect ion 2. 7, Syst em I nst ruct ion
Summary ) .
Performance- monit oring count ers ( not shown in Figure 2- 1) .
I nt ernal caches and buffers ( not shown in Figure 2- 1) .
Performance- monit oring count ers are event count ers t hat can be programmed t o
count processor event s such as t he number of inst ruct ions decoded, t he number of
int errupt s received, or t he number of cache loads. See also: Sect ion 20, I nt roduc-
t ion t o Virt ual- Machine Ext ensions.
The processor provides several int ernal caches and buffers. The caches are used t o
st ore bot h dat a and inst ruct ions. The buffers are used t o st ore t hings like decoded
addresses t o syst em and applicat ion segment s and writ e operat ions wait ing t o be
performed. See also: Chapt er 11, Memory Cache Cont rol.
2.2 MODES OF OPERATION
The I A- 32 support s t hree operat ing modes and one quasi- operat ing mode:
Pr ot ect ed mode This is t he nat ive operat ing mode of t he processor. I t
provides a rich set of archit ect ural feat ures, flexibilit y, high performance and
backward compat ibilit y t o exist ing soft ware base.
Real - addr ess mode This operat ing mode provides t he programming
environment of t he I nt el 8086 processor, wit h a few ext ensions ( such as t he
abilit y t o swit ch t o prot ect ed or syst em management mode) .
Sy st em management mode ( SMM) SMM is a st andard archit ect ural feat ure
in all I A- 32 processors, beginning wit h t he I nt el386 SL processor. This mode
provides an operat ing syst em or execut ive wit h a t ransparent mechanism for
implement ing power management and OEM different iat ion feat ures. SMM is
ent ered t hrough act ivat ion of an ext ernal syst em int errupt pin ( SMI # ) , which
generat es a syst em management int errupt ( SMI ) . I n SMM, t he processor
swit ches t o a separat e address space while saving t he cont ext of t he current ly
Vol. 3 2-11
SYSTEM ARCHITECTURE OVERVIEW
running program or t ask. SMM- specific code may t hen be execut ed t ransparent ly.
Upon ret urning from SMM, t he processor is placed back int o it s st at e prior t o t he
SMI .
Vi r t ual - 8086 mode I n prot ect ed mode, t he processor support s a quasi-
operat ing mode known as virt ual- 8086 mode. This mode allows t he processor
execut e 8086 soft ware in a prot ect ed, mult it asking environment .
I nt el 64 archit ect ure support s all operat ing modes of I A- 32 archit ect ure and I A- 32e
modes:
I A- 32e mode I n I A- 32e mode, t he processor support s t wo sub- modes:
compat ibilit y mode and 64- bit mode. 64- bit mode provides 64- bit linear
addressing and support for physical address space larger t han 64 GByt es.
Compat ibilit y mode allows most legacy prot ect ed- mode applicat ions t o run
unchanged.
Figure 2- 3 shows how t he processor moves bet ween operat ing modes.
The processor is placed in real- address mode following power- up or a reset . The PE
flag in cont rol regist er CR0 t hen cont rols whet her t he processor is operat ing in real-
address or prot ect ed mode. See also: Sect ion 9. 9, Mode Swit ching. and Sect ion
4. 1. 2, Paging- Mode Enabling.
Figure 2-3. Transitions Among the Processors Operating Modes
Real-Address
Protected Mode
Virtual-8086
Mode
System
Management
Mode
PE=1
Reset or
VM=1 VM=0
PE=0
Reset
or
RSM
SMI#
RSM
SMI#
RSM
SMI#
Reset
Mode
IA-32e
Mode
RSM
SMI#
LME=1, CR0.PG=1*
See**
* See Section 9.8.5
** See Section 9.8.5.4
2-12 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The VM flag in t he EFLAGS regist er det ermines whet her t he processor is operat ing in
prot ect ed mode or virt ual- 8086 mode. Transit ions bet ween prot ect ed mode and
virt ual- 8086 mode are generally carried out as part of a t ask swit ch or a ret urn from
an int errupt or except ion handler. See also: Sect ion 17. 2. 5, Ent ering Virt ual- 8086
Mode.
The LMA bit ( I A32_EFER.LMA. LMA[ bit 10] ) det ermines whet her t he processor is
operat ing in I A- 32e mode. When running in I A- 32e mode, 64- bit or compat ibilit y
sub- mode operat ion is det ermined by CS. L bit of t he code segment . The processor
ent ers int o I A- 32e mode from prot ect ed mode by enabling paging and set t ing t he
LME bit ( I A32_EFER. LME[ bit 8] ) . See also: Chapt er 9, Processor Management and
I nit ializat ion.
The processor swit ches t o SMM whenever it receives an SMI while t he processor is in
real- address, prot ect ed, virt ual- 8086, or I A- 32e modes. Upon execut ion of t he RSM
inst ruct ion, t he processor always ret urns t o t he mode it was in when t he SMI
occurred.
2.3 SYSTEM FLAGS AND FIELDS IN THE EFLAGS
REGISTER
The syst em flags and I OPL field of t he EFLAGS regist er cont rol I / O, maskable hard-
ware int errupt s, debugging, t ask swit ching, and t he virt ual- 8086 mode ( see
Figure 2- 4) . Only privileged code ( t ypically operat ing syst em or execut ive code)
should be allowed t o modify t hese bit s.
The syst em flags and I OPL are:
TF Tr ap ( bi t 8) Set t o enable single- st ep mode for debugging; clear t o
disable single- st ep mode. I n single- st ep mode, t he processor generat es a
debug except ion aft er each inst ruct ion. This allows t he execut ion st at e of a
program t o be inspect ed aft er each inst ruct ion. I f an applicat ion program
set s t he TF flag using a POPF, POPFD, or I RET inst ruct ion, a debug except ion
is generat ed aft er t he inst ruct ion t hat follows t he POPF, POPFD, or I RET.
Vol. 3 2-13
SYSTEM ARCHITECTURE OVERVIEW
I F I nt er r upt enabl e ( bi t 9) Cont rols t he response of t he processor t o
maskable hardware int errupt request s ( see also: Sect ion 6. 3. 2, Maskable
Hardware I nt errupt s ) . The flag is set t o respond t o maskable hardware
int errupt s; cleared t o inhibit maskable hardware int errupt s. The I F flag does
not affect t he generat ion of except ions or nonmaskable int errupt s ( NMI
int errupt s) . The CPL, I OPL, and t he st at e of t he VME flag in cont rol regist er
CR4 det ermine whet her t he I F flag can be modified by t he CLI , STI , POPF,
POPFD, and I RET.
I OPL I / O pr i v i l ege l ev el f i el d ( bi t s 12 and 13) I ndicat es t he I / O privilege
level ( I OPL) of t he curr ent ly r unning program or t ask. The CPL of t he
cur rent ly running program or t ask must be less t han or equal t o t he I OPL t o
access t he I / O address space. This field can only be modified by t he POPF
and I RET inst ruct ions when operat ing at a CPL of 0.
The I OPL is also one of t he mechanisms t hat cont rols t he modificat ion of t he
I F flag and t he handling of int errupt s in virt ual- 8086 mode when virt ual
mode ext ensions are in effect ( when CR4. VME = 1) . See also: Chapt er 13,
I nput / Out put , in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volume 1.
NT Nest ed t ask ( bi t 14) Cont rols t he chaining of int errupt ed and called
t asks. The processor set s t his flag on calls t o a t ask init iat ed wit h a CALL
inst ruct ion, an int errupt , or an except ion. I t examines and modifies t his flag
on ret urns from a t ask init iat ed wit h t he I RET inst ruct ion. The flag can be
explicit ly set or cleared wit h t he POPF/ POPFD inst ruct ions; however,
Figure 2-4. System Flags in the EFLAGS Register
31 22 21 20 19 18 17 16
R
F
I
D
A
C
V
M
VM Virtual-8086 Mode
RF Resume Flag
NT Nested Task Flag
IOPLI/O Privilege Level
IF Interrupt Enable Flag
AC Alignment Check
ID Identification Flag
VIP Virtual Interrupt Pending
15 13 14 12 11 10 9 8 7 6 5 4 3 2 1 0
0
C
F
A
F
P
F
1
D
F
I
F
T
F
S
F
Z
F
N
T
0 0
V
I
P
V
I
F
O
F
I
O
P
L
VIF Virtual Interrupt Flag
TF Trap Flag
Reserved
Reserved (set to 0)
2-14 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
changing t o t he st at e of t his flag can generat e unexpect ed except ions in
applicat ion programs.
See also: Sect ion 7.4, Task Linking.
RF Resume ( bi t 16) Cont rols t he processor s response t o inst ruct ion- break-
point condit ions. When set , t his flag t emporarily disables debug except ions
( # DB) from being gener at ed f or i nst r uct i on br eakpoi nt s ( al t hough ot her
except i on condi t i ons can cause an except i on t o be generat ed) . When clear,
inst ruct ion breakpoint s will generat e debug except ions.
The primary funct ion of t he RF flag is t o allow t he rest art ing of an inst ruct ion
following a debug except ion t hat was caused by an inst ruct ion breakpoint
condit ion. Here, debug soft ware must set t his flag in t he EFLAGS image on
t he st ack j ust prior t o ret urning t o t he int errupt ed program wit h I RETD ( t o
prevent t he inst ruct ion breakpoint from causing anot her debug except ion) .
The pr ocessor t hen aut omat ically clears t his flag aft er t he inst ruct ion
ret urned t o has been successfully execut ed, enabling inst ruct ion breakpoint
fault s again.
See also: Sect ion 16. 3. 1. 1, I nst ruct ion- Breakpoint Except ion Condit ion.
VM Vi r t ual - 8086 mode ( bi t 17) Set t o enable virt ual- 8086 mode; clear t o
ret urn t o prot ect ed mode.
See also: Sect ion 17. 2. 1, Enabling Virt ual- 8086 Mode.
AC Al i gnment check ( bi t 18) Set t his flag and t he AM flag in cont rol regist er
CR0 t o enable alignment checking of memory references; clear t he AC flag
and/ or t he AM flag t o disable alignment checking. An alignment - check
except ion is generat ed when reference is made t o an unaligned operand,
such as a word at an odd byt e address or a doubleword at an address which
is not an int egral mult iple of four. Alignment - check except ions are generat ed
only in user mode ( privilege level 3) . Memory references t hat default t o priv-
ilege level 0, such as segment descript or loads, do not generat e t his excep-
t ion even when caused by inst ruct ions execut ed in user- mode.
The alignment - check except ion can be used t o check alignment of dat a. This
is useful when exchanging dat a wit h processors which require all dat a t o be
aligned. The alignment - check except ion can also be used by int erpret ers t o
flag some point ers as special by misaligning t he point er. This eliminat es
overhead of checking each point er and only handles t he special point er when
used.
VI F Vi r t ual I nt er r upt ( bi t 19) Cont ains a virt ual image of t he I F flag. This
flag is used in conj unct ion wit h t he VI P flag. The processor only recognizes
t he VI F flag when eit her t he VME flag or t he PVI flag in cont rol regist er CR4 is
set and t he I OPL is less t han 3. ( The VME flag enables t he virt ual- 8086 mode
ext ensions; t he PVI flag enables t he prot ect ed- mode virt ual int errupt s. )
See also: Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling, and
Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
Vol. 3 2-15
SYSTEM ARCHITECTURE OVERVIEW
VI P Vi r t ual i nt er r upt pendi ng ( bi t 20) Set by soft ware t o indicat e t hat an
int errupt is pending; cleared t o indicat e t hat no int errupt is pending. This flag
is used in conj unct ion wit h t he VI F flag. The processor reads t his flag but
never modifies it . The processor only recognizes t he VI P flag when eit her t he
VME flag or t he PVI flag in cont rol regist er CR4 is set and t he I OPL is less t han
3. The VME flag enables t he virt ual- 8086 mode ext ensions; t he PVI flag
enables t he prot ect ed- mode virt ual int errupt s.
See Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling, and Sect ion
17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
I D I dent i f i cat i on ( bi t 21) . The abilit y of a program or procedure t o set or
clear t his flag indicat es support for t he CPUI D inst ruct ion.
2.3.1 System Flags and Fields in IA-32e Mode
I n 64- bit mode, t he RFLAGS regist er expands t o 64 bit s wit h t he upper 32 bit s
reserved. Syst em flags in RFLAGS ( 64- bit mode) or EFLAGS ( compat ibilit y mode)
are shown in Figure 2- 4.
I n I A- 32e mode, t he processor does not allow t he VM bit t o be set because virt ual-
8086 mode is not support ed ( at t empt s t o set t he bit are ignored) . Also, t he processor
will not set t he NT bit . The processor does, however, allow soft ware t o set t he NT bit
( not e t hat an I RET causes a general prot ect ion fault in I A- 32e mode if t he NT bit is
set ) .
I n I A- 32e mode, t he SYSCALL/ SYSRET inst ruct ions have a programmable met hod of
specifying which bit s are cleared in RFLAGS/ EFLAGS. These inst ruct ions save/ rest ore
EFLAGS/ RFLAGS.
2.4 MEMORY-MANAGEMENT REGISTERS
The processor provides four memory- management regist ers ( GDTR, LDTR, I DTR,
and TR) t hat specify t he locat ions of t he dat a st ruct ures which cont rol segment ed
memory management ( see Figure 2- 5) . Special inst ruct ions are provided for loading
and st oring t hese regist ers.
2-16 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
2.4.1 Global Descriptor Table Register (GDTR)
The GDTR regist er holds t he base address ( 32 bit s in prot ect ed mode; 64 bit s in
I A- 32e mode) and t he 16- bit t able limit for t he GDT. The base address specifies t he
linear address of byt e 0 of t he GDT; t he t able limit specifies t he number of byt es in
t he t able.
The LGDT and SGDT inst ruct ions load and st ore t he GDTR regist er, respect ively. On
power up or reset of t he processor, t he base address is set t o t he default value of 0
and t he limit is set t o 0FFFFH. A new base address must be loaded int o t he GDTR as
part of t he processor init ializat ion process for prot ect ed- mode operat ion.
See also: Sect ion 3.5. 1, Segment Descript or Tables.
2.4.2 Local Descriptor Table Register (LDTR)
The LDTR regist er holds t he 16- bit segment select or, base address ( 32 bit s in
prot ect ed mode; 64 bit s in I A- 32e mode) , segment limit , and descript or at t ribut es
for t he LDT. The base address specifies t he linear address of byt e 0 of t he LDT
segment ; t he segment limit specifies t he number of byt es in t he segment . See also:
Sect ion 3.5. 1, Segment Descript or Tables.
The LLDT and SLDT inst ruct ions load and st ore t he segment select or part of t he LDTR
regist er, respect ively. The segment t hat cont ains t he LDT must have a segment
descript or in t he GDT. When t he LLDT inst ruct ion loads a segment select or in t he
LDTR: t he base address, limit , and descript or at t ribut es from t he LDT descript or are
aut omat ically loaded in t he LDTR.
When a t ask swit ch occurs, t he LDTR is aut omat ically loaded wit h t he segment
select or and descript or for t he LDT for t he new t ask. The cont ent s of t he LDTR are not
aut omat ically saved prior t o writ ing t he new LDT informat ion int o t he regist er.
On power up or reset of t he processor, t he segment select or and base address are set
t o t he default value of 0 and t he limit is set t o 0FFFFH.
Figure 2-5. Memory Management Registers
0 47(79)
GDTR
IDTR
System Table Registers
32(64)-bit Linear Base Address 16-Bit Table Limit
15 16
32(64)-bit Linear Base Address
0
Task
LDTR
System Segment
Seg. Sel.
15
Seg. Sel.
Segment Descriptor Registers (Automatically Loaded)
32(64)-bit Linear Base Address Segment Limit
Attributes
Registers
32(64)-bit Linear Base Address Segment Limit
Register
16-Bit Table Limit
Vol. 3 2-17
SYSTEM ARCHITECTURE OVERVIEW
2.4.3 IDTR Interrupt Descriptor Table Register
The I DTR regist er holds t he base address ( 32 bit s in prot ect ed mode; 64 bit s in
I A- 32e mode) and 16- bit t able limit for t he I DT. The base address specifies t he linear
address of byt e 0 of t he I DT; t he t able limit specifies t he number of byt es in t he t able.
The LI DT and SI DT inst ruct ions load and st ore t he I DTR regist er, respect ively. On
power up or reset of t he processor, t he base address is set t o t he default value of 0
and t he limit is set t o 0FFFFH. The base address and limit in t he regist er can t hen be
changed as part of t he processor init ializat ion process.
See also: Sect ion 6. 10, I nt errupt Descript or Table ( I DT) .
2.4.4 Task Register (TR)
The t ask regist er holds t he 16- bit segment select or, base address ( 32 bit s in
prot ect ed mode; 64 bit s in I A- 32e mode) , segment limit , and descript or at t ribut es
for t he TSS of t he current t ask. The select or references t he TSS descript or in t he GDT.
The base address specifies t he linear address of byt e 0 of t he TSS; t he segment limit
specifies t he number of byt es in t he TSS. See also: Sect ion 7. 2. 4, Task Regist er.
The LTR and STR inst ruct ions load and st ore t he segment select or part of t he t ask
regist er, respect ively. When t he LTR inst ruct ion loads a segment select or in t he t ask
regist er, t he base address, limit , and descript or at t ribut es from t he TSS descript or
are aut omat ically loaded int o t he t ask regist er. On power up or reset of t he processor,
t he base address is set t o t he default value of 0 and t he limit is set t o 0FFFFH.
When a t ask swit ch occurs, t he t ask regist er is aut omat ically loaded wit h t he
segment select or and descript or for t he TSS for t he new t ask. The cont ent s of t he
t ask regist er are not aut omat ically saved prior t o writ ing t he new TSS informat ion
int o t he regist er.
2.5 CONTROL REGISTERS
Cont rol regist ers ( CR0, CR1, CR2, CR3, and CR4; see Figure 2- 6) det ermine oper-
at ing mode of t he processor and t he charact erist ics of t he current ly execut ing t ask.
These regist ers are 32 bit s in all 32- bit modes and compat ibilit y mode.
I n 64- bit mode, cont rol regist ers are expanded t o 64 bit s. The MOV CRn inst ruct ions
are used t o manipulat e t he regist er bit s. Operand- size prefixes for t hese inst ruct ions
are ignored. The following is also t rue:
Bit s 63: 32 of CR0 and CR4 are reserved and must be writ t en wit h zeros. Writ ing
a nonzero value t o any of t he upper 32 bit s result s in a general- prot ect ion
except ion, # GP( 0) .
All 64 bit s of CR2 are writ able by soft ware.
Bit s 51: 40 of CR3 are reserved and must be 0.
2-18 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
The MOV CRn inst ruct ions do not check t hat addresses writ t en t o CR2 and CR3
are wit hin t he linear- address or physical- address limit at ions of t he implemen-
t at ion.
Regist er CR8 is available in 64- bit mode only.
The cont rol regist ers are summarized below, and each archit ect urally defined cont rol
field in t hese cont rol regist ers are described individually. I n Figure 2- 6, t he widt h of
t he regist er in 64- bit mode is indicat ed in parent hesis ( except for CR0) .
CR0 Cont ains syst em cont rol flags t hat cont rol operat ing mode and st at es of
t he processor.
CR1 Reserved.
CR2 Cont ains t he page- fault linear address ( t he linear address t hat caused a
page fault ) .
CR3 Cont ains t he physical address of t he base of t he paging- st ruct ure
hierarchy and t wo flags ( PCD and PWT) . Only t he most - significant bit s ( less t he
lower 12 bit s) of t he base address are specified; t he lower 12 bit s of t he address
are assumed t o be 0. The first paging st ruct ure must t hus be aligned t o a page
( 4- KByt e) boundary. The PCD and PWT flags cont rol caching of t hat paging
st ruct ure in t he processor s int ernal dat a caches ( t hey do not cont rol TLB caching
of page- direct ory informat ion) .
When using t he physical address ext ension, t he CR3 regist er cont ains t he base
address of t he page- direct ory- point er t able I n I A- 32e mode, t he CR3 regist er
cont ains t he base address of t he PML4 t able.
See also: Chapt er 4, Paging.
CR4 Cont ains a group of flags t hat enable several archit ect ural ext ensions,
and indicat e operat ing syst em or execut ive support for specific processor capabil-
it ies. The cont rol regist ers can be read and loaded ( or modified) using t he move-
t o- or- from- cont rol- regist ers forms of t he MOV inst ruct ion. I n prot ect ed mode,
t he MOV inst ruct ions allow t he cont rol regist ers t o be read or loaded ( at privilege
level 0 only) . This rest rict ion means t hat applicat ion programs or operat ing-
syst em procedures ( running at privilege levels 1, 2, or 3) are prevent ed from
reading or loading t he cont rol regist ers.
CR8 Provides read and writ e access t o t he Task Priorit y Regist er ( TPR) . I t
specifies t he priorit y t hreshold value t hat operat ing syst ems use t o cont rol t he
priorit y class of ext ernal int errupt s allowed t o int errupt t he processor. This
regist er is available only in 64- bit mode. However, int errupt filt ering cont inues t o
apply in compat ibilit y mode.
Vol. 3 2-19
SYSTEM ARCHITECTURE OVERVIEW
When loading a cont rol regist er, reserved bit s should always be set t o t he values
previously read. The flags in cont rol regist ers are:
PG Pagi ng ( bi t 31 of CR0) Enables paging when set ; disables paging when
clear. When paging is disabled, all linear addresses are t reat ed as physical
addresses. The PG flag has no effect if t he PE flag ( bit 0 of regist er CR0) is
not also set ; set t ing t he PG flag when t he PE flag is clear causes a general-
prot ect ion except ion ( # GP) . See also: Chapt er 4, Paging.
On I nt el 64 processors, enabling and disabling I A- 32e mode operat ion also
requires modifying CR0. PG.
CD Cache Di sabl e ( bi t 30 of CR0) When t he CD and NW flags are clear,
caching of memory locat ions for t he whole of physical memory in t he
processor s int ernal ( and ext ernal) caches is enabled. When t he CD flag is
set , caching is rest rict ed as described in Table 11- 5. To prevent t he processor
from accessing and updat ing it s caches, t he CD flag must be set and t he
caches must be invalidat ed so t hat no cache hit s can occur.
Figure 2-6. Control Registers
CR1
W
P
A
M
Page-Directory Base
V
M
E
P
S
E
T
S
D
D
E
P
V
I
P
G
E
M
C
E
P
A
E
P
C
E
N
W
P
G
C
D
P
W
T
P
C
D
Page-Fault Linear Address
P
E
E
M
M
P
T
S
N
E
E
T
CR2
CR0
CR4
Reserved
CR3
Reserved (set to 0)
31 29 30 28 19 18 17 16 15 6 5 4 3 2 1 0
31(63) 0
31(63) 0
31(63) 12 11 5 4 3 2
31(63) 9 8 7 6 5 4 3 2 1 0
(PDBR)
13 12 11 10
OSFXSR
OSXMMEXCPT
V
M
X
E
0 0
E
X
M
S
14 18
OSXSAVE
PCIDE
17
2-20 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
See also: Sect ion 11. 5. 3, Prevent ing Caching, and Sect ion 11. 5, Cache
Cont rol.
NW Not Wr i t e- t hr ough ( bi t 29 of CR0) When t he NW and CD flags are
clear, writ e- back ( for Pent ium 4, I nt el Xeon, P6 family, and Pent ium proces-
sors) or writ e- t hrough ( for I nt el486 processors) is enabled for writ es t hat hit
t he cache and invalidat ion cycles are enabled. See Table 11- 5 for det ailed
informat ion about t he affect of t he NW flag on caching for ot her set t ings of
t he CD and NW flags.
AM Al i gnment Mask ( bi t 18 of CR0) Enables aut omat ic alignment checking
when set ; disables alignment checking when clear. Alignment checking is
performed only when t he AM flag is set , t he AC flag in t he EFLAGS regist er is
set , CPL is 3, and t he processor is operat ing in eit her prot ect ed or virt ual-
8086 mode.
WP Wr i t e Pr ot ect ( bi t 16 of CR0) When set , inhibit s supervisor- level proce-
dures from writ ing int o read- only pages; when clear, allows supervisor- level
procedures t o writ e int o read- only pages ( regardless of t he U/ S bit set t ing;
see Sect ion 4. 1. 3 and Sect ion 4.6) . This flag facilit at es implement at ion of t he
copy- on- writ e met hod of creat ing a new process ( forking) used by operat ing
syst ems such as UNI X.
NE Numer i c Er r or ( bi t 5 of CR0) Enables t he nat ive ( int ernal) mechanism
for report ing x87 FPU errors when set ; enables t he PC- st yle x87 FPU error
report ing mechanism when clear. When t he NE flag is clear and t he I GNNE#
input is assert ed, x87 FPU errors are ignored. When t he NE flag is clear and
t he I GNNE# input is deassert ed, an unmasked x87 FPU error causes t he
processor t o assert t he FERR# pin t o generat e an ext ernal int errupt and t o
st op inst ruct ion execut ion immediat ely before execut ing t he next wait ing
float ing- point inst ruct ion or WAI T/ FWAI T inst ruct ion.
The FERR# pin is int ended t o drive an input t o an ext ernal int errupt
cont roller ( t he FERR# pin emulat es t he ERROR# pin of t he I nt el 287 and
I nt el 387 DX mat h coprocessors) . The NE flag, I GNNE# pin, and FERR# pin
are used wit h ext ernal logic t o implement PC- st yle error report ing. Using
FERR# and I GNNE# t o handle float ing- point except ions is deprecat ed by
modern operat ing syst ems; t his non- nat ive approach also limit s newer
processors t o operat e wit h one logical processor act ive.
See also: Soft ware Except ion Handling in Chapt er 8, Programming wit h
t he x87 FPU, and Appendix A, EFLAGS Cross- Reference, in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1.
ET Ex t ensi on Ty pe ( bi t 4 of CR0) Reserved in t he Pent ium 4, I nt el Xeon, P6
family, and Pent ium processors. I n t he Pent ium 4, I nt el Xeon, and P6 family
processors, t his flag is hardcoded t o 1. I n t he I nt el386 and I nt el486 proces-
sors, t his flag indicat es support of I nt el 387 DX mat h coprocessor inst ruc-
t ions when set .
TS Task Sw i t ched ( bi t 3 of CR0) Allows t he saving of t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 cont ext on a t ask swit ch t o be
Vol. 3 2-21
SYSTEM ARCHITECTURE OVERVIEW
delayed unt il an x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion is
act ually execut ed by t he new t ask. The processor set s t his flag on every t ask
swit ch and t est s it when execut ing x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
I f t he TS flag is set and t he EM flag ( bit 2 of CR0) is clear, a device- not -
available except ion ( # NM) is raised prior t o t he execut ion of any x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion; wit h t he except ion
of PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH,
CRC32, and POPCNT. See t he paragraph below for t he special case of t he
WAI T/ FWAI T inst ruct ions.
I f t he TS flag is set and t he MP flag ( bit 1 of CR0) and EM flag are clear, an
# NM except ion is not raised prior t o t he execut ion of an x87 FPU
WAI T/ FWAI T inst ruct ion.
I f t he EM flag is set , t he set t ing of t he TS flag has no affect on t he
execut ion of x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
Table 2- 1 shows t he act ions t aken when t he processor encount ers an x87
FPU inst ruct ion based on t he set t ings of t he TS, EM, and MP flags. Table 12- 1
and 13- 1 show t he act ions t aken when t he processor encount ers an
MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion.
The processor does not aut omat ically save t he cont ext of t he x87 FPU, XMM,
and MXCSR regist ers on a t ask swit ch. I nst ead, it set s t he TS flag, which
causes t he processor t o raise an # NM except ion whenever it encount ers an
x87 FPU/ MMX/ SSE / SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion in t he inst ruct ion
st ream for t he new t ask ( wit h t he except ion of t he inst ruct ions list ed above) .
The fault handler for t he # NM except ion can t hen be used t o clear t he TS flag ( wit h
t he CLTS inst ruct ion) and save t he cont ext of t he x87 FPU, XMM, and MXCSR regis-
t ers. I f t he t ask never encount ers an x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ / SSSE3/ SSE4
inst ruct ion; t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 cont ext is never saved.
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS
CR0 Flags x87 FPU Instruction Type
EM MP TS Floating-Point WAIT/FWAIT
0 0 0 Execute Execute.
0 0 1 #NM Exception Execute.
0 1 0 Execute Execute.
0 1 1 #NM Exception #NM exception.
1 0 0 #NM Exception Execute.
1 0 1 #NM Exception Execute.
1 1 0 #NM Exception Execute.
2-22 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
EM Emul at i on ( bi t 2 of CR0) I ndicat es t hat t he processor does not have an
int ernal or ext ernal x87 FPU when set ; indicat es an x87 FPU is present when
clear. This flag also affect s t he execut ion of
MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions.
When t he EM flag is set , execut ion of an x87 FPU inst ruct ion generat es a
device- not - available except ion ( # NM) . This flag must be set when t he
processor does not have an int ernal x87 FPU or is not connect ed t o an
ext er nal mat h copr ocessor. Set t ing t his f lag f or ces all f loat ing- point inst r uc-
t ions t o be handled by soft war e emulat ion. Table 9- 2 shows t he recom-
mended set t ing of t his flag, depending on t he I A- 32 processor and x87 FPU
or mat h coprocessor present in t he syst em. Table 2- 1 shows t he int eract ion
of t he EM, MP, and TS flags.
Also, when t he EM flag is set , execut ion of an MMX inst ruct ion causes an
invalid- opcode except ion ( # UD) t o be generat ed ( see Table 12- 1) . Thus, if an
I A- 32 or I nt el 64 processor incorporat es MMX t echnology, t he EM flag must
be set t o 0 t o enable execut ion of MMX inst ruct ions.
Similarly for SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, when t he EM flag is
set , execut ion of most SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions causes an
invalid opcode except ion ( # UD) t o be generat ed ( see Table 13- 1) . I f an I A- 32
or I nt el 64 processor incorporat es t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext en-
sions, t he EM flag must be set t o 0 t o enable execut ion of t hese ext ensions.
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions not affect ed by t he EM flag
include: PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH,
CRC32, and POPCNT.
MP Moni t or Copr ocessor ( bi t 1 of CR0) . Cont rols t he int eract ion of t he
WAI T ( or FWAI T) inst ruct ion wit h t he TS flag ( bit 3 of CR0) . I f t he MP flag is
set , a WAI T inst ruct ion generat es a device- not - available except ion ( # NM) if
t he TS flag is also set . I f t he MP flag is clear, t he WAI T inst ruct ion ignores t he
set t ing of t he TS flag. Table 9- 2 shows t he recommended set t ing of t his flag,
depending on t he I A- 32 processor and x87 FPU or mat h coprocessor present
in t he syst em. Table 2- 1 shows t he int eract ion of t he MP, EM, and TS flags.
PE Pr ot ect i on Enabl e ( bi t 0 of CR0) Enables prot ect ed mode when set ;
enables real- address mode when clear. This flag does not enable paging
direct ly. I t only enables segment - level prot ect ion. To enable paging, bot h t he
PE and PG flags must be set .
See also: Sect ion 9.9, Mode Swit ching.
PCD Page- l evel Cache Di sabl e ( bi t 4 of CR3) Cont rols caching of t he first
paging st ruct ure of t he current paging- st ruct ure hierarchy. When t he PCD
1 1 1 #NM Exception #NM exception.
Table 2-1. Action Taken By x87 FPU Instructions for Different
Combinations of EM, MP, and TS
CR0 Flags x87 FPU Instruction Type
Vol. 3 2-23
SYSTEM ARCHITECTURE OVERVIEW
flag is set , caching of t he page- direct ory is prevent ed; when t he flag is clear,
t he page- direct ory can be cached. This flag affect s only t he processor s
int ernal caches ( bot h L1 and L2, when present ) . The processor ignores t his
flag if paging is not used ( t he PG flag in regist er CR0 is clear) or t he CD
( cache disable) flag in CR0 is set .
See also: Chapt er 11, Memory Cache Cont rol ( for more about t he use of
t he PCD flag) and Sect ion 4. 9, Paging and Memory Typing ( for a discussion
of a companion PCD flag in page- direct ory and page- t able ent ries) .
PWT Page- l ev el Wr i t e- Thr ough ( bi t 3 of CR3) Cont rols t he writ e- t hrough or
writ e- back caching policy of t he first paging st ruct ure of t he current paging-
st ruct ure hierarchy. When t he PWT flag is set , writ e- t hrough caching is
enabled; when t he flag is clear, writ e- back caching is enabled. This flag
affect s only int ernal caches ( bot h L1 and L2, when present ) . The processor
ignores t his flag if paging is not used ( t he PG flag in regist er CR0 is clear) or
t he CD ( cache disable) flag in CR0 is set .
See also: Sect ion 11. 5, Cache Cont rol ( for more informat ion about t he use
of t his flag) , and Sect ion 4. 9, Paging and Memory Typing ( for a discussion
of a companion PCD flag in t he page- direct ory and page- t able ent ries) .
VME Vi r t ual - 8086 Mode Ex t ensi ons ( bi t 0 of CR4) Enables int errupt - and
except ion- handling ext ensions in virt ual- 8086 mode when set ; disables t he
ext ensions when clear. Use of t he virt ual mode ext ensions can improve t he
performance of virt ual- 8086 applicat ions by eliminat ing t he overhead of
calling t he virt ual- 8086 monit or t o handle int errupt s and except ions t hat
occur while execut ing an 8086 program and, inst ead, redirect ing t he int er-
rupt s and except ions back t o t he 8086 programs handlers. I t also provides
hardware support for a virt ual int errupt flag ( VI F) t o improve reliabilit y of
running 8086 programs in mult it asking and mult iple- processor environ-
ment s.
See also: Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086
Mode.
PVI Pr ot ect ed- Mode Vi r t ual I nt er r upt s ( bi t 1 of CR4) Enables hardware
support for a virt ual int errupt flag ( VI F) in prot ect ed mode when set ; disables
t he VI F flag in prot ect ed mode when clear.
See also: Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
TSD Ti me St amp Di sabl e ( bi t 2 of CR4) Rest rict s t he execut ion of t he
RDTSC inst ruct ion ( including RDTSCP inst ruct ion if
CPUI D. 80000001H: EDX[ 27] = 1) t o procedures running at privilege level 0
when set ; allows RDTSC inst ruct ion ( including RDTSCP inst ruct ion if
CPUI D. 80000001H: EDX[ 27] = 1) t o be execut ed at any privilege level when
clear.
DE Debuggi ng Ex t ensi ons ( bi t 3 of CR4) References t o debug regist ers
DR4 and DR5 cause an undefined opcode ( # UD) except ion t o be generat ed
2-24 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
when set ; when clear, processor aliases references t o regist ers DR4 and DR5
for compat ibilit y wit h soft ware writ t en t o run on earlier I A- 32 processors.
See also: Sect ion 16. 2. 2, Debug Regist ers DR4 and DR5.
PSE Page Si ze Ex t ensi ons ( bi t 4 of CR4) Enables 4- MByt e pages wit h 32- bit
paging when set ; rest rict s 32- bit paging t o pages t o 4 KByt es when clear.
See also: Sect ion 4.3, 32- Bit Paging.
PAE Phy si cal Addr ess Ex t ensi on ( bi t 5 of CR4) When set , enables paging
t o produce physical addresses wit h more t han 32 bit s. When clear, rest rict s
physical addresses t o 32 bit s. PAE must be set before ent ering I A- 32e mode.
See also: Chapt er 4, Paging.
MCE Machi ne- Check Enabl e ( bi t 6 of CR4) Enables t he machine- check
except ion when set ; disables t he machine- check except ion when clear.
See also: Chapt er 15, Machine- Check Archit ect ure.
PGE Page Gl obal Enabl e ( bi t 7 of CR4) ( I nt roduced in t he P6 family proces-
sors. ) Enables t he global page feat ure when set ; disables t he global page
feat ure when clear. The global page feat ure allows frequent ly used or shared
pages t o be marked as global t o all users ( done wit h t he global flag, bit 8, in
a page- direct ory or page- t able ent ry) . Global pages are not flushed from t he
t ranslat ion- lookaside buffer ( TLB) on a t ask swit ch or a writ e t o regist er CR3.
When enabling t he global page feat ure, paging must be enabled ( by set t ing
t he PG flag in cont rol regist er CR0) before t he PGE flag is set . Reversing t his
sequence may affect program correct ness, and processor performance will
be impact ed.
See also: Sect ion 4. 10, Caching Translat ion I nformat ion.
PCE Per f or mance- Moni t or i ng Count er Enabl e ( bi t 8 of CR4) Enables
execut ion of t he RDPMC inst ruct ion for programs or procedures running at
any prot ect ion level when set ; RDPMC inst ruct ion can be execut ed only at
prot ect ion level 0 when clear.
OSFXSR
Oper at i ng Syst em Suppor t f or FXSAVE and FXRSTOR i nst r uct i ons
( bi t 9 of CR4) When set , t his flag: ( 1) indicat es t o soft ware t hat t he oper-
at ing syst em support s t he use of t he FXSAVE and FXRSTOR inst ruct ions, ( 2)
enables t he FXSAVE and FXRSTOR inst ruct ions t o save and rest ore t he
cont ent s of t he XMM and MXCSR regist ers along wit h t he cont ent s of t he x87
FPU and MMX regist ers, and ( 3) enables t he processor t o execut e
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions, wit h t he except ion of t he PAUSE,
PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH, CRC32, and
POPCNT.
I f t his flag is clear, t he FXSAVE and FXRSTOR inst ruct ions will save and
rest ore t he cont ent s of t he x87 FPU and MMX inst ruct ions, but t hey may not
save and rest ore t he cont ent s of t he XMM and MXCSR regist ers. Also, t he
Vol. 3 2-25
SYSTEM ARCHITECTURE OVERVIEW
processor will generat e an invalid opcode except ion ( # UD) if it at t empt s t o
execut e any SSE/ SSE2/ SSE3 inst ruct ion, wit h t he except ion of PAUSE,
PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI , CLFLUSH, CRC32, and
POPCNT. The operat ing syst em or execut ive must explicit ly set t his flag.
NOTE
CPUI D feat ure flags FXSR indicat es availabilit y of t he
FXSAVE/ FXRSTOR inst ruct ions. The OSFXSR bit provides operat ing
syst em soft ware wit h a means of enabling FXSAVE/ FXRSTOR t o
save/ rest ore t he cont ent s of t he X87 FPU, XMM and MXCSR regist ers.
Consequent ly OSFXSR bit indicat es t hat t he operat ing syst em
provides cont ext swit ch support for SSE/ SSE2/ SSE3/ SSSE3/ SSE4.
OSXMMEXCPT
Oper at i ng Syst em Suppor t f or Unmask ed SI MD Fl oat i ng- Poi nt Ex cep-
t i ons ( bi t 10 of CR4) When set , indicat es t hat t he operat ing syst em
support s t he handling of unmasked SI MD float ing- point except ions t hrough
an except ion handler t hat is invoked when a SI MD float ing- point except ion
( # XF) is generat ed. SI MD float ing- point except ions are only generat ed by
SSE/ SSE2/ SSE3/ SSE4.1 SI MD float ing- point inst ruct ions.
The operat ing syst em or execut ive must explicit ly set t his flag. I f t his flag is
not set , t he processor will generat e an invalid opcode except ion ( # UD)
whenever it det ect s an unmasked SI MD float ing- point except ion.
VMXE
VMX- Enabl e Bi t ( bi t 13 of CR4) Enables VMX operat ion when set . See
Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext ensions.
SMXE
SMX- Enabl e Bi t ( bi t 14 of CR4) Enables SMX operat ion when set . See
Chapt er 6, Safer Mode Ext ensions Reference of I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 2B.
PCI DE
PCI D- Enabl e Bi t ( bi t 17 of CR4) Enables process- cont ext ident ifiers
( PCI Ds) when set . See Sect ion 4. 10. 1, Process- Cont ext I dent ifiers
( PCI Ds) . Can be set only in I A- 32e mode ( if I A32_EFER. LMA = 1) .
OSXSAVE
XSAVE and Pr ocessor Ex t ended St at es- Enabl e Bi t ( bi t 18 of CR4)
When set , t his flag: ( 1) indicat es ( via CPUI D.01H: ECX. OSXSAVE[ bit 27] )
t hat t he operat ing syst em support s t he use of t he XGETBV, XSAVE and
XRSTOR inst ruct ions by general soft ware; ( 2) enables t he XSAVE and
XRSTOR inst ruct ions t o save and rest ore t he x87 FPU st at e ( including MMX
regist ers) , t he SSE st at e ( XMM regist ers and MXCSR) , along wit h ot her
processor ext ended st at es enabled in t he XFEATURE_ENABLED_MASK
regist er ( XCR0) ; ( 3) enables t he processor t o execut e XGETBV and XSETBV
inst ruct ions in order t o read and writ e XCR0. See Sect ion 2. 6 and Chapt er
2-26 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
13, Syst em Programming for I nst ruct ion Set Ext ensions and Processor
Ext ended St at es .
TPL Task Pr i or i t y Lev el ( bi t 3: 0 of CR8) This set s t he t hreshold value corre-
sponding t o t he highest - priorit y int errupt t o be blocked. A value of 0 means
all int errupt s are enabled. This field is available in 64- bit mode. A value of 15
means all int errupt s will be disabled.
2.5.1 CPUID Qualification of Control Register Flags
Not all flags in cont rol regist er CR4 are implement ed on all processors. Wit h t he
except ion of t he PCE flag, t hey can be qualified wit h t he CPUI D inst ruct ion t o det er-
mine if t hey are implement ed on t he processor before t hey are used.
The CR8 regist er is available on processors t hat support I nt el 64 archit ect ure.
2.6 EXTENDED CONTROL REGISTERS (INCLUDING THE
XFEATURE_ENABLED_MASK REGISTER)
I f CPUI D. 01H: ECX. XSAVE[ bit 26] is 1, t he processor support s one or more
ex t ended cont r ol r egi st er s ( XCRs) . Current ly, t he only such regist er defined is
XCR0, t he XFEATURE_ENABLED_MASK r egi st er. This regist er specifies t he set of
processor st at es t hat t he operat ing syst em enables on t hat processor, e. g. x87 FPU
St at es, SSE st at es, and ot her processor ext ended st at es t hat I nt el 64 archit ect ure
may int roduce in t he fut ure. The OS programs XCR0 t o reflect t he feat ures it
support s.
Figure 2-7. XFEATURE_ENABLED_MASK Register (XCR0)
63
x87 FPU/MMX state (must be 1)
Reserved for XCR0 bit vector expansion
Reserved / Future processor extended states
2 1 0
SSE state
Reserved (must be 0)
1
Vol. 3 2-27
SYSTEM ARCHITECTURE OVERVIEW
Soft ware can access XCR0 only if CR4. OSXSAVE[ bit 18] = 1. ( This bit is also readable
as CPUI D.01H: ECX. OSXSAVE[ bit 27] . ) The layout of XCR0 is archit ect ed t o allow
soft ware t o use CPUI D leaf funct ion 0DH t o enumerat e t he set of bit s t hat t he
processor support s in XCR0 ( see CPUI D inst ruct ion in I nt el 64 and I A- 32 Archit ec-
t ures Soft ware Developers Manual, Volume 2A) . Each processor st at e ( X87 FPU
st at e, SSE st at e, or a fut ure processor ext ended st at e) is represent ed by a bit in
XCR0. The OS can enable fut ure processor ext ended st at es in a forward manner by
specifying t he appropriat e bit mask value using t he XSETBV inst ruct ion according t o
t he result s of t he CPUI D leaf 0DH.
Wit h t he except ion of bit 63, each bit in t he XFEATURE_ENABLED_MASK regist er
( XCR0) corresponds t o a subset of t he processor st at es. XCR0 t hus provides space
for up t o 63 set s of processor st at e ext ensions. Bit 63 of XCR0 is reserved for fut ure
expansion and will not represent a processor ext ended st at e.
Current ly, t he XFEATURE_ENABLED_MASK regist er ( XCR0) has t wo processor st at es
defined, wit h up t o 61 bit s reserved for fut ure processor ext ended st at es:
XCR0. X87 ( bit 0) : I f 1, indicat es x87 FPU st at e ( including MMX regist er st at es) is
support ed in t he processor. Bit 0 must be 1. An at t empt t o writ e 0 causes a # GP
except ion.
XCR0. SSE ( bit 1) : I f 1, indicat es MXCSR and XMM regist ers ( XMM0-XMM15 in 64-
bit mode, ot herwise XMM0-XMM7) are support ed by XSAVE/ XRESTOR in t he
processor.
Any at t empt t o set a reserved bit ( as det ermined by t he cont ent s of EAX and EDX
aft er execut ing CPUI D wit h EAX= 0DH, ECX= 0H) in t he XFEATURE_ENABLED_MASK
regist er for a given processor will result in a # GP except ion. An at t empt t o writ e 0 t o
XFEATURE_ENABLED_MASK. x87 ( bit 0) will result in a # GP except ion.
I f a bit in t he XFEATURE_ENABLED_MASK regist er is 1, XSAVE inst ruct ion can selec-
t ively ( in conj unct ion wit h a save mask) save a part ial or full set of processor st at es
t o memory ( See XSAVE inst ruct ion in I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2B) .
Aft er reset all bit s ( except bit 0) in t he XFEATURE_ENABLED_MASK regist er ( XCR0)
are cleared t o zero. XCR0[ 0] is set t o 1.
2.7 SYSTEM INSTRUCTION SUMMARY
Syst em inst ruct ions handle syst em- level funct ions such as loading syst em regist ers,
managing t he cache, managing int errupt s, or set t ing up t he debug regist ers. Many of
t hese inst ruct ions can be execut ed only by operat ing- syst em or execut ive proce-
dures ( t hat is, procedures running at privilege level 0) . Ot hers can be execut ed at
any privilege level and are t hus available t o applicat ion programs.
Table 2- 2 list s t he syst em inst ruct ions and indicat es whet her t hey are available and
useful for applicat ion programs. These inst ruct ions are described in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volumes 2A & 2B.
2-28 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Table 2-2. Summary of System Instructions
Instruction Description
Useful to
Application?
Protected from
Application?
LLDT Load LDT Register No Yes
SLDT Store LDT Register No No
LGDT Load GDT Register No Yes
SGDT Store GDT Register No No
LTR Load Task Register No Yes
STR Store Task Register No No
LIDT Load IDT Register No Yes
SIDT Store IDT Register No No
MOV CRn Load and store control registers No Yes
SMSW Store MSW Yes No
LMSW Load MSW No Yes
CLTS Clear TS flag in CR0 No Yes
ARPL Adjust RPL Yes
1, 5
No
LAR Load Access Rights Yes No
LSL Load Segment Limit Yes No
VERR Verify for Reading Yes No
VERW Verify for Writing Yes No
MOV DRn Load and store debug registers No Yes
INVD Invalidate cache, no writeback No Yes
WBINVD Invalidate cache, with writeback No Yes
INVLPG Invalidate TLB entry No Yes
HLT Halt Processor No Yes
LOCK (Prefix) Bus Lock Yes No
RSM Return from system management
mode
No Yes
RDMSR
3
Read Model-Specific Registers No Yes
WRMSR
3
Write Model-Specific Registers No Yes
RDPMC
4
Read Performance-Monitoring
Counter
Yes Yes
2
RDTSC
3
Read Time-Stamp Counter Yes Yes
2
Vol. 3 2-29
SYSTEM ARCHITECTURE OVERVIEW
2.7.1 Loading and Storing System Registers
The GDTR, LDTR, I DTR, and TR regist ers each have a load and st ore inst ruct ion for
loading dat a int o and st oring dat a from t he regist er:
LGDT ( Load GDTR Regi st er ) Loads t he GDT base address and limit from
memory int o t he GDTR regist er.
SGDT ( St or e GDTR Regi st er ) St ores t he GDT base address and limit from
t he GDTR regist er int o memory.
LI DT ( Load I DTR Regi st er ) Loads t he I DT base address and limit from
memory int o t he I DTR regist er.
SI DT ( Load I DTR Regi st er St ores t he I DT base address and limit from t he
I DTR regist er int o memory.
LLDT ( Load LDT Regi st er ) Loads t he LDT segment select or and segment
descript or from memory int o t he LDTR. ( The segment select or operand can also
be locat ed in a general- purpose regist er. )
SLDT ( St or e LDT Regi st er ) St ores t he LDT segment select or from t he LDTR
regist er int o memory or a general- purpose regist er.
LTR ( Load Task Regi st er ) Loads segment select or and segment descript or
for a TSS from memory int o t he t ask regist er. ( The segment select or operand can
also be locat ed in a general- purpose regist er. )
RDTSCP
7
Read Serialized Time-Stamp Counter Yes Yes
2
XGETBV Return the state of the the
XFEATURE_ENABLED_MASK register
Yes No
XSETBV Enable one or more processor
extended states
No
6
Yes
NOTES:
1. Useful to application programs running at a CPL of 1 or 2.
2. The TSD and PCE flags in control register CR4 control access to these instructions by application
programs running at a CPL of 3.
3. These instructions were introduced into the IA-32 Architecture with the Pentium processor.
4. This instruction was introduced into the IA-32 Architecture with the Pentium Pro processor and
the Pentium processor with MMX technology.
5. This instruction is not supported in 64-bit mode.
6. Application uses XGETBV to query which set of processor extended states are enabled.
7. RDTSCP is introduced in Intel Core i7 processor.
Table 2-2. Summary of System Instructions (Contd.)
Instruction Description
Useful to
Application?
Protected from
Application?
2-30 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
STR ( St or e Task Regi st er ) St ores t he segment select or for t he current t ask
TSS from t he t ask regist er int o memory or a general- purpose regist er.
The LMSW ( load machine st at us word) and SMSW ( st ore machine st at us word)
inst ruct ions operat e on bit s 0 t hrough 15 of cont rol regist er CR0. These inst ruct ions
are provided for compat ibilit y wit h t he 16- bit I nt el 286 processor. Programs writ t en
t o run on 32- bit I A- 32 processors should not use t hese inst ruct ions. I nst ead, t hey
should access t he cont rol regist er CR0 using t he MOV inst ruct ion.
The CLTS ( clear TS flag in CR0) inst ruct ion is provided for use in handling a
device- not - available except ion ( # NM) t hat occurs when t he processor at t empt s t o
execut e a float ing- point inst ruct ion when t he TS flag is set . This inst ruct ion allows
t he TS flag t o be cleared aft er t he x87 FPU cont ext has been saved, prevent ing
furt her # NM except ions. See Sect ion 2. 5, Cont rol Regist ers, for more informat ion
on t he TS flag.
The cont rol regist ers ( CR0, CR1, CR2, CR3, CR4, and CR8) are loaded using t he MOV
inst ruct ion. The inst ruct ion loads a cont rol regist er from a general- purpose regist er
or st ores t he cont ent of a cont rol regist er in a general- purpose regist er.
2.7.2 Verifying of Access Privileges
The processor provides several inst ruct ions for examining segment select ors
and segment descript ors t o det ermine if access t o t heir associat ed segment s
is allowed. These inst ruct ions duplicat e some of t he aut omat ic access right s
and t ype checking done by t he processor, t hus allowing operat ing- syst em or
execut ive soft ware t o prevent except ions fr om being generat ed.
The ARPL ( adj ust RPL) inst ruct ion adj ust s t he RPL ( request or privilege level)
of a segment select or t o mat ch t hat of t he program or procedure t hat
supplied t he segment select or. See Sect ion 5. 10. 4, Checking Caller Access
Privileges ( ARPL I nst ruct ion) , for a det ailed explanat ion of t he funct ion and
use of t his inst ruct ion. Not e t hat ARPL is not support ed in 64- bit mode.
The LAR ( load access right s) inst ruct ion verifies t he accessibilit y of a speci-
fied segment and loads access right s informat ion from t he segment s
segment descript or int o a general- purpose regist er. Soft ware can t hen
examine t he access right s t o det ermine if t he segment t ype is compat ible
wit h it s int ended use. See Sect ion 5. 10. 1, Checking Access Right s ( LAR
I nst ruct ion) , for a det ailed explanat ion of t he funct ion and use of t his
inst ruct ion.
The LSL ( load segment limit ) inst ruct ion verifies t he accessibilit y of a speci-
fied segment and loads t he segment limit from t he segment s segment
descript or int o a general- purpose regist er. Soft ware can t hen compare t he
segment limit wit h an offset int o t he segment t o det ermine whet her t he
offset lies wit hin t he segment . See Sect ion 5. 10. 3, Checking That t he
Point er Offset I s Wit hin Limit s ( LSL I nst ruct ion) , for a det ailed explanat ion
of t he funct ion and use of t his inst ruct ion.
Vol. 3 2-31
SYSTEM ARCHITECTURE OVERVIEW
The VERR ( verify for reading) and VERW ( verify for writ ing) inst ruct ions
verify if a select ed segment is readable or writ able, r espect ively, at a given
CPL. See Sect ion 5. 10. 2, Checking Read/ Writ e Right s ( VERR and VERW
I nst ruct ions) , for a det ailed explanat ion of t he funct ion and use of t his
inst ruct ion.
2.7.3 Loading and Storing Debug Registers
I nt ernal debugging facilit ies in t he processor are cont rolled by a set of 8 debug regis-
t ers ( DR0- DR7) . The MOV inst ruct ion allows set up dat a t o be loaded t o and st ored
from t hese regist ers.
On processors t hat support I nt el 64 archit ect ure, debug regist ers DR0- DR7 are 64
bit s. I n 32- bit modes and compat ibilit y mode, writ es t o a debug regist er fill t he upper
32 bit s wit h zeros. Reads ret urn t he lower 32 bit s. I n 64- bit mode, t he upper 32 bit s
of DR6- DR7 are reserved and must be writ t en wit h zeros. Writ ing one t o any of t he
upper 32 bit s causes an except ion, # GP( 0) .
I n 64- bit mode, MOV DRn inst ruct ions read or writ e all 64 bit s of a debug regist er
( operand- size prefixes are ignored) . All 64 bit s of DR0- DR3 are writ able by soft ware.
However, MOV DRn inst ruct ions do not check t hat addresses writ t en t o DR0- DR3 are
in t he limit s of t he implement at ion. Address mat ching is support ed only on valid
addresses generat ed by t he processor implement at ion.
2.7.4 Invalidating Caches and TLBs
The processor provides several inst ruct ions for use in explicit ly invalidat ing it s caches
and TLB ent ries. The I NVD ( invalidat e cache wit h no writ eback) inst ruct ion invali-
dat es all dat a and inst ruct ion ent ries in t he int ernal caches and sends a signal t o t he
ext ernal caches indicat ing t hat t hey should be also be invalidat ed.
The WBI NVD ( invalidat e cache wit h writ eback) inst ruct ion performs t he same func-
t ion as t he I NVD inst ruct ion, except t hat it writ es back modified lines in it s int ernal
caches t o memory before it invalidat es t he caches. Aft er invalidat ing t he int ernal
caches, WBI NVD signals ext ernal caches t o writ e back modified dat a and invalidat e
t heir cont ent s.
The I NVLPG ( invalidat e TLB ent ry) inst ruct ion invalidat es ( flushes) t he TLB ent ry for
a specified page.
2.7.5 Controlling the Processor
The HLT ( halt processor) inst ruct ion st ops t he processor unt il an enabled int errupt
( such as NMI or SMI , which are normally enabled) , a debug except ion, t he BI NI T#
signal, t he I NI T# signal, or t he RESET# signal is received. The processor generat es a
special bus cycle t o indicat e t hat t he halt mode has been ent ered.
2-32 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
Hardware may respond t o t his signal in a number of ways. An indicat or light on t he
front panel may be t urned on. An NMI int errupt for recording diagnost ic informat ion
may be generat ed. Reset init ializat ion may be invoked ( not e t hat t he BI NI T# pin was
int roduced wit h t he Pent ium Pro processor) . I f any non- wake event s are pending
during shut down, t hey will be handled aft er t he wake event from shut down is
processed ( for example, A20M# int errupt s) .
The LOCK prefix invokes a locked ( at omic) read- modify- writ e operat ion when modi-
fying a memory operand. This mechanism is used t o allow reliable communicat ions
bet ween processors in mult iprocessor syst ems, as described below:
I n t he Pent ium processor and earlier I A- 32 processors, t he LOCK prefix causes
t he processor t o assert t he LOCK# signal during t he inst ruct ion. This always
causes an explicit bus lock t o occur.
I n t he Pent ium 4, I nt el Xeon, and P6 family processors, t he locking operat ion is
handled wit h eit her a cache lock or bus lock. I f a memory access is cacheable and
affect s only a single cache line, a cache lock is invoked and t he syst em bus and
t he act ual memory locat ion in syst em memory are not locked during t he
operat ion. Here, ot her Pent ium 4, I nt el Xeon, or P6 family processors on t he bus
writ e- back any modified dat a and invalidat e t heir caches as necessary t o
maint ain syst em memory coherency. I f t he memory access is not cacheable
and/ or it crosses a cache line boundary, t he processor s LOCK# signal is assert ed
and t he processor does not respond t o request s for bus cont rol during t he locked
operat ion.
The RSM ( ret urn from SMM) inst ruct ion rest ores t he processor ( from a cont ext
dump) t o t he st at e it was in prior t o an syst em management mode ( SMM) int errupt .
2.7.6 Reading Performance-Monitoring and Time-Stamp Counters
The RDPMC ( read performance- monit oring count er) and RDTSC ( read t ime- st amp
count er) inst ruct ions allow applicat ion programs t o read t he processor s perfor-
mance- monit oring and t ime- st amp count ers, respect ively. Processors based on I nt el
Net Burst

microarchit ect ure have eight een 40- bit performance- monit oring
count ers; P6 family processors have t wo 40- bit count ers. I nt el

At om processors
and most of t he processors based on t he I nt el Core microarchit ect ure support t wo
t ypes of performance monit oring count ers: t wo programmable performance
count ers similar t o t hose available in t he P6 family, and t hree fixed- funct ion perfor-
mance monit oring count ers.
The programmable performance count ers can support count ing eit her t he occurrence
or durat ion of event s. Event s t hat can be monit ored on programmable count ers
generally are model specific ( except for archit ect ural performance event s enumer-
at ed by CPUI D leaf 0AH) ; t hey may include t he number of inst ruct ions decoded,
int errupt s received, or t he number of cache loads. I ndividual count ers can be set up
t o monit or different event s. Use t he syst em inst ruct ion WRMSR t o set up values in
I A32_PERFEVTSEL0/ 1 ( for I nt el At om, I nt el Core 2, I nt el Core Duo, and I nt el
Pent ium M processors) , in one of t he 45 ESCRs and one of t he 18 CCCR MSRs ( for
Vol. 3 2-33
SYSTEM ARCHITECTURE OVERVIEW
Pent ium 4 and I nt el Xeon processors) ; or in t he PerfEvt Sel0 or t he PerfEvt Sel1 MSR
( for t he P6 family processors) . The RDPMC inst ruct ion loads t he current count from
t he select ed count er int o t he EDX: EAX regist ers.
Fixed- funct ion performance count ers record only specific event s t hat are defined in
Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext ensions , and t he widt h/ number of
fixed- funct ion count ers are enumerat ed by CPUI D leaf 0AH.
The t ime- st amp count er is a model- specific 64- bit count er t hat is reset t o zero each
t ime t he pr ocessor i s r eset . I f not r eset , t he count er wi l l i ncr ement ~ 9. 5 x 10
16
t i mes per year when t he pr ocessor i s operat i ng at a cl ock rat e of 3GHz. At t hi s
cl ock f r equency, i t woul d t ake over 190 year s f or t he count er t o wrap ar ound. The
RDTSC i nst r uct i on l oads t he cur r ent count of t he t i me- st amp count er i nt o t he
EDX: EAX regist ers.
See Sect ion 30. 1, Performance Monit oring Overview, and Sect ion 16. 11, Time-
St amp Count er, for more informat ion about t he performance monit oring and t ime-
st amp count ers.
The RDTSC inst ruct ion was int roduced int o t he I A- 32 archit ect ure wit h t he Pent ium
processor. The RDPMC inst ruct ion was int roduced int o t he I A- 32 archit ect ure wit h t he
Pent ium Pro processor and t he Pent ium processor wit h MMX t echnology. Earlier
Pent ium processors have t wo performance- monit oring count ers, but t hey can be
read only wit h t he RDMSR inst ruct ion, and only at privilege level 0.
2.7.6.1 Reading Counters in 64-Bit Mode
I n 64- bit mode, RDTSC operat es t he same as in prot ect ed mode. The count in t he
t ime- st amp count er is st ored in EDX: EAX ( or RDX[ 31: 0] : RAX[ 31: 0] wit h
RDX[ 63: 32] : RAX[ 63: 32] cleared) .
RDPMC requires an index t o specify t he offset of t he performance- monit oring
count er. I n 64- bit mode for Pent ium 4 or I nt el Xeon processor families, t he index is
specified in ECX[ 30: 0] . The current count of t he performance- monit oring count er is
st ored in EDX: EAX ( or RDX[ 31: 0] : RAX[ 31: 0] wit h RDX[ 63: 32] : RAX[ 63: 32]
cleared) .
2.7.7 Reading and Writing Model-Specific Registers
The RDMSR ( read model- specific regist er) and WRMSR ( writ e model- specific
regist er) inst ruct ions allow a processor s 64- bit model- specific regist ers ( MSRs) t o be
read and writ t en, respect ively. The MSR t o be read or writ t en is specified by t he value
in t he ECX regist er.
RDMSR reads t he value from t he specified MSR t o t he EDX: EAX regist ers; WRMSR
writ es t he value in t he EDX: EAX regist ers t o t he specified MSR. RDMSR and WRMSR
were int roduced int o t he I A- 32 archit ect ure wit h t he Pent ium processor.
See Sect ion 9.4, Model- Specific Regist ers ( MSRs) , for more informat ion.
2-34 Vol. 3
SYSTEM ARCHITECTURE OVERVIEW
2.7.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode
RDMSR and WRMSR require an index t o specify t he address of an MSR. I n 64- bit
mode, t he index is 32 bit s; it is specified using ECX.
2.7.8 Enabling Processor Extended States
The XSETBV inst ruct ion is required t o enable OS support of individual processor
ext ended st at es in t he XFEATURE_ENABLED_MASK regist er ( see Sect ion 2. 6) .
Vol. 3 3-1
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
This chapt er describes t he I nt el 64 and I A- 32 archit ect ures prot ect ed- mode memory
management facilit ies, including t he physical memory requirement s, segment at ion
mechanism, and paging mechanism.
See also: Chapt er 5, Prot ect ion ( for a descript ion of t he processor s prot ect ion
mechanism) and Chapt er 17, 8086 Emulat ion ( for a descript ion of memory
addressing prot ect ion in real- address and virt ual- 8086 modes) .
3.1 MEMORY MANAGEMENT OVERVIEW
The memory management facilit ies of t he I A- 32 archit ect ure are divided int o t wo
part s: segment at ion and paging. Segment at ion provides a mechanism of isolat ing
individual code, dat a, and st ack modules so t hat mult iple programs ( or t asks) can
run on t he same processor wit hout int erfering wit h one anot her. Paging provides a
mechanism for implement ing a convent ional demand- paged, virt ual- memory syst em
where sect ions of a programs execut ion environment are mapped int o physical
memory as needed. Paging can also be used t o provide isolat ion bet ween mult iple
t asks. When operat ing in prot ect ed mode, some form of segment at ion must be used.
Ther e i s no mode bi t t o di sabl e segment at i on. The use of paging, however, is
opt ional.
These t wo mechanisms ( segment at ion and paging) can be configured t o support
simple single- program ( or single- t ask) syst ems, mult it asking syst ems, or mult iple-
processor syst ems t hat used shared memory.
As shown in Figure 3- 1, segment at ion provides a mechanism for dividing t he
processor s addressable memory space ( called t he l i near addr ess space) int o
smaller prot ect ed address spaces called segment s. Segment s can be used t o hold
t he code, dat a, and st ack for a program or t o hold syst em dat a st ruct ures ( such as a
TSS or LDT) . I f more t han one program ( or t ask) is running on a processor, each
program can be assigned it s own set of segment s. The processor t hen enforces t he
boundaries bet ween t hese segment s and insures t hat one program does not int erfere
wit h t he execut ion of anot her program by writ ing int o t he ot her programs segment s.
The segment at ion mechanism also allows t yping of segment s so t hat t he operat ions
t hat may be performed on a part icular t ype of segment can be rest rict ed.
All t he segment s in a syst em are cont ained in t he processor s linear address space.
To locat e a byt e in a part icular segment , a l ogi cal addr ess ( also called a far point er)
must be provided. A logical address consist s of a segment select or and an offset . The
segment select or is a unique ident ifier for a segment . Among ot her t hings it provides
an offset int o a descript or t able ( such as t he global descript or t able, GDT) t o a dat a
st ruct ure called a segment descript or. Each segment has a segment descript or, which
specifies t he size of t he segment , t he access right s and privilege level for t he
3-2 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
segment , t he segment t ype, and t he locat ion of t he first byt e of t he segment in t he
linear address space ( called t he base address of t he segment ) . The offset part of t he
logical address is added t o t he base address for t he segment t o locat e a byt e wit hin
t he segment . The base address plus t he offset t hus forms a l i near addr ess in t he
processor s linear address space.
I f paging is not used, t he linear address space of t he processor is mapped direct ly
int o t he physical address space of processor. The physical address space is defined as
t he range of addresses t hat t he processor can generat e on it s address bus.
Because mult it asking comput ing syst ems commonly define a linear address space
much larger t han it is economically feasible t o cont ain all at once in physical memory,
some met hod of virt ualizing t he linear address space is needed. This virt ualizat ion
of t he linear address space is handled t hrough t he processor s paging mechanism.
Paging support s a virt ual memory environment where a large linear address space
is simulat ed wit h a small amount of physical memory ( RAM and ROM) and some disk
Figure 3-1. Segmentation and Paging
Global Descriptor
Table (GDT)
Linear Address
Space
Segment
Segment
Descriptor
Offset
Logical Address
Segment
Base Address
Page
Phy. Addr.
Lin. Addr.
Segment
Selector
Dir Table Offset
Linear Address
Page Table
Page Directory
Entry
Physical
Space
Entry
(or Far Pointer)
Paging Segmentation
Address
Page
Vol. 3 3-3
PROTECTED-MODE MEMORY MANAGEMENT
st orage. When using paging, each segment is divided int o pages ( t ypically 4 KByt es
each in size) , which are st ored eit her in physical memory or on t he disk. The oper-
at ing syst em or execut ive maint ains a page direct ory and a set of page t ables t o keep
t rack of t he pages. When a program ( or t ask) at t empt s t o access an address locat ion
in t he linear address space, t he processor uses t he page direct ory and page t ables t o
t ranslat e t he linear address int o a physical address and t hen performs t he request ed
operat ion ( read or writ e) on t he memory locat ion.
I f t he page being accessed is not current ly in physical memory, t he processor int er-
rupt s execut ion of t he program ( by generat ing a page- fault except ion) . The oper-
at ing syst em or execut ive t hen reads t he page int o physical memory from t he disk
and cont inues execut ing t he program.
When paging is implement ed properly in t he operat ing- syst em or execut ive, t he
swapping of pages bet ween physical memory and t he disk is t ransparent t o t he
correct execut ion of a program. Even programs writ t en for 16- bit I A- 32 processors
can be paged ( t ransparent ly) when t hey are run in virt ual- 8086 mode.
3.2 USING SEGMENTS
The segment at ion mechanism support ed by t he I A- 32 archit ect ure can be used t o
implement a wide variet y of syst em designs. These designs range from flat models
t hat make only minimal use of segment at ion t o prot ect programs t o mult i-
segment ed models t hat employ segment at ion t o creat e a robust operat ing environ-
ment in which mult iple programs and t asks can be execut ed reliably.
The following sect ions give several examples of how segment at ion can be employed
in a syst em t o improve memory management performance and reliabilit y.
3.2.1 Basic Flat Model
The simplest memory model for a syst em is t he basic flat model, in which t he oper-
at ing syst em and applicat ion programs have access t o a cont inuous, unsegment ed
address space. To t he great est ext ent possible, t his basic flat model hides t he
segment at ion mechanism of t he archit ect ure from bot h t he syst em designer and t he
applicat ion programmer.
To implement a basic flat memory model wit h t he I A- 32 archit ect ure, at least t wo
segment descript ors must be creat ed, one for referencing a code segment and one
for referencing a dat a segment ( see Figure 3- 2) . Bot h of t hese segment s, however,
are mapped t o t he ent ire linear address space: t hat is, bot h segment descript ors
have t he same base address value of 0 and t he same segment limit of 4 GByt es. By
set t ing t he segment limit t o 4 GByt es, t he segment at ion mechanism is kept from
generat ing except ions for out of limit memory references, even if no physical
memory resides at a part icular address. ROM ( EPROM) is generally locat ed at t he t op
of t he physical address space, because t he processor begins execut ion at
3-4 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
FFFF_FFF0H. RAM ( DRAM) is placed at t he bot t om of t he address space because t he
init ial base address for t he DS dat a segment aft er reset init ializat ion is 0.
3.2.2 Protected Flat Model
The prot ect ed flat model is similar t o t he basic flat model, except t he segment limit s
are set t o include only t he range of addresses for which physical memory act ually
exist s ( see Figure 3- 3) . A general- prot ect ion except ion ( # GP) is t hen generat ed on
any at t empt t o access nonexist ent memory. This model provides a minimum level of
hardware prot ect ion against some kinds of program bugs.
Figure 3-2. Flat Model
Figure 3-3. Protected Flat Model
Linear Address Space
(or Physical Memory)
Data and
FFFFFFFFH
Segment
Limit Access
Base Address
Registers
CS
SS
DS
ES
FS
GS
Code
0
Code- and Data-Segment
Descriptors
Stack
Not Present
Linear Address Space
(or Physical Memory)
Data and
FFFFFFFFH
Segment
Limit Access
Base Address
Registers
CS
ES
SS
DS
FS
GS
Code
0
Segment
Descriptors
Limit Access
Base Address
Memory I/O
Stack
Not Present
Vol. 3 3-5
PROTECTED-MODE MEMORY MANAGEMENT
More complexit y can be added t o t his prot ect ed flat model t o provide more prot ec-
t ion. For example, for t he paging mechanism t o provide isolat ion bet ween user and
supervisor code and dat a, four segment s need t o be defined: code and dat a
segment s at privilege level 3 for t he user, and code and dat a segment s at privilege
level 0 for t he supervisor. Usually t hese segment s all overlay each ot her and st art at
address 0 in t he linear address space. This flat segment at ion model along wit h a
simple paging st ruct ure can prot ect t he operat ing syst em from applicat ions, and by
adding a separat e paging st ruct ure for each t ask or process, it can also prot ect appli-
cat ions from each ot her. Similar designs are used by several popular mult it asking
operat ing syst ems.
3.2.3 Multi-Segment Model
A mult i- segment model ( such as t he one shown in Figure 3- 4) uses t he full capabili-
t ies of t he segment at ion mechanism t o provided hardware enforced prot ect ion of
code, dat a st ruct ures, and programs and t asks. Here, each program ( or t ask) is given
it s own t able of segment descript ors and it s own segment s. The segment s can be
complet ely privat e t o t heir assigned programs or shared among programs. Access t o
all segment s and t o t he execut ion environment s of individual programs running on
t he syst em is cont rolled by hardware.
3-6 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
Access checks can be used t o prot ect not only against referencing an address out side
t he limit of a segment , but also against performing disallowed operat ions in cert ain
segment s. For example, since code segment s are designat ed as read- only segment s,
hardware can be used t o prevent writ es int o code segment s. The access right s infor-
mat ion creat ed for segment s can also be used t o set up prot ect ion rings or levels.
Prot ect ion levels can be used t o prot ect operat ing- syst em procedures from unaut ho-
rized access by applicat ion programs.
3.2.4 Segmentation in IA-32e Mode
I n I A- 32e mode of I nt el 64 archit ect ure, t he effect s of segment at ion depend on
whet her t he processor is running in compat ibilit y mode or 64- bit mode. I n compat i-
bilit y mode, segment at ion funct ions j ust as it does using legacy 16- bit or 32- bit
prot ect ed mode semant ics.
Figure 3-4. Multi-Segment Model
Linear Address Space
(or Physical Memory)
Segment
Registers
CS
Segment
Descriptors
Limit Access
Base Address
SS
Limit Access
Base Address
DS
Limit Access
Base Address
ES
Limit Access
Base Address
FS
Limit Access
Base Address
GS
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Limit Access
Base Address
Stack
Code
Data
Data
Data
Data
Vol. 3 3-7
PROTECTED-MODE MEMORY MANAGEMENT
I n 64- bit mode, segment at ion is generally ( but not complet ely) disabled, creat ing a
flat 64- bit linear- address space. The processor t reat s t he segment base of CS, DS,
ES, SS as zero, creat ing a linear address t hat is equal t o t he effect ive address. The FS
and GS segment s are except ions. These segment regist ers ( which hold t he segment
base) can be used as an addit ional base regist ers in linear address calculat ions. They
facilit at e addressing local dat a and cert ain operat ing syst em dat a st ruct ures.
Not e t hat t he processor does not perform segment limit checks at runt ime in 64- bit
mode.
3.2.5 Paging and Segmentation
Paging can be used wit h any of t he segment at ion models described in Figures 3- 2,
3- 3, and 3- 4. The processor s paging mechanism divides t he linear address space
( int o which segment s are mapped) int o pages ( as shown in Figure 3- 1) . These linear-
address- space pages are t hen mapped t o pages in t he physical address space. The
paging mechanism offers several page- level prot ect ion facilit ies t hat can be used
wit h or inst ead of t he segment - prot ect ion facilit ies. For example, it let s read- writ e
prot ect ion be enforced on a page- by- page basis. The paging mechanism also
provides t wo- level user- supervisor prot ect ion t hat can also be specified on a page-
by- page basis.
3.3 PHYSICAL ADDRESS SPACE
I n prot ect ed mode, t he I A- 32 archit ect ure provides a normal physical address space
of 4 GByt es ( 2
32
byt es) . This is t he address space t hat t he processor can address on
it s address bus. This address space is flat ( unsegment ed) , wit h addresses ranging
cont inuously from 0 t o FFFFFFFFH. This physical address space can be mapped t o
read- writ e memory, read- only memory, and memory mapped I / O. The memory
mapping facilit ies described in t his chapt er can be used t o divide t his physical
memory up int o segment s and/ or pages.
St art ing wit h t he Pent ium Pro processor, t he I A- 32 archit ect ure also support s an
ext ension of t he physical address space t o 2
36
byt es ( 64 GByt es) ; wit h a maximum
physical address of FFFFFFFFFH. This ext ension is invoked in eit her of t wo ways:
Using t he physical address ext ension ( PAE) flag, locat ed in bit 5 of cont rol
regist er CR4.
Using t he 36- bit page size ext ension ( PSE- 36) feat ure ( int roduced in t he Pent ium
III processors) .
Physical address support has since been ext ended beyond 36 bit s. See Chapt er 4,
Paging for more informat ion about 36- bit physical addressing.
3-8 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.3.1 Intel

64 Processors and Physical Address Space
On processors t hat support I nt el 64 archit ect ure ( CPUI D. 80000001: EDX[ 29] = 1) ,
t he size of t he physical address range is implement at ion- specific and indicat ed by
CPUI D.80000008H: EAX[ bit s 7- 0] .
For t he format of informat ion ret urned in EAX, see CPUI DCPU I dent ificat ion in
Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2A. See also: Chapt er 4, Paging.
3.4 LOGICAL AND LINEAR ADDRESSES
At t he syst em- archit ect ure level in prot ect ed mode, t he processor uses t wo st ages of
address t ranslat ion t o arrive at a physical address: logical- address t ranslat ion and
linear address space paging.
Even wit h t he minimum use of segment s, every byt e in t he processor s address
space is accessed wit h a logical address. A logical address consist s of a 16- bit
segment select or and a 32- bit offset ( see Figure 3- 5) . The segment select or ident i-
fies t he segment t he byt e is locat ed in and t he offset specifies t he locat ion of t he byt e
in t he segment relat ive t o t he base address of t he segment .
The processor t ranslat es every logical address int o a linear address. A linear address
is a 32- bit address in t he processor s linear address space. Like t he physical address
space, t he linear address space is a flat ( unsegment ed) , 2
32
- byt e address space,
wit h addresses ranging from 0 t o FFFFFFFFH. The linear address space cont ains all
t he segment s and syst em t ables defined for a syst em.
To t ranslat e a logical address int o a linear address, t he processor does t he following:
1. Uses t he offset in t he segment select or t o locat e t he segment descript or for t he
segment in t he GDT or LDT and reads it int o t he processor. ( This st ep is needed
only when a new segment select or is loaded int o a segment regist er. )
2. Examines t he segment descript or t o check t he access right s and range of t he
segment t o insure t hat t he segment is accessible and t hat t he offset is wit hin t he
limit s of t he segment .
3. Adds t he base address of t he segment from t he segment descript or t o t he offset
t o form a linear address.
Vol. 3 3-9
PROTECTED-MODE MEMORY MANAGEMENT
I f paging is not used, t he processor maps t he linear address direct ly t o a physical
address ( t hat is, t he linear address goes out on t he processor s address bus) . I f t he
linear address space is paged, a second level of address t ranslat ion is used t o t rans-
lat e t he linear address int o a physical address.
See also: Chapt er 4, Paging.
3.4.1 Logical Address Translation in IA-32e Mode
I n I A- 32e mode, an I nt el 64 processor uses t he st eps described above t o t ranslat e a
logical address t o a linear address. I n 64- bit mode, t he offset and base address of t he
segment are 64- bit s inst ead of 32 bit s. The linear address format is also 64 bit s wide
and is subj ect t o t he canonical form requirement .
Each code segment descript or provides an L bit . This bit allows a code segment t o
execut e 64- bit code or legacy 32- bit code by code segment .
3.4.2 Segment Selectors
A segment select or is a 16- bit ident ifier for a segment ( see Figure 3- 6) . I t does not
point direct ly t o t he segment , but inst ead point s t o t he segment descript or t hat
defines t he segment . A segment select or cont ains t he following it ems:
I ndex ( Bit s 3 t hrough 15) Select s one of 8192 descript ors in t he GDT or
LDT. The processor mult iplies t he index value by 8 ( t he number of
byt es in a segment descript or) and adds t he result t o t he base address
of t he GDT or LDT ( from t he GDTR or LDTR regist er, respect ively) .
Figure 3-5. Logical Address to Linear Address Translation
Offset (Effective Address)
0
Base Address
Descriptor Table
Segment
Descriptor
31(63)
Seg. Selector
0 15
Logical
Address
+
Linear Address
0 31(63)
3-10 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
TI ( t abl e i ndi cat or ) f l ag
( Bit 2) Specifies t he descript or t able t o use: clearing t his flag
select s t he GDT; set t ing t his flag select s t he current LDT.
Request ed Pr i vi l ege Lev el ( RPL)
( Bit s 0 and 1) Specifies t he privilege level of t he select or. The priv-
ilege level can range from 0 t o 3, wit h 0 being t he most privileged
level. See Sect ion 5. 5, Privilege Levels , for a descript ion of t he rela-
t ionship of t he RPL t o t he CPL of t he execut ing program ( or t ask) and
t he descript or privilege level ( DPL) of t he descript or t he segment
select or point s t o.
The first ent ry of t he GDT is not used by t he processor. A segment select or t hat point s
t o t his ent ry of t he GDT ( t hat is, a segment select or wit h an index of 0 and t he TI flag
set t o 0) is used as a null segment select or. The processor does not generat e an
except ion when a segment regist er ( ot her t han t he CS or SS regist ers) is loaded wit h
a null select or. I t does, however, generat e an except ion when a segment regist er
holding a null select or is used t o access memory. A null select or can be used t o
init ialize unused segment regist ers. Loading t he CS or SS regist er wit h a null
segment select or causes a general- prot ect ion except ion ( # GP) t o be generat ed.
Segment select ors are visible t o applicat ion programs as part of a point er variable,
but t he values of select ors are usually assigned or modified by link edit ors or linking
loaders, not applicat ion programs.
3.4.3 Segment Registers
To reduce address t ranslat ion t ime and coding complexit y, t he processor provides
regist ers for holding up t o 6 segment select ors ( see Figure 3- 7) . Each of t hese
segment regist ers support a specific kind of memory reference ( code, st ack, or
dat a) . For virt ually any kind of program execut ion t o t ake place, at least t he code-
segment ( CS) , dat a- segment ( DS) , and st ack- segment ( SS) regist ers must be
loaded wit h valid segment select ors. The processor also provides t hree addit ional
dat a- segment regist ers ( ES, FS, and GS) , which can be used t o make addit ional dat a
segment s available t o t he current ly execut ing program ( or t ask) .
Figure 3-6. Segment Selector
15 3 2 1 0
T
I
Index
Table Indicator
0 = GDT
1 = LDT
Requested Privilege Level (RPL)
RPL
Vol. 3 3-11
PROTECTED-MODE MEMORY MANAGEMENT
For a program t o access a segment , t he segment select or for t he segment must have
been loaded in one of t he segment regist ers. So, alt hough a syst em can define t hou-
sands of segment s, only 6 can be available for immediat e use. Ot her segment s can
be made available by loading t heir segment select ors int o t hese regist ers during
program execut ion.
Every segment regist er has a visible part and a hidden part . ( The hidden part is
somet imes referred t o as a descript or cache or a shadow regist er. ) When a
segment select or is loaded int o t he visible part of a segment regist er, t he processor
also loads t he hidden part of t he segment regist er wit h t he base address, segment
limit , and access cont rol informat ion from t he segment descript or point ed t o by t he
segment select or. The informat ion cached in t he segment regist er ( visible and
hidden) allows t he processor t o t ranslat e addresses wit hout t aking ext ra bus cycles
t o read t he base address and limit from t he segment descript or. I n syst ems in which
mult iple processors have access t o t he same descript or t ables, it is t he responsibilit y
of soft ware t o reload t he segment regist ers when t he descript or t ables are modified.
I f t his is not done, an old segment descript or cached in a segment regist er might be
used aft er it s memory- resident version has been modified.
Two kinds of load inst ruct ions are provided for loading t he segment regist ers:
1. Direct load inst ruct ions such as t he MOV, POP, LDS, LES, LSS, LGS, and LFS
inst ruct ions. These inst ruct ions explicit ly reference t he segment regist ers.
2. I mplied load inst ruct ions such as t he far point er versions of t he CALL, JMP, and
RET inst ruct ions, t he SYSENTER and SYSEXI T inst ruct ions, and t he I RET, I NTn,
I NTO and I NT3 inst ruct ions. These inst ruct ions change t he cont ent s of t he CS
regist er ( and somet imes ot her segment regist ers) as an incident al part of t heir
operat ion.
The MOV inst ruct ion can also be used t o st ore visible part of a segment regist er in a
general- purpose regist er.
Figure 3-7. Segment Registers
CS
SS
DS
ES
FS
GS
Segment Selector Base Address, Limit, Access Information
Visible Part Hidden Part
3-12 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.4.4 Segment Loading Instructions in IA-32e Mode
Because ES, DS, and SS segment regist ers are not used in 64- bit mode, t heir fields
( base, limit , and at t ribut e) in segment descript or regist ers are ignored. Some forms
of segment load inst ruct ions are also invalid ( for example, LDS, POP ES) . Address
calculat ions t hat reference t he ES, DS, or SS segment s are t reat ed as if t he segment
base is zero.
The processor checks t hat all linear- address references are in canonical form inst ead
of performing limit checks. Mode swit ching does not change t he cont ent s of t he
segment regist ers or t he associat ed descript or regist ers. These regist ers are also not
changed during 64- bit mode execut ion, unless explicit segment loads are performed.
I n order t o set up compat ibilit y mode for an applicat ion, segment - load inst ruct ions
( MOV t o Sreg, POP Sreg) work normally in 64- bit mode. An ent ry is read from t he
syst em descript or t able ( GDT or LDT) and is loaded in t he hidden port ion of t he
segment descript or regist er. The descript or- regist er base, limit , and at t ribut e fields
are all loaded. However, t he cont ent s of t he dat a and st ack segment select or and t he
descript or regist ers are ignored.
When FS and GS segment overrides are used in 64- bit mode, t heir respect ive base
addresses are used in t he linear address calculat ion: ( FS or GS) . base + index +
displacement . FS. base and GS. base are t hen expanded t o t he full linear- address size
support ed by t he implement at ion. The result ing effect ive address calculat ion can
wrap across posit ive and negat ive addresses; t he result ing linear address must be
canonical.
I n 64- bit mode, memory accesses using FS- segment and GS- segment overrides are
not checked for a runt ime limit nor subj ect ed t o at t ribut e- checking. Normal segment
loads ( MOV t o Sreg and POP Sreg) int o FS and GS load a st andard 32- bit base value
in t he hidden port ion of t he segment descript or regist er. The base address bit s above
t he st andard 32 bit s are cleared t o 0 t o allow consist ency for implement at ions t hat
use less t han 64 bit s.
The hidden descript or regist er fields for FS. base and GS. base are physically mapped
t o MSRs in order t o load all address bit s support ed by a 64- bit implement at ion. Soft -
ware wit h CPL = 0 ( privileged soft ware) can load all support ed linear- address bit s
int o FS. base or GS. base using WRMSR. Addresses writ t en int o t he 64- bit FS. base and
GS. base regist ers must be in canonical form. A WRMSR inst ruct ion t hat at t empt s t o
writ e a non- canonical address t o t hose regist ers causes a # GP fault .
When in compat ibilit y mode, FS and GS overrides operat e as defined by 32- bit mode
behavior regardless of t he value loaded int o t he upper 32 linear- address bit s of t he
hidden descript or regist er base field. Compat ibilit y mode ignores t he upper 32 bit s
when calculat ing an effect ive address.
A new 64- bit mode inst ruct ion, SWAPGS, can be used t o load GS base. SWAPGS
exchanges t he kernel dat a st ruct ure point er from t he I A32_KernelGSbase MSR wit h
t he GS base regist er. The kernel can t hen use t he GS prefix on normal memory refer-
ences t o access t he kernel dat a st ruct ures. An at t empt t o writ e a non- canonical value
( using WRMSR) t o t he I A32_KernelGSBase MSR causes a # GP fault .
Vol. 3 3-13
PROTECTED-MODE MEMORY MANAGEMENT
3.4.5 Segment Descriptors
A segment descript or is a dat a st ruct ure in a GDT or LDT t hat provides t he processor
wit h t he size and locat ion of a segment , as well as access cont rol and st at us informa-
t ion. Segment descript ors are t ypically creat ed by compilers, linkers, loaders, or t he
operat ing syst em or execut ive, but not applicat ion programs. Figure 3- 8 illust rat es
t he general descript or format for all t ypes of segment descript ors.
The flags and fields in a segment descript or are as follows:
Segment l i mi t f i el d
Specifies t he size of t he segment . The processor put s t oget her t he
t wo segment limit fields t o form a 20- bit value. The processor int er-
pret s t he segment limit in one of t wo ways, depending on t he set t ing
of t he G ( granularit y) flag:
I f t he granularit y flag is clear, t he segment size can range from
1 byt e t o 1 MByt e, in byt e increment s.
I f t he granularit y flag is set , t he segment size can range from
4 KByt es t o 4 GByt es, in 4- KByt e increment s.
The processor uses t he segment limit in t wo different ways,
depending on whet her t he segment is an expand- up or an expand-
down segment . See Sect ion 3. 4.5. 1, Code- and Dat a- Segment
Descript or Types , for more informat ion about segment t ypes. For
expand- up segment s, t he offset in a logical address can range from 0
Figure 3-8. Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type S L
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
D
/
B
A
V
L
Seg.
Limit
19:16
G Granularity
LIMIT Segment Limit
P Segment present
S Descriptor type (0 = system; 1 = code or data)
TYPE Segment type
DPL Descriptor privilege level
AVL Available for use by system software
BASE Segment base address
D/B Default operation size (0 = 16-bit segment; 1 = 32-bit segment)
L 64-bit code segment (IA-32e mode only)
3-14 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
t o t he segment limit . Offset s great er t han t he segment limit generat e
general- prot ect ion except ions ( # GP) . For expand- down segment s,
t he segment limit has t he reverse funct ion; t he offset can range from
t he segment limit t o FFFFFFFFH or FFFFH, depending on t he set t ing of
t he B flag. Offset s less t han t he segment limit generat e general-
prot ect ion except ions. Decreasing t he value in t he segment limit field
for an expand- down segment allocat es new memory at t he bot t om of
t he segment ' s address space, rat her t han at t he t op. I A- 32 archit ec-
t ure st acks always grow downwards, making t his mechanism conve-
nient for expandable st acks.
Base addr ess f i el ds
Defines t he locat ion of byt e 0 of t he segment wit hin t he 4- GByt e
linear address space. The processor put s t oget her t he t hree base
address fields t o form a single 32- bit value. Segment base addresses
should be aligned t o 16- byt e boundaries. Alt hough 16- byt e alignment
is not required, t his alignment allows programs t o maximize perfor-
mance by aligning code and dat a on 16- byt e boundaries.
Type f i el d I ndicat es t he segment or gat e t ype and specifies t he kinds of access
t hat can be made t o t he segment and t he direct ion of growt h. The
int erpret at ion of t his field depends on whet her t he descript or t ype flag
specifies an applicat ion ( code or dat a) descript or or a syst em
descript or. The encoding of t he t ype field is different for code, dat a,
and syst em descript ors ( see Figure 5- 1) . See Sect ion 3. 4. 5.1, Code-
and Dat a- Segment Descript or Types , for a descript ion of how t his
field is used t o specify code and dat a- segment t ypes.
S ( descr i pt or t ype) f l ag
Specifies whet her t he segment descript or is for a syst em segment
( S flag is clear) or a code or dat a segment ( S flag is set ) .
DPL ( descr i pt or pr i vi l ege l ev el ) f i el d
Specifies t he privilege level of t he segment . The privilege level can
range from 0 t o 3, wit h 0 being t he most privileged level. The DPL is
used t o cont rol access t o t he segment . See Sect ion 5.5, Privilege
Levels , for a descript ion of t he relat ionship of t he DPL t o t he CPL of
t he execut ing code segment and t he RPL of a segment select or.
P ( segment - pr esent ) f l ag
I ndicat es whet her t he segment is present in memory ( set ) or not
present ( clear) . I f t his flag is clear, t he processor generat es a
segment - not - present except ion ( # NP) when a segment select or t hat
point s t o t he segment descript or is loaded int o a segment regist er.
Memory management soft ware can use t his flag t o cont rol which
segment s are act ually loaded int o physical memory at a given t ime. I t
offers a cont rol in addit ion t o paging for managing virt ual memory.
Figure 3- 9 shows t he format of a segment descript or when t he
segment - present flag is clear. When t his flag is clear, t he operat ing
syst em or execut ive is free t o use t he locat ions marked Available t o
Vol. 3 3-15
PROTECTED-MODE MEMORY MANAGEMENT
st ore it s own dat a, such as informat ion regarding t he whereabout s of
t he missing segment .
D/ B ( def aul t oper at i on si ze/ def aul t st ack poi nt er si ze and/ or upper bound)
f l ag
Performs different funct ions depending on whet her t he segment
descript or is an execut able code segment , an expand- down dat a
segment , or a st ack segment . ( This flag should always be set t o 1 for
32- bit code and dat a segment s and t o 0 for 16- bit code and dat a
segment s. )
Ex ecut abl e code segment . The flag is called t he D flag and it
indicat es t he default lengt h for effect ive addresses and operands
referenced by inst ruct ions in t he segment . I f t he flag is set , 32- bit
addresses and 32- bit or 8- bit operands are assumed; if it is clear,
16- bit addresses and 16- bit or 8- bit operands are assumed.
The inst ruct ion prefix 66H can be used t o select an operand size
ot her t han t he default , and t he prefix 67H can be used select an
address size ot her t han t he default .
St ack segment ( dat a segment poi nt ed t o by t he SS
r egi st er ) . The flag is called t he B ( big) flag and it specifies t he
size of t he st ack point er used for implicit st ack operat ions ( such as
pushes, pops, and calls) . I f t he flag is set , a 32- bit st ack point er is
used, which is st ored in t he 32- bit ESP regist er; if t he flag is clear,
a 16- bit st ack point er is used, which is st ored in t he 16- bit SP
regist er. I f t he st ack segment is set up t o be an expand- down dat a
segment ( described in t he next paragraph) , t he B flag also
specifies t he upper bound of t he st ack segment .
Ex pand- dow n dat a segment . The flag is called t he B flag and it
specifies t he upper bound of t he segment . I f t he flag is set , t he
upper bound is FFFFFFFFH ( 4 GByt es) ; if t he flag is clear, t he
upper bound is FFFFH ( 64 KByt es) .
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear
31 16 15 13 14 12 11 8 7 0
0 Available
D
P
L
Type S 4
31 0
Available
0
Available
3-16 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
G ( gr anul ar i t y ) f l ag
Det ermines t he scaling of t he segment limit field. When t he granu-
larit y flag is clear, t he segment limit is int erpret ed in byt e unit s; when
flag is set , t he segment limit is int erpret ed in 4- KByt e unit s. ( This flag
does not affect t he granularit y of t he base address; it is always byt e
granular. ) When t he granularit y flag is set , t he t welve least significant
bit s of an offset are not t est ed when checking t he offset against t he
segment limit . For example, when t he granularit y flag is set , a limit of
0 result s in valid offset s from 0 t o 4095.
L ( 64- bi t code segment ) f l ag
I n I A- 32e mode, bit 21 of t he second doubleword of t he segment
descript or indicat es whet her a code segment cont ains nat ive 64- bit
code. A value of 1 indicat es inst ruct ions in t his code segment are
execut ed in 64- bit mode. A value of 0 indicat es t he inst ruct ions in t his
code segment are execut ed in compat ibilit y mode. I f L- bit is set , t hen
D- bit must be cleared. When not in I A- 32e mode or for non- code
segment s, bit 21 is reserved and should always be set t o 0.
Avai l abl e and r eser v ed bi t s
Bit 20 of t he second doubleword of t he segment descript or is available
for use by syst em soft ware.
3.4.5.1 Code- and Data-Segment Descriptor Types
When t he S ( descript or t ype) flag in a segment descript or is set , t he descript or is for
eit her a code or a dat a segment . The highest order bit of t he t ype field ( bit 11 of t he
second double word of t he segment descript or) t hen det ermines whet her t he
descript or is for a dat a segment ( clear) or a code segment ( set ) .
For dat a segment s, t he t hree low- order bit s of t he t ype field ( bit s 8, 9, and 10) are
int erpret ed as accessed ( A) , writ e- enable ( W) , and expansion- direct ion ( E) . See
Table 3- 1 for a descript ion of t he encoding of t he bit s in t he t ype field for code and
dat a segment s. Dat a segment s can be read- only or read/ writ e segment s, depending
on t he set t ing of t he writ e- enable bit .
Vol. 3 3-17
PROTECTED-MODE MEMORY MANAGEMENT
St ack segment s are dat a segment s which must be read/ writ e segment s. Loading t he
SS regist er wit h a segment select or for a nonwrit able dat a segment generat es a
general- prot ect ion except ion ( # GP) . I f t he size of a st ack segment needs t o be
changed dynamically, t he st ack segment can be an expand- down dat a segment
( expansion- direct ion flag set ) . Here, dynamically changing t he segment limit causes
st ack space t o be added t o t he bot t om of t he st ack. I f t he size of a st ack segment is
int ended t o remain st at ic, t he st ack segment may be eit her an expand- up or expand-
down t ype.
The accessed bit indicat es whet her t he segment has been accessed since t he last
t ime t he operat ing- syst em or execut ive cleared t he bit . The processor set s t his bit
whenever it loads a segment select or for t he segment int o a segment regist er,
assuming t hat t he t ype of memory t hat cont ains t he segment descript or support s
processor writ es. The bit remains set unt il explicit ly cleared. This bit can be used bot h
for virt ual memory management and for debugging.
Table 3-1. Code- and Data-Segment Types
Type Field Descriptor
Type
Description
Decimal 11 10
E
9
W
8
A
0 0 0 0 0 Data Read-Only
1 0 0 0 1 Data Read-Only, accessed
2 0 0 1 0 Data Read/Write
3 0 0 1 1 Data Read/Write, accessed
4 0 1 0 0 Data Read-Only, expand-down
5 0 1 0 1 Data Read-Only, expand-down, accessed
6 0 1 1 0 Data Read/Write, expand-down
7 0 1 1 1 Data Read/Write, expand-down, accessed
C R A
8 1 0 0 0 Code Execute-Only
9 1 0 0 1 Code Execute-Only, accessed
10 1 0 1 0 Code Execute/Read
11 1 0 1 1 Code Execute/Read, accessed
12 1 1 0 0 Code Execute-Only, conforming
13 1 1 0 1 Code Execute-Only, conforming, accessed
14 1 1 1 0 Code Execute/Read, conforming
15 1 1 1 1 Code Execute/Read, conforming, accessed
3-18 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
For code segment s, t he t hree low- order bit s of t he t ype field are int erpret ed as
accessed ( A) , read enable ( R) , and conforming ( C) . Code segment s can be execut e-
only or execut e/ read, depending on t he set t ing of t he read- enable bit . An
execut e/ read segment might be used when const ant s or ot her st at ic dat a have been
placed wit h inst ruct ion code in a ROM. Here, dat a can be read from t he code segment
eit her by using an inst ruct ion wit h a CS override prefix or by loading a segment
select or for t he code segment in a dat a- segment regist er ( t he DS, ES, FS, or GS
regist ers) . I n prot ect ed mode, code segment s are not writ able.
Code segment s can be eit her conforming or nonconforming. A t ransfer of execut ion
int o a more- privileged conforming segment allows execut ion t o cont inue at t he
current privilege level. A t ransfer int o a nonconforming segment at a different privi-
lege level result s in a general- prot ect ion except ion ( # GP) , unless a call gat e or t ask
gat e is used ( see Sect ion 5. 8. 1, Direct Calls or Jumps t o Code Segment s , for more
informat ion on conforming and nonconforming code segment s) . Syst em ut ilit ies t hat
do not access prot ect ed facilit ies and handlers for some t ypes of except ions ( such as,
divide error or overflow) may be loaded in conforming code segment s. Ut ilit ies t hat
need t o be prot ect ed from less privileged programs and procedures should be placed
in nonconforming code segment s.
NOTE
Execut ion cannot be t ransferred by a call or a j ump t o a less-
privileged ( numerically higher privilege level) code segment ,
regardless of whet her t he t arget segment is a conforming or noncon-
forming code segment . At t empt ing such an execut ion t ransfer will
result in a general- prot ect ion except ion.
All dat a segment s are nonconforming, meaning t hat t hey cannot be accessed by less
privileged programs or procedures ( code execut ing at numerically high privilege
levels) . Unlike code segment s, however, dat a segment s can be accessed by more
privileged programs or procedures ( code execut ing at numerically lower privilege
levels) wit hout using a special access gat e.
I f t he segment descript ors in t he GDT or an LDT are placed in ROM, t he processor can
ent er an indefinit e loop if soft ware or t he processor at t empt s t o updat e ( writ e t o) t he
ROM- based segment descript ors. To prevent t his problem, set t he accessed bit s for
all segment descript ors placed in a ROM. Also, remove operat ing- syst em or execut ive
code t hat at t empt s t o modify segment descript ors locat ed in ROM.
3.5 SYSTEM DESCRIPTOR TYPES
When t he S ( descript or t ype) flag in a segment descript or is clear, t he descript or t ype
is a syst em descript or. The processor recognizes t he following t ypes of syst em
descript ors:
Local descript or- t able ( LDT) segment descript or.
Vol. 3 3-19
PROTECTED-MODE MEMORY MANAGEMENT
Task- st at e segment ( TSS) descript or.
Call- gat e descript or.
I nt errupt - gat e descript or.
Trap- gat e descript or.
Task- gat e descript or.
These descript or t ypes fall int o t wo cat egories: syst em- segment descript ors and gat e
descript ors. Syst em- segment descript ors point t o syst em segment s ( LDT and TSS
segment s) . Gat e descript ors are in t hemselves gat es, which hold point ers t o proce-
dure ent ry point s in code segment s ( call, int errupt , and t rap gat es) or which hold
segment select ors for TSSs ( t ask gat es) .
Table 3- 2 shows t he encoding of t he t ype field for syst em- segment descript ors and
gat e descript ors. Not e t hat syst em descript ors in I A- 32e mode are 16 byt es inst ead
of 8 byt es.
Table 3-2. System-Segment and Gate-Descriptor Types
Type Field Description
Decimal 11 10 9 8 32-Bit Mode IA-32e Mode
0 0 0 0 0 Reserved Upper 8 byte of an 16-
byte descriptor
1 0 0 0 1 16-bit TSS (Available) Reserved
2 0 0 1 0 LDT LDT
3 0 0 1 1 16-bit TSS (Busy) Reserved
4 0 1 0 0 16-bit Call Gate Reserved
5 0 1 0 1 Task Gate Reserved
6 0 1 1 0 16-bit Interrupt Gate Reserved
7 0 1 1 1 16-bit Trap Gate Reserved
8 1 0 0 0 Reserved Reserved
9 1 0 0 1 32-bit TSS (Available) 64-bit TSS (Available)
10 1 0 1 0 Reserved Reserved
11 1 0 1 1 32-bit TSS (Busy) 64-bit TSS (Busy)
12 1 1 0 0 32-bit Call Gate 64-bit Call Gate
13 1 1 0 1 Reserved Reserved
14 1 1 1 0 32-bit Interrupt Gate 64-bit Interrupt Gate
15 1 1 1 1 32-bit Trap Gate 64-bit Trap Gate
3-20 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
See also: Sect ion 3.5. 1, Segment Descript or Tables , and Sect ion 7. 2. 2, TSS
Descript or ( for more informat ion on t he syst em- segment descript ors) ; see Sect ion
5.8. 3, Call Gat es , Sect ion 6.11, I DT Descript ors , and Sect ion 7. 2.5, Task- Gat e
Descript or ( for more informat ion on t he gat e descript ors) .
3.5.1 Segment Descriptor Tables
A segment descript or t able is an array of segment descript ors ( see Figure 3- 10) . A
descript or t able is variable in lengt h and can cont ain up t o 8192 ( 2
13
) 8- byt e descrip-
t ors. There are t wo kinds of descript or t ables:
The global descript or t able ( GDT)
The local descript or t ables ( LDT)
Figure 3-10. Global and Local Descriptor Tables
Segment
Selector
Global
Descriptor
T
First Descriptor in
GDT is Not Used
TI = 0
I
56
40
48
32
24
16
8
0
TI = 1
56
40
48
32
24
16
8
0
Table (GDT)
Local
Descriptor
Table (LDT)
Base Address
Limit
GDTR Register LDTR Register
Base Address
Seg. Sel.
Limit
Vol. 3 3-21
PROTECTED-MODE MEMORY MANAGEMENT
Each syst em must have one GDT defined, which may be used for all programs and
t asks in t he syst em. Opt ionally, one or more LDTs can be defined. For example, an
LDT can be defined for each separat e t ask being run, or some or all t asks can share
t he same LDT.
The GDT is not a segment it self; inst ead, it is a dat a st ruct ure in linear address space.
The base linear address and limit of t he GDT must be loaded int o t he GDTR regist er
( see Sect ion 2. 4, Memory- Management Regist ers ) . The base addresses of t he GDT
should be aligned on an eight - byt e boundary t o yield t he best processor perfor-
mance. The limit value for t he GDT is expressed in byt es. As wit h segment s, t he limit
value is added t o t he base address t o get t he address of t he last valid byt e. A limit
value of 0 result s in exact ly one valid byt e. Because segment descript ors are always
8 byt es long, t he GDT limit should always be one less t han an int egral mult iple of
eight ( t hat is, 8N 1) .
The first descript or in t he GDT is not used by t he processor. A segment select or t o
t his null descript or does not generat e an except ion when loaded int o a dat a-
segment regist er ( DS, ES, FS, or GS) , but it always generat es a general- prot ect ion
except ion ( # GP) when an at t empt is made t o access memory using t he descript or. By
init ializing t he segment regist ers wit h t his segment select or, accident al reference t o
unused segment regist ers can be guarant eed t o generat e an except ion.
The LDT is locat ed in a syst em segment of t he LDT t ype. The GDT must cont ain a
segment descript or for t he LDT segment . I f t he syst em support s mult iple LDTs, each
must have a separat e segment select or and segment descript or in t he GDT. The
segment descript or for an LDT can be locat ed anywhere in t he GDT. See Sect ion 3. 5,
Syst em Descript or Types , informat ion on t he LDT segment - descript or t ype.
An LDT is accessed wit h it s segment select or. To eliminat e address t ranslat ions when
accessing t he LDT, t he segment select or, base linear address, limit , and access right s
of t he LDT are st ored in t he LDTR regist er ( see Sect ion 2. 4, Memory- Management
Regist ers ) .
When t he GDTR regist er is st ored ( using t he SGDT inst ruct ion) , a 48- bit pseudo-
descript or is st ored in memory ( see t op diagram in Figure 3- 11) . To avoid alignment
check fault s in user mode ( privilege level 3) , t he pseudo- descript or should be locat ed
at an odd word address ( t hat is, address MOD 4 is equal t o 2) . This causes t he
processor t o st ore an aligned word, followed by an aligned doubleword. User- mode
programs normally do not st ore pseudo- descript ors, but t he possibilit y of generat ing
an alignment check fault can be avoided by aligning pseudo- descript ors in t his way.
The same alignment should be used when st oring t he I DTR regist er using t he SI DT
inst ruct ion. When st oring t he LDTR or t ask regist er ( using t he SLTR or STR inst ruc-
t ion, respect ively) , t he pseudo- descript or should be locat ed at a doubleword address
( t hat is, address MOD 4 is equal t o 0) .
3-22 Vol. 3
PROTECTED-MODE MEMORY MANAGEMENT
3.5.2 Segment Descriptor Tables in IA-32e Mode
I n I A- 32e mode, a segment descript or t able can cont ain up t o 8192 ( 2
13
) 8- byt e
descript ors. An ent ry in t he segment descript or t able can be 8 byt es. Syst em descrip-
t ors are expanded t o 16 byt es ( occupying t he space of t wo ent ries) .
GDTR and LDTR regist ers are expanded t o hold 64- bit base address. The corre-
sponding pseudo- descript or is 80 bit s. ( see t he bot t om diagram in Figure 3- 11) .
The following syst em descript ors expand t o 16 byt es:
Call gat e descript ors ( see Sect ion 5. 8. 3.1, I A- 32e Mode Call Gat es )
I DT gat e descript ors ( see Sect ion 6.14. 1, 64- Bit Mode I DT )
LDT and TSS descript ors ( see Sect ion 7.2. 3, TSS Descript or in 64- bit
mode ) .
Figure 3-11. Pseudo-Descriptor Formats
0
32-bit Base Address Limit
47 15 16
0
64-bit Base Address Limit
79 15 16
Vol. 3 4-1
CHAPTER 4
PAGING
Chapt er 3 explains how segment at ion convert s logical addresses t o linear addresses.
Pagi ng ( or linear- address t ranslat ion) is t he process of t ranslat ing linear addresses
so t hat t hey can be used t o access memory or I / O devices. Paging t ranslat es each
linear address t o a phy si cal addr ess and det ermines, for each t ranslat ion, what
accesses t o t he linear address are allowed ( t he addresss access r i ght s) and t he
t ype of caching used for such accesses ( t he addresss memor y t ype) .
I nt el- 64 processors support t hree different paging modes. These modes are ident i-
fied and defined in Sect ion 4. 1. Sect ion 4. 2 gives an overview of t he t ranslat ion
mechanism t hat is used in all modes. Sect ion 4. 3, Sect ion 4. 4, and Sect ion 4. 5
discuss t he t hree paging modes in det ail.
Sect ion 4. 6 det ails how paging det ermines and uses access right s. Sect ion 4.7
discusses except ions t hat may be generat ed by paging ( page- fault except ions) .
Sect ion 4. 8 considers dat a which t he processor writ es in response t o linear- address
accesses ( accessed and dirt y flags) .
Sect ion 4. 9 describes how paging det ermines t he memory t ypes used for accesses t o
linear addresses. Sect ion 4. 10 provides det ails of how a processor may cache infor-
mat ion about linear- address t ranslat ion. Sect ion 4. 11 out lines int eract ions bet ween
paging and cert ain VMX feat ures. Sect ion 4. 12 gives an overview of how paging can
be used t o implement virt ual memory.
4.1 PAGING MODES AND CONTROL BITS
Paging behavior is cont rolled by t he following cont rol bit s:
The WP and PG flags in cont rol regist er CR0 ( bit 16 and bit 31, respect ively) .
The PSE, PAE, PGE, and PCI DE flags in cont rol regist er CR4 ( bit 4, bit 5, bit 7,
and bit 17, respect ively) .
The LME and NXE flags in t he I A32_EFER MSR ( bit 8 and bit 11, respect ively) .
Soft ware enables paging by using t he MOV t o CR0 inst ruct ion t o set CR0. PG. Before
doing so, soft ware should ensure t hat cont rol regist er CR3 cont ains t he physical
address of t he first paging st ruct ure t hat t he processor will use for linear- address
t ranslat ion ( see Sect ion 4. 2) and t hat st ruct ure is init ialized as desired. See
Table 4- 3, Table 4- 7, and Table 4- 12 for t he use of CR3 in t he different paging
modes.
Sect ion 4. 1. 1 describes how t he values of CR0.PG, CR4.PAE, and I A32_EFER. LME
det ermine whet her paging is in use and, if so, which of t hree paging modes is in use.
Sect ion 4. 1. 2 explains how t o manage t hese bit s t o est ablish or make changes in
4-2 Vol. 3
PAGING
paging modes. Sect ion 4. 1. 3 discusses how CR0. WP, CR4. PSE, CR4. PGE, CR4. PCI DE,
and I A32_EFER. NXE modify t he operat ion of t he different paging modes.
4.1.1 Three Paging Modes
I f CR0.PG = 0, paging is not used. The logical processor t reat s all linear addresses as
if t hey were physical addresses. CR4. PAE and I A32_EFER. LME are ignored by t he
processor, as are CR0. WP, CR4. PSE, and CR4. PGE, and I A32_EFER. NXE.
Paging is enabled if CR0. PG = 1. Paging can be enabled only if prot ect ion is enabled
( CR0.PE = 1) . I f paging is enabled, one of t hree paging modes is used. The values of
CR4.PAE and I A32_EFER. LME det ermine which paging mode is used:
I f CR0.PG = 1 and CR4.PAE = 0, 32- bi t pagi ng is used. 32- bit paging is det ailed
in Sect ion 4.3. 32- bit paging uses CR0.WP, CR4.PSE, and CR4. PGE as described
in Sect ion 4.1. 3.
I f CR0.PG = 1, CR4. PAE = 1, and I A32_EFER. LME = 0, PAE pagi ng is used. PAE
paging is det ailed in Sect ion 4. 4. PAE paging uses CR0. WP, CR4. PGE, and
I A32_EFER. NXE as described in Sect ion 4. 1. 3.
I f CR0. PG = 1, CR4.PAE = 1, and I A32_EFER. LME = 1, I A- 32e pagi ng is used.
1

I A- 32e paging is det ailed in Sect ion 4. 5. I A- 32e paging uses CR0.WP, CR4.PGE,
CR4. PCI DE, and I A32_EFER. NXE as described in Sect ion 4.1. 3. I A- 32e paging is
available only on processors t hat support t he I nt el 64 archit ect ure.
The t hree paging modes differ wit h regard t o t he following det ails:
Linear- address widt h. The size of t he linear addresses t hat can be t ranslat ed.
Physical- address widt h. The size of t he physical addresses produced by paging.
Page size. The granularit y at which linear addresses are t ranslat ed. Linear
addresses on t he same page are t ranslat ed t o corresponding physical addresses
on t he same page.
Support for execut e- disable access right s. I n some paging modes, soft ware can
be prevent ed from fet ching inst ruct ions from pages t hat are ot herwise readable.
Table 4- 1 illust rat es t he key differences bet ween t he t hree paging modes.
Because t hey are used only if I A32_EFER. LME = 0, 32- bit paging and PAE paging is
used only in legacy prot ect ed mode. Because legacy prot ect ed mode cannot produce
1. The LMA flag in the IA32_EFER MSR (bit 10) is a status bit that indicates whether the logical pro-
cessor is in IA-32e mode (and thus using IA-32e paging). The processor always sets
IA32_EFER.LMA to CR0.PG & IA32_EFER.LME. Software cannot directly modify IA32_EFER.LMA;
an execution of WRMSR to the IA32_EFER MSR ignores bit 10 of its source operand.
Vol. 3 4-3
PAGING
linear addresses larger t han 32 bit s, 32- bit paging and PAE paging t ranslat e 32- bit
linear addresses.
Because it is used only if I A32_EFER.LME = 1, I A- 32e paging is used only in I A- 32e
mode. ( I n fact , it is t he use of I A- 32e paging t hat defines I A- 32e mode. ) I A- 32e
mode has t wo sub- modes:
Compat ibilit y mode. This mode uses only 32- bit linear addresses. I A- 32e paging
t reat s bit s 47: 32 of such an address as all 0.
64- bit mode. While t his mode produces 64- bit linear addresses, t he processor
ensures t hat bit s 63: 47 of such an address are ident ical.
1
I A- 32e paging does not
use bit s 63: 48 of such addresses.
Table 4-1. Properties of Different Paging Modes
Paging
Mode
CR0.PG CR4.PAE
LME in
IA32_EFER
Linear-
Address
Width
Physical-
Address
Width
1
Page
Size(s)
Supports
Execute-
Disable?
None 0 N/A N/A 32 32 N/A No
32-bit 1 0 0
2
32 Up to 40
3
4-KByte
4-MByte
4
No
PAE 1 1 0 32 Up to 52
4-KByte
2-MByte
Yes
5
IA-32e 1 1 2 48 Up to 52
4-KByte
2-MByte
1-GByte
6
Yes
5
NOTES:
1. The physical-address width is always bounded by MAXPHYADDR; see Section 4.1.4.
2. The processor ensures that IA32_EFER.LME must be 0 if CR0.PG = 1 and CR4.PAE = 0.
3. 32-bit paging supports physical-address widths of more than 32 bits only for 4-MByte pages and
only if the PSE-36 mechanism is supported; see Section 4.1.4 and Section 4.3.
4. 4-MByte pages are used with 32-bit paging only if CR4.PSE = 1; see Section 4.3.
5. Execute-disable access rights are applied only if IA32_EFER.NXE = 1; see Section 4.6.
6. Not all processors that support IA-32e paging support 1-GByte pages; see Section 4.1.4.
1. Such an address is called canonical. Use of a non-canonical linear address in 64-bit mode pro-
duces a general-protection exception (#GP(0)); the processor does not attempt to translate non-
canonical linear addresses using IA-32e paging.
4-4 Vol. 3
PAGING
4.1.2 Paging-Mode Enabling
I f CR0.PG = 1, a logical processor is in one of t hree paging modes, depending on t he
values of CR4.PAE and I A32_EFER. LME. Figure 4- 1 illust rat es how soft ware can
enable t hese modes and make t ransit ions bet ween t hem. The following it ems ident ify
cert ain limit at ions and ot her det ails:
I A32_EFER. LME cannot be modified while paging is enabled ( CR0. PG = 1) .
At t empt s t o do so using WRMSR cause a general- prot ect ion except ion ( # GP( 0) ) .
Paging cannot be enabled ( by set t ing CR0. PG t o 1) while CR4. PAE = 0 and
I A32_EFER. LME = 1. At t empt s t o do so using MOV t o CR0 cause a general-
prot ect ion except ion ( # GP( 0) ) .
Figure 4-1. Enabling and Changing Paging Modes
PG = 1
No Paging
PAE Paging
PAE = 1
LME = 0
PG = 0
PAE = 0
LME = 0
32-bit Paging
PG = 1
PAE = 0
LME = 0
PG = 0
PAE = 0
LME = 1
Set PG Set PAE
Clear PAE
Clear PG
No Paging
PG = 0
PAE = 1
LME = 0
No Paging
PG = 1
IA-32e Paging
PAE = 1
LME = 1
C
l
e
a
r

L
M
E
S
e
t
r

L
M
E
PG = 0
PAE = 1
LME = 1
No Paging
Clear PAE
Set PAE
Clear PG
Set PG
Set PAE
Clear PAE
S
e
t
r

L
M
E
C
l
e
a
r

L
M
E
Clear PG
Set PG
#GP
Set LME
#GP
Set LME
#GP
Set PG
Clear PAE
#GP
C
l
e
a
r

L
M
E
#GP
Vol. 3 4-5
PAGING
CR4. PAE cannot be cleared while I A- 32e paging is act ive ( CR0. PG = 1 and
I A32_EFER. LME = 1) . At t empt s t o do so using MOV t o CR4 cause a general-
prot ect ion except ion ( # GP( 0) ) .
Regardless of t he current paging mode, soft ware can disable paging by clearing
CR0. PG wit h MOV t o CR0.
1
Soft ware can make t ransit ions bet ween 32- bit paging and PAE paging by
changing t he value of CR4. PAE wit h MOV t o CR4.
Soft ware cannot make t ransit ions direct ly bet ween I A- 32e paging and eit her of
t he ot her t wo paging modes. I t must first disable paging ( by clearing CR0.PG wit h
MOV t o CR0) , t hen set CR4.PAE and I A32_EFER. LME t o t he desired values ( wit h
MOV t o CR4 and WRMSR) , and t hen re- enable paging ( by set t ing CR0. PG wit h
MOV t o CR0) . As not ed earlier, an at t empt t o clear eit her CR4. PAE or
I A32_EFER. LME cause a general- prot ect ion except ion ( # GP( 0) ) .
VMX t ransit ions allow t ransit ions bet ween paging modes t hat are not possible
using MOV t o CR or WRMSR. This is because VMX t ransit ions can load CR0, CR4,
and I A32_EFER in one operat ion. See Sect ion 4. 11. 1.
4.1.3 Paging-Mode Modifiers
Det ails of how each paging mode operat es are det ermined by t he following cont rol
bit s:
The WP flag in CR0 ( bit 16) .
The PSE, PGE, and PCI DE flags in CR4 ( bit 4, bit 7, and bit 17, respect ively) .
The NXE flag in t he I A32_EFER MSR ( bit 11) .
CR0.WP allows pages t o be prot ect ed from supervisor- mode writ es. I f CR0. WP = 0,
soft ware operat ing wit h CPL < 3 ( supervisor mode) can writ e t o linear addresses wit h
read- only access right s; if CR0.WP = 1, it cannot . ( Soft ware operat ing wit h CPL = 3
user mode cannot writ e t o linear addresses wit h read- only access right s,
regardless of t he value of CR0. WP. ) Sect ion 4. 6 explains how access right s are det er-
mined.
CR4.PSE enables 4- MByt e pages for 32- bit paging. I f CR4.PSE = 0, 32- bit paging can
use only 4- KByt e pages; if CR4. PSE = 1, 32- bit paging can use bot h 4- KByt e pages
and 4- MByt e pages. See Sect ion 4. 3 for more informat ion. ( PAE paging and I A- 32e
paging can use mult iple page sizes regardless of t he value of CR4. PSE. )
CR4.PGE enables global pages. I f CR4. PGE = 0, no t ranslat ions are shared across
address spaces; if CR4. PGE = 1, specified t ranslat ions may be shared across address
spaces. See Sect ion 4. 10. 2. 4 for more informat ion.
CR4.PCI DE enables process- cont ext ident ifiers ( PCI Ds) for I A- 32e paging
( CR4.PCI DE can be 1 only when I A- 32e paging is in use) . PCI Ds allow a logical
1. If CR4.PCIDE = 1, an attempt to clear CR0.PG causes a general-protection exception (#GP); soft-
ware should clear CR4.PCIDE before attempting to disable paging.
4-6 Vol. 3
PAGING
processor t o cache informat ion for mult iple linear- address spaces. See Sect ion
4. 10. 1 for more informat ion.
I A32_EFER. NXE enables execut e- disable access right s for PAE paging and I A- 32e
paging. I f I A32_EFER. NXE = 0, soft ware may fet ch inst ruct ions from any linear
address t hat paging allows t he soft ware t o read; if I A32_EFER. NXE = 1, inst ruct ions
fet ches can be prevent ed from specified linear addresses ( even if dat a reads from t he
addresses are allowed) . Sect ion 4. 6 explains how access right s are det ermined. ( 32-
bit paging always allows soft ware t o fet ch inst ruct ions from any linear address t hat
may be read; I A32_EFER. NXE has no effect wit h 32- bit paging. Soft ware t hat want s
t o limit inst ruct ion fet ches from readable pages must use eit her PAE paging or I A- 32e
paging. )
4.1.4 Enumeration of Paging Features by CPUID
Soft ware can discover support for different paging feat ures using t he CPUI D inst ruc-
t ion:
PSE: page- size ext ensions for 32- bit paging.
I f CPUI D. 01H: EDX. PSE [ bit 3] = 1, CR4. PSE may be set t o 1, enabling support
for 4- MByt e pages wit h 32- bit paging ( see Sect ion 4. 3) .
PAE: physical- address ext ension.
I f CPUI D. 01H: EDX. PAE [ bit 6] = 1, CR4. PAE may be set t o 1, enabling PAE
paging ( t his set t ing is also required for I A- 32e paging) .
PGE: global- page support .
I f CPUI D. 01H: EDX. PGE [ bit 13] = 1, CR4. PGE may be set t o 1, enabling t he
global- page feat ure ( see Sect ion 4. 10. 2. 4) .
PAT: page- at t ribut e t able.
I f CPUI D. 01H: EDX. PAT [ bit 16] = 1, t he 8- ent ry page- at t ribut e t able ( PAT) is
support ed. When t he PAT is support ed, t hree bit s in cert ain paging- st ruct ure
ent ries select a memory t ype ( used t o det ermine t ype of caching used) from t he
PAT ( see Sect ion 4. 9. 2) .
PSE- 36: 36- Bit page size ext ension.
I f CPUI D. 01H: EDX. PSE- 36 [ bit 17] = 1, t he PSE- 36 mechanism is support ed,
indicat ing t hat t ranslat ions using 4- MByt e pages wit h 32- bit paging may produce
physical addresses wit h more t han 32 bit s ( see Sect ion 4. 3) .
PCI D: process- cont ext ident ifiers.
I f CPUI D. 01H: ECX. PCI D [ bit 17] = 1, CR4. PCI DE may be set t o 1, enabling
process- cont ext ident ifiers ( see Sect ion 4. 10. 1) .
NX: execut e disable.
I f CPUI D. 80000001H: EDX. NX [ bit 20] = 1, I A32_EFER. NXE may be set t o 1,
allowing PAE paging and I A- 32e paging t o disable execut e access t o select ed
pages ( see Sect ion 4.6) . ( Processors t hat do not support CPUI D funct ion
80000001H do not allow I A32_EFER. NXE t o be set t o 1. )
Vol. 3 4-7
PAGING
Page1GB: 1- GByt e pages.
I f CPUI D.80000001H: EDX. Page1GB [ bit 26] = 1, 1- GByt e pages are support ed
wit h I A- 32e paging ( see Sect ion 4.5) .
LM: I A- 32e mode support .
I f CPUI D.80000001H: EDX. LM [ bit 29] = 1, I A32_EFER. LME may be set t o 1,
enabling I A- 32e paging. ( Processors t hat do not support CPUI D funct ion
80000001H do not allow I A32_EFER. LME t o be set t o 1. )
CPUI D.80000008H: EAX[ 7: 0] report s t he physical- address widt h support ed by
t he processor. ( For processors t hat do not support CPUI D funct ion 80000008H,
t he widt h is generally 36 if CPUI D. 01H: EDX. PAE [ bit 6] = 1 and 32 ot herwise. )
This widt h is referred t o as MAXPHYADDR. MAXPHYADDR is at most 52.
CPUI D.80000008H: EAX[ 15: 8] report s t he linear- address widt h support ed by t he
processor. Generally, t his value is 48 if CPUI D.80000001H: EDX. LM [ bit 29] = 1
and 32 ot herwise. ( Processors t hat do not support CPUI D funct ion 80000008H,
support a linear- address widt h of 32. )
4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW
All t hree paging modes t ranslat e linear addresses use hi er ar chi cal pagi ng st r uc-
t ur es. This sect ion provides an overview of t heir operat ion. Sect ion 4. 3, Sect ion 4.4,
and Sect ion 4.5 provide det ails for t he t hree paging modes.
Every paging st ruct ure is 4096 Byt es in size and comprises a number of individual
ent r i es. Wit h 32- bit paging, each ent ry is 32 bit s ( 4 byt es) ; t here are t hus 1024
ent ries in each st ruct ure. Wit h PAE paging and I A- 32e paging, each ent ry is 64 bit s
( 8 byt es) ; t here are t hus 512 ent ries in each st ruct ure. ( PAE paging includes one
except ion, a paging st ruct ure t hat is 32 byt es in size, cont aining 4 64- bit ent ries. )
The processor uses t he upper port ion of a linear address t o ident ify a series of
paging- st ruct ure ent ries. The last of t hese ent ries ident ifies t he physical address of
t he region t o which t he linear address t ranslat es ( called t he page f r ame) . The lower
port ion of t he linear address ( called t he page of f set ) ident ifies t he specific address
wit hin t hat region t o which t he linear address t ranslat es.
Each paging- st ruct ure ent ry cont ains a physical address, which is eit her t he address
of anot her paging st ruct ure or t he address of a page frame. I n t he first case, t he
ent ry is said t o r ef er ence t he ot her paging st ruct ure; in t he lat t er, t he ent ry is said
t o map a page.
The first paging st ruct ure used for any t ranslat ion is locat ed at t he physical address
in CR3. A linear address is t ranslat ed using t he following it erat ive procedure. A
port ion of t he linear address ( init ially t he uppermost bit s) select an ent ry in a paging
st ruct ure ( init ially t he one locat ed using CR3) . I f t hat ent ry references anot her
paging st ruct ure, t he process cont inues wit h t hat paging st ruct ure and wit h t he
port ion of t he linear address immediat ely below t hat j ust used. I f inst ead t he ent ry
maps a page, t he process complet es: t he physical address in t he ent ry is t hat of t he
page frame and t he remaining lower port ion of t he linear address is t he page offset .
4-8 Vol. 3
PAGING
The following it ems give an example for each of t he t hree paging modes ( each
example locat es a 4- KByt e page frame) :
Wit h 32- bit paging, each paging st ruct ure comprises 1024 = 2
10
ent ries. For t his
reason, t he t ranslat ion process uses 10 bit s at a t ime from a 32- bit linear
address. Bit s 31: 22 ident ify t he first paging- st ruct ure ent ry and bit s 21: 12
ident ify a second. The lat t er ident ifies t he page frame. Bit s 11: 0 of t he linear
address are t he page offset wit hin t he 4- KByt e page frame. ( See Figure 4- 2 for
an illust rat ion. )
Wit h PAE paging, t he first paging st ruct ure comprises only 4 = 2
2
ent ries.
Translat ion t hus begins by using bit s 31: 30 from a 32- bit linear address t o
ident ify t he first paging- st ruct ure ent ry. Ot her paging st ruct ures comprise
512 = 2
9
ent ries, so t he process cont inues by using 9 bit s at a t ime. Bit s 29: 21
ident ify a second paging- st ruct ure ent ry and bit s 20: 12 ident ify a t hird. This last
ident ifies t he page frame. ( See Figure 4- 5 for an illust rat ion. )
Wit h I A- 32e paging, each paging st ruct ure comprises 512 = 2
9
ent ries and
t ranslat ion uses 9 bit s at a t ime from a 48- bit linear address. Bit s 47: 39 ident ify
t he first paging- st ruct ure ent ry, bit s 38: 30 ident ify a second, bit s 29: 21 a t hird,
and bit s 20: 12 ident ify a fourt h. Again, t he last ident ifies t he page frame. ( See
Figure 4- 8 for an illust rat ion. )
The t ranslat ion process in each of t he examples above complet es by ident ifying a
page frame. However, t he paging st ruct ures may be configured so t hat t ranslat ion
t erminat es before doing so. This occurs if process encount ers a paging- st ruct ure
ent ry t hat is marked not present ( because it s P flag bit 0 is clear) or in which
a reserved bit is set . I n t his case, t here is no t ranslat ion for t he linear address; an
access t o t hat address causes a page- fault except ion ( see Sect ion 4.7) .
I n t he examples above, a paging- st ruct ure ent ry maps a page wit h 4- KByt e page
frame when only 12 bit s remain in t he linear address; ent ries ident ified earlier always
reference ot her paging st ruct ures. That may not apply in ot her cases. The following
it ems ident ify when an ent ry maps a page and when it references anot her paging
st ruct ure:
I f more t han 12 bit s remain in t he linear address, bit 7 ( PS page size) of t he
current paging- st ruct ure ent ry is consult ed. I f t he bit is 0, t he ent ry references
anot her paging st ruct ure; if t he bit is 1, t he ent ry maps a page.
I f only 12 bit s remain in t he linear address, t he current paging- st ruct ure ent ry
always maps a page ( bit 7 is used for ot her purposes) .
I f a paging- st ruct ure ent ry maps a page when more t han 12 bit s remain in t he linear
address, t he ent ry ident ifies a page frame larger t han 4 KByt es. For example, 32- bit
paging uses t he upper 10 bit s of a linear address t o locat e t he first paging- st ruct ure
ent ry; 22 bit s remain. I f t hat ent ry maps a page, t he page frame is 2
22
Byt es = 4
MByt es. 32- bit paging support s 4- MByt e pages if CR4. PSE = 1. PAE paging and
I A- 32e paging support 2- MByt e pages ( regardless of t he value of CR4. PSE) . I A- 32e
paging may support 1- GByt e pages ( see Sect ion 4.1. 4) .
Paging st ruct ures are given different names based t heir uses in t he t ranslat ion
process. Table 4- 2 gives t he names of t he different paging st ruct ures. I t also
Vol. 3 4-9
PAGING
provides, for each st ruct ure, t he source of t he physical address used t o locat e it ( CR3
or a different paging- st ruct ure ent ry) ; t he bit s in t he linear address used t o select an
ent ry from t he st ruct ure; and det ails of about whet her and how such an ent ry can
map a page.
4.3 32-BIT PAGING
A logical processor uses 32- bit paging if CR0. PG = 1 and CR4. PAE = 0. 32- bit paging
t ranslat es 32- bit linear addresses t o 40- bit physical addresses.
1
Alt hough 40 bit s
Table 4-2. Paging Structures in the Different Paging Modes
Paging
Structure
Entry
Name
Paging Mode
Physical
Address of
Structure
Bits
Selecting
Entry
Page Mapping
PML4 table PML4E
32-bit, PAE N/A
IA-32e CR3 47:39 N/A (PS must be 0)
Page-directory-
pointer table
PDPTE
32-bit N/A
PAE CR3 31:30 N/A (PS must be 0)
IA-32e PML4E 38:30 1-GByte page if PS=1
1
NOTES:
1. Not all processors allow the PS flag to be 1 in PDPTEs; see Section 4.1.4 for how to determine
whether 1-GByte pages are supported.
Page directory PDE
32-bit CR3 31:22 4-MByte page if PS=1
2
2. 32-bit paging ignores the PS flag in a PDE (and uses the entry to reference a page table) unless
CR4.PSE = 1. Not all processors allow CR4.PSE to be 1; see Section 4.1.4 for how to determine
whether 4-MByte pages are supported with 32-bit paging.
PAE, IA-32e PDPTE 29:21 2-MByte page if PS=1
Page table PTE
32-bit
PDE
21:12 4-KByte page
PAE, IA-32e 20:12 4-KByte page
1. Bits in the range 39:32 are 0 in any physical address used by 32-bit paging except those used to
map 4-MByte pages. If the processor does not support the PSE-36 mechanism, this is true also
for physical addresses used to map 4-MByte pages. If the processor does support the PSE-36
mechanism and MAXPHYADDR < 40, bits in the range 39:MAXPHYADDR are 0 in any physical
address used to map a 4-MByte page. (The corresponding bits are reserved in PDEs.) See Section
4.1.4 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is supported.
4-10 Vol. 3
PAGING
corresponds t o 1 TByt e, linear addresses are limit ed t o 32 bit s; at most 4 GByt es of
linear- address space may be accessed at any given t ime.
32- bit paging uses a hierarchy of paging st ruct ures t o produce a t ranslat ion for a
linear address. CR3 is used t o locat e t he first paging- st ruct ure, t he page direct ory.
Table 4- 3 illust rat es how CR3 is used wit h 32- bit paging.
32- bit paging may map linear addresses t o eit her 4- KByt e pages or 4- MByt e pages.
Figure 4- 2 illust rat es t he t ranslat ion process when it uses a 4- KByt e page; Figure 4- 3
covers t he case of a 4- MByt e page. The following it ems describe t he 32- bit paging
process in more det ail as well has how t he page size is det ermined:
A 4- KByt e nat urally aligned page direct ory is locat ed at t he physical address
specified in bit s 31: 12 of CR3 ( see Table 4- 3) . A page direct ory comprises 1024
32- bit ent ries ( PDEs) . A PDE is select ed using t he physical address defined as
follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from CR3.
Bit s 11: 2 are bit s 31: 22 of t he linear address.
Bit s 1: 0 are 0.
Because a PDE is ident ified using bit s 31: 22 of t he linear address, it cont rols access
t o a 4- Mbyt e region of t he linear- address space. Use of t he PDE depends on CR. PSE
and t he PDEs PS flag ( bit 7) :
I f CR4. PSE = 1 and t he PDEs PS flag is 1, t he PDE maps a 4- MByt e page ( see
Table 4- 4) . The final physical address is comput ed as follows:
Bit s 39: 32 are bit s 20: 13 of t he PDE.
Table 4-3. Use of CR3 with 32-Bit Paging
Bit
Position(s)
Contents
2:0 Ignored
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory during linear-address translation (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory during linear-address translation (see Section 4.9)
11:5 Ignored
31:12 Physical address of the 4-KByte aligned page directory used for linear-address
translation
63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)
Vol. 3 4-11
PAGING
Bit s 31: 22 are bit s 31: 22 of t he PDE.
1
Bit s 21: 0 are from t he original linear address.
Figure 4-2. Linear-Address Translation to a 4-KByte Page using 32-Bit Paging
Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging
1. The upper bits in the final physical address do not all come from corresponding positions in the
PDE; the physical-address bits in the PDE are not all contiguous.
0
Directory Table Offset
Page Directory
PDE with PS=0
CR3
Page Table
PTE
4-KByte Page
Physical Address
31 21 11 12 22
Linear Address
32
10
12
10
20
20
0
Directory Offset
Page Directory
PDE with PS=1
CR3
4-MByte Page
Physical Address
31 21 22
Linear Address
10
22
32
18
4-12 Vol. 3
PAGING
I f CR4. PSE = 0 or t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is
locat ed at t he physical address specified in bit s 31: 12 of t he PDE ( see Table 4- 5) .
Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-MByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-MByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see Table 4-5)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-MByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
(M20):13 Bits (M1):32 of physical address of the 4-MByte page referenced by this entry
2
21:(M19) Reserved (must be 0)
31:22 Bits 31:22 of physical address of the 4-MByte page referenced by this entry
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
2. If the PSE-36 mechanism is not supported, M is 32, and this row does not apply. If the PSE-36
mechanism is supported, M is the minimum of 40 and MAXPHYADDR (this row does not apply if
MAXPHYADDR = 32). See Section 4.1.4 for how to determine MAXPHYADDR and whether the
PSE-36 mechanism is supported.
Vol. 3 4-13
PAGING
A page t able comprises 1024 32- bit ent ries ( PTEs) . A PTE is select ed using t he
physical address defined as follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from t he PDE.
Bit s 11: 2 are bit s 21: 12 of t he linear address.
Bit s 1: 0 are 0.
Because a PTE is ident ified using bit s 31: 12 of t he linear address, every PTE
maps a 4- KByt e page ( see Table 4- 6) . The final physical address is comput ed as
follows:
Bit s 39: 32 are all 0.
Bit s 31: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f a paging- st ruct ure ent rys P flag ( bit 0) is 0 or if t he ent ry set s any reserved bit , t he
ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a page.
Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) If CR4.PSE = 1, must be 0 (otherwise, this entry maps a 4-MByte page; see
Table 4-4); otherwise, ignored
11:8 Ignored
31:12 Physical address of 4-KByte aligned page table referenced by this entry
4-14 Vol. 3
PAGING
A reference using a linear address whose t ranslat ion would use such a paging- st ruc-
t ure ent ry causes a page- fault except ion ( see Sect ion 4. 7) .
Wit h 32- bit paging, t here are reserved bit s only if CR4.PSE = 1:
I f t he P flag and t he PS flag ( bit 7) of a PDE are bot h 1, t he bit s reserved depend
on MAXPHYADDR whet her t he PSE- 36 mechanism is support ed:
1
I f t he PSE- 36 mechanism is not support ed, bit s 21: 13 are reserved.
Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-KByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
31:12 Physical address of the 4-KByte page referenced by this entry
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
1. See Section 1.1.5 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is
supported.
Vol. 3 4-15
PAGING
I f t he PSE- 36 mechanism is support ed, bit s 21: ( M19) are reserved, where
M is t he minimum of 40 and MAXPHYADDR.
I f t he PAT is not support ed:
1
I f t he P flag of a PTE is 1, bit 7 is reserved.
I f t he P flag and t he PS flag of a PDE are bot h 1, bit 12 is reserved.
( I f CR4. PSE = 0, no bit s are reserved wit h 32- bit paging. )
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4. 6.
Figure 4- 4 gives a summary of t he format s of CR3 and t he paging- st ruct ure ent ries
wit h 32- bit paging. For t he paging st ruct ure ent ries, it ident ifies separat ely t he
format of ent ries t hat map pages, t hose t hat reference ot her paging st ruct ures, and
t hose t hat do neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are
highlight ed because t hey det ermine how such an ent ry is used.
1. See Section 4.1.4 for how to determine whether the PAT is supported.
31302928272625242322212019181716151413121110 9 8 7 6 5 4 3 2 1 0
Address of page directory
1
NOTES:
1. CR3 has 64 bits on processors supporting the Intel-64 architecture. These bits are ignored with
32-bit paging.
Ignored
P
C
D
P
W
T
Ignored CR3
Bits 31:22 of address
of 2MB page frame
Reserved
(must be 0)
Bits 39:32
of
address
2
P
A
T
Ignored G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
4MB
page
Address of page table Ignored 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
Address of 4KB page frame Ignored G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging
4-16 Vol. 3
PAGING
4.4 PAE PAGING
A logical processor uses PAE paging if CR0.PG = 1, CR4. PAE = 1, and
I A32_EFER. LME = 0. PAE paging t ranslat es 32- bit linear addresses t o 52- bit physical
addresses.
1
Alt hough 52 bit s corresponds t o 4 PByt es, linear addresses are limit ed t o
32 bit s; at most 4 GByt es of linear- address space may be accessed at any given t ime.
Wit h PAE paging, a logical processor maint ains a set of four ( 4) PDPTE regist ers,
which are loaded from an address in CR3. Linear address are t ranslat ed using 4 hier-
archies of in- memory paging st ruct ures, each locat ed using one of t he PDPTE regis-
t ers. ( This is different from t he ot her paging modes, in which t here is one hierarchy
referenced by CR3. )
Sect ion 4.4. 1 discusses t he PDPTE regist ers. Sect ion 4. 4. 2 describes linear- address
t ranslat ion wit h PAE paging.
4.4.1 PDPTE Registers
When PAE paging is used, CR3 references t he base of a 32- Byt e page- di r ect or y -
poi nt er t abl e. Table 4- 7 illust rat es how CR3 is used wit h PAE paging.
The page- direct ory- point er- t able comprises four ( 4) 64- bit ent ries called PDPTEs.
Each PDPTE cont rols access t o a 1- GByt e region of t he linear- address space. Corre-
sponding t o t he PDPTEs, t he logical processor maint ains a set of four ( 4) int ernal,
non- archit ect ural PDPTE regist ers, called PDPTE0, PDPTE1, PDPTE2, and PDPTE3.
The logical processor loads t hese regist ers from t he PDPTEs in memory as part of
cert ain execut ions t he MOV t o CR inst ruct ion:
2. This example illustrates a processor in which MAXPHYADDR is 36. If this value is larger or smaller,
the number of bits reserved in positions 20:13 of a PDE mapping a 4-MByte will change.
1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used
by PAE paging. (The corresponding bits are reserved in the paging-structure entries.) See Section
4.1.4 for how to determine MAXPHYADDR.
Table 4-7. Use of CR3 with PAE Paging
Bit
Position(s)
Contents
4:0 Ignored
31:5 Physical address of the 32-Byte aligned page-directory-pointer table used for
linear-address translation
63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)
Vol. 3 4-17
PAGING
I f PAE paging would be in use following an execut ion of MOV t o CR0 or MOV t o
CR4 ( see Sect ion 4.1. 1) and t he inst ruct ion is modifying any of CR0. CD, CR0. NW,
CR0. PG, CR4. PAE, CR4. PGE, or CR4. PSE; t hen t he PDPTEs are loaded from t he
address in CR3.
I f MOV t o CR3 is execut ed while t he logical processor is using PAE paging, t he
PDPTEs are loaded from t he address being loaded int o CR3.
I f PAE paging is in use and a t ask swit ch changes t he value of CR3, t he PDPTEs
are loaded from t he address in t he new CR3 value.
Cert ain VMX t ransit ions load t he PDPTE regist ers. See Sect ion 4. 11. 1.
Unless t he caches are disabled, t he processor uses t he WB memory t ype t o load t he
PDPTEs from memory.
1
Table 4- 8 gives t he format of a PDPTE. I f any of t he PDPTEs set s bot h t he P flag
( bit 0) and any reserved bit , t he MOV t o CR inst ruct ion causes a general- prot ect ion
except ion ( # GP( 0) ) and t he PDPTEs are not loaded.
2
As show in Table 4- 8, bit s 2: 1,
8: 5, and 63: MAXPHYADDR are reserved in t he PDPTEs.
1. Older IA-32 processors used the UC memory type when loading the PDPTEs. This behavior is
model-specific and not architectural.
Table 4-8. Format of a PAE Page-Directory-Pointer-Table Entry (PDPTE)
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page directory
2:1 Reserved (must be 0)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9)
8:5 Reserved (must be 0)
11:9 Ignored
(M1):12 Physical address of 4-KByte aligned page directory referenced by this entry
1
NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
63:M Reserved (must be 0)
2. On some processors, reserved bits are checked even in PDPTEs in which the P flag (bit 0) is 0.
4-18 Vol. 3
PAGING
4.4.2 Linear-Address Translation with PAE Paging
PAE paging may map linear addresses t o eit her 4- KByt e pages or 2- MByt e pages.
Figure 4- 5 illust rat es t he t ranslat ion process when it produces a 4- KByt e page;
Figure 4- 6 covers t he case of a 2- MByt e page. The following it ems describe t he PAE
paging process in more det ail as well has how t he page size is det ermined:
Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging
Figure 4-6. Linear-Address Translation to a 2-MByte Page using PAE Paging
0
Directory Table Offset
Page Directory
PDE with PS=0
Page Table
PTE
4-KByte Page
Physical Address
31 20 11 12 21
Linear Address
PDPTE value
30 29
PDPTE Registers
Directory Pointer
2
9
12
9
40
40
40
0
Directory Offset
Page Directory
PDE with PS=1
2-MByte Page
Physical Address
31 20 21
Linear Address
PDPTE value
30 29
PDPTE Registers
Directory
Pointer
2
9
21
31
40
Vol. 3 4-19
PAGING
Bit s 31: 30 of t he linear address select a PDPTE regist er ( see Sect ion 4. 4. 1) ; t his
is PDPTEi, where i is t he value of bit s 31: 30.
1
Because a PDPTE regist er is
ident ified using bit s 31: 30 of t he linear address, it cont rols access t o a 1- GByt e
region of t he linear- address space. I f t he P flag ( bit 0) of PDPTEi is 0, t he
processor ignores bit s 63: 1, and t here is no mapping for t he 1- GByt e region
cont rolled by PDPTEi. A reference using a linear address in t his region causes a
page- fault except ion ( see Sect ion 4. 7) .
I f t he P flag of PDPTEi is 1, 4- KByt e nat urally aligned page direct ory is locat ed at
t he physical address specified in bit s 51: 12 of PDPTEi ( see Table 4- 8 in Sect ion
4. 4. 1) A page direct ory comprises 512 64- bit ent ries ( PDEs) . A PDE is select ed
using t he physical address defined as follows:
Bit s 51: 12 are from PDPTEi.
Bit s 11: 3 are bit s 29: 21 of t he linear address.
Bit s 2: 0 are 0.
Because a PDE is ident ified using bit s 31: 21 of t he linear address, it cont rols access
t o a 2- Mbyt e region of t he linear- address space. Use of t he PDE depends on it s PS
flag ( bit 7) :
I f t he PDEs PS flag is 1, t he PDE maps a 2- MByt e page ( see Table 4- 9) . The final
physical address is comput ed as follows:
Bit s 51: 21 are from t he PDE.
Bit s 20: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PDE ( see Table 4- 10) . A page
direct ory comprises 512 64- bit ent ries ( PTEs) . A PTE is select ed using t he
physical address defined as follows:
Bit s 51: 12 are from t he PDE.
Bit s 11: 3 are bit s 20: 12 of t he linear address.
Bit s 2: 0 are 0.
Because a PTE is ident ified using bit s 31: 12 of t he linear address, every PTE maps
a 4- KByt e page ( see Table 4- 11) . The final physical address is comput ed as
follows:
Bit s 51: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f t he P flag ( bit 0) of a PDE or a PTE is 0 or if a PDE or a PTE set s any reserved bit ,
t he ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a
1. With PAE paging, the processor does not use CR3 when translating a linear address (as it does
the other paging modes). It does not access the PDPTEs in the page-directory-pointer table dur-
ing linear-address translation.
4-20 Vol. 3
PAGING
Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 2-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see
Table 4-10)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
2-MByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
20:13 Reserved (must be 0)
(M1):21 Physical address of the 2-MByte page referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
Vol. 3 4-21
PAGING
page. A reference using a linear address whose t ranslat ion would use such a paging-
st ruct ure ent ry causes a page- fault except ion ( see Sect ion 4. 7) .
The following bit s are reserved wit h PAE paging:
I f t he P flag ( bit 0) of a PDE or a PTE is 1, bit s 62: MAXPHYADDR are reserved.
I f t he P flag and t he PS flag ( bit 7) of a PDE are bot h 1, bit s 20: 13 are reserved.
I f I A32_EFER. NXE = 0 and t he P flag of a PDE or a PTE is 1, t he XD flag ( bit 63)
is reserved.
I f t he PAT is not support ed:
1
I f t he P flag of a PTE is 1, bit 7 is reserved.
Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-9)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page table referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
1. See Section 4.1.4 for how to determine whether the PAT is supported.
4-22 Vol. 3
PAGING
I f t he P flag and t he PS flag of a PDE are bot h 1, bit 12 is reserved.
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4.6.
Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the
4-KByte page referenced by this entry (see Section 4.9.2); otherwise, reserved
(must be 0)
1
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
(M1):12 Physical address of the 4-KByte page referenced by this entry
62:M Reserved (must be 0)
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 4-KByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
Vol. 3 4-23
PAGING
Figure 4- 7 gives a summary of t he format s of CR3 and t he paging- st ruct ure ent ries
wit h PAE paging. For t he paging st ruct ure ent ries, it ident ifies separat ely t he format
of ent ries t hat map pages, t hose t hat reference ot her paging st ruct ures, and t hose
t hat do neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are high-
light ed because t hey det ermine how a paging- st ruct ure ent ry is used.
6
3
6
2
6
1
6
0
5
9
5
8
5
7
5
6
5
5
5
4
5
3
5
2
5
1
M
1
NOTES:
1. M is an abbreviation for MAXPHYADDR.
M-1
3
2
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
Ignored
2
2. CR3 has 64 bits only on processors supporting the Intel-64 architecture. These bits are ignored with
PAE paging.
Address of page-directory-pointer table Ignored CR3
Reserved
3
3. Reserved fields must be 0.
Address of page directory Ign. Rsvd.
P
C
D
P
W
T
Rs
vd
1
PDPTE:
present
Ignored 0
PDTPE:
not
present
X
D
4
4. If IA32_EFER.NXE = 0 and the P flag of a PDE or a PTE is 1, the XD flag (bit 63) is reserved.
Ignored Rsvd.
Address of
2MB page frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
2MB
page
X
D
Ignored Rsvd. Address of page table Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
X
D
Ignored Rsvd. Address of 4KB page frame Ign. G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-7. Formats of CR3 and Paging-Structure Entries with PAE Paging
4-24 Vol. 3
PAGING
4.5 IA-32E PAGING
A logical processor uses I A- 32e paging if CR0. PG = 1, CR4. PAE = 1, and
I A32_EFER. LME = 1. Wit h I A- 32e paging, linear address are t ranslat ed using a hier-
archy of in- memory paging st ruct ures locat ed using t he cont ent s of CR3. I A- 32e
paging t ranslat es 48- bit linear addresses t o 52- bit physical addresses.
1
Alt hough 52
bit s corresponds t o 4 PByt es, linear addresses are limit ed t o 48 bit s; at most 256
TByt es of linear- address space may be accessed at any given t ime.
I A- 32e paging uses a hierarchy of paging st ruct ures t o produce a t ranslat ion for a
linear address. CR3 is used t o locat e t he first paging- st ruct ure, t he PML4 t able. Use
of CR3 wit h I A- 32e paging depends on whet her process- cont ext ident ifiers ( PCI Ds)
have been enabled by set t ing CR4. PCI DE:
Table 4- 12 illust rat es how CR3 is used wit h I A- 32e paging if CR4. PCI DE = 0.
Table 4- 13 illust rat es how CR3 is used wit h I A- 32e paging if CR4. PCI DE = 1.
Aft er soft ware modifies t he value of CR4. PCI DE, t he logical processor immediat ely
begins using CR3 as specified for t he new value. For example, if soft ware changes
CR4. PCI DE from 1 t o 0, t he current PCI D immediat ely changes from CR3[ 11: 0] t o
000H ( see also Sect ion 4. 10. 4. 1) . I n addit ion, t he logical processor subsequent ly
1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used
by IA-32e paging. (The corresponding bits are reserved in the paging-structure entries.) See Sec-
tion 4.1.4 for how to determine MAXPHYADDR.
Table 4-12. Use of CR3 with IA-32e Paging and CR3.PCIDE = 0
Bit
Position(s)
Contents
2:0 Ignored
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the PML4 table during linear-address translation (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the PML4 table during linear-address translation (see Section 4.9.2)
11:5 Ignored
M1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address
translation
1
NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
63:M Reserved (must be 0)
Vol. 3 4-25
PAGING
det ermines t he memory t ype used t o access t he PML4 t able using CR3.PWT and
CR3.PCD, which had been bit s 4: 3 of t he PCI D.
I A- 32e paging may map linear addresses t o 4- KByt e pages, 2- MByt e pages, or 1-
GByt e pages.
1
Figure 4- 8 illust rat es t he t ranslat ion process when it produces a 4-
KByt e page; Figure 4- 9 covers t he case of a 2- MByt e page, and Figure 4- 10 t he case
of a 1- GByt e page. The following it ems describe t he I A- 32e paging process in more
det ail as well has how t he page size is det ermined:
A 4- KByt e nat urally aligned PML4 t able is locat ed at t he physical address
specified in bit s 51: 12 of CR3 ( see Table 4- 12) . A PML4 t able comprises 512 64-
bit ent ries ( PML4Es) . A PML4E is select ed using t he physical address defined as
follows:
Bit s 51: 12 are from CR3.
Bit s 11: 3 are bit s 47: 39 of t he linear address.
Bit s 2: 0 are all 0.
Because a PML4E is ident ified using bit s 47: 39 of t he linear address, it cont rols
access t o a 512- GByt e region of t he linear- address space.
A 4- KByt e nat urally aligned page- direct ory- point er t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PML4E ( see Table 4- 14) . A page-
direct ory- point er t able comprises 512 64- bit ent ries ( PDPTEs) . A PDPTE is
select ed using t he physical address defined as follows:
Bit s 51: 12 are from t he PML4E.
Bit s 11: 3 are bit s 38: 30 of t he linear address.
Bit s 2: 0 are all 0.
Table 4-13. Use of CR3 with IA-32e Paging and CR3.PCIDE = 1
Bit
Position(s)
Contents
11:0 PCID (see Section 4.10.1)
1
M1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address
translation
2
63:M Reserved (must be 0)
3
NOTES:
1. Section 4.9.2 explains how the processor determines the memory type used to access the PML4
table during linear-address translation with CR4.PCIDE = 1.
2. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
3. See Section 4.10.4.1 for use of bit 63 of the source operand of the MOV to CR3 instruction.
1. Not all processors support 1-GByte pages; see Section 4.1.4.
4-26 Vol. 3
PAGING
Because a PDPTE is ident ified using bit s 47: 30 of t he linear address, it cont rols
access t o a 1- GByt e region of t he linear- address space. Use of t he PDPTE depends on
it s PS flag ( bit 7) :
1
I f t he PDPTEs PS flag is 1, t he PDPTE maps a 1- GByt e page ( see Table 4- 15) . The
final physical address is comput ed as follows:
Bit s 51: 30 are from t he PDPTE.
Bit s 29: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page direct ory is locat ed at
t he physical address specified in bit s 51: 12 of t he PDPTE ( see Table 4- 16) . A
page direct ory comprises 512 64- bit ent ries ( PDEs) . A PDE is select ed using t he
physical address defined as follows:
Bit s 51: 12 are from t he PDPTE.
Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging
1. The PS flag of a PDPTE is reserved and must be 0 (if the P flag is 1) if 1-GByte pages are not sup-
ported. See Section 4.1.4 for how to determine whether 1-GByte pages are supported.
Directory Ptr
PTE
Linear Address
Page Table
PDPTE
CR3
39 38
Pointer Table
9
9
40
12
9
40
4-KByte Page
Offset
Physical Addr
PDE with PS=0
Table
0 11 12 20 21
Directory
30 29
Page-Directory-
Page-Directory
PML4
47
9
PML4E
40
40
40
Vol. 3 4-27
PAGING
Bit s 11: 3 are bit s 29: 21 of t he linear address.
Bit s 2: 0 are all 0.
Because a PDE is ident ified using bit s 47: 21 of t he linear address, it cont rols access
t o a 2- MByt e region of t he linear- address space. Use of t he PDE depends on it s PS
flag:
Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging
Directory Ptr
Linear Address
PDPTE
CR3
39 38
Pointer Table
9
9
40
21
31
2-MByte Page
Offset
Physical Addr
PDE with PS=1
0 20 21
Directory
30 29
Page-Directory-
Page-Directory
PML4
47
9
PML4E
40
40
4-28 Vol. 3
PAGING
Figure 4-10. Linear-Address Translation to a 1-GByte Page using IA-32e Paging
Directory Ptr
Linear Address
PDPTE with PS=1
CR3
39 38
Pointer Table
9
40
30
22
1-GByte Page
Offset
Physical Addr
0 30 29
Page-Directory-
PML4
47
9
PML4E
40
Vol. 3 4-29
PAGING
Table 4-14. Format of an IA-32e PML4 Entry (PML4E) that References a Page-
Directory-Pointer Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page-directory-pointer table
1 (R/W) Read/write; if 0, writes may not be allowed to the 512-GByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 512-GByte
region controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page-directory-pointer table referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page-directory-pointer table referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Reserved (must be 0)
11:8 Ignored
M1:12 Physical address of 4-KByte aligned page-directory-pointer table referenced by
this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 512-GByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
Maps a 1-GByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 1-GByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
4-30 Vol. 3
PAGING
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 1-GByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 1-GByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 1-GByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 1-GByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 1-GByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page directory; see
Table 4-16)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) Indirectly determines the memory type used to access the 1-GByte page
referenced by this entry (see Section 4.9.2)
1
29:13 Reserved (must be 0)
(M1):30 Physical address of the 1-GByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 1-GByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
NOTES:
1. The PAT is supported on all processors that support IA-32e paging.
Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
Maps a 1-GByte Page (Contd.)
Bit
Position(s)
Contents
Vol. 3 4-31
PAGING
I f t he PDEs PS flag is 1, t he PDE maps a 2- MByt e page ( see Table 4- 17) . The final
physical address is comput ed as follows:
Table 4-16. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that
References a Page Directory
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page directory
1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 1-GByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page directory referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 1-GByte page; see Table 4-15)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page directory referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 1-GByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 2-MByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by
this entry (depends on CPL and CR0.WP; see Section 4.6)
4-32 Vol. 3
PAGING
Bit s 51: 21 are from t he PDE.
Bit s 20: 0 are from t he original linear address.
I f t he PDEs PS flag is 0, a 4- KByt e nat urally aligned page t able is locat ed at t he
physical address specified in bit s 51: 12 of t he PDE ( see Table 4- 18) . A page t able
comprises 512 64- bit ent ries ( PTEs) . A PTE is select ed using t he physical address
defined as follows:
Bit s 51: 12 are from t he PDE.
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 2-MByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by
this entry (see Section 4.8)
7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see
Table 4-18)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
12 (PAT) Indirectly determines the memory type used to access the 2-MByte page
referenced by this entry (see Section 4.9.2)
20:13 Reserved (must be 0)
(M1):21 Physical address of the 2-MByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page
Bit
Position(s)
Contents
Vol. 3 4-33
PAGING
Bit s 11: 3 are bit s 20: 12 of t he linear address.
Bit s 2: 0 are all 0.
Because a PTE is ident ified using bit s 47: 12 of t he linear address, every PTE
maps a 4- KByt e page ( see Table 4- 19) . The final physical address is comput ed as
follows:
Bit s 51: 12 are from t he PTE.
Bit s 11: 0 are from t he original linear address.
I f a paging- st ruct ure ent rys P flag ( bit 0) is 0 or if t he ent ry set s any reserved bit , t he
ent ry is used neit her t o reference anot her paging- st ruct ure ent ry nor t o map a page.
A reference using a linear address whose t ranslat ion would use such a paging- st ruc-
t ure ent ry causes a page- fault except ion ( see Sect ion 4.7) .
Table 4-18. Format of an IA-32e Page-Directory Entry that References a Page Table
Bit
Position(s)
Contents
0 (P) Present; must be 1 to reference a page table
1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by
this entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte region
controlled by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the page table referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether this entry has been used for linear-address
translation (see Section 4.8)
6 Ignored
7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-17)
11:8 Ignored
(M1):12 Physical address of 4-KByte aligned page table referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 2-MByte region controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
4-34 Vol. 3
PAGING
The following bit s are reserved wit h I A- 32e paging:
I f t he P flag of a paging- st ruct ure ent ry is 1, bit s 51: MAXPHYADDR are reserved.
I f t he P flag of a PML4E is 1, t he PS flag is reserved.
I f 1- GByt e pages are not support ed and t he P flag of a PDPTE is 1, t he PS flag is
reserved.
1
Table 4-19. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page
Bit
Position(s)
Contents
0 (P) Present; must be 1 to map a 4-KByte page
1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this
entry (depends on CPL and CR0.WP; see Section 4.6)
2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page
referenced by this entry (see Section 4.6)
3 (PWT) Page-level write-through; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9.2)
4 (PCD) Page-level cache disable; indirectly determines the memory type used to access
the 4-KByte page referenced by this entry (see Section 4.9.2)
5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced
by this entry (see Section 4.8)
6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by
this entry (see Section 4.8)
7 (PAT) Indirectly determines the memory type used to access the 2-MByte page
referenced by this entry (see Section 4.9.2)
8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section
4.10); ignored otherwise
11:9 Ignored
(M1):12 Physical address of the 4-KByte page referenced by this entry
51:M Reserved (must be 0)
62:52 Ignored
63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed
from the 4-KByte page controlled by this entry; see Section 4.6); otherwise,
reserved (must be 0)
1. See Section 4.1.4 for how to determine whether 1-GByte pages are supported.
Vol. 3 4-35
PAGING
I f t he P flag and t he PS flag of a PDPTE are bot h 1, bit s 29: 13 are reserved.
I f t he P flag and t he PS flag of a PDE are bot h 1, bit s 20: 13 are reserved.
I f I A32_EFER. NXE = 0 and t he P flag of a paging- st ruct ure ent ry is 1, t he XD flag
( bit 63) is reserved.
A reference using a linear address t hat is successfully t ranslat ed t o a physical
address is performed only if allowed by t he access right s of t he t ranslat ion; see
Sect ion 4. 6.
Figure 4- 11 gives a summary of t he format s of CR3 and t he I A- 32e paging- st ruct ure
ent ries. For t he paging st ruct ure ent ries, it ident ifies separat ely t he format of ent ries
t hat map pages, t hose t hat reference ot her paging st ruct ures, and t hose t hat do
neit her because t hey are not present ; bit 0 ( P) and bit 7 ( PS) are highlight ed
because t hey det ermine how a paging- st ruct ure ent ry is used.
4.6 ACCESS RIGHTS
There is a t ranslat ion for a linear address if t he processes described in Sect ion 4. 3,
Sect ion 4. 4. 2, and Sect ion 4.5 ( depending upon t he paging mode) complet es and
produces a physical address. The accesses permit t ed by a t ranslat ion is det ermined
by t he access right s specified by t he paging- st ruct ure ent ries cont rolling t he t ransla-
t ion.
1
The following it ems det ail how paging det ermines access right s:
For accesses in supervisor mode ( CPL < 3) :
Dat a reads.
Dat a may be read from any linear address wit h a valid t ranslat ion.
Dat a writ es.
I f CR0. WP = 0, dat a may be writ t en t o any linear address wit h a valid
t ranslat ion.
I f CR0. WP = 1, dat a may be writ t en t o any linear address wit h a valid
t ranslat ion for which t he R/ W flag ( bit 1) is 1 in every paging- st ruct ure
ent ry cont rolling t he t ranslat ion.
I nst ruct ion fet ches.
For 32- bit paging or if I A32_EFER. NXE = 0, inst ruct ions may be fet ched
from any linear address wit h a valid t ranslat ion.
For PAE paging or I A- 32e paging wit h I A32_EFER.NXE = 1, inst ruct ions
may be fet ched from any linear address wit h a valid t ranslat ion for which
t he XD flag ( bit 63) is 0 in every paging- st ruct ure ent ry cont rolling t he
t ranslat ion.
For accesses in user mode ( CPL = 3) :
1. With PAE paging, the PDPTEs do not determine access rights.
4-36 Vol. 3
PAGING
Dat a reads.
Dat a may be read from any linear address wit h a valid t ranslat ion for which
6
3
6
2
6
1
6
0
5
9
5
8
5
7
5
6
5
5
5
4
5
3
5
2
5
1
M
1
M-1
3
2
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
Reserved
2
Address of PML4 table Ignored
P
C
D
P
W
T
Ign. CR3
X
D
3
Ignored Rsvd. Address of page-directory-pointer table Ign.
R
s
v
d
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PML4E:
present
Ignored 0
PML4E:
not
present
X
D
Ignored Rsvd.
Address of
1GB page
frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDPTE:
1GB
page
X
D
Ignored Rsvd. Address of page directory Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDPTE:
page
directory
Ignored 0
PDTPE:
not
present
X
D
Ignored Rsvd.
Address of
2MB page frame
Reserved
P
A
T
Ign. G 1 D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
2MB
page
X
D
Ignored Rsvd. Address of page table Ign. 0
I
g
n
A
P
C
D
P
W
T
U
/
S
R
/
W
1
PDE:
page
table
Ignored 0
PDE:
not
present
X
D
Ignored Rsvd. Address of 4KB page frame Ign. G
P
A
T
D A
P
C
D
P
W
T
U
/
S
R
/
W
1
PTE:
4KB
page
Ignored 0
PTE:
not
present
Figure 4-11. Formats of CR3 and Paging-Structure Entries with IA-32e Paging
NOTES:
1. M is an abbreviation for MAXPHYADDR.
2. Reserved fields must be 0.
3. If IA32_EFER.NXE = 0 and the P flag of a paging-structure entry is 1, the XD flag (bit 63) is reserved.
Vol. 3 4-37
PAGING
t he U/ S flag ( bit 2) is 1 in every paging- st ruct ure ent ry cont rolling t he t rans-
lat ion.
Dat a writ es.
Dat a may be writ t en t o any linear address wit h a valid t ranslat ion for which
bot h t he R/ W flag and t he U/ S flag are 1 in every paging- st ruct ure ent ry
cont rolling t he t ranslat ion.
I nst ruct ion fet ches.
For 32- bit paging or if I A32_EFER. NXE = 0, inst ruct ions may be fet ched
from any linear address wit h a valid t ranslat ion for which t he U/ S flag is 1
in every paging- st ruct ure ent ry cont rolling t he t ranslat ion.
For PAE paging or I A- 32e paging wit h I A32_EFER.NXE = 1, inst ruct ions
may be fet ched from any linear address wit h a valid t ranslat ion for which
t he U/ S flag is 1 and t he XD flag is 0 in every paging- st ruct ure ent ry
cont rolling t he t ranslat ion.
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . These st ruct ures may include informa-
t ion about access right s. The processor may enforce access right s based on t he TLBs
and paging- st ruct ure caches inst ead of on t he paging st ruct ures in memory.
This fact implies t hat , if soft ware modifies a paging- st ruct ure ent ry t o change access
right s, t he processor might not use t hat change for a subsequent access t o an
affect ed linear address ( see Sect ion 4. 10. 4. 3) . See Sect ion 4.10. 4. 2 for how soft -
ware can ensure t hat t he processor uses t he modified access right s.
4.7 PAGE-FAULT EXCEPTIONS
Accesses using linear addresses may cause page- f aul t ex cept i ons ( # PF; except ion
14) . An access t o a linear address may cause page- fault except ion for eit her of t wo
reasons: ( 1) t here is no valid t ranslat ion for t he linear address; or ( 2) t here is a valid
t ranslat ion for t he linear address, but it s access right s do not permit t he access.
As not ed in Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4. 5, t here is no valid t ranslat ion
for a linear address if t he t ranslat ion process for t hat address would use a paging-
st ruct ure ent ry in which t he P flag ( bit 0) is 0 or one t hat set s a reserved bit . I f t here
is a valid t ranslat ion for a linear address, it s access right s are det ermined as specified
in Sect ion 4. 6.
Figure 4- 12 illust rat es t he error code t hat t he processor provides on delivery of a
page- fault except ion. The following it ems explain how t he bit s in t he error code
describe t he nat ure of t he page- fault except ion:
P flag ( bit 0) .
This flag is 0 if t here is no valid t ranslat ion for t he linear address because t he P
flag was 0 in one of t he paging- st ruct ure ent ries used t o t ranslat e t hat address.
4-38 Vol. 3
PAGING
W/ R ( bit 1) .
I f t he access causing t he page- fault except ion was a writ e, t his flag is 1;
ot herwise, it is 0. This flag describes t he access causing t he page- fault except ion,
not t he access right s specified by paging.
U/ S ( bit 2) .
I f a user- mode ( CPL= 3) access caused t he page- fault except ion, t his flag is 1; it
is 0 if a supervisor- mode ( CPL < 3) access did so. This bit describes t he access
causing t he page- fault except ion, not t he access right s specified by paging.
RSVD flag ( bit 3) .
This flag is 1 if t here is no valid t ranslat ion for t he linear address because a
reserved bit was set in one of t he paging- st ruct ure ent ries used t o t ranslat e t hat
address. ( Because reserved bit s are not checked in a paging- st ruct ure ent ry
whose P flag is 0, bit 3 of t he error code can be set only if bit 0 is also set . )
Bit s reserved in t he paging- st ruct ure ent ries are reserved for fut ure funct ionalit y.
Soft ware developers should be aware t hat such bit s may be used in t he fut ure
and t hat a paging- st ruct ure ent ry t hat causes a page- fault except ion on one
processor might not do so in t he fut ure.
I / D flag ( bit 4) .
Use of t his flag depends on t he set t ings of CR4.PAE and I A32_EFER. NXE:
CR4. PAE = 0 ( 32- bit paging is in use) or I A32_EFER. NXE= 0.
This flag is 0.

Figure 4-12. Page-Fault Error Code
The fault was caused by a non-present page.
The fault was caused by a page-level protection violation.
The access causing the fault was a read.
The access causing the fault was a write.
The access causing the fault originated when the processor
was executing in supervisor mode (CPL < 3).
The access causing the fault originated when the processor
was executing in user mode (CPL = 3).
31 0
Reserved
1 2 3 4
The fault was not caused by reserved bit violation.
The fault was caused by a reserved bit set to 1 in some
P 0
1
W/R 0
1
U/S
0
RSVD
0
1
1
I
/
D
I/D
0 The fault was not caused by an instruction fetch.
1 The fault was caused by an instruction fetch.
P W
/
R
U
/
S
R
S
V
D
paging-structure entry.
Vol. 3 4-39
PAGING
CR4. PAE = 1 ( eit her PAE paging or I A- 32e paging is in use) and
I A32_EFER. NXE= 1.
I f t he access causing t he page- fault except ion was an inst ruct ion fet ch, t his
flag is 1; ot herwise, it is 0. This flag describes t he access causing t he page-
fault except ion, not t he access right s specified by paging.
Page- fault except ions occur only due t o an at t empt t o use a linear address. Failures
t o load t he PDPTE regist ers wit h PAE paging ( see Sect ion 4. 4. 1) cause general-
prot ect ion except ions ( # GP( 0) ) and not page- fault except ions.
4.8 ACCESSED AND DIRTY FLAGS
For any paging- st ruct ure ent ry t hat is used during linear- address t ranslat ion, bit 5 is
t he accessed flag.
1
For paging- st ruct ure ent ries t hat map a page ( as opposed t o
referencing anot her paging st ruct ure) , bit 6 is t he di r t y flag. These flags are
provided for use by memory- management soft ware t o manage t he t ransfer of pages
and paging st ruct ures int o and out of physical memory.
Whenever t he processor uses a paging- st ruct ure ent ry as part of linear- address
t ranslat ion, it set s t he accessed flag in t hat ent ry ( if it is not already set ) .
Whenever t here is a writ e t o a linear address, t he processor set s t he dirt y flag ( if it is
not already set ) in t he paging- st ruct ure ent ry t hat ident ifies t he final physical
address for t he linear address ( eit her a PTE or a paging- st ruct ure ent ry in which t he
PS flag is 1) .
Memory- management soft ware may clear t hese flags when a page or a paging st ruc-
t ure is init ially loaded int o physical memory. These flags are st icky, meaning t hat ,
once set , t he processor does not clear t hem; only soft ware can clear t hem.
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . This fact implies t hat , if soft ware
changes an accessed flag or a dirt y flag from 1 t o 0, t he processor might not set t he
corresponding bit in memory on a subsequent access using an affect ed linear
address ( see Sect ion 4.10. 4. 3) . See Sect ion 4. 10. 4. 2 for how soft ware can ensure
t hat t hese bit s are updat ed as desired.
NOTE
The accesses used by t he processor t o set t hese flags may or may not
be exposed t o t he processor s self- modifying code det ect ion logic. I f
t he processor is execut ing code from t he same memory area t hat is
being used for t he paging st ruct ures, t he set t ing of t hese flags may
or may not result in an immediat e change t o t he execut ing code
st ream.
1. With PAE paging, the PDPTEs are not used during linear-address translation but only to load the
PDPTE registers for some executions of the MOV CR instruction (see Section 4.4.1). For this rea-
son, the PDPTEs do not contain accessed flags with PAE paging.
4-40 Vol. 3
PAGING
4.9 PAGING AND MEMORY TYPING
The memor y t ype of a memory access refers t o t he t ype of caching used for t hat
access. Chapt er 11, Memory Cache Cont rol provides many det ails regarding
memory t yping in t he I nt el- 64 and I A- 32 archit ect ures. This sect ion describes how
paging cont ribut es t o t he det erminat ion of memory t yping.
The way in which paging cont ribut es t o memory t yping depends on whet her t he
processor support s t he Page At t r i but e Tabl e ( PAT; see Sect ion 11. 12) .
1
Sect ion
4.9. 1 and Sect ion 4. 9. 2 explain how paging cont ribut es t o memory t yping depending
on whet her t he PAT is support ed.
4.9.1 Paging and Memory Typing When the PAT is Not Supported
(Pentium Pro and Pentium II Processors)
NOTE
The PAT is support ed on all processors t hat support I A- 32e paging.
Thus, t his sect ion applies only t o 32- bit paging and PAE paging.
I f t he PAT is not support ed, paging cont ribut es t o memory t yping in conj unct ion wit h
t he memory- t ype range regist ers ( MTRRs) as specified in Table 11- 6 in Sect ion
11.5.2. 1.
For any access t o a physical address, t he t able combines t he memory t ype specified
for t hat physical address by t he MTRRs wit h a PCD value and a PWT value. The lat t er
t wo values are det ermined as follows:
For an access t o a PDE wit h 32- bit paging, t he PCD and PWT values come from
CR3.
For an access t o a PDE wit h PAE paging, t he PCD and PWT values come from t he
relevant PDPTE regist er.
For an access t o a PTE, t he PCD and PWT values come from t he relevant PDE.
For an access t o t he physical address t hat is t he t ranslat ion of a linear address,
t he PCD and PWT values come from t he relevant PTE ( if t he t ranslat ion uses a 4-
KByt e page) or t he relevant PDE ( ot herwise) .
4.9.2 Paging and Memory Typing When the PAT is Supported
(Pentium III and More Recent Processor Families)
I f t he PAT is support ed, paging cont ribut es t o memory t yping in conj unct ion wit h t he
PAT and t he memory- t ype range regist ers ( MTRRs) as specified in Table 11- 7 in
Sect ion 11. 5. 2. 2.
1. The PAT is supported on Pentium III and more recent processor families. See Section 4.1.4 for
how to determine whether the PAT is supported.
Vol. 3 4-41
PAGING
The PAT is a 64- bit MSR ( I A32_PAT; MSR index 277H) comprising eight ( 8) 8- bit
ent ries ( ent ry i comprises bit s 8i+ 7: 8i of t he MSR) .
For any access t o a physical address, t he t able combines t he memory t ype specified
for t hat physical address by t he MTRRs wit h a memory t ype select ed from t he PAT.
Table 11- 11 in Sect ion 11. 12.3 specifies how a memory t ype is select ed from t he PAT.
Specifically, it comes from ent ry i of t he PAT, where i is defined as follows:
For an access t o an ent ry in a paging st ruct ure whose address is in CR3 ( e. g., t he
PML4 t able wit h I A- 32e paging) :
For I A- 32e paging wit h CR4. PCI DE = 1, i = 0.
Ot herwise, i = 2* PCD+ PWT, where t he PCD and PWT values come from CR3.
For an access t o a PDE wit h PAE paging, i = 2* PCD+ PWT, where t he PCD and PWT
values come from t he relevant PDPTE regist er.
For an access t o a paging- st ruct ure ent ry X whose address is in anot her paging-
st ruct ure ent ry Y, i = 2* PCD+ PWT, where t he PCD and PWT values come from Y.
For an access t o t he physical address t hat is t he t ranslat ion of a linear address,
i = 4* PAT+ 2* PCD+ PWT, where t he PAT, PCD, and PWT values come from t he
relevant PTE ( if t he t ranslat ion uses a 4- KByt e page) , t he relevant PDE ( if t he
t ranslat ion uses a 2- MByt e page or a 4- MByt e page) , or t he relevant PDPTE ( if
t he t ranslat ion uses a 1- GByt e page) .
4.9.3 Caching Paging-Related Information about Memory Typing
A processor may cache informat ion from t he paging- st ruct ure ent ries in TLBs and
paging- st ruct ure caches ( see Sect ion 4. 10) . These st ruct ures may include informa-
t ion about memory t yping. The processor may memory- t yping informat ion from t he
TLBs and paging- st ruct ure caches inst ead of from t he paging st ruct ures in memory.
This fact implies t hat , if soft ware modifies a paging- st ruct ure ent ry t o change t he
memory- t yping bit s, t he processor might not use t hat change for a subsequent
t ranslat ion using t hat ent ry or for access t o an affect ed linear address. See Sect ion
4. 10. 4. 2 for how soft ware can ensure t hat t he processor uses t he modified memory
t yping.
4.10 CACHING TRANSLATION INFORMATION
The I nt el- 64 and I A- 32 archit ect ures may accelerat e t he address- t ranslat ion process
by caching dat a from t he paging st ruct ures on t he processor. Because t he processor
does not ensure t hat t he dat a t hat it caches are always consist ent wit h t he st ruct ures
in memory, it is import ant for soft ware developers t o underst and how and when t he
processor may cache such dat a. They should also underst and what act ions soft ware
can t ake t o remove cached dat a t hat may be inconsist ent and when it should do so.
This sect ion provides soft ware developers informat ion about t he relevant processor
operat ion.
4-42 Vol. 3
PAGING
Sect ion 4.10. 1 int roduces process- cont ext ident ifiers ( PCI Ds) , which a logical
processor may use t o dist inguish informat ion cached for different linear- address
spaces. Sect ion 4. 10. 2 and Sect ion 4.10. 3 describe how t he processor may cache
informat ion in t ranslat ion lookaside buffers ( TLBs) and paging- st ruct ure caches,
respect ively. Sect ion 4. 10. 4 explains how soft ware can remove inconsist ent cached
informat ion by invalidat ing port ions of t he TLBs and paging- st ruct ure caches. Sect ion
4.10. 5 describes special considerat ions for mult iprocessor syst ems.
4.10.1 Process-Context Identifiers (PCIDs)
Process- cont ext ident ifiers ( PCI Ds) are a facilit y by which a logical processor may
cache informat ion for mult iple linear- address spaces. The processor may ret ain
cached informat ion when soft ware swit ches t o a different linear- address space wit h a
different PCI D ( e. g., by loading CR3; see Sect ion 4. 10. 4. 1 for det ails) .
A PCI D is a 12- bit ident ifier. Non- zero PCI Ds are enabled by set t ing t he PCI DE flag
( bit 17) of CR4. I f CR4.PCI DE = 0, t he current PCI D is always 000H; ot herwise, t he
current PCI D is t he value of bit s 11: 0 of CR3. Not all processors allow CR4.PCI DE t o
be set t o 1; see Sect ion 4.1.4 for how t o det ermine whet her t his is allowed.
The processor ensures t hat CR4. PCI DE can be 1 only in I A- 32e mode ( t hus, 32- bit
paging and PAE paging use only PCI D 000H) . I n addit ion, soft ware can change
CR4. PCI DE from 0 t o 1 only if CR3[ 11: 0] = 000H. These requirement s are enforced
by t he following limit at ions on t he MOV CR inst ruct ion:
MOV t o CR4 causes a general- prot ect ion except ion ( # GP) if it would change
CR4.PCI DE from 0 t o 1 and eit her I A32_EFER. LMA = 0 or CR3[ 11: 0] 000H.
MOV t o CR0 causes a general- prot ect ion except ion if it would clear CR0.PG t o 0
while CR4. PCI DE = 1.
When a logical processor creat es ent ries in t he TLBs ( Sect ion 4. 10. 2) and paging-
st ruct ure caches ( Sect ion 4. 10. 3) , it associat es t hose ent ries wit h t he current PCI D.
When using ent ries in t he TLBs and paging- st ruct ure caches t o t ranslat e a linear
address, a logical processor uses only t hose ent ries associat ed wit h t he current PCI D
( see Sect ion 4.10.2.4 for an except ion) .
I f CR4. PCI DE = 0, a logical processor does not cache informat ion for any PCI D ot her
t han 000H. This is because ( 1) if CR4. PCI DE = 0, t he logical processor will associat e
any newly cached informat ion wit h t he current PCI D, 000H; and ( 2) if MOV t o CR4
clears CR4. PCI DE, all cached informat ion is invalidat ed ( see Sect ion 4. 10. 4. 1) .
NOTE
I n revisions of t his manual t hat were produced when no processors
allowed CR4. PCI DE t o be set t o 1, Sect ion 4. 10 discussed t he caching
of t ranslat ion informat ion wit hout any reference t o PCI Ds. While t he
sect ion now refers t o PCI Ds in it s specificat ion of t his caching, t his
document at ion change is not int ended t o imply any change t o t he
behavior of processors t hat do not allow CR4. PCI DE t o be set t o 1.
Vol. 3 4-43
PAGING
4.10.2 Translation Lookaside Buffers (TLBs)
A processor may cache informat ion about t he t ranslat ion of linear addresses in t rans-
lat ion lookaside buffers ( TLBs) . I n general, TLBs cont ain ent ries t hat map page
numbers t o page frames; t hese t erms are defined in Sect ion 4.10. 2. 1. Sect ion
4. 10. 2. 2 describes how informat ion may be cached in TLBs, and Sect ion 4.10. 2. 3
gives det ails of TLB usage. Sect ion 4. 10. 2. 4 explains t he global- page feat ure, which
allows soft ware t o indicat e t hat cert ain t ranslat ions should receive special t reat ment
when cached in t he TLBs.
4.10.2.1 Page Numbers, Page Frames, and Page Offsets
Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4.5 give det ails of how t he different paging
modes t ranslat e linear addresses t o physical addresses. Specifically, t he upper bit s of
a linear address ( called t he page number ) det ermine t he upper bit s of t he physical
address ( called t he page f r ame) ; t he lower bit s of t he linear address ( called t he
page of f set ) det ermine t he lower bit s of t he physical address. The boundary
bet ween t he page number and t he page offset is det ermined by t he page si ze.
Specifically:
32- bit paging:
I f t he t ranslat ion does not use a PTE ( because CR4. PSE = 1 and t he PS flag is
1 in t he PDE used) , t he page size is 4 MByt es and t he page number comprises
bit s 31: 22 of t he linear address.
I f t he t ranslat ion does use a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 31: 12 of t he linear address.
PAE paging:
I f t he t ranslat ion does not use a PTE ( because t he PS flag is 1 in t he PDE
used) , t he page size is 2 MByt es and t he page number comprises bit s 31: 21
of t he linear address.
I f t he t ranslat ion does uses a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 31: 12 of t he linear address.
I A- 32e paging:
I f t he t ranslat ion does not use a PDE ( because t he PS flag is 1 in t he PDPTE
used) , t he page size is 1 GByt es and t he page number comprises bit s 47: 30
of t he linear address.
I f t he t ranslat ion does use a PDE but does not uses a PTE ( because t he PS flag
is 1 in t he PDE used) , t he page size is 2 MByt es and t he page number
comprises bit s 47: 21 of t he linear address.
I f t he t ranslat ion does use a PTE, t he page size is 4 KByt es and t he page
number comprises bit s 47: 12 of t he linear address.
4-44 Vol. 3
PAGING
4.10.2.2 Caching Translations in TLBs
The processor may accelerat e t he paging process by caching individual t ranslat ions
in t r ansl at i on l ook asi de buf f er s ( TLBs) . Each ent ry in a TLB is an individual t rans-
lat ion. Each t ranslat ion is referenced by a page number. I t cont ains t he following
informat ion from t he paging- st ruct ure ent ries used t o t ranslat e linear addresses wit h
t he page number:
The physical address corresponding t o t he page number ( t he page frame) .
The access right s from t he paging- st ruct ure ent ries used t o t ranslat e linear
addresses wit h t he page number ( see Sect ion 4. 6) :
The logical- AND of t he R/ W flags.
The logical- AND of t he U/ S flags.
The logical- OR of t he XD flags ( necessary only if I A32_EFER. NXE = 1) .
At t ribut es from a paging- st ruct ure ent ry t hat ident ifies t he final page frame for
t he page number ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag is
1) :
The dirt y flag ( see Sect ion 4. 8) .
The memory t ype ( see Sect ion 4. 9) .
( TLB ent ries may cont ain ot her informat ion as well. A processor may implement
mult iple TLBs, and some of t hese may be for special purposes, e. g., only for inst ruc-
t ion fet ches. Such special- purpose TLBs may not cont ain some of t his informat ion if
it is not necessary. For example, a TLB used only for inst ruct ion fet ches need not
cont ain informat ion about t he R/ W and dirt y flags. )
As not ed in Sect ion 4. 10. 1, any TLB ent ries creat ed by a logical processor are associ-
at ed wit h t he current PCI D.
Processors need not implement any TLBs. Processors t hat do implement TLBs may
invalidat e any TLB ent ry at any t ime. Soft ware should not rely on t he exist ence of
TLBs or on t he ret ent ion of TLB ent ries.
4.10.2.3 Details of TLB Use
Because t he TLBs cache only valid t ranslat ions, t here can be a TLB ent ry for a page
number only if t he P flag is 1 and t he reserved bit s are 0 in each of t he paging- st ruc-
t ure ent ries used t o t ranslat e t hat page number. I n addit ion, t he processor does not
cache a t ranslat ion for a page number unless t he accessed flag is 1 in each of t he
paging- st ruct ure ent ries used during t ranslat ion; before caching a t ranslat ion, t he
processor set s any of t hese accessed flags t hat is not already 1.
The processor may cache t ranslat ions required for prefet ches and for accesses t hat
are a result of speculat ive execut ion t hat would never act ually occur in t he execut ed
code pat h.
I f t he page number of a linear address corresponds t o a TLB ent ry associat ed wit h t he
current PCI D, t he processor may use t hat TLB ent ry t o det ermine t he page frame,
Vol. 3 4-45
PAGING
access right s, and ot her at t ribut es for accesses t o t hat linear address. I n t his case,
t he processor may not act ually consult t he paging st ruct ures in memory. The
processor may ret ain a TLB ent ry unmodified even if soft ware subsequent ly modifies
t he relevant paging- st ruct ure ent ries in memory. See Sect ion 4.10. 4. 2 for how soft -
ware can ensure t hat t he processor uses t he modified paging- st ruct ure ent ries.
I f t he paging st ruct ures specify a t ranslat ion using a page larger t han 4 KByt es, some
processors may choose t o cache mult iple smaller- page TLB ent ries for t hat t ransla-
t ion. Each such TLB ent ry would be associat ed wit h a page number corresponding t o
t he smaller page size ( e. g., bit s 47: 12 of a linear address wit h I A- 32e paging) , even
t hough part of t hat page number ( e. g., bit s 20: 12) are part of t he offset wit h respect
t o t he page specified by t he paging st ruct ures. The upper bit s of t he physical address
in such a TLB ent ry are derived from t he physical address in t he PDE used t o creat e
t he t ranslat ion, while t he lower bit s come from t he linear address of t he access for
which t he t ranslat ion is creat ed. There is no way for soft ware t o be aware t hat
mult iple t ranslat ions for smaller pages have been used for a large page.
I f soft ware modifies t he paging st ruct ures so t hat t he page size used for a 4- KByt e
range of linear addresses changes, t he TLBs may subsequent ly cont ain mult iple
t ranslat ions for t he address range ( one for each page size) . A reference t o a linear
address in t he address range may use any of t hese t ranslat ions. Which t ranslat ion is
used may vary from one execut ion t o anot her, and t he choice may be implement a-
t ion- specific.
4.10.2.4 Global Pages
The I nt el- 64 and I A- 32 archit ect ures also allow for gl obal pages when t he PGE flag
( bit 7) is 1 in CR4. I f t he G flag ( bit 8) is 1 in a paging- st ruct ure ent ry t hat maps a
page ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag is 1) , any TLB
ent ry cached for a linear address using t hat paging- st ruct ure ent ry is considered t o
be gl obal . Because t he G flag is used only in paging- st ruct ure ent ries t hat map a
page, and because informat ion from such ent ries are not cached in t he paging- st ruc-
t ure caches, t he global- page feat ure does not affect t he behavior of t he paging-
st ruct ure caches.
A logical processor may use a global TLB ent ry t o t ranslat e a linear address, even if
t he TLB ent ry is associat ed wit h a PCI D different from t he current PCI D.
4.10.3 Paging-Structure Caches
I n addit ion t o t he TLBs, a processor may cache ot her informat ion about t he paging
st ruct ures in memory.
4.10.3.1 Caches for Paging Structures
A processor may support any or of all t he following paging- st ruct ure caches:
4-46 Vol. 3
PAGING
PML4 cache ( I A- 32e paging only) . Each PML4- cache ent ry is referenced by a 9-
bit value and is used for linear addresses for which bit s 47: 39 have t hat value.
The ent ry cont ains informat ion from t he PML4E used t o t ranslat e such linear
addresses:
The physical address from t he PML4E ( t he address of t he page- direct ory-
point er t able) .
The value of t he R/ W flag of t he PML4E.
The value of t he U/ S flag of t he PML4E.
The value of t he XD flag of t he PML4E.
The values of t he PCD and PWT flags of t he PML4E.
The following it ems det ail how a processor may use t he PML4 cache:
I f t he processor has a PML4- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E in memory) .
The processor does not creat e a PML4- cache ent ry unless t he P flag is 1 and
all reserved bit s are 0 in t he PML4E in memory.
The processor does not creat e a PML4- cache ent ry unless t he accessed flag is
1 in t he PML4E in memory; before caching a t ranslat ion, t he processor set s
t he accessed flag if it is not already 1.
The processor may creat e a PML4- cache ent ry even if t here are no t ransla-
t ions for any linear address t hat might use t hat ent ry ( e. g., because t he P
flags are 0 in all ent ries in t he referenced page- direct ory- point er t able) .
I f t he processor creat es a PML4- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E
in memory.
PDPTE cache ( I A- 32e paging only) .
1
Each PDPTE- cache ent ry is referenced by
an 18- bit value and is used for linear addresses for which bit s 47: 30 have t hat
value. The ent ry cont ains informat ion from t he PML4E and PDPTE used t o
t ranslat e such linear addresses:
The physical address from t he PDPTE ( t he address of t he page direct ory) . ( No
PDPTE- cache ent ry is creat ed for a PDPTE t hat maps a 1- GByt e page. )
The logical- AND of t he R/ W flags in t he PML4E and t he PDPTE.
The logical- AND of t he U/ S flags in t he PML4E and t he PDPTE.
The logical- OR of t he XD flags in t he PML4E and t he PDPTE.
The values of t he PCD and PWT flags of t he PDPTE.
The following it ems det ail how a processor may use t he PDPTE cache:
1. With PAE paging, the PDPTEs are stored in internal, non-architectural registers. The operation of
these registers is described in Section 4.4.1 and differs from that described here.
Vol. 3 4-47
PAGING
I f t he processor has a PDPTE- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E and t he
PDPTE in memory) .
The processor does not creat e a PDPTE- cache ent ry unless t he P flag is 1, t he
PS flag is 0, and t he reserved bit s are 0 in t he PML4E and t he PDPTE in
memory.
The processor does not creat e a PDPTE- cache ent ry unless t he accessed flags
are 1 in t he PML4E and t he PDPTE in memory; before caching a t ranslat ion,
t he processor set s any accessed flags t hat are not already 1.
The processor may creat e a PDPTE- cache ent ry even if t here are no t ransla-
t ions for any linear address t hat might use t hat ent ry.
I f t he processor creat es a PDPTE- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E
or PDPTE in memory.
PDE cache. The use of t he PDE cache depends on t he paging mode:
For 32- bit paging, each PDE- cache ent ry is referenced by a 10- bit value and
is used for linear addresses for which bit s 31: 22 have t hat value.
For PAE paging, each PDE- cache ent ry is referenced by an 11- bit value and is
used for linear addresses for which bit s 31: 21 have t hat value.
For I A- 32e paging, each PDE- cache ent ry is referenced by a 27- bit value and
is used for linear addresses for which bit s 47: 21 have t hat value.
A PDE- cache ent ry cont ains informat ion from t he PML4E, PDPTE, and PDE used t o
t ranslat e t he relevant linear addresses ( for 32- bit paging and PAE paging, only
t he PDE applies) :
The physical address from t he PDE ( t he address of t he page t able) . ( No PDE-
cache ent ry is creat ed for a PDE t hat maps a page. )
The logical-AND of t he R/ W flags in t he PML4E, PDPTE, and PDE.
The logical-AND of t he U/ S flags in t he PML4E, PDPTE, and PDE.
The logical- OR of t he XD flags in t he PML4E, PDPTE, and PDE.
The values of t he PCD and PWT flags of t he PDE.
The following it ems det ail how a processor may use t he PDE cache ( references
below t o PML4Es and PDPTEs apply on t o I A- 32e paging) :
I f t he processor has a PDE- cache ent ry for a linear address, it may use t hat
ent ry when t ranslat ing t he linear address ( inst ead of t he PML4E, t he PDPTE,
and t he PDE in memory) .
The processor does not creat e a PDE- cache ent ry unless t he P flag is 1, t he PS
flag is 0, and t he reserved bit s are 0 in t he PML4E, t he PDPTE, and t he PDE in
memory.
4-48 Vol. 3
PAGING
The processor does not creat e a PDE- cache ent ry unless t he accessed flag is
1 in t he PML4E, t he PDPTE, and t he PDE in memory; before caching a t rans-
lat ion, t he processor set s any accessed flags t hat are not already 1.
The processor may creat e a PDE- cache ent ry even if t here are no t ranslat ions
for any linear address t hat might use t hat ent ry.
I f t he processor creat es a PDE- cache ent ry, t he processor may ret ain it
unmodified even if soft ware subsequent ly modifies t he corresponding PML4E,
t he PDPTE, or t he PDE in memory.
I nformat ion from a paging- st ruct ure ent ry can be included in ent ries in t he paging-
st ruct ure caches for ot her paging- st ruct ure ent ries referenced by t he original ent ry.
For example, if t he R/ W flag is 0 in a PML4E, t hen t he R/ W flag will be 0 in any PDPTE-
cache ent ry for a PDPTE from t he page- direct ory- point er t able referenced by t hat
PML4E. This is because t he R/ W flag of each such PDPTE- cache ent ry is t he logical-
AND of t he R/ W flags in t he appropriat e PML4E and PDPTE.
The paging- st ruct ure caches cont ain informat ion only from paging- st ruct ure ent ries
t hat reference ot her paging st ruct ures ( and not t hose t hat map pages) . Because t he
G flag is not used in such paging- st ruct ure ent ries, t he global- page feat ure does not
affect t he behavior of t he paging- st ruct ure caches.
The processor may creat e ent ries in paging- st ruct ure caches for t ranslat ions
required for prefet ches and for accesses t hat are a result of speculat ive execut ion
t hat would never act ually occur in t he execut ed code pat h.
As not ed in Sect ion 4. 10. 1, any ent ries creat ed in paging- st ruct ure caches by a
logical processor are associat ed wit h t he current PCI D.
A processor may or may not implement any of t he paging- st ruct ure caches. Soft ware
should rely on neit her t heir presence nor t heir absence. The processor may invalidat e
ent ries in t hese caches at any t ime. Because t he processor may creat e t he cache
ent ries at t he t ime of t ranslat ion and not updat e t hem following subsequent modifi-
cat ions t o t he paging st ruct ures in memory, soft ware should t ake care t o invalidat e
t he cache ent ries appropriat ely when causing such modificat ions. The invalidat ion of
TLBs and t he paging- st ruct ure caches is described in Sect ion 4. 10. 4.
4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses
When a linear address is accessed, t he processor uses a procedure such as t he
following t o det ermine t he physical address t o which it t ranslat es and whet her t he
access should be allowed:
I f t he processor finds a TLB ent ry t hat is for t he page number of t he linear
address and t hat is associat ed wit h t he current PCI D ( or which is global) , it may
use t he physical address, access right s, and ot her at t ribut es from t hat ent ry.
I f t he processor does not find a relevant TLB ent ry, it may use t he upper bit s of
t he linear address t o select an ent ry from t he PDE cache t hat is associat ed wit h
t he current PCI D ( Sect ion 4. 10. 3. 1 indicat es which bit s are used in each paging
mode) . I t can t hen use t hat ent ry t o complet e t he t ranslat ion process ( locat ing a
Vol. 3 4-49
PAGING
PTE, et c. ) as if it had t raversed t he PDE ( and, for I A- 32e paging, t he PDPTE and
PML4) corresponding t o t he PDE- cache ent ry.
The following it ems apply when I A- 32e paging is used:
I f t he processor does not find a relevant TLB ent ry or a relevant PDE- cache
ent ry, it may use bit s 47: 30 of t he linear address t o select an ent ry from t he
PDPTE cache t hat is associat ed wit h t he current PCI D. I t can t hen use t hat
ent ry t o complet e t he t ranslat ion process ( locat ing a PDE, et c. ) as if it had
t raversed t he PDPTE and t he PML4 corresponding t o t he PDPTE- cache ent ry.
I f t he processor does not find a relevant TLB ent ry, a relevant PDE- cache
ent ry, or a relevant PDPTE- cache ent ry, it may use bit s 47: 39 of t he linear
address t o select an ent ry from t he PML4 cache t hat is associat ed wit h t he
current PCI D. I t can t hen use t hat ent ry t o complet e t he t ranslat ion process
( locat ing a PDPTE, et c. ) as if it had t raversed t he corresponding PML4.
( Any of t he above st eps would be skipped if t he processor does not support t he cache
in quest ion. )
I f t he processor does not find a TLB or paging- st ruct ure- cache ent ry for t he linear
address, it uses t he linear address t o t raverse t he ent ire paging- st ruct ure hierarchy,
as described in Sect ion 4. 3, Sect ion 4. 4. 2, and Sect ion 4.5.
4.10.3.3 Multiple Cached Entries for a Single Paging-Structure Entry
The paging- st ruct ure caches and TLBs and paging- st ruct ure caches may cont ain
mult iple ent ries associat ed wit h a single PCI D and wit h informat ion derived from a
single paging- st ruct ure ent ry. The following it ems give some examples for I A- 32e
paging:
Suppose t hat t wo PML4Es cont ain t he same physical address and t hus reference
t he same page- direct ory- point er t able. Any PDPTE in t hat t able may result in t wo
PDPTE- cache ent ries, each associat ed wit h a different set of linear addresses.
Specifically, suppose t hat t he n
1
t h
and n
2
t h
ent ries in t he PML4 t able cont ain t he
same physical address. This implies t hat t he physical address in t he m
t h
PDPTE in
t he page- direct ory- point er t able would appear in t he PDPTE- cache ent ries
associat ed wit h bot h p
1
and p
2
, where ( p
1
9) = n
1
, ( p
2
9) = n
2
, and ( p
1
&
1FFH) = ( p
2
& 1FFH) = m. This is because bot h PDPTE- cache ent ries use t he
same PDPTE, one result ing from a reference from t he n
1
t h
PML4E and one from
t he n
2
t h
PML4E.
Suppose t hat t he first PML4E ( i. e., t he one in posit ion 0) cont ains t he physical
address X in CR3 ( t he physical address of t he PML4 t able) . This implies t he
following:
Any PML4- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 39
cont ains address X.
Any PDPTE- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 30
cont ains address X. This is because t he t ranslat ion for a linear address for
which t he value of bit s 47: 30 is 0 uses t he value of bit s 47: 39 ( 0) t o locat e a
4-50 Vol. 3
PAGING
page- direct ory- point er t able at address X ( t he address of t he PML4 t able) . I t
t hen uses t he value of bit s 38: 30 ( also 0) t o find address X again and t o st ore
t hat address in t he PDPTE- cache ent ry.
Any PDE- cache ent ry associat ed wit h linear addresses wit h 0 in bit s 47: 21
cont ains address X for similar reasons.
Any TLB ent ry for page number 0 ( associat ed wit h linear addresses wit h 0 in
bit s 47: 12) t ranslat es t o page frame X 12 for similar reasons.
The same PML4E cont ribut es it s address X t o all t hese cache ent ries because t he
self- referencing nat ure of t he ent ry causes it t o be used as a PML4E, a PDPTE, a
PDE, and a PTE.
4.10.4 Invalidation of TLBs and Paging-Structure Caches
As not ed in Sect ion 4. 10. 2 and Sect ion 4. 10. 3, t he processor may creat e ent ries in
t he TLBs and t he paging- st ruct ure caches when linear addresses are t ranslat ed, and
it may ret ain t hese ent ries even aft er t he paging st ruct ures used t o creat e t hem have
been modified. To ensure t hat linear- address t ranslat ion uses t he modified paging
st ruct ures, soft ware should t ake act ion t o invalidat e any cached ent ries t hat may
cont ain informat ion t hat has since been modified.
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches
The following inst ruct ions invalidat e ent ries in t he TLBs and t he paging- st ruct ure
caches:
I NVLPG. This inst ruct ion t akes a single operand, which is a linear address. The
inst ruct ion invalidat es any TLB ent ries t hat are for a page number corresponding
t o t he linear address and t hat are associat ed wit h t he current PCI D. I t also
invalidat es any global TLB ent ries wit h t hat page number, regardless of PCI D ( see
Sect ion 4. 10. 2. 4) .
1
I NVLPG also invalidat es all ent ries in all paging- st ruct ure
caches associat ed wit h t he current PCI D, regardless of t he linear addresses t o
which t hey correspond.
MOV t o CR3. The behavior of t he inst ruct ion depends on t he value of CR4. PCI DE:
I f CR4. PCI DE = 0, t he inst ruct ion invalidat es all TLB ent ries associat ed wit h
PCI D 000H except t hose for global pages. I t also invalidat es all ent ries in all
paging- st ruct ure caches associat ed wit h PCI D 000H.
I f CR4. PCI DE = 1 and bit 63 of t he inst ruct ions source operand is 0, t he
inst ruct ion invalidat es all TLB ent ries associat ed wit h t he PCI D specified in
bit s 11: 0 of t he inst ruct ions source operand except t hose for global pages. I t
also invalidat es all ent ries in all paging- st ruct ure caches associat ed wit h t hat
1. If the paging structures map the linear address using a page larger than 4 KBytes and there are
multiple TLB entries for that page (see Section 4.10.2.3), the instruction invalidates all of them.
Vol. 3 4-51
PAGING
PCI D. I t is not required t o invalidat e ent ries in t he TLBs and paging- st ruct ure
caches t hat are associat ed wit h ot her PCI Ds.
I f CR4.PCI DE = 1 and bit 63 of t he inst ruct ions source operand is 1, t he
inst ruct ion is not required t o invalidat e any TLB ent ries or ent ries in paging-
st ruct ure caches.
MOV t o CR4. The inst ruct ion invalidat es all TLB ent ries ( including global ent ries)
and all ent ries in all paging- st ruct ure caches ( for all PCI Ds) if eit her ( 1) it
changes t he value of t he CR4. PGE flag;
1
or ( 2) it changes t he value of t he
CR4. PCI DE from 1 t o 0.
Task swit ch. I f a t ask swit ch changes t he value of CR3, it invalidat es all TLB
ent ries associat ed wit h PCI D 000H except t hose for global pages. I t also
invalidat es all ent ries in all paging- st ruct ure caches for associat ed wit h PCI D
000H.
2
VMX t ransit ions. See Sect ion 4. 11. 1.
The processor is always free t o invalidat e addit ional ent ries in t he TLBs and paging-
st ruct ure caches. The following are some examples:
I NVLPG may invalidat e TLB ent ries for pages ot her t han t he one corresponding t o
it s linear- address operand. I t may invalidat e TLB ent ries and paging- st ruct ure-
cache ent ries associat ed wit h PCI Ds ot her t han t he current PCI D.
MOV t o CR3 may invalidat e TLB ent ries for global pages. I f CR4. PCI DE = 1 and
bit 63 of t he inst ruct ions source operand is 0, it may invalidat e TLB ent ries and
ent ries in t he paging- st ruct ure caches associat ed wit h PCI Ds ot her t han t he
current PCI D. I t may invalidat e ent ries if CR4. PCI DE = 1 and bit 63 of t he
inst ruct ions source operand is 1.
On a processor support ing Hyper-Threading Technology, invalidat ions performed
on one logical processor may invalidat e ent ries in t he TLBs and paging- st ruct ure
caches used by ot her logical processors.
( Ot her inst ruct ions and operat ions may invalidat e ent ries in t he TLBs and t he paging-
st ruct ure caches, but t he inst ruct ions ident ified above are recommended. )
I n addit ion t o t he inst ruct ions ident ified above, page fault s invalidat e ent ries in t he
TLBs and paging- st ruct ure caches. I n part icular, a page- fault except ion result ing
from an at t empt t o use a linear address will invalidat e any TLB ent ries t hat are for a
page number corresponding t o t hat linear address and t hat are associat ed wit h t he
current PCI D. it also invalidat es all ent ries in t he paging- st ruct ure caches t hat would
be used for t hat linear address and t hat are associat ed wit h t he current PCI D.
3
These
1. If CR4.PGE is changing from 0 to 1, there were no global TLB entries before the execution; if
CR4.PGE is changing from 1 to 0, there will be no global TLB entries after the execution.
2. Task switches do not occur in IA-32e mode and thus cannot occur with IA-32e paging. Since
CR4.PCIDE can be set only with IA-32e paging, task switches occur only with CR4.PCIDE = 0.
3. Unlike INVLPG, page faults need not invalidate all entries in the paging-structure caches, only
those that would be used to translate the faulting linear address.
4-52 Vol. 3
PAGING
invalidat ions ensure t hat t he page- fault except ion will not recur ( if t he fault ing
inst ruct ion is re- execut ed) if it would not be caused by t he cont ent s of t he paging
st ruct ures in memory ( and if, t herefore, it result ed from cached ent ries t hat were not
invalidat ed aft er t he paging st ruct ures were modified in memory) .
As not ed in Sect ion 4. 10. 2, some processors may choose t o cache mult iple smaller-
page TLB ent ries for a t ranslat ion specified by t he paging st ruct ures t o use a page
larger t han 4 KByt es. There is no way for soft ware t o be aware t hat mult iple t ransla-
t ions for smaller pages have been used for a large page. The I NVLPG inst ruct ion and
page fault s provide t he same assurances t hat t hey provide when a single TLB ent ry is
used: t hey invalidat e all TLB ent ries corresponding t o t he t ranslat ion specified by t he
paging st ruct ures.
4.10.4.2 Recommended Invalidation
The following it ems provide some recommendat ions regarding when soft ware should
perform invalidat ions:
I f soft ware modifies a paging- st ruct ure ent ry t hat ident ifies t he final page frame
for a page number ( eit her a PTE or a paging- st ruct ure ent ry in which t he PS flag
is 1) , it should execut e I NVLPG for any linear address wit h a page number whose
t ranslat ion uses t hat PTE.
1
( I f t he paging- st ruct ure ent ry may be used in t he
t ranslat ion of different page numbers see Sect ion 4. 10. 3. 3 soft ware should
execut e I NVLPG for linear addresses wit h each of t hose page numbers; alt erna-
t ively, it could use MOV t o CR3 or MOV t o CR4. )
I f soft ware modifies a paging- st ruct ure ent ry t hat references anot her paging
st ruct ure, it may use one of t he following approaches depending upon t he t ypes
and number of t ranslat ions cont rolled by t he modified ent ry:
Execut e I NVLPG for linear addresses wit h each of t he page numbers wit h
t ranslat ions t hat would use t he ent ry. However, if no page numbers t hat
would use t he ent ry have t ranslat ions ( e. g., because t he P flags are 0 in all
ent ries in t he paging st ruct ure referenced by t he modified ent ry) , it remains
necessary t o execut e I NVLPG at least once.
Execut e MOV t o CR3 if t he modified ent ry cont rols no global pages.
Execut e MOV t o CR4 t o modify CR4. PGE.
I f CR4. PCI DE = 1 and soft ware modifies a paging- st ruct ure ent ry t hat does not
map a page or in which t he G flag ( bit 8) is 0, addit ional st eps are required if t he
ent ry may be used for PCI Ds ot her t han t he current one. Any one of t he following
suffices:
Execut e MOV t o CR4 t o modify CR4. PGE, eit her immediat ely or before again
using any of t he affect ed PCI Ds. For example, soft ware could use different
( previously unused) PCI Ds for t he processes t hat used t he affect ed PCI Ds.
1. One execution of INVLPG is sufficient even for a page with size greater than 4 KBytes.
Vol. 3 4-53
PAGING
For each affect ed PCI D, execut e MOV t o CR3 t o make t hat PCI D current ( and
t o load t he address of t he appropriat e PML4 t able) . I f t he modified ent ry
cont rols no global pages and bit 63 of t he source operand t o MOV t o CR3 was
0, no furt her st eps are required. Ot herwise, execut e I NVLPG for linear
addresses wit h each of t he page numbers wit h t ranslat ions t hat would use
t he ent ry; if no page numbers t hat would use t he ent ry have t ranslat ions,
execut e I NVLPG at least once.
I f soft ware using PAE paging modifies a PDPTE, it should reload CR3 wit h t he
regist er s current value t o ensure t hat t he modified PDPTE is loaded int o t he
corresponding PDPTE regist er ( see Sect ion 4.4. 1) .
I f t he nat ure of t he paging st ruct ures is such t hat a single ent ry may be used for
mult iple purposes ( see Sect ion 4. 10. 3. 3) , soft ware should perform invalidat ions
for all of t hese purposes. For example, if a single ent ry might serve as bot h a PDE
and PTE, it may be necessary t o execut e I NVLPG wit h t wo ( or more) linear
addresses, one t hat uses t he ent ry as a PDE and one t hat uses it as a PTE. ( Alt er-
nat ively, soft ware could use MOV t o CR3 or MOV t o CR4. )
As not ed in Sect ion 4.10. 2, t he TLBs may subsequent ly cont ain mult iple t ransla-
t ions for t he address range if soft ware modifies t he paging st ruct ures so t hat t he
page size used for a 4- KByt e range of linear addresses changes. A reference t o a
linear address in t he address range may use any of t hese t ranslat ions.
Soft ware wishing t o prevent t his uncert aint y should not writ e t o a paging-
st ruct ure ent ry in a way t hat would change, for any linear address, bot h t he page
size and eit her t he page frame, access right s, or ot her at t ribut es. I t can inst ead
use t he following algorit hm: first clear t he P flag in t he relevant paging- st ruct ure
ent ry ( e. g., PDE) ; t hen invalidat e any t ranslat ions for t he affect ed linear
addresses ( see Sect ion 4. 10. 4. 2) ; and t hen modify t he relevant paging- st ruct ure
ent ry t o set t he P flag and est ablish modified t ranslat ion( s) for t he new page size.
Soft ware should clear bit 63 of t he source operand t o a MOV t o CR3 inst ruct ion
t hat est ablishes a PCI D t hat had been used earlier for a different linear- address
space ( e. g., wit h a different value in bit s 51: 12 of CR3) . This ensures invalidat ion
of any informat ion t hat may have been cached for t he previous linear- address
space.
This assumes t hat bot h linear- address spaces use t he same global pages and
t hat it is t hus not necessary t o invalidat e any global TLB ent ries. I f t hat is not t he
case, soft ware should invalidat e t hose ent ries by execut ing MOV t o CR4 t o modify
CR4. PGE.
4.10.4.3 Optional Invalidation
The following it ems describe cases in which soft ware may choose not t o invalidat e
and t he pot ent ial consequences of t hat choice:
I f a paging- st ruct ure ent ry is modified t o change t he P flag from 0 t o 1, no inval-
idat ion is necessary. This is because no TLB ent ry or paging- st ruct ure cache ent ry
is creat ed wit h informat ion from a paging- st ruct ure ent ry in which t he P flag is 0.
1
4-54 Vol. 3
PAGING
I f a paging- st ruct ure ent ry is modified t o change t he accessed flag from 0 t o 1,
no invalidat ion is necessary ( assuming t hat an invalidat ion was performed t he
last t ime t he accessed flag was changed from 1 t o 0) . This is because no TLB
ent ry or paging- st ruct ure cache ent ry is creat ed wit h informat ion from a paging-
st ruct ure ent ry in which t he accessed flag is 0.
I f a paging- st ruct ure ent ry is modified t o change t he R/ W flag from 0 t o 1, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed writ e access) but no ot her adverse behavior. Such
an except ion will occur at most once for each affect ed linear address ( see Sect ion
4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he U/ S flag from 0 t o 1, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed user- mode access) but no ot her adverse behavior.
Such an except ion will occur at most once for each affect ed linear address ( see
Sect ion 4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he XD flag from 1 t o 0, failure
t o perform an invalidat ion may result in a spurious page- fault except ion ( e. g.,
in response t o an at t empt ed inst ruct ion fet ch) but no ot her adverse behavior.
Such an except ion will occur at most once for each affect ed linear address ( see
Sect ion 4. 10. 4. 1) .
I f a paging- st ruct ure ent ry is modified t o change t he accessed flag from 1 t o 0,
failure t o perform an invalidat ion may result in t he processor not set t ing t hat bit
in response t o a subsequent access t o a linear address whose t ranslat ion uses t he
ent ry. Soft ware cannot int erpret t he bit being clear as an indicat ion t hat such an
access has not occurred.
I f soft ware modifies a paging- st ruct ure ent ry t hat ident ifies t he final physical
address for a linear address ( eit her a PTE or a paging- st ruct ure ent ry in which t he
PS flag is 1) t o change t he dirt y flag from 1 t o 0, failure t o perform an invalidat ion
may result in t he processor not set t ing t hat bit in response t o a subsequent writ e
t o a linear address whose t ranslat ion uses t he ent ry. Soft ware cannot int erpret
t he bit being clear as an indicat ion t hat such a writ e has not occurred.
The read of a paging- st ruct ure ent ry in t ranslat ing an address being used t o fet ch
an inst ruct ion may appear t o execut e before an earlier writ e t o t hat paging-
st ruct ure ent ry if t here is no serializing inst ruct ion bet ween t he writ e and t he
inst ruct ion fet ch. Not e t hat t he invalidat ing inst ruct ions ident ified in Sect ion
4. 10. 4. 1 are all serializing inst ruct ions.
Sect ion 4. 10. 3. 3 describes sit uat ions in which a single paging- st ruct ure ent ry
may cont ain informat ion cached in mult iple ent ries in t he paging- st ruct ure
caches. Because all ent ries in t hese caches are invalidat ed by any execut ion of
I NVLPG, it is not necessary t o follow t he modificat ion of such a paging- st ruct ure
1. If it is also the case that no invalidation was performed the last time the P flag was changed
from 1 to 0, the processor may use a TLB entry or paging-structure cache entry that was cre-
ated when the P flag had earlier been 1.
Vol. 3 4-55
PAGING
ent ry by execut ing I NVLPG mult iple t imes solely for t he purpose of invalidat ing
t hese mult iple cached ent ries. ( I t may be necessary t o do so t o invalidat e
mult iple TLB ent ries. )
4.10.4.4 Delayed Invalidation
Required invalidat ions may be delayed under some circumst ances. Soft ware devel-
opers should underst and t hat , bet ween t he modificat ion of a paging- st ruct ure ent ry
and execut ion of t he invalidat ion inst ruct ion recommended in Sect ion 4. 10. 4. 2, t he
processor may use t ranslat ions based on eit her t he old value or t he new value of t he
paging- st ruct ure ent ry. The following it ems describe some of t he pot ent ial conse-
quences of delayed invalidat ion:
I f a paging- st ruct ure ent ry is modified t o change from 1 t o 0 t he P flag from 1 t o
0, an access t o a linear address whose t ranslat ion is cont rolled by t his ent ry may
or may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he R/ W flag from 0 t o 1, writ e
accesses t o linear addresses whose t ranslat ion is cont rolled by t his ent ry may or
may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he U/ S flag from 0 t o 1, user-
mode accesses t o linear addresses whose t ranslat ion is cont rolled by t his ent ry
may or may not cause a page- fault except ion.
I f a paging- st ruct ure ent ry is modified t o change t he XD flag from 1 t o 0,
inst ruct ion fet ches from linear addresses whose t ranslat ion is cont rolled by t his
ent ry may or may not cause a page- fault except ion.
As not ed in Sect ion 8. 1. 1, an x87 inst ruct ion or an SSE inst ruct ion t hat accesses dat a
larger t han a quadword may be implement ed using mult iple memory accesses. I f
such an inst ruct ion st ores t o memory and invalidat ion has been delayed, some of t he
accesses may complet e ( writ ing t o memory) while anot her causes a page- fault
except ion.
1
I n t his case, t he effect s of t he complet ed accesses may be visible t o soft -
ware even t hough t he overall inst ruct ion caused a fault .
I n some cases, t he consequences of delayed invalidat ion may not affect soft ware
adversely. For example, when freeing a port ion of t he linear- address space ( by
marking paging- st ruct ure ent ries not present ) , invalidat ion using I NVLPG may be
delayed if soft ware does not re- allocat e t hat port ion of t he linear- address space or
t he memory t hat had been associat ed wit h it . However, because of speculat ive
execut ion ( or errant soft ware) , t here may be accesses t o t he freed port ion of t he
linear- address space before t he invalidat ions occur. I n t his case, t he following can
happen:
Reads can occur t o t he freed port ion of t he linear- address space. Therefore,
invalidat ion should not be delayed for an address range t hat has read side
effect s.
1. If the accesses are to different pages, this may occur even if invalidation has not been delayed.
4-56 Vol. 3
PAGING
The processor may ret ain ent ries in t he TLBs and paging- st ruct ure caches for an
ext ended period of t ime. Soft ware should not assume t hat t he processor will not
use ent ries associat ed wit h a linear address simply because t ime has passed.
As not ed in Sect ion 4. 10. 3. 1, t he processor may creat e an ent ry in a paging-
st ruct ure cache even if t here are no t ranslat ions for any linear address t hat might
use t hat ent ry. Thus, if soft ware has marked not present all ent ries in page
t able, t he processor may subsequent ly creat e a PDE- cache ent ry for t he PDE t hat
references t hat page t able ( assuming t hat t he PDE it self is marked present ) .
I f soft ware at t empt s t o writ e t o t he freed port ion of t he linear- address space, t he
processor might not generat e a page fault . ( Such an at t empt would likely be t he
result of a soft ware error. ) For t hat reason, t he page frames previously
associat ed wit h t he freed port ion of t he linear- address space should not be
reallocat ed for anot her purpose unt il t he appropriat e invalidat ions have been
performed.
4.10.5 Propagation of Paging-Structure Changes to Multiple
Processors
As not ed in Sect ion 4. 10. 4, soft ware t hat modifies a paging- st ruct ure ent ry may
need t o invalidat e ent ries in t he TLBs and paging- st ruct ure caches t hat were derived
from t he modified ent ry before it was modified. I n a syst em cont aining more t han
one logical processor, soft ware must account for t he fact t hat t here may be ent ries in
t he TLBs and paging- st ruct ure caches of logical processors ot her t han t he one used
t o modify t he paging- st ruct ure ent ry. The process of propagat ing t he changes t o a
paging- st ruct ure ent ry is commonly referred t o as TLB shoot down.
TLB shoot down can be done using memory- based semaphores and/ or int erprocessor
int errupt s ( I PI ) . The following it ems describe a simple but inefficient example of a
TLB shoot down algorit hm for processors support ing t he I nt el- 64 and I A- 32 archit ec-
t ures:
1. Begin barrier: St op all but one logical processor; t hat is, cause all but one t o
execut e t he HLT inst ruct ion or t o ent er a spin loop.
2. Allow t he act ive logical processor t o change t he necessary paging- st ruct ure
ent ries.
3. Allow all logical processors t o perform invalidat ions appropriat e t o t he modifica-
t ions t o t he paging- st ruct ure ent ries.
4. Allow all logical processors t o resume normal operat ion.
Alt ernat ive, performance- opt imized, TLB shoot down algorit hms may be developed;
however, soft ware developers must t ake care t o ensure t hat t he following condit ions
are met :
All logical processors t hat are using t he paging st ruct ures t hat are being modified
must part icipat e and perform appropriat e invalidat ions aft er t he modificat ions
are made.
Vol. 3 4-57
PAGING
I f t he modificat ions t o t he paging- st ruct ure ent ries are made before t he barrier
or if t here is no barrier, t he operat ing syst em must ensure one of t he following:
( 1) t hat t he affect ed linear- address range is not used bet ween t he t ime of modifi-
cat ion and t he t ime of invalidat ion; or ( 2) t hat it is prepared t o deal wit h t he
consequences of t he affect ed linear- address range being used during t hat period.
For example, if t he operat ing syst em does not allow pages being freed t o be
reallocat ed for anot her purpose unt il aft er t he required invalidat ions, writ es t o
t hose pages by errant soft ware will not unexpect edly modify memory t hat is in
use.
Soft ware must be prepared t o deal wit h reads, inst ruct ion fet ches, and prefet ch
request s t o t he affect ed linear- address range t hat are a result of speculat ive
execut ion t hat would never act ually occur in t he execut ed code pat h.
When mult iple logical processors are using t he same linear- address space at t he
same t ime, t hey must coordinat e before any request t o modify t he paging- st ruct ure
ent ries t hat cont rol t hat linear- address space. I n t hese cases, t he barrier in t he TLB
shoot down rout ine may not be required. For example, when freeing a range of linear
addresses, some ot her mechanism can assure no logical processor is using t hat
range before t he request t o free it is made. I n t his case, a logical processor freeing
t he range can clear t he P flags in t he PTEs associat ed wit h t he range, free t he phys-
ical page frames associat ed wit h t he range, and t hen signal t he ot her logical proces-
sors using t hat linear- address space t o perform t he necessary invalidat ions. All t he
affect ed logical processors must complet e t heir invalidat ions before t he linear-
address range and t he physical page frames previously associat ed wit h t hat range
can be reallocat ed.
4.11 INTERACTIONS WITH VIRTUAL-MACHINE
EXTENSIONS (VMX)
The archit ect ure for virt ual- machine ext ensions ( VMX) includes feat ures t hat int eract
wit h paging. Sect ion 4. 11. 1 discusses ways in which VMX- specific cont rol t ransfers,
called VMX t ransit ions specially affect paging. Sect ion 4. 11. 2 gives an overview of
VMX feat ures specifically designed t o support address t ranslat ion.
4.11.1 VMX Transitions
The VMX archit ect ure defines t wo cont rol t ransfers called VM ent r i es and VM ex i t s;
collect ively, t hese are called VMX t r ansi t i ons. VM ent ries and VM exit s are
described in det ail in Chapt er 23 and Chapt er 24, respect ively, in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B. The following it ems
ident ify paging- relat ed det ails:
VMX t ransit ions modify t he CR0 and CR4 regist ers and t he I A32_EFER MSR
concurrent ly. For t his reason, t hey allow t ransit ions bet ween paging modes t hat
would not ot herwise be possible:
4-58 Vol. 3
PAGING
VM ent ries allow t ransit ions from I A- 32e paging direct ly t o eit her 32- bit
paging or PAE paging.
VM exit s allow t ransit ions from eit her 32- bit paging or PAE paging direct ly t o
I A- 32e paging.
VMX t ransit ions t hat result in PAE paging load t he PDPTE regist ers ( see Sect ion
4.4.1) as follows:
VM ent ries load t he PDPTE regist ers eit her from t he physical address being
loaded int o CR3 or from t he virt ual- machine cont rol st ruct ure ( VMCS) ; see
Sect ion 23.3. 2. 4.
VM exit s load t he PDPTE regist ers from t he physical address being loaded int o
CR3; see Sect ion 24. 5. 4.
VMX t ransit ions invalidat e t he TLBs and paging- st ruct ure caches based on cert ain
cont rol set t ings. See Sect ion 23. 3. 2. 5 and Sect ion 24. 5. 5 in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B.
4.11.2 VMX Support for Address Translation
Chapt er 25, VMX Support for Address Translat ion, in t he I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 3B describe t wo feat ures of t he
virt ual- machine ext ensions ( VMX) t hat int eract direct ly wit h paging. These are
vi r t ual - pr ocessor i dent i f i er s ( VPI Ds) and t he ex t ended page t abl e mechanism
( EPT) .
VPI Ds provide a way for soft ware t o ident ify t o t he processor t he address spaces for
different virt ual processors. The processor may use t his ident ificat ion t o maint ain
concurrent ly informat ion for mult iple address spaces in it s TLBs and paging- st ruct ure
caches, even when non- zero PCI Ds are not being used. See Sect ion 25. 1 for det ails.
When EPT is in use, t he addresses in t he paging- st ruct ures are not used as physical
addresses t o access memory and memory- mapped I / O. I nst ead, t hey are t reat ed as
guest - phy si cal addresses and are t ranslat ed t hrough a set of EPT paging st ruct ures
t o produce physical addresses. EPT can also specify it s own access right s and
memory t yping; t hese are used on conj unct ion wit h t hose specified in t his chapt er.
See Sect ion 25. 2 for more informat ion.
Bot h VPI Ds and EPT may change t he way t hat a processor maint ains informat ion in
TLBs and paging st ruct ure caches and t he ways in which soft ware can manage t hat
informat ion. Some of t he behaviors document ed in Sect ion 4. 10 may change. See
Sect ion 25. 3 for det ails.
4.12 USING PAGING FOR VIRTUAL MEMORY
Wit h paging, port ions of t he linear- address space need not be mapped t o t he phys-
ical- address space; dat a for t he unmapped addresses can be st ored ext ernally ( e. g.,
Vol. 3 4-59
PAGING
on disk) . This met hod of mapping t he linear- address space is referred t o as virt ual
memory or demand- paged virt ual memory.
Paging divides t he linear address space int o fixed- size pages t hat can be mapped int o
t he physical- address space and/ or ext ernal st orage. When a program ( or t ask) refer-
ences a linear address, t he processor uses paging t o t ranslat e t he linear address int o
a corresponding physical address if such an address is defined.
I f t he page cont aining t he linear address is not current ly mapped int o t he physical-
address space, t he processor generat es a page- fault except ion as described in
Sect ion 4. 7. The handler for page- fault except ions t ypically direct s t he operat ing
syst em or execut ive t o load dat a for t he unmapped page from ext ernal st orage int o
physical memory ( perhaps writ ing a different page from physical memory out t o
ext ernal st orage in t he process) and t o map it using paging ( by updat ing t he paging
st ruct ures) . When t he page has been loaded int o physical memory, a ret urn from t he
except ion handler causes t he inst ruct ion t hat generat ed t he except ion t o be
rest art ed.
Paging differs from segment at ion t hrough it s use of fixed- size pages. Unlike
segment s, which usually are t he same size as t he code or dat a st ruct ures t hey hold,
pages have a fixed size. I f segment at ion is t he only form of address t ranslat ion used,
a dat a st ruct ure present in physical memory will have all of it s part s in memory. I f
paging is used, a dat a st ruct ure can be part ly in memory and part ly in disk st orage.
4.13 MAPPING SEGMENTS TO PAGES
The segment at ion and paging mechanisms provide in t he support a wide variet y of
approaches t o memory management . When segment at ion and paging are combined,
segment s can be mapped t o pages in several ways. To implement a flat ( unseg-
ment ed) addressing environment , for example, all t he code, dat a, and st ack modules
can be mapped t o one or more large segment s ( up t o 4- GByt es) t hat share same
range of linear addresses ( see Figure 3- 2 in Sect ion 3. 2. 2) . Here, segment s are
essent ially invisible t o applicat ions and t he operat ing- syst em or execut ive. I f paging
is used, t he paging mechanism can map a single linear- address space ( cont ained in
a single segment ) int o virt ual memory. Alt ernat ively, each program ( or t ask) can
have it s own large linear- address space ( cont ained in it s own segment ) , which is
mapped int o virt ual memory t hrough it s own paging st ruct ures.
Segment s can be smaller t han t he size of a page. I f one of t hese segment s is placed
in a page which is not shared wit h anot her segment , t he ext ra memory is wast ed. For
example, a small dat a st ruct ure, such as a 1- Byt e semaphore, occupies 4 KByt es if it
is placed in a page by it self. I f many semaphores are used, it is more efficient t o pack
t hem int o a single page.
The I nt el- 64 and I A- 32 archit ect ures do not enforce correspondence bet ween t he
boundaries of pages and segment s. A page can cont ain t he end of one segment and
t he beginning of anot her. Similarly, a segment can cont ain t he end of one page and
t he beginning of anot her.
4-60 Vol. 3
PAGING
Memory- management soft ware may be simpler and more efficient if it enforces some
alignment bet ween page and segment boundaries. For example, if a segment which
can fit in one page is placed in t wo pages, t here may be t wice as much paging over-
head t o support access t o t hat segment .
One approach t o combining paging and segment at ion t hat simplifies memory-
management soft ware is t o give each segment it s own page t able, as shown in
Figure 4- 13. This convent ion gives t he segment a single ent ry in t he page direct ory,
and t his ent ry provides t he access cont rol informat ion for paging t he ent ire segment .
Figure 4-13. Memory Management Convention That Assigns a Page Table
to Each Segment
Seg. Descript.
LDT
Seg. Descript.
PDE
Page Directory
PDE
PTE
PTE
PTE
PTE
PTE
Page Tables
Page Frames
Vol. 3 5-1
CHAPTER 5
PROTECTION
I n prot ect ed mode, t he I nt el 64 and I A- 32 archit ect ures provide a prot ect ion mecha-
nism t hat operat es at bot h t he segment level and t he page level. This prot ect ion
mechanism provides t he abilit y t o limit access t o cert ain segment s or pages based on
privilege levels ( four privilege levels for segment s and t wo privilege levels for pages) .
For example, crit ical operat ing- syst em code and dat a can be prot ect ed by placing
t hem in more privileged segment s t han t hose t hat cont ain applicat ions code. The
processor s prot ect ion mechanism will t hen prevent applicat ion code from accessing
t he operat ing- syst em code and dat a in any but a cont rolled, defined manner.
Segment and page prot ect ion can be used at all st ages of soft ware development t o
assist in localizing and det ect ing design problems and bugs. I t can also be incorpo-
rat ed int o end- product s t o offer added robust ness t o operat ing syst ems, ut ilit ies soft -
ware, and applicat ions soft ware.
When t he prot ect ion mechanism is used, each memory reference is checked t o verify
t hat it sat isfies various prot ect ion checks. All checks are made before t he memory
cycle is st art ed; any violat ion result s in an except ion. Because checks are performed
in parallel wit h address t ranslat ion, t here is no performance penalt y. The prot ect ion
checks t hat are performed fall int o t he following cat egories:
Limit checks.
Type checks.
Privilege level checks.
Rest rict ion of addressable domain.
Rest rict ion of procedure ent ry- point s.
Rest rict ion of inst ruct ion set .
All prot ect ion violat ion result s in an except ion being generat ed. See Chapt er 6,
I nt errupt and Except ion Handling, for an explanat ion of t he except ion mechanism.
This chapt er describes t he prot ect ion mechanism and t he violat ions which lead t o
except ions.
The following sect ions describe t he prot ect ion mechanism available in prot ect ed
mode. See Chapt er 17, 8086 Emulat ion, for informat ion on prot ect ion in real-
address and virt ual- 8086 mode.
5.1 ENABLING AND DISABLING SEGMENT AND PAGE
PROTECTION
Set t ing t he PE flag in regist er CR0 causes t he processor t o swit ch t o prot ect ed mode,
which in t urn enables t he segment - prot ect ion mechanism. Once in prot ect ed mode,
5-2 Vol. 3
PROTECTION
t here is no cont rol bit for t urning t he prot ect ion mechanism on or off. The part of t he
segment - prot ect ion mechanism t hat is based on pr ivilege levels can essent ially be
disabled while st ill in pr ot ect ed mode by assigning a pr ivilege level of 0 ( most privi-
leged) t o all segment select ors and segment descript ors. This act ion disables t he
privilege level prot ect ion barriers bet ween segment s, but ot her prot ect ion checks
such as limit checking and t ype checking are st ill carried out .
Page- level prot ect ion is aut omat ically enabled when paging is enabled ( by set t ing t he
PG flag in regist er CR0) . Here again t here is no mode bit for t urning off page- level
prot ect ion once paging is enabled. However, page- level prot ect ion can be disabled by
performing t he following operat ions:
Clear t he WP flag in cont rol regist er CR0.
Set t he read/ writ e ( R/ W) and user/ supervisor ( U/ S) flags for each page- direct ory
and page- t able ent ry.
This act ion makes each page a writ able, user page, which in effect disables page-
level prot ect ion.
5.2 FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION
The processor s prot ect ion mechanism uses t he following fields and flags in t he
syst em dat a st ruct ures t o cont rol access t o segment s and pages:
Descr i pt or t ype ( S) f l ag ( Bit 12 in t he second doubleword of a segment
descript or. ) Det ermines if t he segment descript or is for a syst em segment or a
code or dat a segment .
Ty pe f i el d ( Bit s 8 t hrough 11 in t he second doubleword of a segment
descript or. ) Det ermines t he t ype of code, dat a, or syst em segment .
Li mi t f i el d ( Bit s 0 t hrough 15 of t he first doubleword and bit s 16 t hrough 19
of t he second doubleword of a segment descript or. ) Det ermines t he size of t he
segment , along wit h t he G flag and E flag ( for dat a segment s) .
G f l ag ( Bit 23 in t he second doubleword of a segment descript or. ) Det ermines
t he size of t he segment , along wit h t he limit field and E flag ( for dat a segment s) .
E f l ag ( Bit 10 in t he second doubleword of a dat a- segment descript or.)
Det ermines t he size of t he segment , along wit h t he limit field and G flag.
Descr i pt or pr i vi l ege l ev el ( DPL) f i el d ( Bit s 13 and 14 in t he second
doubleword of a segment descript or.) Det ermines t he privilege level of t he
segment .
Request ed pr i vi l ege l ev el ( RPL) f i el d ( Bit s 0 and 1 of any segment
select or. ) Specifies t he request ed privilege level of a segment select or.
Cur r ent pr i vi l ege l evel ( CPL) f i el d ( Bit s 0 and 1 of t he CS segment
regist er. ) I ndicat es t he privilege level of t he current ly execut ing program or
Vol. 3 5-3
PROTECTION
procedure. The t erm current privilege level ( CPL) refers t o t he set t ing of t his
field.
User / super vi sor ( U/ S) f l ag ( Bit 2 of paging- st ruct ure ent ries. ) Det ermines
t he t ype of page: user or supervisor.
Read/ w r i t e ( R/ W) f l ag ( Bit 1 of paging- st ruct ure ent ries. ) Det ermines t he
t ype of access allowed t o a page: read- only or read/ writ e.
Ex ecut e- di sabl e ( XD) f l ag ( Bit 63 of cert ain paging- st ruct ure ent ries. )
Det ermines t he t ype of access allowed t o a page: execut able or not - execut able.
Figure 5- 1 shows t he locat ion of t he various fields and flags in t he dat a, code, and
syst em- segment descript ors; Figure 3- 6 shows t he locat ion of t he RPL ( or CPL) field
in a segment select or ( or t he CS regist er) ; and Chapt er 4 ident ifies t he locat ions of
t he U/ S, R/ W, and XD flags in t he paging- st ruct ure ent ries.
5-4 Vol. 3
PROTECTION
Many different st yles of prot ect ion schemes can be implement ed wit h t hese fields
and flags. When t he operat ing syst em creat es a descript or, it places values in t hese
fields and flags in keeping wit h t he part icular prot ect ion st yle chosen for an operat ing
syst em or execut ive. Applicat ion program do not generally access or modify t hese
fields and flags.
Figure 5-1. Descriptor Fields Used for Protection
Base 23:16
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
1
0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
A
V
L
Limit
19:16
B
A W E 0
Data-Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
1
0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Base 23:16
A
V
L
Limit
19:16
D
A R C 1
Code-Segment Descriptor
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type 0
4
31 16 15 0
Base Address 15:00 Segment Limit 15:00 0
Limit
19:16
System-Segment Descriptor
A
B
C
D
DPL
Accessed
Big
Conforming
Default
Descriptor Privilege Level
Reserved
E
G
R
LIMIT
W
P
Expansion Direction
Granularity
Readable
Segment Limit
Writable
Present
0
AVL Available to Sys. Programmers
Vol. 3 5-5
PROTECTION
The following sect ions describe how t he processor uses t hese fields and flags t o
perform t he various cat egories of checks described in t he int roduct ion t o t his chapt er.
5.2.1 Code Segment Descriptor in 64-bit Mode
Code segment s cont inue t o exist in 64- bit mode even t hough, for address calcula-
t ions, t he segment base is t reat ed as zero. Some code- segment ( CS) descript or
cont ent ( t he base address and limit fields) is ignored; t he remaining fields funct ion
normally ( except for t he readable bit in t he t ype field) .
Code segment descript ors and select ors are needed in I A- 32e mode t o est ablish t he
processor s operat ing mode and execut ion privilege- level. The usage is as follows:
I A- 32e mode uses a previously unused bit in t he CS descript or. Bit 53 is defined
as t he 64- bit ( L) flag and is used t o select bet ween 64- bit mode and compat ibilit y
mode when I A- 32e mode is act ive ( I A32_EFER. LMA = 1) . See Figure 5- 2.
I f CS. L = 0 and I A- 32e mode is act ive, t he processor is running in compat i-
bilit y mode. I n t his case, CS. D select s t he default size for dat a and addresses.
I f CS. D = 0, t he default dat a and address size is 16 bit s. I f CS. D = 1, t he
default dat a and address size is 32 bit s.
I f CS. L = 1 and I A- 32e mode is act ive, t he only valid set t ing is CS. D = 0. This
set t ing indicat es a default operand size of 32 bit s and a default address size
of 64 bit s. The CS. L = 1 and CS. D = 1 bit combinat ion is reserved for fut ure
use and a # GP fault will be generat ed on an at t empt t o use a code segment
wit h t hese bit s set in I A- 32e mode.
I n I A- 32e mode, t he CS descript or s DPL is used for execut ion privilege checks
( as in legacy 32- bit mode) .
5-6 Vol. 3
PROTECTION
5.3 LIMIT CHECKING
The limit field of a segment descript or prevent s programs or procedures from
addressing memory locat ions out side t he segment . The effect ive value of t he limit
depends on t he set t ing of t he G ( granularit y) flag ( see Figure 5- 1) . For dat a
segment s, t he limit also depends on t he E ( expansion direct ion) flag and t he B
( default st ack point er size and/ or upper bound) flag. The E flag is one of t he bit s in
t he t ype field when t he segment descript or is for a dat a- segment t ype.
When t he G flag is clear ( byt e granularit y) , t he effect ive limit is t he value of t he
20- bit limit field in t he segment descript or. Here, t he limit ranges from 0 t o FFFFFH
( 1 MByt e) . When t he G flag is set ( 4- KByt e page granularit y) , t he processor scales
t he value in t he limit field by a fact or of 2
12
( 4 KByt es) . I n t his case, t he effect ive
limit ranges from FFFH ( 4 KByt es) t o FFFFFFFFH ( 4 GByt es) . Not e t hat when scaling
is used ( G flag is set ) , t he lower 12 bit s of a segment offset ( address) are not checked
against t he limit ; for example, not e t hat if t he segment limit is 0, offset s 0 t hrough
FFFH are st ill valid.
For all t ypes of segment s except expand- down dat a segment s, t he effect ive limit is
t he last address t hat is allowed t o be accessed in t he segment , which is one less t han
t he size, in byt es, of t he segment . The processor causes a general- prot ect ion excep-
t ion ( or, if t he segment is SS, a st ack- fault except ion) any t ime an at t empt is made
t o access t he following addresses in a segment :
A byt e at an offset great er t han t he effect ive limit
A word at an offset great er t han t he ( effect ive- limit 1)
Figure 5-2. Descriptor Fields with Flags used in IA-32e Mode
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
P G
D
P
L
Type
1
L
4
0
0
A
V
L
D
A R C 1
Code-Segment Descriptor
31
A
C
D
DPL
Accessed
Conforming
Default
Descriptor Privilege Level
G
R
Granularity
Readable
AVL Available to Sys. Programmers
L 64-Bit Flag
P Present
Vol. 3 5-7
PROTECTION
A doubleword at an offset great er t han t he ( effect ive- limit 3)
A quadword at an offset great er t han t he ( effect ive- limit 7)
A double quadword at an offset great er t han t he ( effect ive limit 15)
When t he effect ive limit is FFFFFFFFH ( 4 GByt es) , t hese accesses may or may not
cause t he indicat ed except ions. Behavior is implement at ion- specific and may vary
from one execut ion t o anot her.
For expand- down dat a segment s, t he segment limit has t he same funct ion but is
int erpret ed different ly. Here, t he effect ive limit specifies t he last address t hat is not
allowed t o be accessed wit hin t he segment ; t he range of valid offset s is from ( effec-
t ive- limit + 1) t o FFFFFFFFH if t he B flag is set and from ( effect ive- limit + 1) t o FFFFH
if t he B flag is clear. An expand- down segment has maximum size when t he segment
limit is 0.
Limit checking cat ches programming errors such as runaway code, runaway
subscript s, and invalid point er calculat ions. These errors are det ect ed when t hey
occur, so ident ificat ion of t he cause is easier. Wit hout limit checking, t hese errors
could overwrit e code or dat a in anot her segment .
I n addit ion t o checking segment limit s, t he processor also checks descript or t able
limit s. The GDTR and I DTR regist ers cont ain 16- bit limit values t hat t he processor
uses t o prevent programs from select ing a segment descript ors out side t he respec-
t ive descript or t ables. The LDTR and t ask regist ers cont ain 32- bit segment limit value
( read from t he segment descript ors for t he current LDT and TSS, respect ively) . The
processor uses t hese segment limit s t o prevent accesses beyond t he bounds of t he
current LDT and TSS. See Sect ion 3.5.1, Segment Descript or Tables, for more infor-
mat ion on t he GDT and LDT limit fields; see Sect ion 6.10, I nt errupt Descript or Table
( I DT) , for more informat ion on t he I DT limit field; and see Sect ion 7. 2. 4, Task
Regist er, for more informat ion on t he TSS segment limit field.
5.3.1 Limit Checking in 64-bit Mode
I n 64- bit mode, t he processor does not perform runt ime limit checking on code or
dat a segment s. However, t he processor does check descript or- t able limit s.
5.4 TYPE CHECKING
Segment descript ors cont ain t ype informat ion in t wo places:
The S ( descript or t ype) flag.
The t ype field.
The processor uses t his informat ion t o det ect programming errors t hat result in an
at t empt t o use a segment or gat e in an incorrect or unint ended manner.
The S flag indicat es whet her a descript or is a syst em t ype or a code or dat a t ype. The
t ype field provides 4 addit ional bit s for use in defining various t ypes of code, dat a,
5-8 Vol. 3
PROTECTION
and syst em descript ors. Table 3- 1 shows t he encoding of t he t ype field for code and
dat a descript ors; Table 3- 2 shows t he encoding of t he field for syst em descript ors.
The processor examines t ype informat ion at various t imes while operat ing on
segment select ors and segment descript ors. The following list gives examples of
t ypical operat ions where t ype checking is performed ( t his list is not exhaust ive) :
When a segment sel ect or i s l oaded i nt o a segment r egi st er Cert ain
segment regist ers can cont ain only cert ain descript or t ypes, for example:
The CS regist er only can be loaded wit h a select or for a code segment .
Segment select ors for code segment s t hat are not readable or for syst em
segment s cannot be loaded int o dat a- segment regist ers ( DS, ES, FS, and
GS) .
Only segment select ors of writ able dat a segment s can be loaded int o t he SS
regist er.
When a segment select or is loaded int o t he LDTR or t ask regist er For example:
The LDTR can only be loaded wit h a select or for an LDT.
The t ask regist er can only be loaded wit h a segment select or for a TSS.
When i nst r uct i ons access segment s w hose descr i pt or s ar e al r eady
l oaded i nt o segment r egi st er s Cert ain segment s can be used by inst ruc-
t ions only in cert ain predefined ways, for example:
No inst ruct ion may writ e int o an execut able segment .
No inst ruct ion may writ e int o a dat a segment if it is not writ able.
No inst ruct ion may read an execut able segment unless t he readable flag is
set .
When an i nst r uct i on oper and cont ai ns a segment sel ect or Cert ain
inst ruct ions can access segment s or gat es of only a part icular t ype, for example:
A far CALL or far JMP inst ruct ion can only access a segment descript or for a
conforming code segment , nonconforming code segment , call gat e, t ask
gat e, or TSS.
The LLDT inst ruct ion must reference a segment descript or for an LDT.
The LTR inst ruct ion must reference a segment descript or for a TSS.
The LAR inst ruct ion must reference a segment or gat e descript or for an LDT,
TSS, call gat e, t ask gat e, code segment , or dat a segment .
The LSL inst ruct ion must reference a segment descript or for a LDT, TSS, code
segment , or dat a segment .
I DT ent ries must be int errupt , t rap, or t ask gat es.
Dur i ng cer t ai n i nt er nal oper at i ons For example:
On a far call or far j ump ( execut ed wit h a far CALL or far JMP inst ruct ion) , t he
processor det ermines t he t ype of cont rol t ransfer t o be carried out ( call or
Vol. 3 5-9
PROTECTION
j ump t o anot her code segment , a call or j ump t hrough a gat e, or a t ask
swit ch) by checking t he t ype field in t he segment ( or gat e) descript or point ed
t o by t he segment ( or gat e) select or given as an operand in t he CALL or JMP
inst ruct ion. I f t he descript or t ype is for a code segment or call gat e, a call or
j ump t o anot her code segment is indicat ed; if t he descript or t ype is for a TSS
or t ask gat e, a t ask swit ch is indicat ed.
On a call or j ump t hrough a call gat e ( or on an int errupt - or except ion- handler
call t hrough a t rap or int errupt gat e) , t he processor aut omat ically checks t hat
t he segment descript or being point ed t o by t he gat e is for a code segment .
On a call or j ump t o a new t ask t hrough a t ask gat e ( or on an int errupt - or
except ion- handler call t o a new t ask t hrough a t ask gat e) , t he processor
aut omat ically checks t hat t he segment descript or being point ed t o by t he
t ask gat e is for a TSS.
On a call or j ump t o a new t ask by a direct reference t o a TSS, t he processor
aut omat ically checks t hat t he segment descript or being point ed t o by t he
CALL or JMP inst ruct ion is for a TSS.
On ret urn from a nest ed t ask ( init iat ed by an I RET inst ruct ion) , t he processor
checks t hat t he previous t ask link field in t he current TSS point s t o a TSS.
5.4.1 Null Segment Selector Checking
At t empt ing t o load a null segment select or ( see Sect ion 3.4.2, Segment Select ors )
int o t he CS or SS segment regist er generat es a general- prot ect ion except ion ( # GP) .
A null segment select or can be loaded int o t he DS, ES, FS, or GS regist er, but any
at t empt t o access a segment t hrough one of t hese regist ers when it is loaded wit h a
null segment select or result s in a # GP except ion being generat ed. Loading unused
dat a- segment regist ers wit h a null segment select or is a useful met hod of det ect ing
accesses t o unused segment regist ers and/ or prevent ing unwant ed accesses t o dat a
segment s.
5.4.1.1 NULL Segment Checking in 64-bit Mode
I n 64- bit mode, t he processor does not perform runt ime checking on NULL segment
select ors. The processor does not cause a # GP fault when an at t empt is made t o
access memory where t he referenced segment regist er has a NULL segment select or.
5.5 PRIVILEGE LEVELS
The processor s segment - prot ect ion mechanism recognizes 4 privilege levels,
numbered from 0 t o 3. The great er numbers mean lesser privileges. Figure 5- 3
shows how t hese levels of privilege can be int erpret ed as rings of prot ect ion.
5-10 Vol. 3
PROTECTION
The cent er ( reserved for t he most privileged code, dat a, and st acks) is used for t he
segment s cont aining t he crit ical soft ware, usually t he kernel of an operat ing syst em.
Out er rings are used for less crit ical soft ware. ( Syst ems t hat use only 2 of t he 4
possible privilege levels should use levels 0 and 3. )
The processor uses privilege levels t o prevent a program or t ask operat ing at a lesser
privilege level from accessing a segment wit h a great er privilege, except under
cont rolled sit uat ions. When t he processor det ect s a privilege level violat ion, it gener-
at es a general- prot ect ion except ion ( # GP) .
To carry out privilege- level checks bet ween code segment s and dat a segment s, t he
processor recognizes t he following t hree t ypes of privilege levels:
Cur r ent pr i vi l ege l evel ( CPL) The CPL is t he privilege level of t he current ly
execut ing program or t ask. I t is st ored in bit s 0 and 1 of t he CS and SS segment
regist ers. Normally, t he CPL is equal t o t he privilege level of t he code segment
from which inst ruct ions are being fet ched. The processor changes t he CPL when
program cont rol is t ransferred t o a code segment wit h a different privilege level.
The CPL is t reat ed slight ly different ly when accessing conforming code segment s.
Conforming code segment s can be accessed from any privilege level t hat is equal
t o or numerically great er ( less privileged) t han t he DPL of t he conforming code
segment . Also, t he CPL is not changed when t he processor accesses a conforming
code segment t hat has a different privilege level t han t he CPL.
Descr i pt or pr i vi l ege l evel ( DPL) The DPL is t he privilege level of a segment
or gat e. I t is st ored in t he DPL field of t he segment or gat e descript or for t he
segment or gat e. When t he current ly execut ing code segment at t empt s t o access
a segment or gat e, t he DPL of t he segment or gat e is compared t o t he CPL and
RPL of t he segment or gat e select or ( as described lat er in t his sect ion) . The DPL
Figure 5-3. Protection Rings
Level 0
Level 1
Level 2
Level 3
Protection Rings
Operating
Operating System
Services
System
Kernel
Applications
Vol. 3 5-11
PROTECTION
is int erpret ed different ly, depending on t he t ype of segment or gat e being
accessed:
Dat a segment The DPL indicat es t he numerically highest privilege level
t hat a program or t ask can have t o be allowed t o access t he segment . For
example, if t he DPL of a dat a segment is 1, only programs running at a CPL of
0 or 1 can access t he segment .
Nonconf or mi ng code segment ( w i t hout usi ng a cal l gat e) The DPL
indicat es t he privilege level t hat a program or t ask must be at t o access t he
segment . For example, if t he DPL of a nonconforming code segment is 0, only
programs running at a CPL of 0 can access t he segment .
Cal l gat e The DPL indicat es t he numerically highest privilege level t hat t he
current ly execut ing program or t ask can be at and st ill be able t o access t he
call gat e. ( This is t he same access rule as for a dat a segment . )
Conf or mi ng code segment and nonconf or mi ng code segment
accessed t hr ough a cal l gat e The DPL indicat es t he numerically lowest
privilege level t hat a program or t ask can have t o be allowed t o access t he
segment . For example, if t he DPL of a conforming code segment is 2,
programs running at a CPL of 0 or 1 cannot access t he segment .
TSS The DPL indicat es t he numerically highest privilege level t hat t he
current ly execut ing program or t ask can be at and st ill be able t o access t he
TSS. ( This is t he same access rule as for a dat a segment . )
Request ed pr i v i l ege l evel ( RPL) The RPL is an override privilege level t hat
is assigned t o segment select ors. I t is st ored in bit s 0 and 1 of t he segment
select or. The processor checks t he RPL along wit h t he CPL t o det ermine if access
t o a segment is allowed. Even if t he program or t ask request ing access t o a
segment has sufficient privilege t o access t he segment , access is denied if t he
RPL is not of sufficient privilege level. That is, if t he RPL of a segment select or is
numerically great er t han t he CPL, t he RPL overrides t he CPL, and vice versa. The
RPL can be used t o insure t hat privileged code does not access a segment on
behalf of an applicat ion program unless t he program it self has access privileges
for t hat segment . See Sect ion 5. 10. 4, Checking Caller Access Privileges ( ARPL
I nst ruct ion) , for a det ailed descript ion of t he purpose and t ypical use of t he RPL.
Privilege levels are checked when t he segment select or of a segment descript or is
loaded int o a segment regist er. The checks used for dat a access differ from t hose
used for t ransfers of program cont rol among code segment s; t herefore, t he t wo
kinds of accesses are considered separat ely in t he following sect ions.
5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA
SEGMENTS
To access operands in a dat a segment , t he segment select or for t he dat a segment
must be loaded int o t he dat a- segment regist ers ( DS, ES, FS, or GS) or int o t he st ack-
5-12 Vol. 3
PROTECTION
segment regist er ( SS) . ( Segment regist ers can be loaded wit h t he MOV, POP, LDS,
LES, LFS, LGS, and LSS inst ruct ions. ) Before t he processor loads a segment select or
int o a segment regist er, it performs a privilege check ( see Figure 5- 4) by comparing
t he privilege levels of t he current ly running program or t ask ( t he CPL) , t he RPL of t he
segment select or, and t he DPL of t he segment s segment descript or. The processor
loads t he segment select or int o t he segment regist er if t he DPL is numerically great er
t han or equal t o bot h t he CPL and t he RPL. Ot herwise, a general- prot ect ion fault is
generat ed and t he segment regist er is not loaded.
Figure 5- 5 shows four procedures ( locat ed in codes segment s A, B, C, and D) , each
running at different privilege levels and each at t empt ing t o access t he same dat a
segment .
1. The procedure in code segment A is able t o access dat a segment E using segment
select or E1, because t he CPL of code segment A and t he RPL of segment select or
E1 are equal t o t he DPL of dat a segment E.
2. The procedure in code segment B is able t o access dat a segment E using segment
select or E2, because t he CPL of code segment B and t he RPL of segment select or
E2 are bot h numerically lower t han ( more privileged) t han t he DPL of dat a
segment E. A code segment B procedure can also access dat a segment E using
segment select or E1.
3. The procedure in code segment C is not able t o access dat a segment E using
segment select or E3 ( dot t ed line) , because t he CPL of code segment C and t he
RPL of segment select or E3 are bot h numerically great er t han ( less privileged)
t han t he DPL of dat a segment E. Even if a code segment C procedure were t o use
segment select or E1 or E2, such t hat t he RPL would be accept able, it st ill could
not access dat a segment E because it s CPL is not privileged enough.
4. The procedure in code segment D should be able t o access dat a segment E
because code segment Ds CPL is numerically less t han t he DPL of dat a segment
Figure 5-4. Privilege Check for Data Access
CPL
RPL
DPL
Privilege
Check
Data-Segment Descriptor
CS Register
Segment Selector
For Data Segment
Vol. 3 5-13
PROTECTION
E. However, t he RPL of segment select or E3 ( which t he code segment D
procedure is using t o access dat a segment E) is numerically great er t han t he DPL
of dat a segment E, so access is not allowed. I f t he code segment D procedure
were t o use segment select or E1 or E2 t o access t he dat a segment , access would
be allowed.
As demonst rat ed in t he previous examples, t he addressable domain of a program or
t ask varies as it s CPL changes. When t he CPL is 0, dat a segment s at all privilege
levels are accessible; when t he CPL is 1, only dat a segment s at privilege levels 1
t hrough 3 are accessible; when t he CPL is 3, only dat a segment s at privilege level 3
are accessible.
The RPL of a segment select or can always override t he addressable domain of a
program or t ask. When properly used, RPLs can prevent problems caused by acci-
dent al ( or int ensional) use of segment select ors for privileged dat a segment s by less
privileged programs or procedures.
I t is import ant t o not e t hat t he RPL of a segment select or for a dat a segment is under
soft ware cont rol. For example, an applicat ion program running at a CPL of 3 can set
t he RPL for a dat a- segment select or t o 0. Wit h t he RPL set t o 0, only t he CPL checks,
not t he RPL checks, will provide prot ect ion against deliberat e, direct at t empt s t o
violat e privilege- level securit y for t he dat a segment . To prevent t hese t ypes of privi-
lege- level- check violat ions, a program or procedure can check access privileges
whenever it receives a dat a- segment select or from anot her procedure ( see Sect ion
5. 10. 4, Checking Caller Access Privileges ( ARPL I nst ruct ion) ) .
Figure 5-5. Examples of Accessing Data Segments From Various Privilege Levels
Data
Lowest Privilege
Highest Privilege
Segment E
3
2
1
0
CPL=1
CPL=3
CPL=0
DPL=2
CPL=2
Segment Sel. E3
RPL=3
Segment Sel. E1
RPL=2
Segment Sel. E2
RPL=1
Code
Segment C
Code
Segment A
Code
Segment B
Code
Segment D
5-14 Vol. 3
PROTECTION
5.6.1 Accessing Data in Code Segments
I n some inst ances it may be desirable t o access dat a st ruct ures t hat are cont ained in
a code segment . The following met hods of accessing dat a in code segment s are
possible:
Load a dat a- segment regist er wit h a segment select or for a nonconforming,
readable, code segment .
Load a dat a- segment regist er wit h a segment select or for a conforming,
readable, code segment .
Use a code- segment override prefix ( CS) t o read a readable, code segment
whose select or is already loaded in t he CS regist er.
The same rules for accessing dat a segment s apply t o met hod 1. Met hod 2 is always
valid because t he privilege level of a conforming code segment is effect ively t he
same as t he CPL, regardless of it s DPL. Met hod 3 is always valid because t he DPL of
t he code segment select ed by t he CS regist er is t he same as t he CPL.
5.7 PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS
REGISTER
Privilege level checking also occurs when t he SS regist er is loaded wit h t he segment
select or for a st ack segment . Here all privilege levels relat ed t o t he st ack segment
must mat ch t he CPL; t hat is, t he CPL, t he RPL of t he st ack- segment select or, and t he
DPL of t he st ack- segment descript or must be t he same. I f t he RPL and DPL are not
equal t o t he CPL, a general- prot ect ion except ion ( # GP) is generat ed.
5.8 PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING
PROGRAM CONTROL BETWEEN CODE SEGMENTS
To t ransfer program cont rol from one code segment t o anot her, t he segment select or
for t he dest inat ion code segment must be loaded int o t he code- segment regist er
( CS) . As part of t his loading process, t he processor examines t he segment descript or
for t he dest inat ion code segment and performs various limit , t ype, and privilege
checks. I f t hese checks are successful, t he CS regist er is loaded, program cont rol is
t ransferred t o t he new code segment , and program execut ion begins at t he inst ruc-
t ion point ed t o by t he EI P regist er.
Program cont rol t ransfers are carried out wit h t he JMP, CALL, RET, SYSENTER,
SYSEXI T, I NT n, and I RET inst ruct ions, as well as by t he except ion and int errupt
mechanisms. Except ions, int errupt s, and t he I RET inst ruct ion are special cases
discussed in Chapt er 6, I nt errupt and Except ion Handling. This chapt er discusses
only t he JMP, CALL, RET, SYSENTER, and SYSEXI T inst ruct ions.
A JMP or CALL inst ruct ion can reference anot her code segment in any of four ways:
Vol. 3 5-15
PROTECTION
The t arget operand cont ains t he segment select or for t he t arget code segment .
The t arget operand point s t o a call- gat e descript or, which cont ains t he segment
select or for t he t arget code segment .
The t arget operand point s t o a TSS, which cont ains t he segment select or for t he
t arget code segment .
The t arget operand point s t o a t ask gat e, which point s t o a TSS, which in t urn
cont ains t he segment select or for t he t arget code segment .
The following sect ions describe first t wo t ypes of references. See Sect ion 7.3, Task
Swit ching, for informat ion on t ransferring program cont rol t hrough a t ask gat e
and/ or TSS.
The SYSENTER and SYSEXI T inst ruct ions are special inst ruct ions for making fast calls
t o and ret urns from operat ing syst em or execut ive procedures. These inst ruct ions
are discussed briefly in Sect ion 5. 8. 7, Performing Fast Calls t o Syst em Procedures
wit h t he SYSENTER and SYSEXI T I nst ruct ions.
5.8.1 Direct Calls or Jumps to Code Segments
The near forms of t he JMP, CALL, and RET inst ruct ions t ransfer program cont rol
wit hin t he current code segment , so privilege- level checks are not performed. The far
forms of t he JMP, CALL, and RET inst ruct ions t ransfer cont rol t o ot her code segment s,
so t he processor does perform privilege- level checks.
When t ransferring program cont rol t o anot her code segment wit hout going t hrough a
call gat e, t he processor examines four kinds of privilege level and t ype informat ion
( see Figure 5- 6) :
The CPL. ( Here, t he CPL is t he privilege level of t he calling code segment ; t hat is,
t he code segment t hat cont ains t he procedure t hat is making t he call or j ump. )
Figure 5-6. Privilege Check for Control Transfer Without Using a Gate
CPL
RPL
DPL
Privilege
Check
CS Register
Segment Selector
For Code Segment
Destination Code
Segment Descriptor
C
5-16 Vol. 3
PROTECTION
The DPL of t he segment descript or for t he dest inat ion code segment t hat
cont ains t he called procedure.
The RPL of t he segment select or of t he dest inat ion code segment .
The conforming ( C) flag in t he segment descript or for t he dest inat ion code
segment , which det ermines whet her t he segment is a conforming ( C flag is set )
or nonconforming ( C flag is clear) code segment . See Sect ion 3.4. 5. 1, Code-
and Dat a- Segment Descript or Types, for more informat ion about t his flag.
The rules t hat t he processor uses t o check t he CPL, RPL, and DPL depends on t he
set t ing of t he C flag, as described in t he following sect ions.
5.8.1.1 Accessing Nonconforming Code Segments
When accessing nonconforming code segment s, t he CPL of t he calling procedure
must be equal t o t he DPL of t he dest inat ion code segment ; ot herwise, t he processor
generat es a general- prot ect ion except ion ( # GP) . For example in Figure 5- 7:
Code segment C is a nonconforming code segment . A procedure in code segment
A can call a procedure in code segment C ( using segment select or C1) because
t hey are at t he same privilege level ( CPL of code segment A is equal t o t he DPL of
code segment C) .
A procedure in code segment B cannot call a procedure in code segment C ( using
segment select or C2 or C1) because t he t wo code segment s are at different
privilege levels.
Vol. 3 5-17
PROTECTION
The RPL of t he segment select or t hat point s t o a nonconforming code segment has a
limit ed effect on t he privilege check. The RPL must be numerically less t han or equal
t o t he CPL of t he calling procedure for a successful cont rol t ransfer t o occur. So, in t he
example in Figure 5- 7, t he RPLs of segment select ors C1 and C2 could legally be set
t o 0, 1, or 2, but not t o 3.
When t he segment select or of a nonconforming code segment is loaded int o t he CS
regist er, t he privilege level field is not changed; t hat is, it remains at t he CPL ( which
is t he privilege level of t he calling procedure) . This is t rue, even if t he RPL of t he
segment select or is different from t he CPL.
5.8.1.2 Accessing Conforming Code Segments
When accessing conforming code segment s, t he CPL of t he calling procedure may be
numerically equal t o or great er t han ( less privileged) t he DPL of t he dest inat ion code
segment ; t he processor generat es a general- prot ect ion except ion ( # GP) only if t he
CPL is less t han t he DPL. ( The segment select or RPL for t he dest inat ion code segment
is not checked if t he segment is a conforming code segment . )
Figure 5-7. Examples of Accessing Conforming and Nonconforming Code Segments
From Various Privilege Levels
Code
Segment D
Code
Segment C
Code
Segment A
Lowest Privilege
Highest Privilege
CPL=3
Code
Segment B
Nonconforming
Code Segment
Conforming
Code Segment
3
2
1
0
CPL=2
DPL=2
DPL=1
Segment Sel. D1
RPL=2
Segment Sel. D2
RPL=3
Segment Sel. C2
RPL=3
Segment Sel. C1
RPL=2
5-18 Vol. 3
PROTECTION
I n t he example in Figure 5- 7, code segment D is a conforming code segment . There-
fore, calling procedures in bot h code segment A and B can access code segment D
( using eit her segment select or D1 or D2, respect ively) , because t hey bot h have CPLs
t hat are great er t han or equal t o t he DPL of t he conforming code segment . For
conf or mi ng code segment s, t he DPL r epr esent s t he numer i cal l y l ow est pr i v-
i l ege l ev el t hat a cal l i ng pr ocedur e may be at t o successf ul l y mak e a cal l t o
t he code segment .
( Not e t hat segment s select ors D1 and D2 are ident ical except for t heir respect ive
RPLs. But since RPLs are not checked when accessing conforming code segment s,
t he t wo segment select ors are essent ially int erchangeable.)
When program cont rol is t ransferred t o a conforming code segment , t he CPL does not
change, even if t he DPL of t he dest inat ion code segment is less t han t he CPL. This
sit uat ion is t he only one where t he CPL may be different from t he DPL of t he current
code segment . Also, since t he CPL does not change, no st ack swit ch occurs.
Conforming segment s are used for code modules such as mat h libraries and excep-
t ion handlers, which support applicat ions but do not require access t o prot ect ed
syst em facilit ies. These modules are part of t he operat ing syst em or execut ive soft -
ware, but t hey can be execut ed at numerically higher privilege levels ( less privileged
levels) . Keeping t he CPL at t he level of a calling code segment when swit ching t o a
conforming code segment prevent s an applicat ion program from accessing noncon-
forming code segment s while at t he privilege level ( DPL) of a conforming code
segment and t hus prevent s it from accessing more privileged dat a.
Most code segment s are nonconforming. For t hese segment s, program cont rol can
be t ransferred only t o code segment s at t he same level of privilege, unless t he
t ransfer is carried out t hrough a call gat e, as described in t he following sect ions.
5.8.2 Gate Descriptors
To provide cont rolled access t o code segment s wit h different privilege levels, t he
processor provides special set of descript ors called gat e descript ors. There are four
kinds of gat e descript ors:
Call gat es
Trap gat es
I nt errupt gat es
Task gat es
Task gat es are used for t ask swit ching and are discussed in Chapt er 7, Task Manage-
ment . Trap and int errupt gat es are special kinds of call gat es used for calling excep-
t ion and int errupt handlers. The are described in Chapt er 6, I nt errupt and Except ion
Handling. This chapt er is concerned only wit h call gat es.
Vol. 3 5-19
PROTECTION
5.8.3 Call Gates
Call gat es facilit at e cont rolled t ransfers of program cont rol bet ween different privi-
lege levels. They are t ypically used only in operat ing syst ems or execut ives t hat use
t he privilege- level prot ect ion mechanism. Call gat es are also useful for t ransferring
program cont rol bet ween 16- bit and 32- bit code segment s, as described in Sect ion
18. 4, Transferring Cont rol Among Mixed- Size Code Segment s.
Figure 5- 8 shows t he format of a call- gat e descript or. A call- gat e descript or may
reside in t he GDT or in an LDT, but not in t he int errupt descript or t able ( I DT) . I t
performs six funct ions:
I t specifies t he code segment t o be accessed.
I t defines an ent ry point for a procedure in t he specified code segment .
I t specifies t he privilege level required for a caller t rying t o access t he procedure.
I f a st ack swit ch occurs, it specifies t he number of opt ional paramet ers t o be
copied bet ween st acks.
I t defines t he size of values t o be pushed ont o t he t arget st ack: 16- bit gat es force
16- bit pushes and 32- bit gat es force 32- bit pushes.
I t specifies whet her t he call- gat e descript or is valid.
The segment select or field in a call gat e specifies t he code segment t o be accessed.
The offset field specifies t he ent ry point in t he code segment . This ent ry point is
generally t o t he first inst ruct ion of a specific procedure. The DPL field indicat es t he
privilege level of t he call gat e, which in t urn is t he privilege level required t o access
t he select ed procedure t hrough t he gat e. The P flag indicat es whet her t he call- gat e
descript or is valid. ( The presence of t he code segment t o which t he gat e point s is
indicat ed by t he P flag in t he code segment s descript or.) The paramet er count field
indicat es t he number of paramet ers t o copy from t he calling procedures st ack t o t he
new st ack if a st ack swit ch occurs ( see Sect ion 5. 8.5, St ack Swit ching ) . The param-
et er count specifies t he number of words for 16- bit call gat es and doublewords for
32- bit call gat es.
Figure 5-8. Call-Gate Descriptor
31 16 15 13 14 12 11 8 7 0
P Offset in Segment 31:16
D
P
L
Type
0
4
31 16 15 0
Segment Selector Offset in Segment 15:00 0
Param.
0 0 1 1
P
DPL
Gate Valid
Descriptor Privilege Level
Count
4 5 6
0 0 0
5-20 Vol. 3
PROTECTION
Not e t hat t he P flag in a gat e descript or is normally always set t o 1. I f it is set t o 0, a
not present ( # NP) except ion is generat ed when a program at t empt s t o access t he
descript or. The operat ing syst em can use t he P flag for special purposes. For
example, it could be used t o t rack t he number of t imes t he gat e is used. Here, t he P
flag is init ially set t o 0 causing a t rap t o t he not - present except ion handler. The
except ion handler t hen increment s a count er and set s t he P flag t o 1, so t hat on
ret urning from t he handler, t he gat e descript or will be valid.
5.8.3.1 IA-32e Mode Call Gates
Call- gat e descript ors in 32- bit mode provide a 32- bit offset for t he inst ruct ion point er
( EI P) ; 64- bit ext ensions double t he size of 32- bit mode call gat es in order t o st ore
64- bit inst ruct ion point ers ( RI P) . See Figure 5- 9:
The first eight byt es ( byt es 7: 0) of a 64- bit mode call gat e are similar but not
ident ical t o legacy 32- bit mode call gat es. The paramet er- copy- count field has
been removed.
Byt es 11: 8 hold t he upper 32 bit s of t he t arget - segment offset in canonical form.
A general- prot ect ion except ion ( # GP) is generat ed if soft ware at t empt s t o use a
call gat e wit h a t arget offset t hat is not in canonical form.
16- byt e descript ors may reside in t he same descript or t able wit h 16- bit and
32- bit descript ors. A t ype field, used for consist ency checking, is defined in bit s
12: 8 of t he 64- bit descript or s highest dword ( cleared t o zero) . A general-
prot ect ion except ion ( # GP) result s if an at t empt is made t o access t he upper half
of a 64- bit mode descript or as a 32- bit mode descript or.
Vol. 3 5-21
PROTECTION
Target code segment s referenced by a 64- bit call gat e must be 64- bit code
segment s ( CS.L = 1, CS. D = 0) . I f not , t he reference generat es a general-
prot ect ion except ion, # GP ( CS select or) .
Only 64- bit mode call gat es can be referenced in I A- 32e mode ( 64- bit mode and
compat ibilit y mode) . The legacy 32- bit mode call gat e t ype ( 0CH) is redefined in
I A- 32e mode as a 64- bit call- gat e t ype; no 32- bit call- gat e t ype exist s in I A- 32e
mode.
I f a far call references a 16- bit call gat e t ype ( 04H) in I A- 32e mode, a general-
prot ect ion except ion ( # GP) is generat ed.
When a call references a 64- bit mode call gat e, act ions t aken are ident ical t o t hose
t aken in 32- bit mode, wit h t he following except ions:
St ack pushes are made in eight - byt e increment s.
A 64- bit RI P is pushed ont o t he st ack.
Paramet er copying is not performed.
Use a mat ching far- ret urn inst ruct ion size for correct operat ion ( ret urns from 64- bit
calls must be performed wit h a 64- bit operand- size ret urn t o process t he st ack
correct ly) .
Figure 5-9. Call-Gate Descriptor in IA-32e Mode
31 8 7 0
P Offset in Segment 31:16
D
P
L
Type
0
4
31 16 15 0
Segment Selector Offset in Segment 15:00 0
.
0 0 1 1
P
DPL
Gate Valid
Descriptor Privilege Level
31 0
0
16
31 0
Offset in Segment 63:31 8
0 0 0 0
0
13 12 11 10 9 8 7
16 15 14 13 12 11
Reserved
Reserved
Type
5-22 Vol. 3
PROTECTION
5.8.4 Accessing a Code Segment Through a Call Gate
To access a call gat e, a far point er t o t he gat e is provided as a t arget operand in a
CALL or JMP inst ruct ion. The segment select or from t his point er ident ifies t he call
gat e ( see Figure 5- 10) ; t he offset from t he point er is required, but not used or
checked by t he processor. ( The offset can be set t o any value.)
When t he processor has accessed t he call gat e, it uses t he segment select or from t he
call gat e t o locat e t he segment descript or for t he dest inat ion code segment . ( This
segment descript or can be in t he GDT or t he LDT. ) I t t hen combines t he base address
from t he code- segment descript or wit h t he offset from t he call gat e t o form t he linear
address of t he procedure ent ry point in t he code segment .
As shown in Figure 5- 11, four different privilege levels are used t o check t he validit y
of a program cont rol t ransfer t hrough a call gat e:
The CPL ( current privilege level) .
The RPL ( request or' s privilege level) of t he call gat es select or.
The DPL ( descript or privilege level) of t he call gat e descript or.
The DPL of t he segment descript or of t he dest inat ion code segment .
The C flag ( conforming) in t he segment descript or for t he dest inat ion code segment
is also checked.
Figure 5-10. Call-Gate Mechanism
Offset Segment Selector
Far Pointer to Call Gate
Required but not used by processor
Call-Gate
Descriptor
Code-Segment
Descriptor
Descriptor Table
Offset
Base
Base
Offset
Base
Segment Selector
+
Procedure
Entry Point
Vol. 3 5-23
PROTECTION
The privilege checking rules are different depending on whet her t he cont rol t ransfer
was init iat ed wit h a CALL or a JMP inst ruct ion, as shown in Table 5- 1.
The DPL field of t he call- gat e descript or specifies t he numerically highest privilege
level from which a calling procedure can access t he call gat e; t hat is, t o access a call
gat e, t he CPL of a calling procedure must be equal t o or less t han t he DPL of t he call
gat e. For example, in Figure 5- 15, call gat e A has a DPL of 3. So calling procedures at
all CPLs ( 0 t hrough 3) can access t his call gat e, which includes calling procedures in
code segment s A, B, and C. Call gat e B has a DPL of 2, so only calling procedures at
a CPL or 0, 1, or 2 can access call gat e B, which includes calling procedures in code
Figure 5-11. Privilege Check for Control Transfer with Call Gate
Table 5-1. Privilege Check Rules for Call Gates
Instruction Privilege Check Rules
CALL CPL call gate DPL; RPL call gate DPL
Destination conforming code segment DPL CPL
Destination nonconforming code segment DPL CPL
JMP CPL call gate DPL; RPL call gate DPL
Destination conforming code segment DPL CPL
Destination nonconforming code segment DPL = CPL
CPL
RPL
DPL
DPL
Privilege
Check
Call Gate (Descriptor)
Destination Code-
CS Register
Call-Gate Selector
Segment Descriptor
5-24 Vol. 3
PROTECTION
segment s B and C. The dot t ed line shows t hat a calling procedure in code segment A
cannot access call gat e B.
The RPL of t he segment select or t o a call gat e must sat isfy t he same t est as t he CPL
of t he calling procedure; t hat is, t he RPL must be less t han or equal t o t he DPL of t he
call gat e. I n t he example in Figure 5- 15, a calling procedure in code segment C can
access call gat e B using gat e select or B2 or B1, but it could not use gat e select or B3
t o access call gat e B.
I f t he privilege checks bet ween t he calling procedure and call gat e are successful, t he
processor t hen checks t he DPL of t he code- segment descript or against t he CPL of t he
calling procedure. Here, t he privilege check rules vary bet ween CALL and JMP
inst ruct ions. Only CALL inst ruct ions can use call gat es t o t ransfer program cont rol t o
more privileged ( numerically lower privilege level) nonconforming code segment s;
t hat is, t o nonconforming code segment s wit h a DPL less t han t he CPL. A JMP inst ruc-
t ion can use a call gat e only t o t ransfer program cont rol t o a nonconforming code
segment wit h a DPL equal t o t he CPL. CALL and JMP inst ruct ion can bot h t ransfer
program cont rol t o a more privileged conforming code segment ; t hat is, t o a
conforming code segment wit h a DPL less t han or equal t o t he CPL.
I f a call is made t o a more privileged ( numerically lower privilege level) noncon-
forming dest inat ion code segment , t he CPL is lowered t o t he DPL of t he dest inat ion
code segment and a st ack swit ch occurs ( see Sect ion 5.8. 5, St ack Swit ching ) . I f a
call or j ump is made t o a more privileged conforming dest inat ion code segment , t he
CPL is not changed and no st ack swit ch occurs.
Vol. 3 5-25
PROTECTION
Call gat es allow a single code segment t o have procedures t hat can be accessed at
different privilege levels. For example, an operat ing syst em locat ed in a code
segment may have some services which are int ended t o be used by bot h t he oper-
at ing syst em and applicat ion soft ware ( such as procedures for handling charact er
I / O) . Call gat es for t hese procedures can be set up t hat allow access at all privilege
levels ( 0 t hrough 3) . More privileged call gat es ( wit h DPLs of 0 or 1) can t hen be set
up for ot her operat ing syst em services t hat are int ended t o be used only by t he oper-
at ing syst em ( such as procedures t hat init ialize device drivers) .
5.8.5 Stack Switching
Whenever a call gat e is used t o t ransfer program cont rol t o a more privileged
nonconforming code segment ( t hat is, when t he DPL of t he nonconforming dest ina-
t ion code segment is less t han t he CPL) , t he processor aut omat ically swit ches t o t he
st ack for t he dest inat ion code segment s privilege level. This st ack swit ching is
carried out t o prevent more privileged procedures from crashing due t o insufficient
st ack space. I t also prevent s less privileged procedures from int erfering ( by accident
or int ent ) wit h more privileged procedures t hrough a shared st ack.
Figure 5-12. Example of Accessing Call Gates At Various Privilege Levels
Code
Segment A
Stack Switch No Stack
Switch Occurs Occurs
Lowest Privilege
Highest Privilege
3
2
1
0
Call
Gate A
Code
Segment B
Call
Gate B
Code
Segment C
Code
Segment D
Code
Segment E
Nonconforming
Code Segment
Conforming
Code Segment
Gate Selector A
RPL=3
Gate Selector B1
RPL=2
Gate Selector B2
RPL=1
CPL=3
CPL=2
CPL=1
DPL=3
DPL=2
DPL=0 DPL=0
Gate Selector B3
RPL=3
5-26 Vol. 3
PROTECTION
Each t ask must define up t o 4 st acks: one for applicat ions code ( running at privilege
level 3) and one for each of t he privilege levels 2, 1, and 0 t hat are used. ( I f only t wo
privilege levels are used [ 3 and 0] , t hen only t wo st acks must be defined. ) Each of
t hese st acks is locat ed in a separat e segment and is ident ified wit h a segment
select or and an offset int o t he st ack segment ( a st ack point er) .
The segment select or and st ack point er for t he privilege level 3 st ack is locat ed in t he
SS and ESP regist ers, respect ively, when privilege- level- 3 code is being execut ed and
is aut omat ically st ored on t he called procedures st ack when a st ack swit ch occurs.
Point ers t o t he privilege level 0, 1, and 2 st acks are st ored in t he TSS for t he current ly
running t ask ( see Figure 7- 2) . Each of t hese point ers consist s of a segment select or
and a st ack point er ( loaded int o t he ESP regist er) . These init ial point ers are st rict ly
read- only values. The processor does not change t hem while t he t ask is running.
They are used only t o creat e new st acks when calls are made t o more privileged
levels ( numerically lower privilege levels) . These st acks are disposed of when a
ret urn is made from t he called procedure. The next t ime t he procedure is called, a
new st ack is creat ed using t he init ial st ack point er. ( The TSS does not specify a st ack
for privilege level 3 because t he processor does not allow a t ransfer of program
cont rol from a procedure running at a CPL of 0, 1, or 2 t o a procedure running at a
CPL of 3, except on a ret urn.)
The operat ing syst em is responsible for creat ing st acks and st ack- segment descrip-
t ors for all t he privilege levels t o be used and for loading init ial point ers for t hese
st acks int o t he TSS. Each st ack must be read/ writ e accessible ( as specified in t he
t ype field of it s segment descript or) and must cont ain enough space ( as specified in
t he limit field) t o hold t he following it ems:
The cont ent s of t he SS, ESP, CS, and EI P regist ers for t he calling procedure.
The paramet ers and t emporary variables required by t he called procedure.
The EFLAGS regist er and error code, when implicit calls are made t o an except ion
or int errupt handler.
The st ack will need t o require enough space t o cont ain many frames of t hese it ems,
because procedures oft en call ot her procedures, and an operat ing syst em may
support nest ing of mult iple int errupt s. Each st ack should be large enough t o allow for
t he worst case nest ing scenario at it s privilege level.
( I f t he operat ing syst em does not use t he processor s mult it asking mechanism, it st ill
must creat e at least one TSS for t his st ack- relat ed purpose. )
When a procedure call t hrough a call gat e result s in a change in privilege level, t he
processor performs t he following st eps t o swit ch st acks and begin execut ion of t he
called procedure at a new privilege level:
1. Uses t he DPL of t he dest inat ion code segment ( t he new CPL) t o select a point er
t o t he new st ack ( segment select or and st ack point er) from t he TSS.
2. Reads t he segment select or and st ack point er for t he st ack t o be swit ched t o from
t he current TSS. Any limit violat ions det ect ed while reading t he st ack- segment
select or, st ack point er, or st ack- segment descript or cause an invalid TSS ( # TS)
except ion t o be generat ed.
Vol. 3 5-27
PROTECTION
3. Checks t he st ack- segment descript or for t he proper privileges and t ype and
generat es an invalid TSS ( # TS) except ion if violat ions are det ect ed.
4. Temporarily saves t he current values of t he SS and ESP regist ers.
5. Loads t he segment select or and st ack point er for t he new st ack in t he SS and ESP
regist ers.
6. Pushes t he t emporarily saved values for t he SS and ESP regist ers ( for t he calling
procedure) ont o t he new st ack ( see Figure 5- 13) .
7. Copies t he number of paramet er specified in t he paramet er count field of t he call
gat e from t he calling procedures st ack t o t he new st ack. I f t he count is 0, no
paramet ers are copied.
8. Pushes t he ret urn inst ruct ion point er ( t he current cont ent s of t he CS and EI P
regist ers) ont o t he new st ack.
9. Loads t he segment select or for t he new code segment and t he new inst ruct ion
point er from t he call gat e int o t he CS and EI P regist ers, respect ively, and begins
execut ion of t he called procedure.
See t he descript ion of t he CALL inst ruct ion in Chapt er 3, I nst ruct ion Set Reference, in
t he I A- 32 I nt el Archit ect ure Soft ware Developers Manual, Volume 2, for a det ailed
descript ion of t he privilege level checks and ot her prot ect ion checks t hat t he
processor performs on a far call t hrough a call gat e.
The paramet er count field in a call gat e specifies t he number of dat a it ems ( up t o 31)
t hat t he processor should copy from t he calling procedures st ack t o t he st ack of t he
called procedure. I f more t han 31 dat a it ems need t o be passed t o t he called proce-
Figure 5-13. Stack Switching During an Interprivilege-Level Call
Parameter 1
Parameter 2
Parameter 3
Calling SS
Calling ESP
Parameter 1
Parameter 2
Parameter 3
Calling CS
Calling EIP
Called Procedures Stack
ESP
ESP
Calling Procedures Stack
5-28 Vol. 3
PROTECTION
dure, one of t he paramet ers can be a point er t o a dat a st ruct ure, or t he saved
cont ent s of t he SS and ESP regist ers may be used t o access paramet ers in t he old
st ack space. The size of t he dat a it ems passed t o t he called procedure depends on
t he call gat e size, as described in Sect ion 5. 8. 3, Call Gat es.
5.8.5.1 Stack Switching in 64-bit Mode
Alt hough prot ect ion- check rules for call gat es are unchanged from 32- bit mode,
st ack- swit ch changes in 64- bit mode are different .
When st acks are swit ched as part of a 64- bit mode privilege- level change t hrough a
call gat e, a new SS ( st ack segment ) descript or is not loaded; 64- bit mode only loads
an inner- level RSP from t he TSS. The new SS is forced t o NULL and t he SS select or s
RPL field is forced t o t he new CPL. The new SS is set t o NULL in order t o handle
nest ed far t ransfers ( CALLF, I NTn, int errupt s and except ions) . The old SS and RSP
are saved on t he new st ack.
On a subsequent RETF, t he old SS is popped from t he st ack and loaded int o t he SS
regist er. See Table 5- 2.
I n 64- bit mode, st ack operat ions result ing from a privilege- level- changing far call or
far ret urn are eight - byt es wide and change t he RSP by eight . The mode does not
support t he aut omat ic paramet er- copy feat ure found in 32- bit mode. The call- gat e
count field is ignored. Soft ware can access t he old st ack, if necessary, by referencing
t he old st ack- segment select or and st ack point er saved on t he new process st ack.
I n 64- bit mode, RETF is allowed t o load a NULL SS under cert ain condit ions. I f t he
t arget mode is 64- bit mode and t he t arget CPL< > 3, I RET allows SS t o be loaded wit h
a NULL select or. I f t he called procedure it self is int errupt ed, t he NULL SS is pushed on
t he st ack frame. On t he subsequent RETF, t he NULL SS on t he st ack act s as a flag t o
t ell t he processor not t o load a new SS descript or.
5.8.6 Returning from a Called Procedure
The RET inst ruct ion can be used t o perform a near ret urn, a far ret urn at t he same
privilege level, and a far ret urn t o a different privilege level. This inst ruct ion is
Table 5-2. 64-Bit-Mode Stack Layout After CALLF with CPL Change
32-bit Mode IA-32e mode
Old SS Selector +12 +24 Old SS Selector
Old ESP +8 +16 Old RSP
CS Selector +4 +8 Old CS Selector
EIP 0 ESP RSP 0 RIP
< 4 Bytes > < 8 Bytes >
Vol. 3 5-29
PROTECTION
int ended t o execut e ret urns from procedures t hat were called wit h a CALL inst ruc-
t ion. I t does not support ret urns from a JMP inst ruct ion, because t he JMP inst ruct ion
does not save a ret urn inst ruct ion point er on t he st ack.
A near ret urn only t ransfers program cont rol wit hin t he current code segment ; t here-
fore, t he processor performs only a limit check. When t he processor pops t he ret urn
inst ruct ion point er from t he st ack int o t he EI P regist er, it checks t hat t he point er does
not exceed t he limit of t he current code segment .
On a far ret urn at t he same privilege level, t he processor pops bot h a segment
select or for t he code segment being ret urned t o and a ret urn inst ruct ion point er from
t he st ack. Under normal condit ions, t hese point ers should be valid, because t hey
were pushed on t he st ack by t he CALL inst ruct ion. However, t he processor performs
privilege checks t o det ect sit uat ions where t he current procedure might have alt ered
t he point er or failed t o maint ain t he st ack properly.
A far ret urn t hat requires a privilege- level change is only allowed when ret urning t o a
less privileged level ( t hat is, t he DPL of t he ret urn code segment is numerically
great er t han t he CPL) . The processor uses t he RPL field from t he CS regist er value
saved for t he calling procedure ( see Figure 5- 13) t o det ermine if a ret urn t o a numer-
ically higher privilege level is required. I f t he RPL is numerically great er ( less privi-
leged) t han t he CPL, a ret urn across privilege levels occurs.
The processor performs t he following st eps when performing a far ret urn t o a calling
procedure ( see Figures 6- 2 and 6- 4 in t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 1, for an illust rat ion of t he st ack cont ent s prior t o
and aft er a ret urn) :
1. Checks t he RPL field of t he saved CS regist er value t o det ermine if a privilege
level change is required on t he ret urn.
2. Loads t he CS and EI P regist ers wit h t he values on t he called procedures st ack.
( Type and privilege level checks are performed on t he code- segment descript or
and RPL of t he code- segment select or. )
3. ( I f t he RET inst ruct ion includes a paramet er count operand and t he ret urn
requires a privilege level change. ) Adds t he paramet er count ( in byt es obt ained
from t he RET inst ruct ion) t o t he current ESP regist er value ( aft er popping t he CS
and EI P values) , t o st ep past t he paramet ers on t he called procedures st ack. The
result ing value in t he ESP regist er point s t o t he saved SS and ESP values for t he
calling procedures st ack. ( Not e t hat t he byt e count in t he RET inst ruct ion must
be chosen t o mat ch t he paramet er count in t he call gat e t hat t he calling
procedure referenced when it made t he original call mult iplied by t he size of t he
paramet ers. )
4. ( I f t he ret urn requires a privilege level change. ) Loads t he SS and ESP regist ers
wit h t he saved SS and ESP values and swit ches back t o t he calling procedures
st ack. The SS and ESP values for t he called procedures st ack are discarded. Any
limit violat ions det ect ed while loading t he st ack- segment select or or st ack
point er cause a general- prot ect ion except ion ( # GP) t o be generat ed. The new
st ack- segment descript or is also checked for t ype and privilege violat ions.
5-30 Vol. 3
PROTECTION
5. ( I f t he RET inst ruct ion includes a paramet er count operand. ) Adds t he paramet er
count ( in byt es obt ained from t he RET inst ruct ion) t o t he current ESP regist er
value, t o st ep past t he paramet ers on t he calling procedures st ack. The result ing
ESP value is not checked against t he limit of t he st ack segment . I f t he ESP value
is beyond t he limit , t hat fact is not recognized unt il t he next st ack operat ion.
6. ( I f t he ret urn requires a privilege level change. ) Checks t he cont ent s of t he DS,
ES, FS, and GS segment regist ers. I f any of t hese regist ers refer t o segment s
whose DPL is less t han t he new CPL ( excluding conforming code segment s) , t he
segment regist er is loaded wit h a null segment select or.
See t he descript ion of t he RET inst ruct ion in Chapt er 4 of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 2B, for a det ailed descript ion of
t he privilege level checks and ot her prot ect ion checks t hat t he processor performs on
a far ret urn.
5.8.7 Performing Fast Calls to System Procedures with the
SYSENTER and SYSEXIT Instructions
The SYSENTER and SYSEXI T inst ruct ions were int roduced int o t he I A- 32 archit ect ure
in t he Pent ium I I processors for t he purpose of providing a fast ( low overhead) mech-
anism for calling operat ing syst em or execut ive procedures. SYSENTER is int ended
for use by user code running at privilege level 3 t o access operat ing syst em or exec-
ut ive procedures running at privilege level 0. SYSEXI T is int ended for use by privilege
level 0 operat ing syst em or execut ive procedures for fast ret urns t o privilege level 3
user code. SYSENTER can be execut ed from privilege levels 3, 2, 1, or 0; SYSEXI T
can only be execut ed from privilege level 0.
The SYSENTER and SYSEXI T inst ruct ions are companion inst ruct ions, but t hey do not
const it ut e a call/ ret urn pair. This is because SYSENTER does not save any st at e infor-
mat ion for use by SYSEXI T on a ret urn.
The t arget inst ruct ion and st ack point er for t hese inst ruct ions are not specified
t hrough inst ruct ion operands. I nst ead, t hey are specified t hrough paramet ers
ent ered in MSRs and general- purpose regist ers.
For SYSENTER, t arget fields are generat ed using t he following sources:
Tar get code segment Reads t his from I A32_SYSENTER_CS.
Tar get i nst r uct i on Reads t his from I A32_SYSENTER_EI P.
St ack segment Comput ed by adding 8 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Reads t his from t he I A32_SYSENTER_ESP.
For SYSEXI T, t arget fields are generat ed using t he following sources:
Tar get code segment Comput ed by adding 16 t o t he value in t he
I A32_SYSENTER_CS.
Tar get i nst r uct i on Reads t his from EDX.
Vol. 3 5-31
PROTECTION
St ack segment Comput ed by adding 24 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Reads t his from ECX.
The SYSENTER and SYSEXI T inst ruct ions preform fast calls and ret urns because
t hey force t he processor int o a predefined privilege level 0 st at e when SYSENTER is
execut ed and int o a predefined privilege level 3 st at e when SYSEXI T is execut ed. By
forcing predefined and consist ent processor st at es, t he number of privilege checks
ordinarily required t o perform a far call t o anot her privilege levels are great ly
reduced. Also, by predefining t he t arget cont ext st at e in MSRs and general- purpose
regist ers eliminat es all memory accesses except when fet ching t he t arget code.
Any addit ional st at e t hat needs t o be saved t o allow a ret urn t o t he calling procedure
must be saved explicit ly by t he calling procedure or be predefined t hrough program-
ming convent ions.
5.8.7.1 SYSENTER and SYSEXIT Instructions in IA-32e Mode
For I nt el 64 processors, t he SYSENTER and SYSEXI T inst ruct ions are enhanced t o
allow fast syst em calls from user code running at privilege level 3 ( in compat ibilit y
mode or 64- bit mode) t o 64- bit execut ive procedures running at privilege level 0.
I A32_SYSENTER_EI P MSR and I A32_SYSENTER_ESP MSR are expanded t o hold
64- bit addresses. I f I A- 32e mode is inact ive, only t he lower 32- bit addresses st ored
in t hese MSRs are used. I f 64- bit mode is act ive, addresses st ored in
I A32_SYSENTER_EI P and I A32_SYSENTER_ESP must be canonical. Not e t hat , in
64- bit mode, I A32_SYSENTER_CS must not cont ain a NULL select or.
When SYSENTER t ransfers cont rol, t he following fields are generat ed and bit s set :
Tar get code segment Reads non- NULL select or from I A32_SYSENTER_CS.
New CS at t r i but es CS base = 0, CS limit = FFFFFFFFH.
Tar get i nst r uct i on Reads 64- bit canonical address from
I A32_SYSENTER_EI P.
St ack segment Comput ed by adding 8 t o t he value from
I A32_SYSENTER_CS.
St ack poi nt er Reads 64- bit canonical address from I A32_SYSENTER_ESP.
New SS at t r i but es SS base = 0, SS limit = FFFFFFFFH.
When t he SYSEXI T inst ruct ion t ransfers cont rol t o 64- bit mode user code using
REX. W, t he following fields are generat ed and bit s set :
Tar get code segment Comput ed by adding 32 t o t he value in
I A32_SYSENTER_CS.
New CS at t r i but es L- bit = 1 ( go t o 64- bit mode) .
Tar get i nst r uct i on Reads 64- bit canonical address in RDX.
St ack segment Comput ed by adding 40 t o t he value of I A32_SYSENTER_CS.
St ack poi nt er Updat e RSP using 64- bit canonical address in RCX.
5-32 Vol. 3
PROTECTION
When SYSEXI T t ransfers cont rol t o compat ibilit y mode user code when t he operand
size at t ribut e is 32 bit s, t he following fields are generat ed and bit s set :
Tar get code segment Comput ed by adding 16 t o t he value in
I A32_SYSENTER_CS.
New CS at t r i but es L- bit = 0 ( go t o compat ibilit y mode) .
Tar get i nst r uct i on Fet ch t he t arget inst ruct ion from 32- bit address in EDX.
St ack segment Comput ed by adding 24 t o t he value in I A32_SYSENTER_CS.
St ack poi nt er Updat e ESP from 32- bit address in ECX.
5.8.8 Fast System Calls in 64-bit Mode
The SYSCALL and SYSRET inst ruct ions are designed for operat ing syst ems t hat use a
flat memory model ( segment at ion is not used) . The inst ruct ions, along wit h
SYSENTER and SYSEXI T, are suit ed for I A- 32e mode operat ion. SYSCALL and
SYSRET, however, are not support ed in compat ibilit y mode. Use CPUI D t o check if
SYSCALL and SYSRET are available ( CPUI D. 80000001H. EDX[ bit 11] = 1) .
SYSCALL is int ended for use by user code running at privilege level 3 t o access oper-
at ing syst em or execut ive procedures running at privilege level 0. SYSRET is
int ended for use by privilege level 0 operat ing syst em or execut ive procedures for
fast ret urns t o privilege level 3 user code.
St ack point ers for SYSCALL/ SYSRET are not specified t hrough model specific regis-
t ers. The clearing of bit s in RFLAGS is programmable rat her t han fixed.
SYSCALL/ SYSRET save and rest ore t he RFLAGS regist er.
For SYSCALL, t he processor saves RFLAGS int o R11 and t he RI P of t he next inst ruc-
t ion int o RCX; it t hen get s t he privilege- level 0 t arget inst ruct ion and st ack point er
from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 47: 32] .
Tar get i nst r uct i on Reads a 64- bit canonical address from I A32_LSTAR.
St ack segment Comput ed by adding 8 t o t he value in I A32_STAR[ 47: 32] .
Sy st em f l ags The processor set s RFLAGS t o t he logical- AND of it s current
value wit h t he complement of t he value in t he I A32_FMASK MSR.
When SYSRET t ransfers cont rol t o 64- bit mode user code using REX. W, t he processor
get s t he privilege level 3 t arget inst ruct ion and st ack point er from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 63: 48] +
16.
Tar get i nst r uct i on Copies t he value in RCX int o RI P.
St ack segment I A32_STAR[ 63: 48] + 8.
EFLAGS Loaded from R11.
Vol. 3 5-33
PROTECTION
When SYSRET t ransfers cont rol t o 32- bit mode user code using a 32- bit operand size,
t he processor get s t he privilege level 3 t arget inst ruct ion and st ack point er from:
Tar get code segment Reads a non- NULL select or from I A32_STAR[ 63: 48] .
Tar get i nst r uct i on Copies t he value in ECX int o EI P.
St ack segment I A32_STAR[ 63: 48] + 8.
EFLAGS Loaded from R11.
I t is t he responsibilit y of t he OS t o ensure t he descript ors in t he GDT/ LDT correspond
t o t he select ors loaded by SYSCALL/ SYSRET ( consist ent wit h t he base, limit , and
at t ribut e values forced by t he inst ruct ions) .
Any address writ t en t o I A32_LSTAR is first checked by WRMSR t o ensure canonical
form. I f an address is not canonical, an except ion is generat ed ( # GP) .
See Figure 5- 14 for t he layout of I A32_STAR, I A32_LSTAR and I A32_FMASK.
5.9 PRIVILEGED INSTRUCTIONS
Some of t he syst em inst ruct ions ( called privileged inst ruct ions ) are prot ect ed from
use by applicat ion programs. The privileged inst ruct ions cont rol syst em funct ions
( such as t he loading of syst em regist ers) . They can be execut ed only when t he CPL is
0 ( most privileged) . I f one of t hese inst ruct ions is execut ed when t he CPL is not 0, a
Figure 5-14. MSRs Used by SYSCALL and SYSRET
63 32 31 0
63 0
63 0
Target RIP for 64-bit Mode Calling Program
SYSRET CS and SS SYSCALL CS and SS
48 47
IA32_STAR
IA32_LSTAR
IA32_FMASK
32 31
SYSCALL EFLAGS Mask
Reserved
Reserved
5-34 Vol. 3
PROTECTION
general- prot ect ion except ion ( # GP) is generat ed. The following syst em inst ruct ions
are privileged inst ruct ions:
LGDT Load GDT regist er.
LLDT Load LDT regist er.
LTR Load t ask regist er.
LI DT Load I DT regist er.
MOV ( cont rol regist ers) Load and st ore cont rol regist ers.
LMSW Load machine st at us word.
CLTS Clear t ask- swit ched flag in regist er CR0.
MOV ( debug regist ers) Load and st ore debug regist ers.
I NVD I nvalidat e cache, wit hout writ eback.
WBI NVD I nvalidat e cache, wit h writ eback.
I NVLPG I nvalidat e TLB ent ry.
HLT Halt processor.
RDMSR Read Model- Specific Regist ers.
WRMSR Writ e Model- Specific Regist ers.
RDPMC Read Performance- Monit oring Count er.
RDTSC Read Time- St amp Count er.
Some of t he privileged inst ruct ions are available only in t he more recent families of
I nt el 64 and I A- 32 processors ( see Sect ion 19. 13, New I nst ruct ions I n t he Pent ium
and Lat er I A- 32 Processors ) .
The PCE and TSD flags in regist er CR4 ( bit s 4 and 2, respect ively) enable t he RDPMC
and RDTSC inst ruct ions, respect ively, t o be execut ed at any CPL.
5.10 POINTER VALIDATION
When operat ing in prot ect ed mode, t he processor validat es all point ers t o enforce
prot ect ion bet ween segment s and maint ain isolat ion bet ween privilege levels.
Point er validat ion consist s of t he following checks:
1. Checking access right s t o det ermine if t he segment t ype is compat ible wit h it s
use.
2. Checking read/ writ e right s.
3. Checking if t he point er offset exceeds t he segment limit .
4. Checking if t he supplier of t he point er is allowed t o access t he segment .
5. Checking t he offset alignment .
Vol. 3 5-35
PROTECTION
The processor aut omat ically performs first , second, and t hird checks during inst ruc-
t ion execut ion. Soft ware must explicit ly request t he fourt h check by issuing an ARPL
inst ruct ion. The fift h check ( offset alignment ) is performed aut omat ically at privilege
level 3 if alignment checking is t urned on. Offset alignment does not affect isolat ion
of privilege levels.
5.10.1 Checking Access Rights (LAR Instruction)
When t he processor accesses a segment using a far point er, it performs an access
right s check on t he segment descript or point ed t o by t he far point er. This check is
performed t o det ermine if t ype and privilege level ( DPL) of t he segment descript or
are compat ible wit h t he operat ion t o be performed. For example, when making a far
call in prot ect ed mode, t he segment - descript or t ype must be for a conforming or
nonconforming code segment , a call gat e, a t ask gat e, or a TSS. Then, if t he call is t o
a nonconforming code segment , t he DPL of t he code segment must be equal t o t he
CPL, and t he RPL of t he code segment s segment select or must be less t han or equal
t o t he DPL. I f t ype or privilege level are found t o be incompat ible, t he appropriat e
except ion is generat ed.
To prevent t ype incompat ibilit y except ions from being generat ed, soft ware can check
t he access right s of a segment descript or using t he LAR ( load access right s) inst ruc-
t ion. The LAR inst ruct ion specifies t he segment select or for t he segment descript or
whose access right s are t o be checked and a dest inat ion regist er. The inst ruct ion t hen
performs t he following operat ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code, dat a, LDT, call gat e, t ask gat e, or
TSS segment - descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or are less t han or equal t o t he DPL) .
5. I f t he privilege level and t ype checks pass, loads t he second doubleword of t he
segment descript or int o t he dest inat ion regist er ( masked by t he value
00FXFF00H, where X indicat es t hat t he corresponding 4 bit s are undefined) and
set s t he ZF flag in t he EFLAGS regist er. I f t he segment select or is not visible at
t he current privilege level or is an invalid t ype for t he LAR inst ruct ion, t he
inst ruct ion does not modify t he dest inat ion regist er and clears t he ZF flag.
Once loaded in t he dest inat ion regist er, soft ware can preform addit ional checks on
t he access right s informat ion.
5-36 Vol. 3
PROTECTION
5.10.2 Checking Read/Write Rights (VERR and VERW Instructions)
When t he processor accesses any code or dat a segment it checks t he read/ writ e priv-
ileges assigned t o t he segment t o verify t hat t he int ended read or writ e operat ion is
allowed. Soft ware can check read/ writ e right s using t he VERR ( verify for reading)
and VERW ( verify for writ ing) inst ruct ions. Bot h t hese inst ruct ions specify t he
segment select or for t he segment being checked. The inst ruct ions t hen perform t he
following operat ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code or dat a- segment descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or are less t han or equal t o t he DPL) .
5. Checks t hat t he segment is readable ( for t he VERR inst ruct ion) or writ able ( for
t he VERW) inst ruct ion.
The VERR inst ruct ion set s t he ZF flag in t he EFLAGS regist er if t he segment is visible
at t he CPL and readable; t he VERW set s t he ZF flag if t he segment is visible and writ -
able. ( Code segment s are never writ able. ) The ZF flag is cleared if any of t hese
checks fail.
5.10.3 Checking That the Pointer Offset Is Within Limits (LSL
Instruction)
When t he processor accesses any segment it performs a limit check t o insure t hat t he
offset is wit hin t he limit of t he segment . Soft ware can perform t his limit check using
t he LSL ( load segment limit ) inst ruct ion. Like t he LAR inst ruct ion, t he LSL inst ruct ion
specifies t he segment select or for t he segment descript or whose limit is t o be
checked and a dest inat ion regist er. The inst ruct ion t hen performs t he following oper-
at ions:
1. Check t hat t he segment select or is not null.
2. Checks t hat t he segment select or point s t o a segment descript or t hat is wit hin
t he descript or t able limit ( GDT or LDT) .
3. Checks t hat t he segment descript or is a code, dat a, LDT, or TSS segment -
descript or t ype.
4. I f t he segment is not a conforming code segment , checks if t he segment
descript or is visible at t he CPL ( t hat is, if t he CPL and t he RPL of t he segment
select or less t han or equal t o t he DPL) .
5. I f t he privilege level and t ype checks pass, loads t he unscrambled limit ( t he limit
scaled according t o t he set t ing of t he G flag in t he segment descript or) int o t he
Vol. 3 5-37
PROTECTION
dest inat ion regist er and set s t he ZF flag in t he EFLAGS regist er. I f t he segment
select or is not visible at t he current privilege level or is an invalid t ype for t he LSL
inst ruct ion, t he inst ruct ion does not modify t he dest inat ion regist er and clears
t he ZF flag.
Once loaded in t he dest inat ion regist er, soft ware can compare t he segment limit wit h
t he offset of a point er.
5.10.4 Checking Caller Access Privileges (ARPL Instruction)
The request or s privilege level ( RPL) field of a segment select or is int ended t o carry
t he privilege level of a calling procedure ( t he calling procedures CPL) t o a called
procedure. The called procedure t hen uses t he RPL t o det ermine if access t o a
segment is allowed. The RPL is said t o weaken t he privilege level of t he called
procedure t o t hat of t he RPL.
Operat ing- syst em procedures t ypically use t he RPL t o prevent less privileged appli-
cat ion programs from accessing dat a locat ed in more privileged segment s. When an
operat ing- syst em procedure ( t he called procedure) receives a segment select or from
an applicat ion program ( t he calling procedure) , it set s t he segment select or s RPL t o
t he privilege level of t he calling procedure. Then, when t he operat ing syst em uses
t he segment select or t o access it s associat ed segment , t he processor performs priv-
ilege checks using t he calling procedures privilege level ( st ored in t he RPL) rat her
t han t he numerically lower privilege level ( t he CPL) of t he operat ing- syst em proce-
dure. The RPL t hus insures t hat t he operat ing syst em does not access a segment on
behalf of an applicat ion program unless t hat program it self has access t o t he
segment .
Figure 5- 15 shows an example of how t he processor uses t he RPL field. I n t his
example, an applicat ion program ( locat ed in code segment A) possesses a segment
select or ( segment select or D1) t hat point s t o a privileged dat a st ruct ure ( t hat is, a
dat a st ruct ure locat ed in a dat a segment D at privilege level 0) .
The applicat ion program cannot access dat a segment D, because it does not have
sufficient privilege, but t he operat ing syst em ( locat ed in code segment C) can. So, in
an at t empt t o access dat a segment D, t he applicat ion program execut es a call t o t he
operat ing syst em and passes segment select or D1 t o t he operat ing syst em as a
paramet er on t he st ack. Before passing t he segment select or, t he ( well behaved)
applicat ion program set s t he RPL of t he segment select or t o it s current privilege level
( which in t his example is 3) . I f t he operat ing syst em at t empt s t o access dat a
segment D using segment select or D1, t he processor compares t he CPL ( which is
now 0 following t he call) , t he RPL of segment select or D1, and t he DPL of dat a
segment D ( which is 0) . Since t he RPL is great er t han t he DPL, access t o dat a
segment D is denied. The processor s prot ect ion mechanism t hus prot ect s dat a
segment D from access by t he operat ing syst em, because applicat ion programs priv-
ilege level ( represent ed by t he RPL of segment select or B) is great er t han t he DPL of
dat a segment D.
5-38 Vol. 3
PROTECTION
Now assume t hat inst ead of set t ing t he RPL of t he segment select or t o 3, t he appli-
cat ion program set s t he RPL t o 0 ( segment select or D2) . The operat ing syst em can
now access dat a segment D, because it s CPL and t he RPL of segment select or D2 are
bot h equal t o t he DPL of dat a segment D.
Because t he applicat ion program is able t o change t he RPL of a segment select or t o
any value, it can pot ent ially use a procedure operat ing at a numerically lower privi-
lege level t o access a prot ect ed dat a st ruct ure. This abilit y t o lower t he RPL of a
segment select or breaches t he processor s prot ect ion mechanism.
Because a called procedure cannot rely on t he calling procedure t o set t he RPL
correct ly, operat ing- syst em procedures ( execut ing at numerically lower privilege-
levels) t hat receive segment select ors from numerically higher privilege- level proce-
dures need t o t est t he RPL of t he segment select or t o det ermine if it is at t he appro-
priat e level. The ARPL ( adj ust request ed privilege level) inst ruct ion is provided for
t his purpose. This inst ruct ion adj ust s t he RPL of one segment select or t o mat ch t hat
of anot her segment select or.
Figure 5-15. Use of RPL to Weaken Privilege Level of Called Procedure
Passed as a
parameter on
the stack.
Access
allowed
Access
allowed
Application Program
Operating
System
Lowest Privilege
Highest Privilege
3
2
1
0
Data
Segment D
not
Segment Sel. D1
RPL=3
Segment Sel. D2
RPL=0
Gate Selector B
RPL=3
Code
Segment A
CPL=3
Code
Segment C
DPL=0
Call
Gate B
DPL=3
DPL=0
Vol. 3 5-39
PROTECTION
The example in Figure 5- 15 demonst rat es how t he ARPL inst ruct ion is int ended t o be
used. When t he operat ing- syst em receives segment select or D2 from t he applicat ion
program, it uses t he ARPL inst ruct ion t o compare t he RPL of t he segment select or
wit h t he privilege level of t he applicat ion program ( represent ed by t he code- segment
select or pushed ont o t he st ack) . I f t he RPL is less t han applicat ion programs privi-
lege level, t he ARPL inst ruct ion changes t he RPL of t he segment select or t o mat ch t he
privilege level of t he applicat ion pr ogram ( segment select or D1) . Using t his inst r uc-
t ion t hus pr event s a pr ocedur e r unning at a numer ically higher privilege level from
accessing numerically lower privilege- level ( more privileged) segment s by lowering
t he RPL of a segment select or.
Not e t hat t he privilege level of t he applicat ion program can be det ermined by reading
t he RPL field of t he segment select or for t he applicat ion- programs code segment .
This segment select or is st ored on t he st ack as part of t he call t o t he operat ing
syst em. The operat ing syst em can copy t he segment sel ect or f r om t he st ack i nt o a
r egi st er f or use as an operand f or t he ARPL i nst r uct i on.
5.10.5 Checking Alignment
When t he CPL is 3, alignment of memory references can be checked by set t ing t he
AM flag in t he CR0 regist er and t he AC flag in t he EFLAGS regist er. Unaligned memory
references generat e alignment except ions ( # AC) . The processor does not generat e
alignment except ions when operat ing at privilege level 0, 1, or 2. See Table 6- 7 for a
descript ion of t he alignment requirement s when alignment checking is enabled.
5.11 PAGE-LEVEL PROTECTION
Page- level prot ect ion can be used alone or applied t o segment s. When page- level
prot ect ion is used wit h t he flat memory model, it allows supervisor code and dat a
( t he operat ing syst em or execut ive) t o be prot ect ed from user code and dat a ( appli-
cat ion programs) . I t also allows pages cont aining code t o be writ e prot ect ed. When
t he segment - and page- level prot ect ion are combined, page- level read/ writ e prot ec-
t ion allows more prot ect ion granularit y wit hin segment s.
Wit h page- level prot ect ion ( as wit h segment - level prot ect ion) each memory refer-
ence is checked t o verify t hat prot ect ion checks are sat isfied. All checks are made
before t he memory cycle is st art ed, and any violat ion prevent s t he cycle from
st art ing and result s in a page- fault except ion being generat ed. Because checks are
performed in parallel wit h address t ranslat ion, t here is no performance penalt y.
The processor performs t wo page- level prot ect ion checks:
Rest rict ion of addressable domain ( supervisor and user modes) .
Page t ype ( read only or read/ writ e) .
Violat ions of eit her of t hese checks result s in a page- fault except ion being generat ed.
See Chapt er 6, I nt errupt 14Page- Fault Except ion ( # PF) , for an explanat ion of t he
5-40 Vol. 3
PROTECTION
page- fault except ion mechanism. This chapt er describes t he prot ect ion violat ions
which lead t o page- fault except ions.
5.11.1 Page-Protection Flags
Prot ect ion informat ion for pages is cont ained in t wo flags in a paging- st ruct ure ent ry
( see Chapt er 4) : t he read/ writ e flag ( bit 1) and t he user/ supervisor flag ( bit 2) . The
prot ect ion checks use t he flags in all paging st ruct ures.
5.11.2 Restricting Addressable Domain
The page- level prot ect ion mechanism allows rest rict ing access t o pages based on
t wo privilege levels:
Supervisor mode ( U/ S flag is 0) ( Most privileged) For t he operat ing syst em or
execut ive, ot her syst em soft ware ( such as device drivers) , and prot ect ed syst em
dat a ( such as page t ables) .
User mode ( U/ S flag is 1) ( Least privileged) For applicat ion code and dat a.
The segment privilege levels map t o t he page privilege levels as follows. I f t he
processor is current ly operat ing at a CPL of 0, 1, or 2, it is in supervisor mode; if it is
operat ing at a CPL of 3, it is in user mode. When t he processor is in supervisor mode,
it can access all pages; when in user mode, it can access only user- level pages. ( Not e
t hat t he WP flag in cont rol regist er CR0 modifies t he supervisor permissions, as
described in Sect ion 5. 11. 3, Page Type. )
Not e t hat t o use t he page- level prot ect ion mechanism, code and dat a segment s must
be set up for at least t wo segment - based privilege levels: level 0 for supervisor code
and dat a segment s and level 3 for user code and dat a segment s. ( I n t his model, t he
st acks are placed in t he dat a segment s. ) To minimize t he use of segment s, a flat
memory model can be used ( see Sect ion 3.2. 1, Basic Flat Model ) .
Here, t he user and supervisor code and dat a segment s all begin at address zero in
t he linear address space and overlay each ot her. Wit h t his arrangement , operat ing-
syst em code ( running at t he supervisor level) and applicat ion code ( running at t he
user level) can execut e as if t here are no segment s. Prot ect ion bet ween operat ing-
syst em and applicat ion code and dat a is provided by t he processor s page- level
prot ect ion mechanism.
5.11.3 Page Type
The page- level prot ect ion mechanism recognizes t wo page t ypes:
Read- only access ( R/ W flag is 0) .
Read/ writ e access ( R/ W flag is 1) .
Vol. 3 5-41
PROTECTION
When t he processor is in supervisor mode and t he WP flag in regist er CR0 is clear ( it s
st at e following reset init ializat ion) , all pages are bot h readable and writ able ( writ e-
prot ect ion is ignored) . When t he processor is in user mode, it can writ e only t o user-
mode pages t hat are read/ writ e accessible. User- mode pages which are read/ writ e or
read- only are readable; supervisor- mode pages are neit her readable nor writ able
from user mode. A page- fault except ion is generat ed on any at t empt t o violat e t he
prot ect ion rules.
St art ing wit h t he P6 family, I nt el processors allow user- mode pages t o be writ e-
prot ect ed against supervisor- mode access. Set t ing CR0. WP = 1 enables supervisor-
mode sensit ivit y t o writ e prot ect ed pages. I f CR0. WP = 1, read- only pages are not
writ able from any privilege level. This supervisor writ e- prot ect feat ure is useful for
implement ing a copy- on- writ e st rat egy used by some operat ing syst ems, such as
UNI X* , for t ask creat ion ( also called forking or spawning) . When a new t ask is
creat ed, it is possible t o copy t he ent ire address space of t he parent t ask. This gives
t he child t ask a complet e, duplicat e set of t he parent ' s segment s and pages. An alt er-
nat ive copy- on- writ e st rat egy saves memory space and t ime by mapping t he child' s
segment s and pages t o t he same segment s and pages used by t he parent t ask. A
privat e copy of a page get s creat ed only when one of t he t asks writ es t o t he page. By
using t he WP flag and marking t he shared pages as read- only, t he supervisor can
det ect an at t empt t o writ e t o a page, and can copy t he page at t hat t ime.
5.11.4 Combining Protection of Both Levels of Page Tables
For any one page, t he prot ect ion at t ribut es of it s page- direct ory ent ry ( first - level
page t able) may differ from t hose of it s page- t able ent ry ( second- level page t able) .
The processor checks t he prot ect ion for a page in bot h it s page- direct ory and t he
page- t able ent ries. Table 5- 3 shows t he prot ect ion provided by t he possible combina-
t ions of prot ect ion at t ribut es when t he WP flag is clear.
5.11.5 Overrides to Page Protection
The following t ypes of memory accesses are checked as if t hey are privilege- level 0
accesses, regardless of t he CPL at which t he processor is current ly operat ing:
Access t o segment descript ors in t he GDT, LDT, or I DT.
Access t o an inner- privilege- level st ack during an int er- privilege- level call or a
call t o in except ion or int errupt handler, when a change of privilege level occurs.
5.12 COMBINING PAGE AND SEGMENT PROTECTION
When paging is enabled, t he processor evaluat es segment prot ect ion first , t hen
evaluat es page prot ect ion. I f t he processor det ect s a prot ect ion violat ion at eit her
t he segment level or t he page level, t he memory access is not carried out and an
5-42 Vol. 3
PROTECTION
except ion is generat ed. I f an except ion is generat ed by segment at ion, no paging
except ion is generat ed.
Page- level prot ect ions cannot be used t o override segment - level prot ect ion. For
example, a code segment is by definit ion not writ able. I f a code segment is paged,
set t ing t he R/ W flag for t he pages t o read- writ e does not make t he pages writ able.
At t empt s t o writ e int o t he pages will be blocked by segment - level prot ect ion checks.
Page- level prot ect ion can be used t o enhance segment - level prot ect ion. For
example, if a large read- writ e dat a segment is paged, t he page- prot ect ion mecha-
nism can be used t o writ e- prot ect individual pages.
Table 5-3. Combined Page-Directory and Page-Table Protection
Page-Directory Entry Page-Table Entry Combined Effect
Privilege Access Type Privilege Access Type Privilege Access Type
User Read-Only User Read-Only User Read-Only
User Read-Only User Read-Write User Read-Only
User Read-Write User Read-Only User Read-Only
User Read-Write User Read-Write User Read/Write
User Read-Only Supervisor Read-Only Supervisor Read/Write*
User Read-Only Supervisor Read-Write Supervisor Read/Write*
User Read-Write Supervisor Read-Only Supervisor Read/Write*
User Read-Write Supervisor Read-Write Supervisor Read/Write
Supervisor Read-Only User Read-Only Supervisor Read/Write*
Supervisor Read-Only User Read-Write Supervisor Read/Write*
Supervisor Read-Write User Read-Only Supervisor Read/Write*
Supervisor Read-Write User Read-Write Supervisor Read/Write
Supervisor Read-Only Supervisor Read-Only Supervisor Read/Write*
Supervisor Read-Only Supervisor Read-Write Supervisor Read/Write*
Supervisor Read-Write Supervisor Read-Only Supervisor Read/Write*
Supervisor Read-Write Supervisor Read-Write Supervisor Read/Write
NOTE:
* If CR0.WP = 1, access type is determined by the R/W flags of the page-directory and page-table
entries. IF CR0.WP = 0, supervisor privilege permits read-write access.
Vol. 3 5-43
PROTECTION
5.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE
BIT
I n addit ion t o page- level prot ect ion offered by t he U/ S and R/ W flags, paging st ruc-
t ures used wit h PAE paging and I A- 32e paging ( see Chapt er 4) provide t he execut e-
disable bit . This bit offers addit ional prot ect ion for dat a pages.
An I nt el 64 or I A- 32 processor wit h t he execut e- disable bit capabilit y can prevent
dat a pages from being used by malicious soft ware t o execut e code. This capabilit y is
provided in:
32- bit prot ect ed mode wit h PAE enabled.
I A- 32e mode.
While t he execut e- disable bit capabilit y does not int roduce new inst ruct ions, it does
require operat ing syst ems t o use a PAE- enabled environment and est ablish a page-
granular prot ect ion policy for memory pages.
I f t he execut e- disable bit of a memory page is set , t hat page can be used only as
dat a. An at t empt t o execut e code from a memory page wit h t he execut e- disable bit
set causes a page- fault except ion.
The execut e- disable capabilit y is support ed only wit h PAE paging and I A- 32e paging.
I t is not support ed wit h 32- bit paging. Exist ing page- level prot ect ion mechanisms
( see Sect ion 5. 11, Page- Level Prot ect ion ) cont inue t o apply t o memory pages inde-
pendent of t he execut e- disable set t ing.
5.13.1 Detecting and Enabling the Execute-Disable Capability
Soft ware can det ect t he presence of t he execut e- disable capabilit y using t he CPUI D
inst ruct ion. CPUI D.80000001H: EDX. NX [ bit 20] = 1 indicat es t he capabilit y is avail-
able.
I f t he capabilit y is available, soft ware can enable it by set t ing I A32_EFER. NXE[ bit 11]
t o 1. I A32_EFER is available if CPUI D.80000001H. EDX[ bit 20 or 29] = 1.
I f t he execut e- disable capabilit y is not available, a writ e t o set I A32_EFER. NXE
produces a # GP except ion. See Table 5- 4.
Table 5-4. Extended Feature Enable MSR (IA32_EFER)
63:12 11 10 9 8 7:1 0
Reserved Execute-
disable bit
enable (NXE)
IA-32e mode
active (LMA)
Reserve
d
IA-32e mode
enable (LME)
Reserve
d
SysCall enable
(SCE)
5-44 Vol. 3
PROTECTION
5.13.2 Execute-Disable Page Protection
The execut e- disable bit in t he paging st ruct ures enhances page prot ect ion for dat a
pages. I nst ruct ions cannot be fet ched from a memory page if I A32_EFER. NXE = 1
and t he execut e- disable bit is set in any of t he paging- st ruct ure ent ries used t o map
t he page. Table 5- 5 list s t he valid usage of a page in relat ion t o t he value of execut e-
disable bit ( bit 63) of t he corresponding ent ry in each level of t he paging st ruct ures.
Execut e- disable prot ect ion can be act ivat ed using t he execut e- disable bit at any level
of t he paging st ruct ure, irrespect ive of t he corresponding ent ry in ot her levels. When
execut e- disable prot ect ion is not act ivat ed, t he page can be used as code or dat a.
I n legacy PAE- enabled mode, Table 5- 6 and Table 5- 7 show t he effect of set t ing t he
execut e- disable bit for code and dat a pages.

Table 5-5. IA-32e Mode Page Level Protection Matrix
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PML4 PDP PDE PTE
Bit 63 = 1 * * * Data
* Bit 63 = 1 * * Data
* * Bit 63 = 1 * Data
* * * Bit 63 = 1 Data
Bit 63 = 0 Bit 63 = 0 Bit 63 = 0 Bit 63 = 0 Data/Code
NOTES:
* Value not checked.
Vol. 3 5-45
PROTECTION
5.13.3 Reserved Bit Checking
The processor enforces reserved bit checking in paging dat a st ruct ure ent ries. The
bit s being checked varies wit h paging mode and may vary wit h t he size of physical
address space.
Table 5- 8 shows t he reserved bit s t hat are checked when t he execut e disable bit
capabilit y is enabled ( CR4. PAE = 1 and I A32_EFER. NXE = 1) . Table 5- 8 and Table
show t he following paging modes:
Non- PAE 4- KByt e paging: 4- KByt e- page only paging ( CR4. PAE = 0,
CR4. PSE = 0) .
PSE36: 4- KByt e and 4- MByt e pages ( CR4. PAE = 0, CR4.PSE = 1) .
PAE: 4- KByt e and 2- MByt e pages ( CR4.PAE = 1, CR4.PSE = X) .
The reserved bit checking depends on t he physical address size support ed by t he
implement at ion, which is report ed in CPUI D.80000008H. See t he t able not e.
Table 5-6. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PDE PTE
Bit 63 = 1 * Data
* Bit 63 = 1 Data
Bit 63 = 0 Bit 63 = 0 Data/Code
NOTE:
* Value not checked.
Table 5-7. Legacy PAE-Enabled 2-MByte Page Level Protection
with Execute-Disable Bit Capability
Execute Disable Bit Value (Bit 63) Valid Usage
PDE
Bit 63 = 1 Data
Bit 63 = 0 Data/Code
5-46 Vol. 3
PROTECTION
I f execut e disable bit capabilit y is not enabled or not available, reserved bit checking
in 64- bit mode includes bit 63 and addit ional bit s. This and reserved bit checking for
legacy 32- bit paging modes are shown in Table 5- 10.

Table 5-8. IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit
Capability Enabled
Mode Paging Mode Check Bits
32-bit 4-KByte paging (non-PAE) No reserved bits checked
PSE36 - PDE, 4-MByte page Bit [21]
PSE36 - PDE, 4-KByte page No reserved bits checked
PSE36 - PTE No reserved bits checked
PAE - PDP table entry Bits [63:MAXPHYADDR] & [8:5] & [2:1] *
PAE - PDE, 2-MByte page Bits [62:MAXPHYADDR] & [20:13] *
PAE - PDE, 4-KByte page Bits [62:MAXPHYADDR] *
PAE - PTE Bits [62:MAXPHYADDR] *
64-bit PML4E Bits [51:MAXPHYADDR] *
PDPTE Bits [51:MAXPHYADDR] *
PDE, 2-MByte page Bits [51:MAXPHYADDR] & [20:13] *
PDE, 4-KByte page Bits [51:MAXPHYADDR] *
PTE Bits [51:MAXPHYADDR] *
NOTES:
* MAXPHYADDR is the maximum physical address size and is indicated by
CPUID.80000008H:EAX[bits 7-0].
Vol. 3 5-47
PROTECTION
5.13.4 Exception Handling
When execut e disable bit capabilit y is enabled ( I A32_EFER. NXE = 1) , condit ions for
a page fault t o occur include t he same condit ions t hat apply t o an I nt el 64 or I A- 32
processor wit hout execut e disable bit capabilit y plus t he following new condit ion: an
inst ruct ion fet ch t o a linear address t hat t ranslat es t o physical address in a memory
page t hat has t he execut e- disable bit set .
An Execut e Disable Bit page fault can occur at all privilege levels. I t can occur on any
inst ruct ion fet ch, including ( but not limit ed t o) : near branches, far branches,
CALL/ RET/ I NT/ I RET execut ion, sequent ial inst ruct ion fet ches, and t ask swit ches. The
execut e- disable bit in t he page t ranslat ion mechanism is checked only when:
I A32_EFER. NXE = 1.
The inst ruct ion t ranslat ion look- aside buffer ( I TLB) is loaded wit h a page t hat is
not already present in t he I TLB.
Table 5-9. Reserved Bit Checking WIth Execute-Disable Bit Capability Not Enabled
Mode Paging Mode Check Bits
32-bit KByte paging (non-PAE) No reserved bits checked
PSE36 - PDE, 4-MByte page Bit [21]
PSE36 - PDE, 4-KByte page No reserved bits checked
PSE36 - PTE No reserved bits checked
PAE - PDP table entry Bits [63:MAXPHYADDR] & [8:5] & [2:1]*
PAE - PDE, 2-MByte page Bits [63:MAXPHYADDR] & [20:13]*
PAE - PDE, 4-KByte page Bits [63:MAXPHYADDR]*
PAE - PTE Bits [63:MAXPHYADDR]*
64-bit PML4E Bit [63], bits [51:MAXPHYADDR]*
PDPTE Bit [63], bits [51:MAXPHYADDR]*
PDE, 2-MByte page Bit [63], bits [51:MAXPHYADDR] & [20:13]*
PDE, 4-KByte page Bit [63], bits [51:MAXPHYADDR]*
PTE Bit [63], bits [51:MAXPHYADDR]*
NOTES:
* MAXPHYADDR is the maximum physical address size and is indicated by
CPUID.80000008H:EAX[bits 7-0].
5-48 Vol. 3
PROTECTION
Vol. 3 6-1
CHAPTER 6
INTERRUPT AND EXCEPTION HANDLING
This chapt er describes t he int errupt and except ion- handling mechanism when oper-
at ing in prot ect ed mode on an I nt el 64 or I A- 32 processor. Most of t he informat ion
provided here also applies t o int errupt and except ion mechanisms used in real-
address, virt ual- 8086 mode, and 64- bit mode.
Chapt er 17, 8086 Emulat ion, describes informat ion specific t o int errupt and excep-
t ion mechanisms in real- address and virt ual- 8086 mode. Sect ion 6. 14, Except ion
and I nt errupt Handling in 64- bit Mode, describes informat ion specific t o int errupt
and except ion mechanisms in I A- 32e mode and 64- bit sub- mode.
6.1 INTERRUPT AND EXCEPTION OVERVIEW
I nt errupt s and except ions are event s t hat indicat e t hat a condit ion exist s somewhere
in t he syst em, t he processor, or wit hin t he current ly execut ing program or t ask t hat
requires t he at t ent ion of a processor. They t ypically result in a forced t ransfer of
execut ion from t he current ly running program or t ask t o a special soft ware rout ine or
t ask called an int errupt handler or an except ion handler. The act ion t aken by a
processor in response t o an int errupt or except ion is referred t o as servicing or
handling t he int errupt or except ion.
I nt errupt s occur at random t imes during t he execut ion of a program, in response t o
signals from hardware. Syst em hardware uses int errupt s t o handle event s ext ernal
t o t he processor, such as request s t o service peripheral devices. Soft ware can also
generat e int errupt s by execut ing t he I NT n inst ruct ion.
Except ions occur when t he processor det ect s an error condit ion while execut ing an
inst ruct ion, such as division by zero. The processor det ect s a variet y of error condi-
t ions including prot ect ion violat ions, page fault s, and int ernal machine fault s. The
machine- check archit ect ure of t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium
processors also permit s a machine- check except ion t o be generat ed when int ernal
hardware errors and bus errors are det ect ed.
When an int errupt is received or an except ion is det ect ed, t he current ly running
procedure or t ask is suspended while t he processor execut es an int errupt or excep-
t ion handler. When execut ion of t he handler is complet e, t he processor resumes
execut ion of t he int errupt ed procedure or t ask. The resumpt ion of t he int errupt ed
procedure or t ask happens wit hout loss of program cont inuit y, unless recovery from
an except ion was not possible or an int errupt caused t he current ly running program
t o be t erminat ed.
This chapt er describes t he processor s int errupt and except ion- handling mechanism,
when operat ing in prot ect ed mode. A descript ion of t he except ions and t he condit ions
t hat cause t hem t o be generat ed is given at t he end of t his chapt er.
6-2 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.2 EXCEPTION AND INTERRUPT VECTORS
To aid in handling except ions and int errupt s, each archit ect urally defined except ion
and each int errupt condit ion requir ing special handling by t he processor is assigned
a unique ident ificat ion number, called a vect or number. The processor uses t he vect or
number assigned t o an except ion or int errupt as an index int o t he int errupt
descript or t able ( I DT) . The t able provides t he ent ry point t o an except ion or int errupt
handler ( see Sect ion 6. 10, I nt errupt Descript or Table ( I DT) ) .
The allowable range for vect or numbers is 0 t o 255. Vect or number s in t he range 0
t hrough 31 are reserved by t he I nt el 64 and I A- 32 archit ect ures for archit ect ure-
defined except ions and int errupt s. Not all of t he vect or numbers in t his range have a
current ly defined funct ion. The unassigned vect or numbers in t his range are
reserved. Do not use t he reserved vect or numbers.
Vect or number s in t he range 32 t o 255 are designat ed as user- defined int errupt s and
are not reserved by t he I nt el 64 and I A- 32 archit ect ure. These int errupt s are gener-
ally assigned t o ext ernal I / O devices t o enable t hose devices t o send int errupt s t o t he
processor t hrough one of t he ext ernal hardware int errupt mechanisms ( see Sect ion
6. 3, Sources of I nt errupt s ) .
Table 6- 1 shows vect or number assignment s for archit ect urally defined except ions
and for t he NMI int errupt . This t able gives t he except ion t ype ( see Sect ion 6. 5,
Except ion Classificat ions ) and indicat es whet her an error code is saved on t he st ack
for t he except ion. The source of each predefined except ion and t he NMI int errupt is
also given.
6.3 SOURCES OF INTERRUPTS
The processor receives int errupt s from t wo sources:
Ext ernal ( hardware generat ed) int errupt s.
Soft ware- generat ed int errupt s.
6.3.1 External Interrupts
Ext ernal int errupt s are received t hrough pins on t he processor or t hrough t he local
API C. The primary int errupt pins on Pent ium 4, I nt el Xeon, P6 family, and Pent ium
processors are t he LI NT[ 1: 0] pins, which are connect ed t o t he local API C ( see
Chapt er 10, Advanced Programmable I nt errupt Cont roller ( API C) ) . When t he local
API C is enabled, t he LI NT[ 1: 0] pins can be programmed t hrough t he API Cs local
vect or t able ( LVT) t o be associat ed wit h any of t he processor s except ion or int errupt
vect ors.
When t he local API C is global/ hardware disabled, t hese pins are configured as I NTR
and NMI pins, respect ively. Assert ing t he I NTR pin signals t he processor t hat an
ext ernal int errupt has occurred. The processor reads from t he syst em bus t he int er-
Vol. 3 6-3
INTERRUPT AND EXCEPTION HANDLING
rupt vect or number provided by an ext ernal int errupt cont roller, such as an 8259A
( see Sect ion 6. 2, Except ion and I nt errupt Vect ors ) . Assert ing t he NMI pin signals a
non- maskable int errupt ( NMI ) , which is assigned t o int errupt vect or 2.
Table 6-1. Protected-Mode Exceptions and Interrupts
Vector
No.
Mne-
monic
Description Type Error
Code
Source
0 #DE Divide Error Fault No DIV and IDIV instructions.
1 #DB RESERVED Fault/
Trap
No For Intel use only.
2 NMI Interrupt Interrupt No Nonmaskable external
interrupt.
3 #BP Breakpoint Trap No INT 3 instruction.
4 #OF Overflow Trap No INTO instruction.
5 #BR BOUND Range Exceeded Fault No BOUND instruction.
6 #UD Invalid Opcode (Undefined
Opcode)
Fault No UD2 instruction or reserved
opcode.
1
7 #NM Device Not Available (No
Math Coprocessor)
Fault No Floating-point or WAIT/FWAIT
instruction.
8 #DF Double Fault Abort Yes
(zero)
Any instruction that can
generate an exception, an NMI,
or an INTR.
9 Coprocessor Segment
Overrun (reserved)
Fault No Floating-point instruction.
2
10 #TS Invalid TSS Fault Yes Task switch or TSS access.
11 #NP Segment Not Present Fault Yes Loading segment registers or
accessing system segments.
12 #SS Stack-Segment Fault Fault Yes Stack operations and SS
register loads.
13 #GP General Protection Fault Yes Any memory reference and
other protection checks.
14 #PF Page Fault Fault Yes Any memory reference.
15 (Intel reserved. Do not
use.)
No
16 #MF x87 FPU Floating-Point
Error (Math Fault)
Fault No x87 FPU floating-point or
WAIT/FWAIT instruction.
17 #AC Alignment Check Fault Yes
(Zero)
Any data reference in
memory.
3
6-4 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
The processor s local API C is normally connect ed t o a syst em- based I / O API C. Here,
ext ernal int errupt s received at t he I / O API Cs pins can be direct ed t o t he local API C
t hrough t he syst em bus ( Pent ium 4, I nt el Core Duo, I nt el Core 2, I nt el

At om, and
I nt el Xeon processors) or t he API C serial bus ( P6 family and Pent ium processors) .
The I / O API C det ermines t he vect or number of t he int errupt and sends t his number
t o t he local API C. When a syst em cont ains mult iple processors, processors can also
send int errupt s t o one anot her by means of t he syst em bus ( Pent ium 4, I nt el Core
Duo, I nt el Core 2, I nt el At om, and I nt el Xeon processors) or t he API C serial bus ( P6
family and Pent ium processors) .
The LI NT[ 1: 0] pins are not available on t he I nt el486 processor and earlier Pent ium
processors t hat do not cont ain an on- chip local API C. These processors have dedi-
cat ed NMI and I NTR pins. Wit h t hese processors, ext ernal int errupt s are t ypically
generat ed by a syst em- based int errupt cont roller ( 8259A) , wit h t he int errupt s being
signaled t hrough t he I NTR pin.
Not e t hat several ot her pins on t he processor can cause a processor int errupt t o
occur. However, t hese int errupt s are not handled by t he int errupt and except ion
mechanism described in t his chapt er. These pins include t he RESET# , FLUSH# ,
STPCLK# , SMI # , R/ S# , and I NI T# pins. Whet her t hey are included on a part icular
processor is implement at ion dependent . Pin funct ions are described in t he dat a
books for t he individual processors. The SMI # pin is described in Chapt er 26,
Syst em Management .
6.3.2 Maskable Hardware Interrupts
Any ext ernal int errupt t hat is delivered t o t he processor by means of t he I NTR pin or
t hrough t he local API C is called a maskable hardware int errupt . Maskable hardware
int errupt s t hat can be delivered t hrough t he I NTR pin include all I A- 32 archit ect ure
18 #MC Machine Check Abort No Error codes (if any) and source
are model dependent.
4
19 #XM SIMD Floating-Point
Exception
Fault No SSE/SSE2/SSE3 floating-point
instructions
5
20-31 Intel reserved. Do not use.
32-
255
User Defined (Non-
reserved) Interrupts
Interrupt External interrupt or INT n
instruction.
NOTES:
1. The UD2 instruction was introduced in the Pentium Pro processor.
2. Processors after the Intel386 processor do not generate this exception.
3. This exception was introduced in the Intel486 processor.
4. This exception was introduced in the Pentium processor and enhanced in the P6 family proces-
sors.
5. This exception was introduced in the Pentium III processor.
Table 6-1. Protected-Mode Exceptions and Interrupts (Contd.)
Vol. 3 6-5
INTERRUPT AND EXCEPTION HANDLING
defined int errupt vect ors from 0 t hrough 255; t hose t hat can be delivered t hrough
t he local API C include int errupt vect ors 16 t hrough 255.
The I F flag in t he EFLAGS regist er permit s all maskable hardware int errupt s t o be
masked as a group ( see Sect ion 6. 8. 1, Masking Maskable Hardware I nt errupt s ) .
Not e t hat when int errupt s 0 t hrough 15 are delivered t hrough t he local API C, t he
API C indicat es t he receipt of an illegal vect or.
6.3.3 Software-Generated Interrupts
The I NT n inst ruct ion permit s int errupt s t o be generat ed from wit hin soft ware by
supplying an int errupt vect or number as an operand. For example, t he I NT 35
inst ruct ion forces an implicit call t o t he int errupt handler for int errupt 35.
Any of t he int errupt vect ors from 0 t o 255 can be used as a paramet er in t his inst ruc-
t ion. I f t he processor s predefined NMI vect or is used, however, t he response of t he
processor will not be t he same as it would be from an NMI int errupt generat ed in t he
normal manner. I f vect or number 2 ( t he NMI vect or) is used in t his inst ruct ion, t he
NMI int errupt handler is called, but t he processor s NMI - handling hardware is not
act ivat ed.
I nt errupt s generat ed in soft ware wit h t he I NT n inst ruct ion cannot be masked by t he
I F flag in t he EFLAGS regist er.
6.4 SOURCES OF EXCEPTIONS
The processor receives except ions from t hree sources:
Processor- det ect ed program- error except ions.
Soft ware- generat ed except ions.
Machine- check except ions.
6.4.1 Program-Error Exceptions
The processor generat es one or more except ions when it det ect s program errors
during t he execut ion in an applicat ion program or t he operat ing syst em or execut ive.
I nt el 64 and I A- 32 archit ect ures define a vect or number for each processor- det ect -
able except ion. Except ions are classified as f aul t s, t r aps, and abor t s ( see Sect ion
6. 5, Except ion Classificat ions ) .
6-6 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.4.2 Software-Generated Exceptions
The I NTO, I NT 3, and BOUND inst ruct ions permit except ions t o be generat ed in soft -
ware. These inst ruct ions allow checks for except ion condit ions t o be performed at
point s in t he inst ruct ion st ream. For example, I NT 3 causes a breakpoint except ion t o
be generat ed.
The I NT n inst ruct ion can be used t o emulat e except ions in soft ware; but t here is a
limit at ion. I f I NT n provides a vect or for one of t he archit ect urally- defined excep-
t ions, t he processor generat es an int errupt t o t he correct vect or ( t o access t he
except ion handler) but does not push an error code on t he st ack. This is t rue even if
t he associat ed hardware- generat ed except ion normally produces an error code. The
except ion handler will st ill at t empt t o pop an error code from t he st ack while handling
t he except ion. Because no error code was pushed, t he handler will pop off and
discard t he EI P inst ead ( in place of t he missing error code) . This sends t he ret urn t o
t he wrong locat ion.
6.4.3 Machine-Check Exceptions
The P6 family and Pent ium processors provide bot h int ernal and ext ernal machine-
check mechanisms for checking t he operat ion of t he int ernal chip hardware and bus
t ransact ions. These mechanisms are implement at ion dependent . When a machine-
check error is det ect ed, t he processor signals a machine- check except ion ( vect or 18)
and ret urns an error code.
See Chapt er 6, I nt errupt 18Machine- Check Except ion ( # MC) and Chapt er 15,
Machine- Check Archit ect ure, for more informat ion about t he machine- check
mechanism.
6.5 EXCEPTION CLASSIFICATIONS
Except ions are classified as f aul t s, t r aps, or abor t s depending on t he way t hey are
report ed and whet her t he inst ruct ion t hat caused t he except ion can be rest art ed
wit hout loss of program or t ask cont inuit y.
Faul t s A fault is an except ion t hat can generally be correct ed and t hat , once
correct ed, allows t he program t o be rest art ed wit h no loss of cont inuit y. When a
fault is report ed, t he processor rest ores t he machine st at e t o t he st at e prior t o
t he beginning of execut ion of t he fault ing inst ruct ion. The ret urn address ( saved
cont ent s of t he CS and EI P regist ers) for t he fault handler point s t o t he fault ing
inst ruct ion, rat her t han t o t he inst ruct ion following t he fault ing inst ruct ion.
Tr aps A t rap is an except ion t hat is report ed immediat ely following t he
execut ion of t he t rapping inst ruct ion. Traps allow execut ion of a program or t ask
t o be cont inued wit hout loss of program cont inuit y. The ret urn address for t he
t rap handler point s t o t he inst ruct ion t o be execut ed aft er t he t rapping
inst ruct ion.
Vol. 3 6-7
INTERRUPT AND EXCEPTION HANDLING
Abor t s An abort is an except ion t hat does not always report t he precise
locat ion of t he inst ruct ion causing t he except ion and does not allow a rest art of
t he program or t ask t hat caused t he except ion. Abort s are used t o report severe
errors, such as hardware errors and inconsist ent or illegal values in syst em
t ables.
NOTE
One except ion subset normally report ed as a fault is not rest art able.
Such except ions result in loss of some processor st at e. For example,
execut ing a POPAD inst ruct ion where t he st ack frame crosses over
t he end of t he st ack segment causes a fault t o be report ed. I n t his
sit uat ion, t he except ion handler sees t hat t he inst ruct ion point er
( CS: EI P) has been rest ored as if t he POPAD inst ruct ion had not been
execut ed. However, int ernal processor st at e ( t he general- purpose
regist ers) will have been modified. Such cases are considered
programming errors. An applicat ion causing t his class of except ions
should be t erminat ed by t he operat ing syst em.
6.6 PROGRAM OR TASK RESTART
To allow t he rest art ing of program or t ask following t he handling of an except ion or
an int errupt , all except ions ( except abort s) are guarant eed t o report except ions on
an inst ruct ion boundary. All int errupt s are guarant eed t o be t aken on an inst ruct ion
boundary.
For fault - class except ions, t he ret urn inst ruct ion point er ( saved when t he processor
generat es an except ion) point s t o t he fault ing inst ruct ion. So, when a program or t ask
is rest art ed following t he handling of a fault , t he fault ing inst ruct ion is rest art ed ( re-
execut ed) . Rest art ing t he fault ing inst ruct ion is commonly used t o handle except ions
t hat are generat ed when access t o an operand is blocked. The most common example
of t his t ype of fault is a page- fault except ion ( # PF) t hat occurs when a program or
t ask references an operand locat ed on a page t hat is not in memory. When a page-
fault except ion occurs, t he except ion handler can load t he page int o memory and
resume execut ion of t he program or t ask by rest art ing t he fault ing inst ruct ion. To
insure t hat t he rest art is handled t ransparent ly t o t he current ly execut ing program or
t ask, t he processor saves t he necessary regist ers and st ack point ers t o allow a rest art
t o t he st at e prior t o t he execut ion of t he fault ing inst ruct ion.
For t rap- class except ions, t he ret urn inst ruct ion point er point s t o t he inst ruct ion
following t he t rapping inst ruct ion. I f a t rap is det ect ed during an inst ruct ion which
t ransfers execut ion, t he ret urn inst ruct ion point er reflect s t he t ransfer. For example,
if a t rap is det ect ed while execut ing a JMP inst ruct ion, t he ret urn inst ruct ion point er
point s t o t he dest inat ion of t he JMP inst ruct ion, not t o t he next address past t he JMP
inst ruct ion. All t rap except ions allow program or t ask rest art wit h no loss of cont i-
nuit y. For example, t he overflow except ion is a t rap except ion. Here, t he ret urn
inst ruct ion point er point s t o t he inst ruct ion following t he I NTO inst ruct ion t hat t est ed
6-8 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
EFLAGS. OF ( overflow) flag. The t rap handler for t his except ion resolves t he overflow
condit ion. Upon ret urn from t he t rap handler, program or t ask execut ion cont inues at
t he inst ruct ion following t he I NTO inst ruct ion.
The abort - class except ions do not support reliable rest art ing of t he program or t ask.
Abort handlers are designed t o collect diagnost ic informat ion about t he st at e of t he
processor when t he abort except ion occurred and t hen shut down t he applicat ion and
syst em as gracefully as possible.
I nt errupt s rigorously support rest art ing of int errupt ed programs and t asks wit hout
loss of cont inuit y. The ret urn inst ruct ion point er saved for an int errupt point s t o t he
next inst ruct ion t o be execut ed at t he inst ruct ion boundary where t he processor t ook
t he int errupt . I f t he inst ruct ion j ust execut ed has a repeat prefix, t he int errupt is
t aken at t he end of t he current it erat ion wit h t he regist ers set t o execut e t he next
it erat ion.
The abilit y of a P6 family processor t o speculat ively execut e inst ruct ions does not
affect t he t aking of int errupt s by t he processor. I nt errupt s are t aken at inst ruct ion
boundaries locat ed during t he ret irement phase of inst ruct ion execut ion; so t hey are
always t aken in t he in- order inst ruct ion st ream. See Chapt er 2, I nt el 64 and I A-
32 Archit ect ures, in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 1, for more informat ion about t he P6 family processors microarchi-
t ect ure and it s support for out - of- order inst ruct ion execut ion.
Not e t hat t he Pent ium processor and earlier I A- 32 processors also perform varying
amount s of prefet ching and preliminary decoding. Wit h t hese processors as well,
except ions and int errupt s are not signaled unt il act ual in- order execut ion of t he
inst ruct ions. For a given code sample, t he signaling of except ions occurs uniformly
when t he code is execut ed on any family of I A- 32 processors ( except where new
except ions or new opcodes have been defined) .
6.7 NONMASKABLE INTERRUPT (NMI)
The nonmaskable int errupt ( NMI ) can be generat ed in eit her of t wo ways:
Ext ernal hardware assert s t he NMI pin.
The processor receives a message on t he syst em bus ( Pent ium 4, I nt el Core Duo,
I nt el Core 2, I nt el At om, and I nt el Xeon processors) or t he API C serial bus ( P6
family and Pent ium processors) wit h a delivery mode NMI .
When t he processor receives a NMI from eit her of t hese sources, t he processor
handles it immediat ely by calling t he NMI handler point ed t o by int errupt vect or
number 2. The processor also invokes cert ain hardware condit ions t o insure t hat no
ot her int errupt s, including NMI int errupt s, are received unt il t he NMI handler has
complet ed execut ing ( see Sect ion 6. 7. 1, Handling Mult iple NMI s ) .
Also, when an NMI is received from eit her of t he above sources, it cannot be masked
by t he I F flag in t he EFLAGS regist er.
Vol. 3 6-9
INTERRUPT AND EXCEPTION HANDLING
I t is possible t o issue a maskable hardware int errupt ( t hrough t he I NTR pin) t o vect or
2 t o invoke t he NMI int errupt handler; however, t his int errupt will not t ruly be an NMI
int errupt . A t rue NMI int errupt t hat act ivat es t he processor s NMI - handling hardware
can only be delivered t hrough one of t he mechanisms list ed above.
6.7.1 Handling Multiple NMIs
While an NMI int errupt handler is execut ing, t he processor disables addit ional calls t o
t he NMI handler unt il t he next I RET inst ruct ion is execut ed. This blocking of subse-
quent NMI s prevent s st acking up calls t o t he NMI handler. I t is recommended t hat t he
NMI int errupt handler be accessed t hrough an int errupt gat e t o disable maskable
hardware int errupt s ( see Sect ion 6. 8. 1, Masking Maskable Hardware I nt errupt s ) . I f
t he NMI handler is a virt ual- 8086 t ask wit h an I OPL of less t han 3, an I RET inst ruct ion
issued from t he handler generat es a general- prot ect ion except ion ( see Sect ion
17. 2. 7, Sensit ive I nst ruct ions ) . I n t his case, t he NMI is unmasked before t he
general- prot ect ion except ion handler is invoked.
6.8 ENABLING AND DISABLING INTERRUPTS
The processor inhibit s t he generat ion of some int errupt s, depending on t he st at e of
t he processor and of t he I F and RF flags in t he EFLAGS regist er, as described in t he
following sect ions.
6.8.1 Masking Maskable Hardware Interrupts
The I F flag can disable t he servicing of maskable hardware int errupt s received on t he
processor s I NTR pin or t hrough t he local API C ( see Sect ion 6.3.2, Maskable Hard-
ware I nt errupt s ) . When t he I F flag is clear, t he processor inhibit s int errupt s deliv-
ered t o t he I NTR pin or t hrough t he local API C from generat ing an int ernal int errupt
request ; when t he I F flag is set , int errupt s delivered t o t he I NTR or t hrough t he local
API C pin are processed as normal ext ernal int errupt s.
The I F flag does not affect non- maskable int errupt s ( NMI s) delivered t o t he NMI pin
or delivery mode NMI messages delivered t hrough t he local API C, nor does it affect
processor generat ed except ions. As wit h t he ot her flags in t he EFLAGS regist er, t he
processor clears t he I F flag in response t o a hardware reset .
The fact t hat t he group of maskable hardware int errupt s includes t he reserved int er-
rupt and except ion vect ors 0 t hrough 32 can pot ent ially cause confusion. Archit ect ur-
ally, when t he I F flag is set , an int errupt for any of t he vect ors from 0 t hrough 32 can
be delivered t o t he processor t hrough t he I NTR pin and any of t he vect ors from 16
t hrough 32 can be delivered t hrough t he local API C. The processor will t hen generat e
an int errupt and call t he int errupt or except ion handler point ed t o by t he vect or
number. So for example, it is possible t o invoke t he page- fault handler t hrough t he
I NTR pin ( by means of vect or 14) ; however, t his is not a t rue page- fault except ion. I t
6-10 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
is an int errupt . As wit h t he I NT n inst ruct ion ( see Sect ion 6. 4.2, Soft ware- Generat ed
Except ions ) , when an int errupt is generat ed t hrough t he I NTR pin t o an except ion
vect or, t he processor does not push an error code on t he st ack, so t he except ion
handler may not operat e correct ly.
The I F flag can be set or cleared wit h t he STI ( set int errupt - enable flag) and CLI
( clear int errupt - enable flag) inst ruct ions, respect ively. These inst ruct ions may be
execut ed only if t he CPL is equal t o or less t han t he I OPL. A general- prot ect ion excep-
t ion ( # GP) is generat ed if t hey are execut ed when t he CPL is great er t han t he I OPL.
( The effect of t he I OPL on t hese inst ruct ions is modified slight ly when t he virt ual
mode ext ension is enabled by set t ing t he VME flag in cont rol regist er CR4: see
Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode. Behavior is
also impact ed by t he PVI flag: see Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s.
The I F flag is also affect ed by t he following operat ions:
The PUSHF inst ruct ion st ores all flags on t he st ack, where t hey can be examined
and modified. The POPF inst ruct ion can be used t o load t he modified flags back
int o t he EFLAGS regist er.
Task swit ches and t he POPF and I RET inst ruct ions load t he EFLAGS regist er;
t herefore, t hey can be used t o modify t he set t ing of t he I F flag.
When an int errupt is handled t hrough an int errupt gat e, t he I F flag is aut omat i-
cally cleared, which disables maskable hardware int errupt s. ( I f an int errupt is
handled t hrough a t rap gat e, t he I F flag is not cleared. )
See t he descript ions of t he CLI , STI , PUSHF, POPF, and I RET inst ruct ions in Chapt er
3, I nst ruct ion Set Reference, A- M, in t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 2A, for a det ailed descript ion of t he operat ions
t hese inst ruct ions are allowed t o perform on t he I F flag.
6.8.2 Masking Instruction Breakpoints
The RF ( resume) flag in t he EFLAGS regist er cont rols t he response of t he processor
t o inst ruct ion- breakpoint condit ions ( see t he descript ion of t he RF flag in Sect ion 2. 3,
Syst em Flags and Fields in t he EFLAGS Regist er ) .
When set , it prevent s an inst ruct ion breakpoint from generat ing a debug except ion
( # DB) ; when clear, inst ruct ion breakpoint s will generat e debug except ions. The
primary funct ion of t he RF flag is t o prevent t he processor from going int o a debug
except ion loop on an inst ruct ion- breakpoint . See Sect ion 16. 3. 1. 1, I nst ruct ion-
Breakpoint Except ion Condit ion, for more informat ion on t he use of t his flag.
Vol. 3 6-11
INTERRUPT AND EXCEPTION HANDLING
6.8.3 Masking Exceptions and Interrupts When Switching Stacks
To swit ch t o a different st ack segment , soft ware oft en uses a pair of inst ruct ions, for
example:
MOV SS, AX
MOV ESP, StackTop
I f an int errupt or except ion occurs aft er t he segment select or has been loaded int o
t he SS regist er but before t he ESP regist er has been loaded, t hese t wo part s of t he
logical address int o t he st ack space are inconsist ent for t he durat ion of t he int errupt
or except ion handler.
To prevent t his sit uat ion, t he processor inhibit s int errupt s, debug except ions, and
single- st ep t rap except ions aft er eit her a MOV t o SS inst ruct ion or a POP t o SS
inst ruct ion, unt il t he inst ruct ion boundary following t he next inst ruct ion is reached.
All ot her fault s may st ill be generat ed. I f t he LSS inst ruct ion is used t o modify t he
cont ent s of t he SS regist er ( which is t he recommended met hod of modifying t his
regist er) , t his problem does not occur.
6.9 PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND
INTERRUPTS
I f more t han one except ion or int errupt is pending at an inst ruct ion boundary, t he
processor services t hem in a predict able order. Table 6- 2 shows t he priorit y among
classes of except ion and int errupt sources.
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts
Priority Description
1 (Highest) Hardware Reset and Machine Checks
- RESET
- Machine Check
2 Trap on Task Switch
- T flag in TSS is set
3 External Hardware Interventions
- FLUSH
- STOPCLK
- SMI
- INIT
4 Traps on the Previous Instruction
- Breakpoints
- Debug Trap Exceptions (TF flag set or data/I-O breakpoint)
6-12 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
While priorit y among t hese classes list ed in Table 6- 2 is consist ent t hroughout t he
archit ect ure, except ions wit hin each class are implement at ion- dependent and may
vary from processor t o processor. The processor first services a pending except ion or
int errupt from t he class which has t he highest priorit y, t ransferring execut ion t o t he
first inst ruct ion of t he handler. Lower priorit y except ions are discarded; lower priorit y
int errupt s are held pending. Discarded except ions are re- generat ed when t he int er-
rupt handler ret urns execut ion t o t he point in t he program or t ask where t he excep-
t ions and/ or int errupt s occurred.
6.10 INTERRUPT DESCRIPTOR TABLE (IDT)
The int errupt descript or t able ( I DT) associat es each except ion or int errupt vect or
wit h a gat e descript or for t he procedure or t ask used t o service t he associat ed excep-
t ion or int errupt . Like t he GDT and LDTs, t he I DT is an array of 8- byt e descript ors ( in
5 Nonmaskable Interrupts (NMI)
1
6 Maskable Hardware Interrupts
1
7 Code Breakpoint Fault
8 Faults from Fetching Next Instruction
- Code-Segment Limit Violation
- Code Page Fault
9 Faults from Decoding the Next Instruction
- Instruction length > 15 bytes
- Invalid Opcode
- Coprocessor Not Available
10 (Lowest) Faults on Executing an Instruction
- Overflow
- Bound error
- Invalid TSS
- Segment Not Present
- Stack fault
- General Protection
- Data Page Fault
- Alignment Check
- x87 FPU Floating-point exception
- SIMD floating-point exception
NOTE:
1. The Intel486 processor and earlier processors group nonmaskable and maskable interrupts in
the same priority class.
Table 6-2. Priority Among Simultaneous Exceptions and Interrupts (Contd.)
Vol. 3 6-13
INTERRUPT AND EXCEPTION HANDLING
prot ect ed mode) . Unlike t he GDT, t he first ent ry of t he I DT may cont ain a descript or.
To form an index int o t he I DT, t he processor scales t he except ion or int errupt vect or
by eight ( t he number of byt es in a gat e descript or) . Because t here are only 256 int er-
rupt or except ion vect ors, t he I DT need not cont ain more t han 256 descript ors. I t can
cont ain fewer t han 256 descript ors, because descript ors are required only for t he
int errupt and except ion vect ors t hat may occur. All empt y descript or slot s in t he I DT
should have t he present flag for t he descript or set t o 0.
The base addresses of t he I DT should be aligned on an 8- byt e boundary t o maximize
performance of cache line fills. The limit value is expressed in byt es and is added t o
t he base address t o get t he address of t he last valid byt e. A limit value of 0 result s in
exact ly 1 valid byt e. Because I DT ent ries are always eight byt es long, t he limit should
always be one less t han an int egral mult iple of eight ( t hat is, 8N 1) .
The I DT may reside anywhere in t he linear address space. As shown in Figure 6- 1,
t he processor locat es t he I DT using t he I DTR regist er. This regist er holds bot h a
32- bit base address and 16- bit limit for t he I DT.
The LI DT ( load I DT regist er) and SI DT ( st ore I DT regist er) inst ruct ions load and st ore
t he cont ent s of t he I DTR regist er, respect ively. The LI DT inst ruct ion loads t he I DTR
regist er wit h t he base address and limit held in a memory operand. This inst ruct ion
can be execut ed only when t he CPL is 0. I t normally is used by t he init ializat ion code
of an operat ing syst em when creat ing an I DT. An operat ing syst em also may use it t o
change from one I DT t o anot her. The SI DT inst ruct ion copies t he base and limit value
st ored in I DTR t o memory. This inst ruct ion can be execut ed at any privilege level.
I f a vect or references a descript or beyond t he limit of t he I DT, a general- prot ect ion
except ion ( # GP) is generat ed.
NOTE
Because int errupt s are delivered t o t he processor core only once, an
incorrect ly configured I DT could result in incomplet e int errupt
handling and/ or t he blocking of int errupt delivery.
I A- 32 archit ect ure rules need t o be followed for set t ing up I DTR
base/ limit / access fields and each field in t he gat e descript ors. The
same apply for t he I nt el 64 archit ect ure. This includes implicit
referencing of t he dest inat ion code segment t hrough t he GDT or LDT
and accessing t he st ack.
6-14 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
6.11 IDT DESCRIPTORS
The I DT may cont ain any of t hree kinds of gat e descript ors:
Task- gat e descript or
I nt errupt - gat e descript or
Trap- gat e descript or
Figure 6- 2 shows t he format s for t he t ask- gat e, int errupt - gat e, and t rap- gat e
descript ors. The format of a t ask gat e used in an I DT is t he same as t hat of a t ask
gat e used in t he GDT or an LDT ( see Sect ion 7. 2. 5, Task- Gat e Descript or ) . The t ask
gat e cont ains t he segment select or for a TSS for an except ion and/ or int errupt
handler t ask.
I nt errupt and t rap gat es are very similar t o call gat es ( see Sect ion 5. 8. 3, Call
Gat es ) . They cont ain a far point er ( segment select or and offset ) t hat t he processor
uses t o t ransfer program execut ion t o a handler procedure in an except ion- or int er-
rupt - handler code segment . These gat es differ in t he way t he processor handles t he
I F flag in t he EFLAGS regist er ( see Sect ion 6. 12. 1. 2, Flag Usage By Except ion- or
I nt errupt - Handler Procedure ) .
Figure 6-1. Relationship of the IDTR and IDT
IDT Limit IDT Base Address
+
Interrupt
Descriptor Table (IDT)
Gate for
0
IDTR Register
Interrupt #n
Gate for
Interrupt #3
Gate for
Interrupt #2
Gate for
Interrupt #1
15 16 47
0 31
0
8
16
(n1)8
Vol. 3 6-15
INTERRUPT AND EXCEPTION HANDLING
6.12 EXCEPTION AND INTERRUPT HANDLING
The processor handles calls t o except ion- and int errupt - handlers similar t o t he way it
handles calls wit h a CALL inst ruct ion t o a procedure or a t ask. When responding t o an
except ion or int errupt , t he processor uses t he except ion or int errupt vect or as an
index t o a descript or in t he I DT. I f t he index point s t o an int errupt gat e or t rap gat e,
t he processor calls t he except ion or int errupt handler in a manner similar t o a CALL
t o a call gat e ( see Sect ion 5. 8.2, Gat e Descript ors, t hrough Sect ion 5. 8.6,
Figure 6-2. IDT Gate Descriptors
31 16 15 13 14 12 8 7 0
P Offset 31..16
D
P
L
0
4
31 16 15 0
Segment Selector Offset 15..0
0
0 1 1 D
Interrupt Gate
DPL
Offset
P
Selector
Descriptor Privilege Level
Offset to procedure entry point
Segment Present flag
Segment Selector for destination code segment
31 16 15 13 14 12 8 7 0
P
D
P
L
0
4
31 16 15 0
TSS Segment Selector 0
1 0 1 0
Task Gate
4 5
0 0 0
31 16 15 13 14 12 8 7 0
P Offset 31..16
D
P
L
0
4
31 16 15 0
Segment Selector Offset 15..0 0
1 1 1 D
Trap Gate
4 5
0 0 0
Reserved
Size of gate: 1 = 32 bits; 0 = 16 bits D
6-16 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Ret urning from a Called Procedure ) . I f index point s t o a t ask gat e, t he processor
execut es a t ask swit ch t o t he except ion- or int errupt - handler t ask in a manner similar
t o a CALL t o a t ask gat e ( see Sect ion 7. 3, Task Swit ching ) .
6.12.1 Exception- or Interrupt-Handler Procedures
An int errupt gat e or t rap gat e references an except ion- or int errupt - handler proce-
dure t hat runs in t he cont ext of t he current ly execut ing t ask ( see Figure 6- 3) . The
segment select or for t he gat e point s t o a segment descript or for an execut able code
segment in eit her t he GDT or t he current LDT. The offset field of t he gat e descript or
point s t o t he beginning of t he except ion- or int errupt - handling procedure.
Figure 6-3. Interrupt Procedure Call
IDT
Interrupt or
Code Segment
Segment Selector
GDT or LDT
Segment
Interrupt
Vector
Base
Address
Destination
Procedure
Interrupt
+
Descriptor
Trap Gate
Offset
Vol. 3 6-17
INTERRUPT AND EXCEPTION HANDLING
When t he processor performs a call t o t he except ion- or int errupt - handler procedure:
I f t he handler procedure is going t o be execut ed at a numerically lower privilege
level, a st ack swit ch occurs. When t he st ack swit ch occurs:
a. The segment select or and st ack point er for t he st ack t o be used by t he
handler are obt ained from t he TSS for t he current ly execut ing t ask. On t his
new st ack, t he processor pushes t he st ack segment select or and st ack
point er of t he int errupt ed procedure.
b. The processor t hen saves t he current st at e of t he EFLAGS, CS, and EI P
regist ers on t he new st ack ( see Figures 6- 4) .
c. I f an except ion causes an error code t o be saved, it is pushed on t he new
st ack aft er t he EI P value.
I f t he handler procedure is going t o be execut ed at t he same privilege level as t he
int errupt ed procedure:
a. The processor saves t he current st at e of t he EFLAGS, CS, and EI P regist ers
on t he current st ack ( see Figures 6- 4) .
b. I f an except ion causes an error code t o be saved, it is pushed on t he current
st ack aft er t he EI P value.
6-18 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
To ret urn from an except ion- or int errupt - handler procedure, t he handler must use
t he I RET ( or I RETD) inst ruct ion. The I RET inst ruct ion is similar t o t he RET inst ruct ion
except t hat it rest ores t he saved flags int o t he EFLAGS regist er. The I OPL field of t he
EFLAGS regist er is rest ored only if t he CPL is 0. The I F flag is changed only if t he CPL
is less t han or equal t o t he I OPL. See Chapt er 3, I nst ruct ion Set Reference, A- M, of
t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2A, for
a descript ion of t he complet e operat ion performed by t he I RET inst ruct ion.
I f a st ack swit ch occurred when calling t he handler procedure, t he I RET inst ruct ion
swit ches back t o t he int errupt ed procedures st ack on t he ret urn.
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures
The privilege- level prot ect ion for except ion- and int errupt - handler procedures is
similar t o t hat used for ordinary procedure calls when called t hrough a call gat e ( see
Sect ion 5.8. 4, Accessing a Code Segment Through a Call Gat e ) . The processor does
Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines
CS
Error Code
EFLAGS
CS
EIP
ESP After
Transfer to Handler
Error Code
ESP Before
Transfer to Handler
EFLAGS
EIP
SS
ESP
Stack Usage with No
Privilege-Level Change
Stack Usage with
Privilege-Level Change
Interrupted Procedures
Interrupted Procedures
and Handlers Stack
Handlers Stack
ESP After
Transfer to Handler
Transfer to Handler
ESP Before
Stack
Vol. 3 6-19
INTERRUPT AND EXCEPTION HANDLING
not permit t ransfer of execut ion t o an except ion- or int errupt - handler procedure in a
less privileged code segment ( numerically great er privilege level) t han t he CPL.
An at t empt t o violat e t his rule result s in a general- prot ect ion except ion ( # GP) . The
prot ect ion mechanism for except ion- and int errupt - handler procedures is different in
t he following ways:
Because int errupt and except ion vect ors have no RPL, t he RPL is not checked on
implicit calls t o except ion and int errupt handlers.
The processor checks t he DPL of t he int errupt or t rap gat e only if an except ion or
int errupt is generat ed wit h an I NT n, I NT 3, or I NTO inst ruct ion. Here, t he CPL
must be less t han or equal t o t he DPL of t he gat e. This rest rict ion prevent s
applicat ion programs or procedures running at privilege level 3 from using a
soft ware int errupt t o access crit ical except ion handlers, such as t he page- fault
handler, providing t hat t hose handlers are placed in more privileged code
segment s ( numerically lower privilege level) . For hardware- generat ed int errupt s
and processor- det ect ed except ions, t he processor ignores t he DPL of int errupt
and t rap gat es.
Because except ions and int errupt s generally do not occur at predict able t imes, t hese
privilege rules effect ively impose rest rict ions on t he privilege levels at which excep-
t ion and int errupt - handling procedures can run. Eit her of t he following t echniques
can be used t o avoid privilege- level violat ions.
The except ion or int errupt handler can be placed in a conforming code segment .
This t echnique can be used for handlers t hat only need t o access dat a available
on t he st ack ( for example, divide error except ions) . I f t he handler needs dat a
from a dat a segment , t he dat a segment needs t o be accessible from privilege
level 3, which would make it unprot ect ed.
The handler can be placed in a nonconforming code segment wit h privilege level
0. This handler would always run, regardless of t he CPL t hat t he int errupt ed
program or t ask is running at .
6.12.1.2 Flag Usage By Exception- or Interrupt-Handler Procedure
When accessing an except ion or int errupt handler t hrough eit her an int errupt gat e or
a t rap gat e, t he processor clears t he TF flag in t he EFLAGS regist er aft er it saves t he
cont ent s of t he EFLAGS regist er on t he st ack. ( On calls t o except ion and int errupt
handlers, t he processor also clears t he VM, RF, and NT flags in t he EFLAGS regist er,
aft er t hey are saved on t he st ack. ) Clearing t he TF flag prevent s inst ruct ion t racing
from affect ing int errupt response. A subsequent I RET inst ruct ion rest ores t he TF
( and VM, RF, and NT) flags t o t he values in t he saved cont ent s of t he EFLAGS regist er
on t he st ack.
The only difference bet ween an int errupt gat e and a t rap gat e is t he way t he
processor handles t he I F flag in t he EFLAGS regist er. When accessing an except ion-
or int errupt - handling procedure t hrough an int errupt gat e, t he processor clears t he
I F flag t o prevent ot her int errupt s from int erfering wit h t he current int errupt handler.
A subsequent I RET inst ruct ion rest ores t he I F flag t o it s value in t he saved cont ent s
6-20 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
of t he EFLAGS regist er on t he st ack. Accessing a handler procedure t hrough a t rap
gat e does not affect t he I F flag.
6.12.2 Interrupt Tasks
When an except ion or int errupt handler is accessed t hrough a t ask gat e in t he I DT, a
t ask swit ch result s. Handling an except ion or int errupt wit h a separat e t ask offers
several advant ages:
The ent ire cont ext of t he int errupt ed program or t ask is saved aut omat ically.
A new TSS permit s t he handler t o use a new privilege level 0 st ack when handling
t he except ion or int errupt . I f an except ion or int errupt occurs when t he current
privilege level 0 st ack is corrupt ed, accessing t he handler t hrough a t ask gat e can
prevent a syst em crash by providing t he handler wit h a new privilege level 0
st ack.
The handler can be furt her isolat ed from ot her t asks by giving it a separat e
address space. This is done by giving it a separat e LDT.
The disadvant age of handling an int errupt wit h a separat e t ask is t hat t he amount of
machine st at e t hat must be saved on a t ask swit ch makes it slower t han using an
int errupt gat e, result ing in increased int errupt lat ency.
A t ask gat e in t he I DT references a TSS descript or in t he GDT ( see Figure 6- 5) . A
swit ch t o t he handler t ask is handled in t he same manner as an ordinary t ask swit ch
( see Sect ion 7.3, Task Swit ching ) . The link back t o t he int errupt ed t ask is st ored in
t he previous t ask link field of t he handler t asks TSS. I f an except ion caused an error
code t o be generat ed, t his error code is copied t o t he st ack of t he new t ask.
When except ion- or int errupt - handler t asks are used in an operat ing syst em, t here
are act ually t wo mechanisms t hat can be used t o dispat ch t asks: t he soft ware sched-
uler ( part of t he operat ing syst em) and t he hardware scheduler ( part of t he
processor' s int errupt mechanism) . The soft ware scheduler needs t o accommodat e
int errupt t asks t hat may be dispat ched when int errupt s are enabled.
NOTE
Because I A- 32 archit ect ure t asks are not re- ent rant , an int errupt -
handler t ask must disable int errupt s bet ween t he t ime it complet es
handling t he int errupt and t he t ime it execut es t he I RET inst ruct ion.
This act ion prevent s anot her int errupt from occurring while t he
int errupt t asks TSS is st ill marked busy, which would cause a
general- prot ect ion ( # GP) except ion.
Vol. 3 6-21
INTERRUPT AND EXCEPTION HANDLING
6.13 ERROR CODE
When an except ion condit ion is relat ed t o a specific segment , t he processor pushes
an error code ont o t he st ack of t he except ion handler ( whet her it is a procedure or
t ask) . The error code has t he format shown in Figure 6- 6. The error code resembles
a segment select or; however, inst ead of a TI flag and RPL field, t he error code
cont ains 3 flags:
EXT Ex t er nal event ( bi t 0) When set , indicat es t hat an event ext ernal
t o t he program, such as a hardware int errupt , caused t he except ion.
I DT Descr i pt or l ocat i on ( bi t 1) When set , indicat es t hat t he index
port ion of t he error code refers t o a gat e descript or in t he I DT; when
Figure 6-5. Interrupt Task Switch
IDT
Task Gate
TSS for Interrupt-
TSS Selector
GDT
TSS Descriptor
Interrupt
Vector
TSS
Base
Address
Handling Task
6-22 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
clear, indicat es t hat t he index refers t o a descript or in t he GDT or t he
current LDT.
TI GDT/ LDT ( bi t 2) Only used when t he I DT flag is clear. When set ,
t he TI flag indicat es t hat t he index port ion of t he error code refers t o
a segment or gat e descript or in t he LDT; when clear, it indicat es t hat
t he index refers t o a descript or in t he current GDT.
The segment select or index field provides an index int o t he I DT, GDT, or current LDT
t o t he segment or gat e select or being referenced by t he error code. I n some cases
t he error code is null ( t hat is, all bit s in t he lower word are clear) . A null error code
indicat es t hat t he error was not caused by a reference t o a specific segment or t hat a
null segment descript or was referenced in an operat ion.
The format of t he error code is different for page- fault except ions ( # PF) . See t he
I nt errupt 14Page- Fault Except ion ( # PF) sect ion in t his chapt er.
The error code is pushed on t he st ack as a doubleword or word ( depending on t he
default int errupt , t rap, or t ask gat e size) . To keep t he st ack aligned for doubleword
pushes, t he upper half of t he error code is reserved. Not e t hat t he error code is not
popped when t he I RET inst ruct ion is execut ed t o ret urn from an except ion handler, so
t he handler must remove t he error code before execut ing a ret urn.
Error codes are not pushed on t he st ack for except ions t hat are generat ed ext ernally
( wit h t he I NTR or LI NT[ 1: 0] pins) or t he I NT n inst ruct ion, even if an error code is
normally produced for t hose except ions.
6.14 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT
MODE
I n 64- bit mode, int errupt and except ion handling is similar t o what has been
described for non- 64- bit modes. The following are t he except ions:
All int errupt handlers point ed by t he I DT are in 64- bit code ( t his does not apply t o
t he SMI handler) .
The size of int errupt - st ack pushes is fixed at 64 bit s; and t he processor uses
8- byt e, zero ext ended st ores.
Figure 6-6. Error Code
31 0
Reserved
I
D
T
T
I
1 2 3
Segment Selector Index
E
X
T
Vol. 3 6-23
INTERRUPT AND EXCEPTION HANDLING
The st ack point er ( SS: RSP) is pushed uncondit ionally on int errupt s. I n legacy
modes, t his push is condit ional and based on a change in current privilege level
( CPL) .
The new SS is set t o NULL if t here is a change in CPL.
I RET behavior changes.
There is a new int errupt st ack- swit ch mechanism.
The alignment of int errupt st ack frame is different .
6.14.1 64-Bit Mode IDT
I nt errupt and t rap gat es are 16 byt es in lengt h t o provide a 64- bit offset for t he
inst ruct ion point er ( RI P) . The 64- bit RI P referenced by int errupt - gat e descript ors
allows an int errupt service rout ine t o be locat ed anywhere in t he linear- address
space. See Figure 6- 7.
I n 64- bit mode, t he I DT index is formed by scaling t he int errupt vect or by 16. The
first eight byt es ( byt es 7: 0) of a 64- bit mode int errupt gat e are similar but not iden-
t ical t o legacy 32- bit int errupt gat es. The t ype field ( bit s 11: 8 in byt es 7: 4) is
described in Table 3- 2. The I nt errupt St ack Table ( I ST) field ( bit s 4: 0 in byt es 7: 4) is
used by t he st ack swit ching mechanisms described in Sect ion 6.14. 5, I nt errupt
St ack Table. Byt es 11: 8 hold t he upper 32 bit s of t he t arget RI P ( int errupt segment
offset ) in canonical form. A general- prot ect ion except ion ( # GP) is generat ed if soft -
Figure 6-7. 64-Bit IDT Gate Descriptors
31 16 15 13 14 12 8 7 0
P
Offset 31..16
D
P
L
0 4
31 16 15 0
Segment Selector Offset 15..0 0
TYPE
Interrupt/Trap Gate
DPL
Offset
P
Selector
Descriptor Privilege Level
Offset to procedure entry point
Segment Present flag
Segment Selector for destination code segment
4 5
0 0 0
31 0
Offset 63..32
8
31 0
12
11
IST 0 0
2
Reserved
IST
Interrupt Stack Table
6-24 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
ware at t empt s t o reference an int errupt gat e wit h a t arget RI P t hat is not in canonical
form.
The t arget code segment referenced by t he int errupt gat e must be a 64- bit code
segment ( CS. L = 1, CS. D = 0) . I f t he t arget is not a 64- bit code segment , a general-
prot ect ion except ion ( # GP) is generat ed wit h t he I DT vect or number report ed as t he
error code.
Only 64- bit int errupt and t rap gat es can be referenced in I A- 32e mode ( 64- bit mode
and compat ibilit y mode) . Legacy 32- bit int errupt or t rap gat e t ypes ( 0EH or 0FH) are
redefined in I A- 32e mode as 64- bit int errupt and t rap gat e t ypes. No 32- bit int errupt
or t rap gat e t ype exist s in I A- 32e mode. I f a reference is made t o a 16- bit int errupt
or t rap gat e ( 06H or 07H) , a general- prot ect ion except ion ( # GP( 0) ) is generat ed.
6.14.2 64-Bit Mode Stack Frame
I n legacy mode, t he size of an I DT ent ry ( 16 bit s or 32 bit s) det ermines t he size of
int errupt - st ack- frame pushes. SS: ESP is pushed only on a CPL change. I n 64- bit
mode, t he size of int errupt st ack- frame pushes is fixed at eight byt es. This is because
only 64- bit mode gat es can be referenced. 64- bit mode also pushes SS: RSP uncon-
dit ionally, rat her t han only on a CPL change.
Aside from error codes, pushing SS: RSP uncondit ionally present s operat ing syst ems
wit h a consist ent int errupt - st ackframe size across all int errupt s. I nt errupt service-
rout ine ent ry point s t hat handle int errupt s generat ed by t he I NTn inst ruct ion or
ext ernal I NTR# signal can push an addit ional error code place- holder t o maint ain
consist ency.
I n legacy mode, t he st ack point er may be at any alignment when an int errupt or
except ion causes a st ack frame t o be pushed. This causes t he st ack frame and
succeeding pushes done by an int errupt handler t o be at arbit rary alignment s. I n
I A- 32e mode, t he RSP is aligned t o a 16- byt e boundary before pushing t he st ack
frame. The st ack frame it self is aligned on a 16- byt e boundary when t he int errupt
handler is called. The processor can arbit rarily realign t he new RSP on int errupt s
because t he previous ( possibly unaligned) RSP is uncondit ionally saved on t he newly
aligned st ack. The previous RSP will be aut omat ically rest ored by a subsequent I RET.
Aligning t he st ack permit s except ion and int errupt frames t o be aligned on a 16- byt e
boundary before int errupt s are re- enabled. This allows t he st ack t o be format t ed for
opt imal st orage of 16- byt e XMM regist ers, which enables t he int errupt handler t o use
fast er 16- byt e aligned loads and st ores ( MOVAPS rat her t han MOVUPS) t o save and
rest ore XMM regist ers.
Alt hough t he RSP alignment is always performed when LMA = 1, it is only of conse-
quence for t he kernel- mode case where t here is no st ack swit ch or I ST used. For a
st ack swit ch or I ST, t he OS would have presumably put suit ably aligned RSP values in
t he TSS.
Vol. 3 6-25
INTERRUPT AND EXCEPTION HANDLING
6.14.3 IRET in IA-32e Mode
I n I A- 32e mode, I RET execut es wit h an 8- byt e operand size. There is not hing t hat
forces t his requirement . The st ack is format t ed in such a way t hat for act ions where
I RET is required, t he 8- byt e I RET operand size works correct ly.
Because int errupt st ack- frame pushes are always eight byt es in I A- 32e mode, an
I RET must pop eight byt e it ems off t he st ack. This is accomplished by preceding t he
I RET wit h a 64- bit operand- size prefix. The size of t he pop is det ermined by t he
address size of t he inst ruct ion. The SS/ ESP/ RSP size adj ust ment is det ermined by
t he st ack size.
I RET pops SS: RSP uncondit ionally off t he int errupt st ack frame only when it is
execut ed in 64- bit mode. I n compat ibilit y mode, I RET pops SS: RSP off t he st ack only
if t here is a CPL change. This allows legacy applicat ions t o execut e properly in
compat ibilit y mode when using t he I RET inst ruct ion. 64- bit int errupt service rout ines
t hat exit wit h an I RET uncondit ionally pop SS: RSP off of t he int errupt st ack frame,
even if t he t arget code segment is running in 64- bit mode or at CPL = 0. This is
because t he original int errupt always pushes SS: RSP.
I n I A- 32e mode, I RET is allowed t o load a NULL SS under cert ain condit ions. I f t he
t arget mode is 64- bit mode and t he t arget CPL < > 3, I RET allows SS t o be loaded
wit h a NULL select or. As part of t he st ack swit ch mechanism, an int errupt or excep-
t ion set s t he new SS t o NULL, inst ead of fet ching a new SS select or from t he TSS and
loading t he corresponding descript or from t he GDT or LDT. The new SS select or is set
t o NULL in order t o properly handle ret urns from subsequent nest ed far t ransfers. I f
t he called procedure it self is int errupt ed, t he NULL SS is pushed on t he st ack frame.
On t he subsequent I RET, t he NULL SS on t he st ack act s as a flag t o t ell t he processor
not t o load a new SS descript or.
6.14.4 Stack Switching in IA-32e Mode
The I A- 32 archit ect ure provides a mechanism t o aut omat ically swit ch st ack frames in
response t o an int errupt . The 64- bit ext ensions of I nt el 64 archit ect ure implement a
modified version of t he legacy st ack- swit ching mechanism and an alt ernat ive st ack-
swit ching mechanism called t he int errupt st ack t able ( I ST) .
I n I A- 32 modes, t he legacy I A- 32 st ack- swit ch mechanism is unchanged. I n I A- 32e
mode, t he legacy st ack- swit ch mechanism is modified. When st acks are swit ched as
part of a 64- bit mode privilege- level change ( result ing from an int errupt ) , a new SS
descript or is not loaded. I A- 32e mode loads only an inner- level RSP from t he TSS.
The new SS select or is forced t o NULL and t he SS select or s RPL field is set t o t he new
CPL. The new SS is set t o NULL in order t o handle nest ed far t ransfers ( CALLF, I NT,
int errupt s and except ions) . The old SS and RSP are saved on t he new st ack
( Figure 6- 8) . On t he subsequent I RET, t he old SS is popped from t he st ack and
loaded int o t he SS regist er.
6-26 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
I n summary, a st ack swit ch in I A- 32e mode works like t he legacy st ack swit ch,
except t hat a new SS select or is not loaded from t he TSS. I nst ead, t he new SS is
forced t o NULL.
6.14.5 Interrupt Stack Table
I n I A- 32e mode, a new int errupt st ack t able ( I ST) mechanism is available as an alt er-
nat ive t o t he modified legacy st ack- swit ching mechanism described above. This
mechanism uncondit ionally swit ches st acks when it is enabled. I t can be enabled on
an individual int errupt - vect or basis using a field in t he I DT ent ry. This means t hat
some int errupt vect ors can use t he modified legacy mechanism and ot hers can use
t he I ST mechanism.
The I ST mechanism is only available in I A- 32e mode. I t is part of t he 64- bit mode
TSS. The mot ivat ion for t he I ST mechanism is t o provide a met hod for specific int er-
rupt s ( such as NMI , double- fault , and machine- check) t o always execut e on a known
good st ack. I n legacy mode, int errupt s can use t he t ask- swit ch mechanism t o set up
a known- good st ack by accessing t he int errupt service rout ine t hrough a t ask gat e
locat ed in t he I DT. However, t he legacy t ask- swit ch mechanism is not support ed in
I A- 32e mode.
The I ST mechanism provides up t o seven I ST point ers in t he TSS. The point ers are
referenced by an int errupt - gat e descript or in t he int errupt - descript or t able ( I DT) ;
see Figure 6- 7. The gat e descript or cont ains a 3- bit I ST index field t hat provides an
offset int o t he I ST sect ion of t he TSS. Using t he I ST mechanism, t he processor loads
t he value point ed by an I ST point er int o t he RSP.
When an int errupt occurs, t he new SS select or is forced t o NULL and t he SS select or s
RPL field is set t o t he new CPL. The old SS, RSP, RFLAGS, CS, and RI P are pushed
ont o t he new st ack. I nt errupt processing t hen proceeds as normal. I f t he I ST index is
zero, t he modified legacy st ack- swit ching mechanism described above is used.
Figure 6-8. IA-32e Mode Stack Usage After Privilege Level Change
CS
Error Code
RFLAGS
RIP
SS
RSP
Stack Usage with
Privilege-Level Change
Handlers Stack
Stack Pointer After
Transfer to Handler
CS
Error Code
EFLAGS
EIP
SS
ESP
Handlers Stack
Legacy Mode
IA-32e Mode
0
+4
+8
+12
+16
+20
0
+8
+16
+24
+32
+40
Vol. 3 6-27
INTERRUPT AND EXCEPTION HANDLING
6.15 EXCEPTION AND INTERRUPT REFERENCE
The following sect ions describe condit ions which generat e except ions and int errupt s.
They are arranged in t he order of vect or numbers. The informat ion cont ained in
t hese sect ions are as follows:
Ex cept i on Cl ass I ndicat es whet her t he except ion class is a fault , t rap, or
abort t ype. Some except ions can be eit her a fault or t rap t ype, depending on
when t he error condit ion is det ect ed. ( This sect ion is not applicable t o int errupt s. )
Descr i pt i on Gives a general descript ion of t he purpose of t he except ion or
int errupt t ype. I t also describes how t he processor handles t he except ion or
int errupt .
Ex cept i on Er r or Code I ndicat es whet her an error code is saved for t he
except ion. I f one is saved, t he cont ent s of t he error code are described. ( This
sect ion is not applicable t o int errupt s. )
Saved I nst r uct i on Poi nt er Describes which inst ruct ion t he saved ( or ret urn)
inst ruct ion point er point s t o. I t also indicat es whet her t he point er can be used t o
rest art a fault ing inst ruct ion.
Pr ogr am St at e Change Describes t he effect s of t he except ion or int errupt on
t he st at e of t he current ly running program or t ask and t he possibilit ies of
rest art ing t he program or t ask wit hout loss of cont inuit y.
6-28 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 0Divide Error Exception (#DE)
Exception Class Fault .
Description
I ndicat es t he divisor operand for a DI V or I DI V inst ruct ion is 0 or t hat t he result
cannot be represent ed in t he number of bit s specified for t he dest inat ion operand.
Exception Error Code
None.
Saved Instruction Pointer
Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany t he divide error, because t he except ion
occurs before t he fault ing inst ruct ion is execut ed.
Vol. 3 6-29
INTERRUPT AND EXCEPTION HANDLING
Interrupt 1Debug Exception (#DB)
Exception Class Trap or Fault. The exception handler can distinguish
between traps or faults by examining the contents of DR6
and the other debug registers.
Description
I ndicat es t hat one or more of several debug- except ion condit ions has been det ect ed.
Whet her t he except ion is a fault or a t rap depends on t he condit ion ( see Table 6- 3) .
See Chapt er 16, Debugging, Profiling Branches and Time- St amp Count er, for
det ailed informat ion about t he debug except ions.
Exception Error Code
None. An except ion handler can examine t he debug regist ers t o det ermine which
condit ion caused t he except ion.
Saved Instruction Pointer
Fault Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat gener-
at ed t he except ion.
Trap Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he
inst ruct ion t hat generat ed t he except ion.
Program State Change
Fault A program- st at e change does not accompany t he debug except ion, because
t he except ion occurs before t he fault ing inst ruct ion is execut ed. The program can
resume normal execut ion upon ret urning from t he debug except ion handler.
Trap A program- st at e change does accompany t he debug except ion, because t he
inst ruct ion or t ask swit ch being execut ed is allowed t o complet e before t he except ion
is generat ed. However, t he new st at e of t he program is not corrupt ed and execut ion
of t he program can cont inue reliably.
Table 6-3. Debug Exception Conditions and Corresponding Exception Classes
Exception Condition Exception Class
Instruction fetch breakpoint Fault
Data read or write breakpoint Trap
I/O read or write breakpoint Trap
General detect condition (in conjunction with in-circuit emulation) Fault
Single-step Trap
Task-switch Trap
6-30 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 2NMI Interrupt
Exception Class Not applicable.
Description
The nonmaskable int errupt ( NMI ) is generat ed ext ernally by assert ing t he
processor s NMI pin or t hrough an NMI request set by t he I / O API C t o t he local API C.
This int errupt causes t he NMI int errupt handler t o be called.
Exception Error Code
Not applicable.
Saved Instruction Pointer
The processor always t akes an NMI int errupt on an inst ruct ion boundary. The saved
cont ent s of CS and EI P regist ers point t o t he next inst ruct ion t o be execut ed at t he
point t he int errupt is t aken. See Sect ion 6. 5, Except ion Classificat ions, for more
informat ion about when t he processor t akes NMI int errupt s.
Program State Change
The inst ruct ion execut ing when an NMI int errupt is received is complet ed before t he
NMI is generat ed. A program or t ask can t hus be rest art ed upon ret urning from an
int errupt handler wit hout loss of cont inuit y, provided t he int errupt handler saves t he
st at e of t he processor before handling t he int errupt and rest ores t he processor s
st at e prior t o a ret urn.
Vol. 3 6-31
INTERRUPT AND EXCEPTION HANDLING
Interrupt 3Breakpoint Exception (#BP)
Exception Class Trap.
Description
I ndicat es t hat a breakpoint inst ruct ion ( I NT 3) was execut ed, causing a breakpoint
t rap t o be generat ed. Typically, a debugger set s a breakpoint by replacing t he first
opcode byt e of an inst ruct ion wit h t he opcode for t he I NT 3 inst ruct ion. ( The I NT 3
inst ruct ion is one byt e long, which makes it easy t o replace an opcode in a code
segment in RAM wit h t he breakpoint opcode. ) The operat ing syst em or a debugging
t ool can use a dat a segment mapped t o t he same physical address space as t he code
segment t o place an I NT 3 inst ruct ion in places where it is desired t o call t he
debugger.
Wit h t he P6 family, Pent ium, I nt el486, and I nt el386 processors, it is more convenient
t o set breakpoint s wit h t he debug regist ers. ( See Sect ion 16. 3. 2, Breakpoint Excep-
t ion ( # BP) I nt errupt Vect or 3, for informat ion about t he breakpoint except ion. ) I f
more breakpoint s are needed beyond what t he debug regist ers allow, t he I NT 3
inst ruct ion can be used.
The breakpoint ( # BP) except ion can also be generat ed by execut ing t he I NT n
inst ruct ion wit h an operand of 3. The act ion of t his inst ruct ion ( I NT 3) is slight ly
different t han t hat of t he I NT 3 inst ruct ion ( see I NTn/ I NTO/ I NT3Call t o I nt errupt
Procedure in Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volume 2A) .
Exception Error Code
None.
Saved Instruction Pointer
Saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he I NT 3
inst ruct ion.
Program State Change
Even t hough t he EI P point s t o t he inst ruct ion following t he breakpoint inst ruct ion, t he
st at e of t he program is essent ially unchanged because t he I NT 3 inst ruct ion does not
affect any regist er or memory locat ions. The debugger can t hus resume t he
suspended program by replacing t he I NT 3 inst ruct ion t hat caused t he breakpoint
wit h t he original opcode and decrement ing t he saved cont ent s of t he EI P regist er.
Upon ret urning from t he debugger, program execut ion resumes wit h t he replaced
inst ruct ion.
6-32 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 4Overflow Exception (#OF)
Exception Class Trap.
Description
I ndicat es t hat an overflow t rap occurred when an I NTO inst ruct ion was execut ed. The
I NTO inst ruct ion checks t he st at e of t he OF flag in t he EFLAGS regist er. I f t he OF flag
is set , an overflow t rap is generat ed.
Some arit hmet ic inst ruct ions ( such as t he ADD and SUB) perform bot h signed and
unsigned arit hmet ic. These inst ruct ions set t he OF and CF flags in t he EFLAGS
regist er t o indicat e signed overflow and unsigned overflow, respect ively. When
performing arit hmet ic on signed operands, t he OF flag can be t est ed direct ly or t he
I NTO inst ruct ion can be used. The benefit of using t he I NTO inst ruct ion is t hat if t he
overflow except ion is det ect ed, an except ion handler can be called aut omat ically t o
handle t he overflow condit ion.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion following t he I NTO
inst ruct ion.
Program State Change
Even t hough t he EI P point s t o t he inst ruct ion following t he I NTO inst ruct ion, t he st at e
of t he program is essent ially unchanged because t he I NTO inst ruct ion does not affect
any regist er or memory locat ions. The program can t hus resume normal execut ion
upon ret urning from t he overflow except ion handler.
Vol. 3 6-33
INTERRUPT AND EXCEPTION HANDLING
Interrupt 5BOUND Range Exceeded Exception (#BR)
Exception Class Fault.
Description
I ndicat es t hat a BOUND- range- exceeded fault occurred when a BOUND inst ruct ion
was execut ed. The BOUND inst ruct ion checks t hat a signed array index is wit hin t he
upper and lower bounds of an array locat ed in memory. I f t he array index is not
wit hin t he bounds of t he array, a BOUND- range- exceeded fault is generat ed.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he BOUND inst ruct ion t hat
generat ed t he except ion.
Program State Change
A program- st at e change does not accompany t he bounds- check fault , because t he
operands for t he BOUND inst ruct ion are not modified. Ret urning from t he BOUND-
range- exceeded except ion handler causes t he BOUND inst ruct ion t o be rest art ed.
6-34 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 6Invalid Opcode Exception (#UD)
Exception Class Fault.
Description
I ndicat es t hat t he processor did one of t he following t hings:
At t empt ed t o execut e an invalid or reserved opcode.
At t empt ed t o execut e an inst ruct ion wit h an operand t ype t hat is invalid for it s
accompanying opcode; for example, t he source operand for a LES inst ruct ion is
not a memory locat ion.
At t empt ed t o execut e an MMX or SSE/ SSE2/ SSE3 inst ruct ion on an I nt el 64 or
I A- 32 processor t hat does not support t he MMX t echnology or
SSE/ SSE2/ SSE3/ SSSE3 ext ensions, respect ively. CPUI D feat ure flags MMX ( bit
23) , SSE ( bit 25) , SSE2 ( bit 26) , SSE3 ( ECX, bit 0) , SSSE3 ( ECX, bit 9) indicat e
support for t hese ext ensions.
At t empt ed t o execut e an MMX inst ruct ion or SSE/ SSE2/ SSE3/ SSSE3 SI MD
inst ruct ion ( wit h t he except ion of t he MOVNTI , PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, CLFLUSH, MONI TOR, and MWAI T inst ruct ions) when t he EM
flag in cont rol regist er CR0 is set ( 1) .
At t empt ed t o execut e an SSE/ SE2/ SSE3/ SSSE3 inst ruct ion when t he OSFXSR bit
in cont rol regist er CR4 is clear ( 0) . Not e t his does not include t he following
SSE/ SSE2/ SSE3 inst ruct ions: MASKMOVQ, MOVNTQ, MOVNTI , PREFETCHh,
SFENCE, LFENCE, MFENCE, and CLFLUSH; or t he 64- bit versions of t he PAVGB,
PAVGW, PEXTRW, PI NSRW, PMAXSW, PMAXUB, PMI NSW, PMI NUB, PMOVMSKB,
PMULHUW, PSADBW, PSHUFW, PADDQ, PSUBQ, PALI GNR, PABSB, PABSD,
PABSW, PHADDD, PHADDSW, PHADDW, PHSUBD, PHSUBSW, PHSUBW,
PMADDUBSM, PMULHRSW, PSHUFB, PSI GNB, PSI GND, and PSI GNW.
At t empt ed t o execut e an SSE/ SSE2/ SSE3/ SSSE3 inst ruct ion on an I nt el 64 or
I A- 32 processor t hat caused a SI MD float ing- point except ion when t he
OSXMMEXCPT bit in cont rol regist er CR4 is clear ( 0) .
Execut ed a UD2 inst ruct ion. Not e t hat even t hough it is t he execut ion of t he UD2
inst ruct ion t hat causes t he invalid opcode except ion, t he saved inst ruct ion
point er will st ill point s at t he UD2 inst ruct ion.
Det ect ed a LOCK prefix t hat precedes an inst ruct ion t hat may not be locked or
one t hat may be locked but t he dest inat ion operand is not a memory locat ion.
At t empt ed t o execut e an LLDT, SLDT, LTR, STR, LSL, LAR, VERR, VERW, or ARPL
inst ruct ion while in real- address or virt ual- 8086 mode.
At t empt ed t o execut e t he RSM inst ruct ion when not in SMM mode.
I n I nt el 64 and I A- 32 processors t hat implement out - of- order execut ion microarchi-
t ect ures, t his except ion is not generat ed unt il an at t empt is made t o ret ire t he result
of execut ing an invalid inst ruct ion; t hat is, decoding and speculat ively at t empt ing t o
execut e an invalid opcode does not generat e t his except ion. Likewise, in t he Pent ium
Vol. 3 6-35
INTERRUPT AND EXCEPTION HANDLING
processor and earlier I A- 32 processors, t his except ion is not generat ed as t he result
of prefet ching and preliminary decoding of an invalid inst ruct ion. ( See Sect ion 6. 5,
Except ion Classificat ions, for general rules for t aking of int errupt s and except ions. )
The opcodes D6 and F1 are undefined opcodes reserved by t he I nt el 64 and I A- 32
archit ect ures. These opcodes, even t hough undefined, do not generat e an invalid
opcode except ion.
The UD2 inst ruct ion is guarant eed t o generat e an invalid opcode except ion.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany an invalid- opcode fault , because t he
invalid inst ruct ion is not execut ed.
6-36 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 7Device Not Available Exception (#NM)
Exception Class Fault.
Description
I ndicat es one of t he following t hings:
The device- not - available except ion is generat ed by eit her of t hree condit ions:
The processor execut ed an x87 FPU float ing- point inst ruct ion while t he EM flag in
cont rol regist er CR0 was set ( 1) . See t he paragraph below for t he special case of
t he WAI T/ FWAI T inst ruct ion.
The processor execut ed a WAI T/ FWAI T inst ruct ion while t he MP and TS flags of
regist er CR0 were set , regardless of t he set t ing of t he EM flag.
The processor execut ed an x87 FPU, MMX, or SSE/ SSE2/ SSE3 inst ruct ion ( wit h
t he except ion of MOVNTI , PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, and
CLFLUSH) while t he TS flag in cont rol regist er CR0 was set and t he EM flag is
clear.
The EM flag is set when t he processor does not have an int ernal x87 FPU float ing-
point unit . A device- not - available except ion is t hen generat ed each t ime an x87 FPU
float ing- point inst ruct ion is encount ered, allowing an except ion handler t o call
float ing- point inst ruct ion emulat ion rout ines.
The TS flag indicat es t hat a cont ext swit ch ( t ask swit ch) has occurred since t he last
t ime an x87 float ing- point , MMX, or SSE/ SSE2/ SSE3 inst ruct ion was execut ed; but
t hat t he cont ext of t he x87 FPU, XMM, and MXCSR regist ers were not saved. When
t he TS flag is set and t he EM flag is clear, t he processor generat es a device- not - avail-
able except ion each t ime an x87 float ing- point , MMX, or SSE/ SSE2/ SSE3 inst ruct ion
is encount ered ( wit h t he except ion of t he inst ruct ions list ed above) . The except ion
handler can t hen save t he cont ext of t he x87 FPU, XMM, and MXCSR regist ers before
it execut es t he inst ruct ion. See Sect ion 2.5, Cont rol Regist ers, for more informat ion
about t he TS flag.
The MP flag in cont rol regist er CR0 is used along wit h t he TS flag t o det ermine if WAI T
or FWAI T inst ruct ions should generat e a device- not - available except ion. I t ext ends
t he funct ion of t he TS flag t o t he WAI T and FWAI T inst ruct ions, giving t he except ion
handler an opport unit y t o save t he cont ext of t he x87 FPU before t he WAI T or FWAI T
inst ruct ion is execut ed. The MP flag is provided primarily for use wit h t he I nt el 286
and I nt el386 DX processors. For programs running on t he Pent ium 4, I nt el Xeon, P6
family, Pent ium, or I nt el486 DX processors, or t he I nt el 487 SX coprocessors, t he MP
flag should always be set ; for programs running on t he I nt el486 SX processor, t he MP
flag should be clear.
Exception Error Code
None.
Vol. 3 6-37
INTERRUPT AND EXCEPTION HANDLING
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he float ing- point inst ruct ion or
t he WAI T/ FWAI T inst ruct ion t hat generat ed t he except ion.
Program State Change
A program- st at e change does not accompany a device- not - available fault , because
t he inst ruct ion t hat generat ed t he except ion is not execut ed.
I f t he EM flag is set , t he except ion handler can t hen read t he float ing- point inst ruc-
t ion point ed t o by t he EI P and call t he appropriat e emulat ion rout ine.
I f t he MP and TS flags are set or t he TS flag alone is set , t he except ion handler can
save t he cont ext of t he x87 FPU, clear t he TS flag, and cont inue execut ion at t he
int errupt ed float ing- point or WAI T/ FWAI T inst ruct ion.
6-38 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 8Double Fault Exception (#DF)
Exception Class Abort.
Description
I ndicat es t hat t he processor det ect ed a second except ion while calling an except ion
handler for a prior except ion. Normally, when t he processor det ect s anot her excep-
t ion while t rying t o call an except ion handler, t he t wo except ions can be handled seri-
ally. I f, however, t he processor cannot handle t hem serially, it signals t he double- fault
except ion. To det ermine when t wo fault s need t o be signalled as a double fault , t he
processor divides t he except ions int o t hree classes: benign except ions, cont ribut ory
except ions, and page fault s ( see Table 6- 4) .
Table 6- 5 shows t he various combinat ions of except ion classes t hat cause a double
fault t o be generat ed. A double- fault except ion falls in t he abort class of except ions.
The program or t ask cannot be rest art ed or resumed. The double- fault handler can
be used t o collect diagnost ic informat ion about t he st at e of t he machine and/ or, when
possible, t o shut t he applicat ion and/ or syst em down gracefully or rest art t he
syst em.
Table 6-4. Interrupt and Exception Classes
Class Vector Number Description
Benign Exceptions and
Interrupts
1
2
3
4
5
6
7
9
16
17
18
19
All
All
Debug
NMI Interrupt
Breakpoint
Overflow
BOUND Range Exceeded
Invalid Opcode
Device Not Available
Coprocessor Segment Overrun
Floating-Point Error
Alignment Check
Machine Check
SIMD floating-point
INT n
INTR
Contributory Exceptions 0
10
11
12
13
Divide Error
Invalid TSS
Segment Not Present
Stack Fault
General Protection
Page Faults 14 Page Fault
Vol. 3 6-39
INTERRUPT AND EXCEPTION HANDLING
A segment or page fault may be encount ered while prefet ching inst ruct ions;
however, t his behavior is out side t he domain of Table 6- 5. Any furt her fault s gener-
at ed while t he processor is at t empt ing t o t ransfer cont rol t o t he appropriat e fault
handler could st ill lead t o a double- fault sequence.
I f anot her except ion occurs while at t empt ing t o call t he double- fault handler, t he
processor ent ers shut down mode. This mode is similar t o t he st at e following execu-
t ion of an HLT inst ruct ion. I n t his mode, t he processor st ops execut ing inst ruct ions
unt il an NMI int errupt , SMI int errupt , hardware reset , or I NI T# is received. The
processor generat es a special bus cycle t o indicat e t hat it has ent ered shut down
mode. Soft ware designers may need t o be aware of t he response of hardware when
it goes int o shut down mode. For example, hardware may t urn on an indicat or light on
t he front panel, generat e an NMI int errupt t o record diagnost ic informat ion, invoke
reset init ializat ion, generat e an I NI T init ializat ion, or generat e an SMI . I f any event s
are pending during shut down, t hey will be handled aft er an wake event from shut -
down is processed ( for example, A20M# int errupt s) .
I f a shut down occurs while t he processor is execut ing an NMI int errupt handler, t hen
only a hardware reset can rest art t he processor. Likewise, if t he shut down occurs
while execut ing in SMM, a hardware reset must be used t o rest art t he processor.
Exception Error Code
Zero. The processor always pushes an error code of 0 ont o t he st ack of t he double-
fault handler.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers are undefined.
Program State Change
A program- st at e following a double- fault except ion is undefined. The program or t ask
cannot be resumed or rest art ed. The only available act ion of t he double- fault excep-
t ion handler is t o collect all possible cont ext informat ion for use in diagnost ics and
t hen close t he applicat ion and/ or shut down or reset t he processor.
Table 6-5. Conditions for Generating a Double Fault
Second Exception
First Exception Benign Contributory Page Fault
Benign Handle Exceptions
Serially
Handle Exceptions
Serially
Handle Exceptions
Serially
Contributory Handle Exceptions
Serially
Generate a Double
Fault
Handle Exceptions
Serially
Page Fault Handle Exceptions
Serially
Generate a Double
Fault
Generate a Double Fault
6-40 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
I f t he double fault occurs when any port ion of t he except ion handling machine st at e
is corrupt ed, t he handler cannot be invoked and t he processor must be reset .
Vol. 3 6-41
INTERRUPT AND EXCEPTION HANDLING
Interrupt 9Coprocessor Segment Overrun
Exception Class Abort. (Intel reserved; do not use. Recent IA-32 processors
do not generate this exception.)
Description
I ndicat es t hat an I nt el386 CPU- based syst ems wit h an I nt el 387 mat h coprocessor
det ect ed a page or segment violat ion while t ransferring t he middle port ion of an
I nt el 387 mat h coprocessor operand. The P6 family, Pent ium, and I nt el486 proces-
sors do not generat e t his except ion; inst ead, t his condit ion is det ect ed wit h a general
prot ect ion except ion ( # GP) , int errupt 13.
Exception Error Code
None.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e following a coprocessor segment - overrun except ion is unde-
fined. The program or t ask cannot be resumed or rest art ed. The only available act ion
of t he except ion handler is t o save t he inst ruct ion point er and reinit ialize t he x87 FPU
using t he FNI NI T inst ruct ion.
6-42 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 10Invalid TSS Exception (#TS)
Exception Class Fault.
Description
I ndicat es t hat t here was an error relat ed t o a TSS. Such an error might be det ect ed
during a t ask swit ch or during t he execut ion of inst ruct ions t hat use informat ion from
a TSS. Table 6- 6 shows t he condit ions t hat cause an invalid TSS except ion t o be
generat ed.
Table 6-6. Invalid TSS Conditions
Error Code Index Invalid Condition
TSS segment selector index The TSS segment limit is less than 67H for 32-bit TSS or less than
2CH for 16-bit TSS.
TSS segment selector index During an IRET task switch, the TI flag in the TSS segment selector
indicates the LDT.
TSS segment selector index During an IRET task switch, the TSS segment selector exceeds
descriptor table limit.
TSS segment selector index During an IRET task switch, the busy flag in the TSS descriptor
indicates an inactive task.
TSS segment selector index During an IRET task switch, an attempt to load the backlink limit
faults.
TSS segment selector index During an IRET task switch, the backlink is a NULL selector.
TSS segment selector index During an IRET task switch, the backlink points to a descriptor
which is not a busy TSS.
TSS segment selector index The new TSS descriptor is beyond the GDT limit.
TSS segment selector index The new TSS descriptor is not writable.
TSS segment selector index Stores to the old TSS encounter a fault condition.
TSS segment selector index The old TSS descriptor is not writable for a jump or IRET task
switch.
TSS segment selector index The new TSS backlink is not writable for a call or exception task
switch.
TSS segment selector index The new TSS selector is null on an attempt to lock the new TSS.
TSS segment selector index The new TSS selector has the TI bit set on an attempt to lock the
new TSS.
TSS segment selector index The new TSS descriptor is not an available TSS descriptor on an
attempt to lock the new TSS.
LDT segment selector index LDT or LDT not present.
Vol. 3 6-43
INTERRUPT AND EXCEPTION HANDLING
Stack segment selector
index
The stack segment selector exceeds descriptor table limit.
Stack segment selector
index
The stack segment selector is NULL.
Stack segment selector
index
The stack segment descriptor is a non-data segment.
Stack segment selector
index
The stack segment is not writable.
Stack segment selector
index
The stack segment DPL != CPL.
Stack segment selector
index
The stack segment selector RPL != CPL.
Code segment selector
index
The code segment selector exceeds descriptor table limit.
Code segment selector
index
The code segment selector is NULL.
Code segment selector
index
The code segment descriptor is not a code segment type.
Code segment selector
index
The nonconforming code segment DPL != CPL.
Code segment selector
index
The conforming code segment DPL is greater than CPL.
Data segment selector index The data segment selector exceeds the descriptor table limit.
Data segment selector index The data segment descriptor is not a readable code or data type.
Data segment selector index The data segment descriptor is a nonconforming code type and RPL
> DPL.
Data segment selector index The data segment descriptor is a nonconforming code type and CPL
> DPL.
TSS segment selector index The TSS segment selector is NULL for LTR.
TSS segment selector index The TSS segment selector has the TI bit set for LTR.
TSS segment selector index The TSS segment descriptor/upper descriptor is beyond the GDT
segment limit.
TSS segment selector index The TSS segment descriptor is not an available TSS type.
TSS segment selector index The TSS segment descriptor is an available 286 TSS type in IA-32e
mode.
Table 6-6. Invalid TSS Conditions (Contd.)
Error Code Index Invalid Condition
6-44 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
This except ion can generat ed eit her in t he cont ext of t he original t ask or in t he
cont ext of t he new t ask ( see Sect ion 7. 3, Task Swit ching ) . Unt il t he processor has
complet ely verified t he presence of t he new TSS, t he except ion is generat ed in t he
cont ext of t he original t ask. Once t he exist ence of t he new TSS is verified, t he t ask
swit ch is considered complet e. Any invalid-TSS condit ions det ect ed aft er t his point
are handled in t he cont ext of t he new t ask. ( A t ask swit ch is considered complet e
when t he t ask regist er is loaded wit h t he segment select or for t he new TSS and, if t he
swit ch is due t o a procedure call or int errupt , t he previous t ask link field of t he new
TSS references t he old TSS. )
The invalid-TSS handler must be a t ask called using a t ask gat e. Handling t his excep-
t ion inside t he fault ing TSS cont ext is not recommended because t he processor st at e
may not be consist ent .
Exception Error Code
An error code cont aining t he segment select or index for t he segment descript or t hat
caused t he violat ion is pushed ont o t he st ack of t he except ion handler. I f t he EXT flag
is set , it indicat es t hat t he except ion was caused by an event ext ernal t o t he current ly
running program ( for example, if an ext ernal int errupt handler using a t ask gat e
at t empt ed a t ask swit ch t o an invalid TSS) .
Saved Instruction Pointer
I f t he except ion condit ion was det ect ed before t he t ask swit ch was carried out , t he
saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat invoked t he t ask
swit ch. I f t he except ion condit ion was det ect ed aft er t he t ask swit ch was carried out ,
t he saved cont ent s of CS and EI P regist ers point t o t he first inst ruct ion of t he new
t ask.
Program State Change
The abilit y of t he invalid-TSS handler t o recover from t he fault depends on t he error
condit ion t han causes t he fault . See Sect ion 7. 3, Task Swit ching, for more informa-
t ion on t he t ask swit ch process and t he possible recovery act ions t hat can be t aken.
TSS segment selector index The TSS segment upper descriptor is not the correct type.
TSS segment selector index The TSS segment descriptor contains a non-canonical base.
TSS segment selector index There is a limit violation in attempting to load SS selector or ESP
from a TSS on a call or exception which changes privilege levels in
legacy mode.
TSS segment selector index There is a limit violation or canonical fault in attempting to load RSP
or IST from a TSS on a call or exception which changes privilege
levels in IA-32e mode.
Table 6-6. Invalid TSS Conditions (Contd.)
Error Code Index Invalid Condition
Vol. 3 6-45
INTERRUPT AND EXCEPTION HANDLING
I f an invalid TSS except ion occurs during a t ask swit ch, it can occur before or aft er
t he commit - t o- new- t ask point . I f it occurs before t he commit point , no program st at e
change occurs. I f it occurs aft er t he commit point ( when t he segment descript or
informat ion for t he new segment select ors have been loaded in t he segment regis-
t ers) , t he processor will load all t he st at e informat ion from t he new TSS before it
generat es t he except ion. During a t ask swit ch, t he processor first loads all t he
segment regist ers wit h segment select ors from t he TSS, t hen checks t heir cont ent s
for validit y. I f an invalid TSS except ion is discovered, t he remaining segment regis-
t ers are loaded but not checked for validit y and t herefore may not be usable for refer-
encing memory. The invalid TSS handler should not rely on being able t o use t he
segment select ors found in t he CS, SS, DS, ES, FS, and GS regist ers wit hout causing
anot her except ion. The except ion handler should load all segment regist ers before
t rying t o resume t he new t ask; ot herwise, general- prot ect ion except ions ( # GP) may
result lat er under condit ions t hat make diagnosis more difficult . The I nt el recom-
mended way of dealing sit uat ion is t o use a t ask for t he invalid TSS except ion
handler. The t ask swit ch back t o t he int errupt ed t ask from t he invalid-TSS except ion-
handler t ask will t hen cause t he processor t o check t he regist ers as it loads t hem
from t he TSS.
6-46 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 11Segment Not Present (#NP)
Exception Class Fault.
Description
I ndicat es t hat t he present flag of a segment or gat e descript or is clear. The processor
can generat e t his except ion during any of t he following operat ions:
While at t empt ing t o load CS, DS, ES, FS, or GS regist ers. [ Det ect ion of a not -
present segment while loading t he SS regist er causes a st ack fault except ion
( # SS) t o be generat ed. ] This sit uat ion can occur while performing a t ask swit ch.
While at t empt ing t o load t he LDTR using an LLDT inst ruct ion. Det ect ion of a not -
present LDT while loading t he LDTR during a t ask swit ch operat ion causes an
invalid-TSS except ion ( # TS) t o be generat ed.
When execut ing t he LTR inst ruct ion and t he TSS is marked not present .
While at t empt ing t o use a gat e descript or or TSS t hat is marked segment - not -
present , but is ot herwise valid.
An operat ing syst em t ypically uses t he segment - not - present except ion t o implement
virt ual memory at t he segment level. I f t he except ion handler loads t he segment and
ret urns, t he int errupt ed program or t ask resumes execut ion.
A not - present indicat ion in a gat e descript or, however, does not indicat e t hat a
segment is not present ( because gat es do not correspond t o segment s) . The oper-
at ing syst em may use t he present flag for gat e descript ors t o t rigger except ions of
special significance t o t he operat ing syst em.
A cont ribut ory except ion or page fault t hat subsequent ly referenced a not - present
segment would cause a double fault ( # DF) t o be generat ed inst ead of # NP.
Exception Error Code
An error code cont aining t he segment select or index for t he segment descript or t hat
caused t he violat ion is pushed ont o t he st ack of t he except ion handler. I f t he EXT flag
is set , it indicat es t hat t he except ion result ed from eit her:
an ext ernal event ( NMI or I NTR) t hat caused an int errupt , which subsequent ly
referenced a not - present segment
a benign except ion t hat subsequent ly referenced a not - present segment
The I DT flag is set if t he error code refers t o an I DT ent ry. This occurs when t he I DT
ent ry for an int errupt being serviced references a not - present gat e. Such an event
could be generat ed by an I NT inst ruct ion or a hardware int errupt .
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers normally point t o t he inst ruct ion t hat
generat ed t he except ion. I f t he except ion occurred while loading segment descrip-
Vol. 3 6-47
INTERRUPT AND EXCEPTION HANDLING
t ors for t he segment select ors in a new TSS, t he CS and EI P regist ers point t o t he first
inst ruct ion in t he new t ask. I f t he except ion occurred while accessing a gat e
descript or, t he CS and EI P regist ers point t o t he inst ruct ion t hat invoked t he access
( for example a CALL inst ruct ion t hat references a call gat e) .
Program State Change
I f t he segment - not - present except ion occurs as t he result of loading a regist er ( CS,
DS, SS, ES, FS, GS, or LDTR) , a program- st at e change does accompany t he excep-
t ion because t he regist er is not loaded. Recovery from t his except ion is possible by
simply loading t he missing segment int o memory and set t ing t he present flag in t he
segment descript or.
I f t he segment - not - present except ion occurs while accessing a gat e descript or, a
program- st at e change does not accompany t he except ion. Recovery from t his excep-
t ion is possible merely by set t ing t he present flag in t he gat e descript or.
I f a segment - not - present except ion occurs during a t ask swit ch, it can occur before
or aft er t he commit - t o- new- t ask point ( see Sect ion 7. 3, Task Swit ching ) . I f it
occurs before t he commit point , no program st at e change occurs. I f it occurs aft er
t he commit point , t he processor will load all t he st at e informat ion from t he new TSS
( wit hout performing any addit ional limit , present , or t ype checks) before it generat es
t he except ion. The segment - not - present except ion handler should not rely on being
able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and GS regist ers
wit hout causing anot her except ion. ( See t he Program St at e Change descript ion for
I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addit ional informat ion
on how t o handle t his sit uat ion. )
6-48 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 12Stack Fault Exception (#SS)
Exception Class Fault.
Description
I ndicat es t hat one of t he following st ack relat ed condit ions was det ect ed:
A limit violat ion is det ect ed during an operat ion t hat refers t o t he SS regist er.
Operat ions t hat can cause a limit violat ion include st ack- orient ed inst ruct ions
such as POP, PUSH, CALL, RET, I RET, ENTER, and LEAVE, as well as ot her memory
references which implicit ly or explicit ly use t he SS regist er ( for example, MOV
AX, [ BP+ 6] or MOV AX, SS: [ EAX+ 6] ) . The ENTER inst ruct ion generat es t his
except ion when t here is not enough st ack space for allocat ing local variables.
A not - present st ack segment is det ect ed when at t empt ing t o load t he SS regist er.
This violat ion can occur during t he execut ion of a t ask swit ch, a CALL inst ruct ion
t o a different privilege level, a ret urn t o a different privilege level, an LSS
inst ruct ion, or a MOV or POP inst ruct ion t o t he SS regist er.
A canonical violat ion is det ect ed in 64- bit mode during an operat ion t hat
reference memory using t he st ack point er regist er cont aining a non- canonical
memory address.
Recovery from t his fault is possible by eit her ext ending t he limit of t he st ack segment
( in t he case of a limit violat ion) or loading t he missing st ack segment int o memory ( in
t he case of a not - present violat ion.
I n t he case of a canonical violat ion t hat was caused int ent ionally by soft ware,
recovery is possible by loading t he correct canonical value int o RSP. Ot herwise, a
canonical violat ion of t he address in RSP likely reflect s some regist er corrupt ion in
t he soft ware.
Exception Error Code
I f t he except ion is caused by a not - present st ack segment or by overflow of t he new
st ack during an int er- privilege- level call, t he error code cont ains a segment select or
for t he segment t hat caused t he except ion. Here, t he except ion handler can t est t he
present flag in t he segment descript or point ed t o by t he segment select or t o det er-
mine t he cause of t he except ion. For a normal limit violat ion ( on a st ack segment
already in use) t he error code is set t o 0.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers generally point t o t he inst ruct ion t hat
generat ed t he except ion. However, when t he except ion result s from at t empt ing t o
load a not - present st ack segment during a t ask swit ch, t he CS and EI P regist ers point
t o t he first inst ruct ion of t he new t ask.
Vol. 3 6-49
INTERRUPT AND EXCEPTION HANDLING
Program State Change
A program- st at e change does not generally accompany a st ack- fault except ion,
because t he inst ruct ion t hat generat ed t he fault is not execut ed. Here, t he inst ruct ion
can be rest art ed aft er t he except ion handler has correct ed t he st ack fault condit ion.
I f a st ack fault occurs during a t ask swit ch, it occurs aft er t he commit - t o- new- t ask
point ( see Sect ion 7.3, Task Swit ching ) . Here, t he processor loads all t he st at e
informat ion from t he new TSS ( wit hout performing any addit ional limit , present , or
t ype checks) before it generat es t he except ion. The st ack fault handler should t hus
not rely on being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS,
and GS r egist er s wit hout causing anot her except ion. The except ion handler should
check all segment regist ers before t rying t o resume t he new t ask; ot herwise, general
prot ect ion fault s may result lat er under condit ions t hat are more difficult t o diagnose.
( See t he Program St at e Change descript ion for I nt errupt 10I nvalid TSS Except ion
( # TS) in t his chapt er for addit ional informat ion on how t o handle t his sit uat ion. )
6-50 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 13General Protection Exception (#GP)
Exception Class Fault.
Description
I ndicat es t hat t he processor det ect ed one of a class of prot ect ion violat ions called
general- prot ect ion violat ions. The condit ions t hat cause t his except ion t o be gener-
at ed comprise all t he prot ect ion violat ions t hat do not cause ot her except ions t o be
generat ed ( such as, invalid-TSS, segment - not - present , st ack- fault , or page- fault
except ions) . The following condit ions cause general- prot ect ion except ions t o be
generat ed:
Exceeding t he segment limit when accessing t he CS, DS, ES, FS, or GS
segment s.
Exceeding t he segment limit when referencing a descript or t able ( except during a
t ask swit ch or a st ack swit ch) .
Transferring execut ion t o a segment t hat is not execut able.
Writ ing t o a code segment or a read- only dat a segment .
Reading from an execut e- only code segment .
Loading t he SS regist er wit h a segment select or for a read- only segment ( unless
t he select or comes from a TSS during a t ask swit ch, in which case an invalid-TSS
except ion occurs) .
Loading t he SS, DS, ES, FS, or GS regist er wit h a segment select or for a syst em
segment .
Loading t he DS, ES, FS, or GS regist er wit h a segment select or for an execut e-
only code segment .
Loading t he SS regist er wit h t he segment select or of an execut able segment or a
null segment select or.
Loading t he CS regist er wit h a segment select or for a dat a segment or a null
segment select or.
Accessing memory using t he DS, ES, FS, or GS regist er when it cont ains a null
segment select or.
Swit ching t o a busy t ask during a call or j ump t o a TSS.
Using a segment select or on a non- I RET t ask swit ch t hat point s t o a TSS
descript or in t he current LDT. TSS descript ors can only reside in t he GDT. This
condit ion causes a # TS except ion during an I RET t ask swit ch.
Violat ing any of t he privilege rules described in Chapt er 5, Prot ect ion.
Exceeding t he inst ruct ion lengt h limit of 15 byt es ( t his only can occur when
redundant prefixes are placed before an inst ruct ion) .
Loading t he CR0 regist er wit h a set PG flag ( paging enabled) and a clear PE flag
( prot ect ion disabled) .
Vol. 3 6-51
INTERRUPT AND EXCEPTION HANDLING
Loading t he CR0 regist er wit h a set NW flag and a clear CD flag.
Referencing an ent ry in t he I DT ( following an int errupt or except ion) t hat is not
an int errupt , t rap, or t ask gat e.
At t empt ing t o access an int errupt or except ion handler t hrough an int errupt or
t rap gat e from virt ual- 8086 mode when t he handler s code segment DPL is
great er t han 0.
At t empt ing t o writ e a 1 int o a reserved bit of CR4.
At t empt ing t o execut e a privileged inst ruct ion when t he CPL is not equal t o 0 ( see
Sect ion 5. 9, Privileged I nst ruct ions, for a list of privileged inst ruct ions) .
Writ ing t o a reserved bit in an MSR.
Accessing a gat e t hat cont ains a null segment select or.
Execut ing t he I NT n inst ruct ion when t he CPL is great er t han t he DPL of t he
referenced int errupt , t rap, or t ask gat e.
The segment select or in a call, int errupt , or t rap gat e does not point t o a code
segment .
The segment select or operand in t he LLDT inst ruct ion is a local t ype ( TI flag is
set ) or does not point t o a segment descript or of t he LDT t ype.
The segment select or operand in t he LTR inst ruct ion is local or point s t o a TSS
t hat is not available.
The t arget code- segment select or for a call, j ump, or ret urn is null.
I f t he PAE and/ or PSE flag in cont rol regist er CR4 is set and t he processor det ect s
any reserved bit s in a page- direct ory- point er- t able ent ry set t o 1. These bit s are
checked during a writ e t o cont rol regist ers CR0, CR3, or CR4 t hat causes a
reloading of t he page- direct ory- point er- t able ent ry.
At t empt ing t o writ e a non- zero value int o t he reserved bit s of t he MXCSR regist er.
Execut ing an SSE/ SSE2/ SSE3 inst ruct ion t hat at t empt s t o access a 128- bit
memory locat ion t hat is not aligned on a 16- byt e boundary when t he inst ruct ion
requires 16- byt e alignment . This condit ion also applies t o t he st ack segment .
A program or t ask can be rest art ed following any general- prot ect ion except ion. I f t he
except ion occurs while at t empt ing t o call an int errupt handler, t he int errupt ed
program can be rest art able, but t he int errupt may be lost .
Exception Error Code
The processor pushes an error code ont o t he except ion handler' s st ack. I f t he fault
condit ion was det ect ed while loading a segment descript or, t he error code cont ains a
segment select or t o or I DT vect or number for t he descript or; ot herwise, t he error
code is 0. The source of t he select or in an error code may be any of t he following:
An operand of t he inst ruct ion.
A select or from a gat e which is t he operand of t he inst ruct ion.
6-52 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
A select or from a TSS involved in a t ask swit ch.
I DT vect or number.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
I n general, a program- st at e change does not accompany a general- prot ect ion excep-
t ion, because t he invalid inst ruct ion or operat ion is not execut ed. An except ion
handler can be designed t o correct all of t he condit ions t hat cause general- prot ect ion
except ions and rest art t he program or t ask wit hout any loss of program cont inuit y.
I f a general- prot ect ion except ion occurs during a t ask swit ch, it can occur before or
aft er t he commit - t o- new- t ask point ( see Sect ion 7. 3, Task Swit ching ) . I f it occurs
before t he commit point , no program st at e change occurs. I f it occurs aft er t he
commit point , t he processor will load all t he st at e informat ion from t he new TSS
( wit hout performing any addit ional limit , present , or t ype checks) before it generat es
t he except ion. The general- prot ect ion except ion handler should t hus not rely on
being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and GS
regist ers wit hout causing anot her except ion. ( See t he Program St at e Change
descript ion for I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addi-
t ional informat ion on how t o handle t his sit uat ion. )
General Protection Exception in 64-bit Mode
The following condit ions cause general- prot ect ion except ions in 64- bit mode:
I f t he memory address is in a non- canonical form.
I f a segment descript or memory address is in non- canonical form.
I f t he t arget offset in a dest inat ion operand of a call or j mp is in a non- canonical
form.
I f a code segment or 64- bit call gat e overlaps non- canonical space.
I f t he code segment descript or point ed t o by t he select or in t he 64- bit gat e
doesn' t have t he L- bit set and t he D- bit clear.
I f t he EFLAGS. NT bit is set in I RET.
I f t he st ack segment select or of I RET is null when going back t o compat ibilit y
mode.
I f t he st ack segment select or of I RET is null going back t o CPL3 and 64- bit mode.
I f a null st ack segment select or RPL of I RET is not equal t o CPL going back t o non-
CPL3 and 64- bit mode.
I f t he proposed new code segment descript or of I RET has bot h t he D- bit and t he
L- bit set .
Vol. 3 6-53
INTERRUPT AND EXCEPTION HANDLING
I f t he segment descript or point ed t o by t he segment select or in t he dest inat ion
operand is a code segment and it has bot h t he D- bit and t he L- bit set .
I f t he segment descript or from a 64- bit call gat e is in non- canonical space.
I f t he DPL from a 64- bit call- gat e is less t han t he CPL or t han t he RPL of t he 64- bit
call- gat e.
I f t he upper t ype field of a 64- bit call gat e is not 0x0.
I f an at t empt is made t o load a null select or in t he SS regist er in compat ibilit y
mode.
I f an at t empt is made t o load null select or in t he SS regist er in CPL3 and 64- bit
mode.
I f an at t empt is made t o load a null select or in t he SS regist er in non- CPL3 and
64- bit mode where RPL is not equal t o CPL.
I f an at t empt is made t o clear CR0. PG while I A- 32e mode is enabled.
I f an at t empt is made t o set a reserved bit in CR3, CR4 or CR8.
6-54 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 14Page-Fault Exception (#PF)
Exception Class Fault.
Description
I ndicat es t hat , wit h paging enabled ( t he PG flag in t he CR0 regist er is set ) , t he
processor det ect ed one of t he following condit ions while using t he page- t ranslat ion
mechanism t o t ranslat e a linear address t o a physical address:
The P ( present ) flag in a page- direct ory or page- t able ent ry needed for t he
address t ranslat ion is clear, indicat ing t hat a page t able or t he page cont aining
t he operand is not present in physical memory.
The procedure does not have sufficient privilege t o access t he indicat ed page
( t hat is, a procedure running in user mode at t empt s t o access a supervisor- mode
page) .
Code running in user mode at t empt s t o writ e t o a read- only page. I n t he I nt el486
and lat er processors, if t he WP flag is set in CR0, t he page fault will also be
t riggered by code running in supervisor mode t hat t ries t o writ e t o a read- only
page.
An inst ruct ion fet ch t o a linear address t hat t ranslat es t o a physical address in a
memory page wit h t he execut e- disable bit set ( for informat ion about t he
execut e- disable bit , see Chapt er 4, Paging ) .
One or more reserved bit s in page direct ory ent ry are set t o 1. See descript ion
below of RSVD error code flag.
The except ion handler can recover from page- not - present condit ions and rest art t he
program or t ask wit hout any loss of program cont inuit y. I t can also rest art t he
program or t ask aft er a privilege violat ion, but t he problem t hat caused t he privilege
violat ion may be uncorrect able.
See also: Sect ion 4.7, Page- Fault Except ions.
Exception Error Code
Yes ( special format ) . The processor provides t he page- fault handler wit h t wo it ems of
informat ion t o aid in diagnosing t he except ion and recovering from it :
An error code on t he st ack. The error code for a page fault has a format different
from t hat for ot her except ions ( see Figure 6- 9) . The error code t ells t he
except ion handler four t hings:
The P flag indicat es whet her t he except ion was due t o a not - present page ( 0)
or t o eit her an access right s violat ion or t he use of a reserved bit ( 1) .
The W/ R flag indicat es whet her t he memory access t hat caused t he except ion
was a read ( 0) or writ e ( 1) .
Vol. 3 6-55
INTERRUPT AND EXCEPTION HANDLING
The U/ S flag indicat es whet her t he processor was execut ing at user mode ( 1)
or supervisor mode ( 0) at t he t ime of t he except ion.
The RSVD flag indicat es t hat t he processor det ect ed 1s in reserved bit s of t he
page direct ory, when t he PSE or PAE flags in cont rol regist er CR4 are set t o 1.
Not e:
The PSE flag is only available in recent I nt el 64 and I A- 32 processors
including t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors.
The PAE flag is only available on recent I nt el 64 and I A- 32 processors
including t he Pent ium 4, I nt el Xeon, and P6 family processors.
I n earlier I A- 32 processors, t he bit posit ion of t he RSVD flag is reserved
and is cleared t o 0.
The I / D flag indicat es whet her t he except ion was caused by an inst ruct ion
fet ch. This flag is reserved and cleared t o 0 if CR4. PAE = 0 ( 32- bit paging is
in use) or I A32_EFER. NXE= 0 ( t he execut e- disable feat ure is eit her
unsupport ed or not enabled) . See Sect ion 4. 7, Page- Fault Except ions, for
det ails.
The cont ent s of t he CR2 regist er. The processor loads t he CR2 regist er wit h t he
32- bit linear address t hat generat ed t he except ion. The page- fault handler can
use t his address t o locat e t he corresponding page direct ory and page- t able
ent ries. Anot her page fault can pot ent ially occur during execut ion of t he page-
fault handler; t he handler should save t he cont ent s of t he CR2 regist er before a

Figure 6-9. Page-Fault Error Code
The fault was caused by a non-present page.
The fault was caused by a page-level protection violation.
The access causing the fault was a read.
The access causing the fault was a write.
The access causing the fault originated when the processor
was executing in supervisor mode.
The access causing the fault originated when the processor
was executing in user mode.
31 0
Reserved
1 2 3 4
The fault was not caused by reserved bit violation.
The fault was caused by reserved bits set to 1 in a page directory.
P 0
1
W/R 0
1
U/S
0
RSVD
0
1
1
I
/
D
I/D
0 The fault was not caused by an instruction fetch.
1 The fault was caused by an instruction fetch.
P W
/
R
U
/
S
R
S
V
D
6-56 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
second page fault can occur.
1
I f a page fault is caused by a page- level prot ect ion
violat ion, t he access flag in t he page- direct ory ent ry is set when t he fault occurs.
The behavior of I A- 32 processors regarding t he access flag in t he corresponding
page- t able ent ry is model specific and not archit ect urally defined.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers generally point t o t he inst ruct ion t hat
generat ed t he except ion. I f t he page- fault except ion occurred during a t ask swit ch,
t he CS and EI P regist ers may point t o t he first inst ruct ion of t he new t ask ( as
described in t he following Program St at e Change sect ion) .
Program State Change
A program- st at e change does not normally accompany a page- fault except ion,
because t he inst ruct ion t hat causes t he except ion t o be generat ed is not execut ed.
Aft er t he page- fault except ion handler has correct ed t he violat ion ( for example,
loaded t he missing page int o memory) , execut ion of t he program or t ask can be
resumed.
When a page- fault except ion is generat ed during a t ask swit ch, t he program- st at e
may change, as follows. During a t ask swit ch, a page- fault except ion can occur
during any of following operat ions:
While writ ing t he st at e of t he original t ask int o t he TSS of t hat t ask.
While reading t he GDT t o locat e t he TSS descript or of t he new t ask.
While reading t he TSS of t he new t ask.
While reading segment descript ors associat ed wit h segment select ors from t he
new t ask.
While reading t he LDT of t he new t ask t o verify t he segment regist ers st ored in
t he new TSS.
I n t he last t wo cases t he except ion occurs in t he cont ext of t he new t ask. The inst ruc-
t ion point er refers t o t he first inst ruct ion of t he new t ask, not t o t he inst ruct ion which
caused t he t ask swit ch ( or t he last inst ruct ion t o be execut ed, in t he case of an int er-
rupt ) . I f t he design of t he operat ing syst em permit s page fault s t o occur during t ask-
swit ches, t he page- fault handler should be called t hrough a t ask gat e.
I f a page fault occurs during a t ask swit ch, t he processor will load all t he st at e infor-
mat ion from t he new TSS ( wit hout performing any addit ional limit , present , or t ype
checks) before it generat es t he except ion. The page- fault handler should t hus not
rely on being able t o use t he segment select ors found in t he CS, SS, DS, ES, FS, and
GS regist ers wit hout causing anot her except ion. ( See t he Program St at e Change
1. Processors update CR2 whenever a page fault is detected. If a second page fault occurs while an
earlier page fault is being delivered, the faulting linear address of the second fault will overwrite
the contents of CR2 (replacing the previous address). These updates to CR2 occur even if the
page fault results in a double fault or occurs during the delivery of a double fault.
Vol. 3 6-57
INTERRUPT AND EXCEPTION HANDLING
descript ion for I nt errupt 10I nvalid TSS Except ion ( # TS) in t his chapt er for addi-
t ional informat ion on how t o handle t his sit uat ion. )
Additional Exception-Handling Information
Special care should be t aken t o ensure t hat an except ion t hat occurs during an
explicit st ack swit ch does not cause t he processor t o use an invalid st ack point er
( SS: ESP) . Soft ware writ t en for 16- bit I A- 32 processors oft en use a pair of inst ruc-
t ions t o change t o a new st ack, for example:
MOV SS, AX
MOV SP, StackTop
When execut ing t his code on one of t he 32- bit I A- 32 processors, it is possible t o get
a page fault , general- prot ect ion fault ( # GP) , or alignment check fault ( # AC) aft er t he
segment select or has been loaded int o t he SS regist er but before t he ESP regist er
has been loaded. At t his point , t he t wo part s of t he st ack point er ( SS and ESP) are
inconsist ent . The new st ack segment is being used wit h t he old st ack point er.
The processor does not use t he inconsist ent st ack point er if t he except ion handler
swit ches t o a well defined st ack ( t hat is, t he handler is a t ask or a more privileged
procedure) . However, if t he except ion handler is called at t he same privilege level
and from t he same t ask, t he processor will at t empt t o use t he inconsist ent st ack
point er.
I n syst ems t hat handle page- fault , general- prot ect ion, or alignment check excep-
t ions wit hin t he fault ing t ask ( wit h t rap or int errupt gat es) , soft ware execut ing at t he
same privilege level as t he except ion handler should init ialize a new st ack by using
t he LSS inst ruct ion rat her t han a pair of MOV inst ruct ions, as described earlier in t his
not e. When t he except ion handler is running at privilege level 0 ( t he normal case) ,
t he problem is limit ed t o procedures or t asks t hat run at privilege level 0, t ypically
t he kernel of t he operat ing syst em.
6-58 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 16x87 FPU Floating-Point Error (#MF)
Exception Class Fault.
Description
I ndicat es t hat t he x87 FPU has det ect ed a float ing- point error. The NE flag in t he
regist er CR0 must be set for an int errupt 16 ( float ing- point error except ion) t o be
generat ed. ( See Sect ion 2. 5, Cont rol Regist ers, for a det ailed descript ion of t he NE
flag. )
NOTE
SI MD float ing- point except ions ( # XM) are signaled t hrough int errupt
19.
While execut ing x87 FPU inst ruct ions, t he x87 FPU det ect s and report s six t ypes of
float ing- point error condit ions:
I nvalid operat ion ( # I )
St ack overflow or underflow ( # I S)
I nvalid arit hmet ic operat ion ( # I A)
Divide- by- zero ( # Z)
Denormalized operand ( # D)
Numeric overflow ( # O)
Numeric underflow ( # U)
I nexact result ( precision) ( # P)
Each of t hese error condit ions represent s an x87 FPU except ion t ype, and for each of
except ion t ype, t he x87 FPU provides a flag in t he x87 FPU st at us regist er and a mask
bit in t he x87 FPU cont rol regist er. I f t he x87 FPU det ect s a float ing- point error and
t he mask bit for t he except ion t ype is set , t he x87 FPU handles t he except ion aut o-
mat ically by generat ing a predefined ( default ) response and cont inuing program
execut ion. The default responses have been designed t o provide a reasonable result
for most float ing- point applicat ions.
I f t he mask for t he except ion is clear and t he NE flag in regist er CR0 is set , t he x87
FPU does t he following:
1. Set s t he necessary flag in t he FPU st at us regist er.
2. Wait s unt il t he next wait ing x87 FPU inst ruct ion or WAI T/ FWAI T inst ruct ion is
encount ered in t he programs inst ruct ion st ream.
3. Generat es an int ernal error signal t hat cause t he processor t o generat e a
float ing- point except ion ( # MF) .
Vol. 3 6-59
INTERRUPT AND EXCEPTION HANDLING
Prior t o execut ing a wait ing x87 FPU inst ruct ion or t he WAI T/ FWAI T inst ruct ion, t he
x87 FPU checks for pending x87 FPU float ing- point except ions ( as described in st ep 2
above) . Pending x87 FPU float ing- point except ions are ignored for non- wait ing x87
FPU inst ruct ions, which include t he FNI NI T, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW,
FNSTENV, and FNSAVE inst ruct ions. Pending x87 FPU except ions are also ignored
when execut ing t he st at e management inst ruct ions FXSAVE and FXRSTOR.
All of t he x87 FPU float ing- point error condit ions can be recovered from. The x87 FPU
float ing- point - error except ion handler can det ermine t he error condit ion t hat caused
t he except ion from t he set t ings of t he flags in t he x87 FPU st at us word. See Soft -
ware Except ion Handling in Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft -
ware Developers Manual, Volume 1, for more informat ion on handling x87 FPU
float ing- point except ions.
Exception Error Code
None. The x87 FPU provides it s own error informat ion.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he float ing- point or WAI T/ FWAI T
inst ruct ion t hat was about t o be execut ed when t he float ing- point - error except ion
was generat ed. This is not t he fault ing inst ruct ion in which t he error condit ion was
det ect ed. The address of t he fault ing inst ruct ion is cont ained in t he x87 FPU inst ruc-
t ion point er regist er. See x87 FPU I nst ruct ion and Operand ( Dat a) Point ers in
Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 1, for more informat ion about informat ion t he FPU saves for use in handling
float ing- point - error except ions.
Program State Change
A program- st at e change generally accompanies an x87 FPU float ing- point except ion
because t he handling of t he except ion is delayed unt il t he next wait ing x87 FPU
float ing- point or WAI T/ FWAI T inst ruct ion following t he fault ing inst ruct ion. The x87
FPU, however, saves sufficient informat ion about t he error condit ion t o allow
recovery from t he error and re- execut ion of t he fault ing inst ruct ion if needed.
I n sit uat ions where non- x87 FPU float ing- point inst ruct ions depend on t he result s of
an x87 FPU float ing- point inst ruct ion, a WAI T or FWAI T inst ruct ion can be insert ed in
front of a dependent inst ruct ion t o force a pending x87 FPU float ing- point except ion
t o be handled before t he dependent inst ruct ion is execut ed. See x87 FPU Except ion
Synchronizat ion in Chapt er 8 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1, for more informat ion about synchronizat ion of x87
float ing- point - error except ions.
6-60 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 17Alignment Check Exception (#AC)
Exception Class Fault.
Description
I ndicat es t hat t he processor det ect ed an unaligned memory operand when alignment
checking was enabled. Alignment checks are only carried out in dat a ( or st ack)
accesses ( not in code fet ches or syst em segment accesses) . An example of an align-
ment - check violat ion is a word st ored at an odd byt e address, or a doubleword st ored
at an address t hat is not an int eger mult iple of 4. Table 6- 7 list s t he alignment
requirement s various dat a t ypes recognized by t he processor.
Not e t hat t he alignment check except ion ( # AC) is generat ed only for dat a t ypes t hat
must be aligned on word, doubleword, and quadword boundaries. A general- prot ec-
t ion except ion ( # GP) is generat ed 128- bit dat a t ypes t hat are not aligned on a
16- byt e boundary.
To enable alignment checking, t he following condit ions must be t rue:
AM flag in CR0 regist er is set .
Table 6-7. Alignment Requirements by Data Type
Data Type Address Must Be Divisible By
Word 2
Doubleword 4
Single-precision floating-point (32-bits) 4
Double-precision floating-point (64-bits) 8
Double extended-precision floating-point (80-
bits)
8
Quadword 8
Double quadword 16
Segment Selector 2
32-bit Far Pointer 2
48-bit Far Pointer 4
32-bit Pointer 4
GDTR, IDTR, LDTR, or Task Register Contents 4
FSTENV/FLDENV Save Area 4 or 2, depending on operand size
FSAVE/FRSTOR Save Area 4 or 2, depending on operand size
Bit String 2 or 4 depending on the operand-size attribute.
Vol. 3 6-61
INTERRUPT AND EXCEPTION HANDLING
AC flag in t he EFLAGS regist er is set .
The CPL is 3 ( prot ect ed mode or virt ual- 8086 mode) .
Alignment - check except ions ( # AC) are generat ed only when operat ing at privilege
level 3 ( user mode) . Memory references t hat default t o privilege level 0, such as
segment descript or loads, do not generat e alignment - check except ions, even when
caused by a memory reference made from privilege level 3.
St oring t he cont ent s of t he GDTR, I DTR, LDTR, or t ask regist er in memory while at
privilege level 3 can generat e an alignment - check except ion. Alt hough applicat ion
programs do not normally st ore t hese regist ers, t he fault can be avoided by aligning
t he informat ion st ored on an even word- address.
The FXSAVE and FXRSTOR inst ruct ions save and rest ore a 512- byt e dat a st ruct ure,
t he first byt e of which must be aligned on a 16- byt e boundary. I f t he alignment - check
except ion ( # AC) is enabled when execut ing t hese inst ruct ions ( and CPL is 3) , a
misaligned memory operand can cause eit her an alignment - check except ion or a
general- prot ect ion except ion ( # GP) depending on t he processor implement at ion
( see FXSAVE- Save x87 FPU, MMX, SSE, and SSE2 St at e and FXRSTOR- Rest or e
x87 FPU, MMX, SSE, and SSE2 St at e in Chapt er 3 of t he I nt el 64 and I A- 32 Ar chi-
t ect ur es Soft war e Developer s Manual, Volume 2A) .
The MOVUPS and MOVUPD inst ruct ions perform 128- bit unaligned loads or st ores.
The LDDQU inst ruct ions loads 128- bit unaligned dat a. They do not generat e general-
prot ect ion except ions ( # GP) when operands are not aligned on a 16- byt e boundary.
I f alignment checking is enabled, alignment - check except ions ( # AC) may or may not
be generat ed depending on processor implement at ion when dat a addresses are not
aligned on an 8- byt e boundary.
FSAVE and FRSTOR inst ruct ions can generat e unaligned references, which can cause
alignment - check fault s. These inst ruct ions are rarely needed by applicat ion
programs.
Exception Error Code
Yes ( always zero) .
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat generat ed t he
except ion.
Program State Change
A program- st at e change does not accompany an alignment - check fault , because t he
inst ruct ion is not execut ed.
6-62 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 18Machine-Check Exception (#MC)
Exception Class Abort.
Description
I ndicat es t hat t he processor det ect ed an int ernal machine error or a bus error, or t hat
an ext ernal agent det ect ed a bus error. The machine- check except ion is model-
specific, available on t he Pent ium and lat er generat ions of processors. The imple-
ment at ion of t he machine- check except ion is different bet ween different processor
families, and t hese implement at ions may not be compat ible wit h fut ure I nt el 64 or
I A- 32 processors. ( Use t he CPUI D inst ruct ion t o det ermine whet her t his feat ure is
present . )
Bus errors det ect ed by ext ernal agent s are signaled t o t he processor on dedicat ed
pins: t he BI NI T# and MCERR# pins on t he Pent ium 4, I nt el Xeon, and P6 family
processors and t he BUSCHK# pin on t he Pent ium processor. When one of t hese pins
is enabled, assert ing t he pin causes error informat ion t o be loaded int o machine-
check regist ers and a machine- check except ion is generat ed.
The machine- check except ion and machine- check archit ect ure are discussed in det ail
in Chapt er 15, Machine- Check Archit ect ure. Also, see t he dat a books for t he indi-
vidual processors for processor- specific hardware informat ion.
Exception Error Code
None. Error informat ion is provide by machine- check MSRs.
Saved Instruction Pointer
For t he Pent ium 4 and I nt el Xeon processors, t he saved cont ent s of ext ended
machine- check st at e regist ers are direct ly associat ed wit h t he error t hat caused t he
machine- check except ion t o be generat ed ( see Sect ion 15. 3. 1. 2,
I A32_MCG_STATUS MSR, and Sect ion 15. 3. 2. 6, I A32_MCG Ext ended Machine
Check St at e MSRs ) .
For t he P6 family processors, if t he EI PV flag in t he MCG_STATUS MSR is set , t he
saved cont ent s of CS and EI P regist ers are direct ly associat ed wit h t he error t hat
caused t he machine- check except ion t o be generat ed; if t he flag is clear, t he saved
inst ruct ion point er may not be associat ed wit h t he error ( see Sect ion 15. 3. 1.2,
I A32_MCG_STATUS MSR ) .
For t he Pent ium processor, cont ent s of t he CS and EI P regist ers may not be associ-
at ed wit h t he error.
Program State Change
The machine- check mechanism is enabled by set t ing t he MCE flag in cont rol regist er
CR4.
Vol. 3 6-63
INTERRUPT AND EXCEPTION HANDLING
For t he Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors, a program- st at e
change always accompanies a machine- check except ion, and an abort class excep-
t ion is generat ed. For abort except ions, informat ion about t he except ion can be
collect ed from t he machine- check MSRs, but t he program cannot generally be
rest art ed.
I f t he machine- check mechanism is not enabled ( t he MCE flag in cont rol regist er CR4
is clear) , a machine- check except ion causes t he processor t o ent er t he shut down
st at e.
6-64 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Interrupt 19SIMD Floating-Point Exception (#XM)
Exception Class Fault.
Description
I ndicat es t he processor has det ect ed an SSE/ SSE2/ SSE3 SI MD float ing- point excep-
t ion. The appropriat e st at us flag in t he MXCSR regist er must be set and t he part icular
except ion unmasked for t his int errupt t o be generat ed.
There are six classes of numeric except ion condit ions t hat can occur while execut ing
an SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ion:
I nvalid operat ion ( # I )
Divide- by- zero ( # Z)
Denormal operand ( # D)
Numeric overflow ( # O)
Numeric underflow ( # U)
I nexact result ( Precision) ( # P)
The invalid operat ion, divide- by- zero, and denormal- operand except ions are pre-
comput at ion except ions; t hat is, t hey are det ect ed before any arit hmet ic operat ion
occurs. The numeric underflow, numeric overflow, and inexact result except ions are
post - comput at ional except ions.
See "SI MD Float ing- Point Except ions" in Chapt er 11 of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1, for addit ional informat ion
about t he SI MD float ing- point except ion classes.
When a SI MD float ing- point except ion occurs, t he processor does eit her of t he
following t hings:
I t handles t he except ion aut omat ically by producing t he most reasonable result
and allowing program execut ion t o cont inue undist urbed. This is t he response t o
masked except ions.
I t generat es a SI MD float ing- point except ion, which in t urn invokes a soft ware
except ion handler. This is t he response t o unmasked except ions.
Each of t he six SI MD float ing- point except ion condit ions has a corresponding flag bit
and mask bit in t he MXCSR regist er. I f an except ion is masked ( t he corresponding
mask bit in t he MXCSR regist er is set ) , t he processor t akes an appropriat e aut omat ic
default act ion and cont inues wit h t he comput at ion. I f t he except ion is unmasked ( t he
corresponding mask bit is clear) and t he operat ing syst em support s SI MD float ing-
point except ions ( t he OSXMMEXCPT flag in cont rol regist er CR4 is set ) , a soft ware
except ion handler is invoked t hrough a SI MD float ing- point except ion. I f t he excep-
t ion is unmasked and t he OSXMMEXCPT bit is clear ( indicat ing t hat t he operat ing
syst em does not support unmasked SI MD float ing- point except ions) , an invalid
opcode except ion ( # UD) is signaled inst ead of a SI MD float ing- point except ion.
Vol. 3 6-65
INTERRUPT AND EXCEPTION HANDLING
Not e t hat because SI MD float ing- point except ions are precise and occur immediat ely,
t he sit uat ion does not arise where an x87 FPU inst ruct ion, a WAI T/ FWAI T inst ruct ion,
or anot her SSE/ SSE2/ SSE3 inst ruct ion will cat ch a pending unmasked SI MD float ing-
point except ion.
I n sit uat ions where a SI MD float ing- point except ion occurred while t he SI MD
float ing- point except ions were masked ( causing t he corresponding except ion flag t o
be set ) and t he SI MD float ing- point except ion was subsequent ly unmasked, t hen no
except ion is generat ed when t he except ion is unmasked.
When SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ions operat e on packed operands
( made up of t wo or four sub- operands) , mult iple SI MD float ing- point except ion
condit ions may be det ect ed. I f no more t han one except ion condit ion is det ect ed for
one or more set s of sub- operands, t he except ion flags are set for each except ion
condit ion det ect ed. For example, an invalid except ion det ect ed for one sub- operand
will not prevent t he report ing of a divide- by- zero except ion for anot her sub- operand.
However, when t wo or more except ions condit ions are generat ed for one sub-
operand, only one except ion condit ion is report ed, according t o t he precedences
shown in Table 6- 8. This except ion precedence somet imes result s in t he higher
priorit y except ion condit ion being report ed and t he lower priorit y except ion condi-
t ions being ignored.
Exception Error Code
None.
Table 6-8. SIMD Floating-Point Exceptions Priority
Priority Description
1 (Highest) Invalid operation exception due to SNaN operand (or any NaN operand for
maximum, minimum, or certain compare and convert operations).
2 QNaN operand
1
.
3 Any other invalid operation exception not mentioned above or a divide-by-zero
exception
2
.
4 Denormal operand exception
2
.
5 Numeric overflow and underflow exceptions possibly in conjunction with the
inexact result exception
2
.
6 (Lowest) Inexact result exception.
NOTES:
1. Though a QNaN this is not an exception, the handling of a QNaN operand has precedence over
lower priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a divide-
by-zero- exception.
2. If masked, then instruction execution continues, and a lower priority exception can occur as
well.
6-66 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he SSE/ SSE2/ SSE3 inst ruct ion
t hat was execut ed when t he SI MD float ing- point except ion was generat ed. This is t he
fault ing inst ruct ion in which t he error condit ion was det ect ed.
Program State Change
A program- st at e change does not accompany a SI MD float ing- point except ion
because t he handling of t he except ion is immediat e unless t he part icular except ion is
masked. The available st at e informat ion is oft en sufficient t o allow recovery from t he
error and re- execut ion of t he fault ing inst ruct ion if needed.
Vol. 3 6-67
INTERRUPT AND EXCEPTION HANDLING
Interrupts 32 to 255User Defined Interrupts
Exception Class Not applicable.
Description
I ndicat es t hat t he processor did one of t he following t hings:
Execut ed an I NT n inst ruct ion where t he inst ruct ion operand is one of t he vect or
numbers from 32 t hrough 255.
Responded t o an int errupt request at t he I NTR pin or from t he local API C when
t he int errupt vect or number associat ed wit h t he request is from 32 t hrough 255.
Exception Error Code
Not applicable.
Saved Instruction Pointer
The saved cont ent s of CS and EI P regist ers point t o t he inst ruct ion t hat follows t he
I NT n inst ruct ion or inst ruct ion following t he inst ruct ion on which t he I NTR signal
occurred.
Program State Change
A program- st at e change does not accompany int errupt s generat ed by t he I NT n
inst ruct ion or t he I NTR signal. The I NT n inst ruct ion generat es t he int errupt wit hin
t he inst ruct ion st ream. When t he processor receives an I NTR signal, it commit s all
st at e changes for all previous inst ruct ions before it responds t o t he int errupt ; so,
program execut ion can resume upon ret urning from t he int errupt handler.
6-68 Vol. 3
INTERRUPT AND EXCEPTION HANDLING
Vol. 3 7-1
CHAPTER 7
TASK MANAGEMENT
This chapt er describes t he I A- 32 archit ect ures t ask management facilit ies. These
facilit ies are only available when t he processor is running in prot ect ed mode.
This chapt er focuses on 32- bit t asks and t he 32- bit TSS st ruct ure. For informat ion on
16- bit t asks and t he 16- bit TSS st ruct ure, see Sect ion 7. 6, 16- Bit Task- St at e
Segment ( TSS) . For informat ion specific t o t ask management in 64- bit mode, see
Sect ion 7. 7, Task Management in 64- bit Mode.
7.1 TASK MANAGEMENT OVERVIEW
A t ask is a unit of work t hat a processor can dispat ch, execut e, and suspend. I t can
be used t o execut e a program, a t ask or process, an operat ing- syst em service ut ilit y,
an int errupt or except ion handler, or a kernel or execut ive ut ilit y.
The I A- 32 archit ect ure provides a mechanism for saving t he st at e of a t ask, for
dispat ching t asks for execut ion, and for swit ching from one t ask t o anot her. When
operat ing in prot ect ed mode, all processor execut ion t akes place from wit hin a t ask.
Even simple syst ems must define at least one t ask. More complex syst ems can use
t he processor s t ask management facilit ies t o support mult it asking applicat ions.
7.1.1 Task Structure
A t ask is made up of t wo part s: a t ask execut ion space and a t ask- st at e segment
( TSS) . The t ask execut ion space consist s of a code segment , a st ack segment , and
one or more dat a segment s ( see Figure 7- 1) . I f an operat ing syst em or execut ive
uses t he processor s privilege- level prot ect ion mechanism, t he t ask execut ion space
also provides a separat e st ack for each privilege level.
The TSS specifies t he segment s t hat make up t he t ask execut ion space and provides
a st orage place for t ask st at e informat ion. I n mult it asking syst ems, t he TSS also
provides a mechanism for linking t asks.
A t ask is ident ified by t he segment select or for it s TSS. When a t ask is loaded int o t he
processor for execut ion, t he segment select or, base address, limit , and segment
descript or at t ribut es for t he TSS are loaded int o t he t ask regist er ( see Sect ion 2. 4. 4,
Task Regist er ( TR) ) .
I f paging is implement ed for t he t ask, t he base address of t he page direct ory used by
t he t ask is loaded int o cont rol regist er CR3.
7-2 Vol. 3
TASK MANAGEMENT
7.1.2 Task State
The following it ems define t he st at e of t he current ly execut ing t ask:
The t asks current execut ion space, defined by t he segment select ors in t he
segment regist ers ( CS, DS, SS, ES, FS, and GS) .
The st at e of t he general- purpose regist ers.
The st at e of t he EFLAGS regist er.
The st at e of t he EI P regist er.
The st at e of cont rol regist er CR3.
The st at e of t he t ask regist er.
The st at e of t he LDTR regist er.
The I / O map base address and I / O map ( cont ained in t he TSS) .
St ack point ers t o t he privilege 0, 1, and 2 st acks ( cont ained in t he TSS) .
Link t o previously execut ed t ask ( cont ained in t he TSS) .
Prior t o dispat ching a t ask, all of t hese it ems are cont ained in t he t asks TSS, except
t he st at e of t he t ask regist er. Also, t he complet e cont ent s of t he LDTR regist er are not
cont ained in t he TSS, only t he segment select or for t he LDT.
Figure 7-1. Structure of a Task
Code
Segment
Stack
Segment
(Current Priv.
Data
Segment
Stack Seg.
Priv. Level 0
Stack Seg.
Priv. Level 1
Stack
Segment
(Priv. Level 2)
Task-State
Segment
(TSS)
Task Register
CR3
Level)
Vol. 3 7-3
TASK MANAGEMENT
7.1.3 Executing a Task
Soft ware or t he processor can dispat ch a t ask for execut ion in one of t he following
ways:
A explicit call t o a t ask wit h t he CALL inst ruct ion.
A explicit j ump t o a t ask wit h t he JMP inst ruct ion.
An implicit call ( by t he processor) t o an int errupt - handler t ask.
An implicit call t o an except ion- handler t ask.
A ret urn ( init iat ed wit h an I RET inst ruct ion) when t he NT flag in t he EFLAGS
regist er is set .
All of t hese met hods for dispat ching a t ask ident ify t he t ask t o be dispat ched wit h a
segment select or t hat point s t o a t ask gat e or t he TSS for t he t ask. When dispat ching
a t ask wit h a CALL or JMP inst ruct ion, t he select or in t he inst ruct ion may select t he
TSS direct ly or a t ask gat e t hat holds t he select or for t he TSS. When dispat ching a
t ask t o handle an int errupt or except ion, t he I DT ent ry for t he int errupt or except ion
must cont ain a t ask gat e t hat holds t he select or for t he int errupt - or except ion-
handler TSS.
When a t ask is dispat ched for execut ion, a t ask swit ch occurs bet ween t he current ly
running t ask and t he dispat ched t ask. During a t ask swit ch, t he execut ion environ-
ment of t he current ly execut ing t ask ( called t he t asks st at e or cont ex t ) is saved in
it s TSS and execut ion of t he t ask is suspended. The cont ext for t he dispat ched t ask is
t hen loaded int o t he processor and execut ion of t hat t ask begins wit h t he inst ruct ion
point ed t o by t he newly loaded EI P regist er. I f t he t ask has not been run since t he
syst em was last init ialized, t he EI P will point t o t he first inst ruct ion of t he t asks code;
ot herwise, it will point t o t he next inst ruct ion aft er t he last inst ruct ion t hat t he t ask
execut ed when it was last act ive.
I f t he current ly execut ing t ask ( t he calling t ask) called t he t ask being dispat ched ( t he
called t ask) , t he TSS segment select or for t he calling t ask is st ored in t he TSS of t he
called t ask t o provide a link back t o t he calling t ask.
For all I A- 32 processors, t asks are not recursive. A t ask cannot call or j ump t o it self.
I nt errupt s and except ions can be handled wit h a t ask swit ch t o a handler t ask. Here,
t he processor performs a t ask swit ch t o handle t he int errupt or except ion and aut o-
mat ically swit ches back t o t he int errupt ed t ask upon ret urning from t he int errupt -
handler t ask or except ion- handler t ask. This mechanism can also handle int errupt s
t hat occur during int errupt t asks.
As part of a t ask swit ch, t he processor can also swit ch t o anot her LDT, allowing each
t ask t o have a different logical- t o- physical address mapping for LDT- based segment s.
The page- direct ory base regist er ( CR3) also is reloaded on a t ask swit ch, allowing
each t ask t o have it s own set of page t ables. These prot ect ion facilit ies help isolat e
t asks and prevent t hem from int erfering wit h one anot her.
I f prot ect ion mechanisms are not used, t he processor provides no prot ect ion
bet ween t asks. This is t rue even wit h operat ing syst ems t hat use mult iple privilege
levels for prot ect ion. A t ask running at privilege level 3 t hat uses t he same LDT and
7-4 Vol. 3
TASK MANAGEMENT
page t ables as ot her privilege- level- 3 t asks can access code and corrupt dat a and t he
st ack of ot her t asks.
Use of t ask management facilit ies for handling mult it asking applicat ions is opt ional.
Mult it asking can be handled in soft ware, wit h each soft ware defined t ask execut ed in
t he cont ext of a single I A- 32 archit ect ure t ask.
7.2 TASK MANAGEMENT DATA STRUCTURES
The processor defines five dat a st ruct ures for handling t ask- relat ed act ivit ies:
Task- st at e segment ( TSS) .
Task- gat e descript or.
TSS descript or.
Task regist er.
NT flag in t he EFLAGS regist er.
When operat ing in prot ect ed mode, a TSS and TSS descript or must be creat ed for at
least one t ask, and t he segment select or for t he TSS must be loaded int o t he t ask
regist er ( using t he LTR inst ruct ion) .
7.2.1 Task-State Segment (TSS)
The processor st at e informat ion needed t o rest ore a t ask is saved in a syst em
segment called t he t ask- st at e segment ( TSS) . Figure 7- 2 shows t he format of a TSS
for t asks designed for 32- bit CPUs. The fields of a TSS are divided int o t wo main cat e-
gories: dynamic fields and st at ic fields.
For informat ion about 16- bit I nt el 286 processor t ask st ruct ures, see Sect ion 7. 6,
16- Bit Task- St at e Segment ( TSS) . For informat ion about 64- bit mode t ask st ruc-
t ures, see Sect ion 7. 7, Task Management in 64- bit Mode.
Vol. 3 7-5
TASK MANAGEMENT
The processor updat es dynamic fields when a t ask is suspended during a t ask swit ch.
The following are dynamic fields:
Gener al - pur pose r egi st er f i el ds St at e of t he EAX, ECX, EDX, EBX, ESP, EBP,
ESI , and EDI regist ers prior t o t he t ask swit ch.
Segment sel ect or f i el ds Segment select ors st ored in t he ES, CS, SS, DS, FS,
and GS regist ers prior t o t he t ask swit ch.
EFLAGS r egi st er f i el d St at e of t he EFAGS regist er prior t o t he t ask swit ch.
Figure 7-2. 32-Bit Task-State Segment (TSS)
0 31
100
96
92
88
84
80
76
I/O Map Base Address
15
LDT Segment Selector
GS
FS
DS
SS
CS
72
68
64
60
56
52
48
44
40
36
32
28
24
20
SS2
16
12
8
4
0
SS1
SS0
ESP0
Previous Task Link
ESP1
ESP2
CR3 (PDBR)
T
ES
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
EFLAGS
EIP
Reserved bits. Set to 0.
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
7-6 Vol. 3
TASK MANAGEMENT
EI P ( i nst r uct i on poi nt er ) f i el d St at e of t he EI P regist er prior t o t he t ask
swit ch.
Pr ev i ous t ask l i nk f i el d Cont ains t he segment select or for t he TSS of t he
previous t ask ( updat ed on a t ask swit ch t hat was init iat ed by a call, int errupt , or
except ion) . This field ( which is somet imes called t he back link field) permit s a
t ask swit ch back t o t he previous t ask by using t he I RET inst ruct ion.
The processor reads t he st at ic fields, but does not normally change t hem. These
fields are set up when a t ask is creat ed. The following are st at ic fields:
LDT segment sel ect or f i el d Cont ains t he segment select or for t he t ask' s
LDT.
CR3 cont r ol r egi st er f i el d Cont ains t he base physical address of t he page
direct ory t o be used by t he t ask. Cont rol regist er CR3 is also known as t he page-
direct ory base regist er ( PDBR) .
Pr i v i l ege l ev el - 0, - 1, and - 2 st ack poi nt er f i el ds These st ack point ers
consist of a logical address made up of t he segment select or for t he st ack
segment ( SS0, SS1, and SS2) and an offset int o t he st ack ( ESP0, ESP1, and
ESP2) . Not e t hat t he values in t hese fields are st at ic for a part icular t ask;
whereas, t he SS and ESP values will change if st ack swit ching occurs wit hin t he
t ask.
T ( debug t r ap) f l ag ( by t e 100, bi t 0) When set , t he T flag causes t he
processor t o raise a debug except ion when a t ask swit ch t o t his t ask occurs ( see
Sect ion 16. 3. 1. 5, Task- Swit ch Except ion Condit ion ) .
I / O map base addr ess f i el d Cont ains a 16- bit offset from t he base of t he
TSS t o t he I / O permission bit map and int errupt redirect ion bit map. When
present , t hese maps are st ored in t he TSS at higher addresses. The I / O map base
address point s t o t he beginning of t he I / O permission bit map and t he end of t he
int errupt redirect ion bit map. See Chapt er 13, I nput / Out put , in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1, for more
informat ion about t he I / O permission bit map. See Sect ion 17. 3, I nt errupt and
Except ion Handling in Virt ual- 8086 Mode, for a det ailed descript ion of t he
int errupt redirect ion bit map.
I f paging is used:
Avoid placing a page boundary in t he part of t he TSS t hat t he processor reads
during a t ask swit ch ( t he first 104 byt es) . The processor may not correct ly
perform address t ranslat ions if a boundary occurs in t his area. During a t ask
swit ch, t he processor reads and writ es int o t he first 104 byt es of each TSS ( using
cont iguous physical addresses beginning wit h t he physical address of t he first
byt e of t he TSS) . So, aft er TSS access begins, if part of t he 104 byt es is not
physically cont iguous, t he processor will access incorrect informat ion wit hout
generat ing a page- fault except ion.
Pages corresponding t o t he previous t asks TSS, t he current t asks TSS, and t he
descript or t able ent ries for each all should be marked as read/ writ e.
Vol. 3 7-7
TASK MANAGEMENT
Task swit ches are carried out fast er if t he pages cont aining t hese st ruct ures are
present in memory before t he t ask swit ch is init iat ed.
7.2.2 TSS Descriptor
The TSS, like all ot her segment s, is defined by a segment descript or. Figure 7- 3
shows t he format of a TSS descript or. TSS descript ors may only be placed in t he GDT;
t hey cannot be placed in an LDT or t he I DT.
An at t empt t o access a TSS using a segment select or wit h it s TI flag set ( which indi-
cat es t he current LDT) causes a general- prot ect ion except ion ( # GP) t o be generat ed
during CALLs and JMPs; it causes an invalid TSS except ion ( # TS) during I RETs. A
general- prot ect ion except ion is also generat ed if an at t empt is made t o load a
segment select or for a TSS int o a segment regist er.
The busy flag ( B) in t he t ype field indicat es whet her t he t ask is busy. A busy t ask is
current ly running or suspended. A t ype field wit h a value of 1001B indicat es an inac-
t ive t ask; a value of 1011B indicat es a busy t ask. Tasks are not recursive. The
processor uses t he busy flag t o det ect an at t empt t o call a t ask whose execut ion has
been int errupt ed. To insure t hat t here is only one busy flag is associat ed wit h a t ask,
each TSS should have only one TSS descript or t hat point s t o it .
The base, limit , and DPL fields and t he granularit y and present flags have funct ions
similar t o t heir use in dat a- segment descript ors ( see Sect ion 3. 4. 5, Segment
Descript ors ) . When t he G flag is 0 in a TSS descript or for a 32- bit TSS, t he limit field
must have a value equal t o or great er t han 67H, one byt e less t han t he minimum size
Figure 7-3. TSS Descriptor
31 24 23 22 2120 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
0
0
31 16 15 0
Base Address 15:00 Segment Limit 15:00
Base 23:16
A
V
L
Limit
19:16
0
1 B 0 1
TSS Descriptor
AVL
B
BASE
DPL
G
Available for use by system software
Busy flag
Segment Base Address
Descriptor Privilege Level
Granularity
LIMIT
P
TYPE
Segment Limit
Segment Present
Segment Type
0
4
7-8 Vol. 3
TASK MANAGEMENT
of a TSS. At t empt ing t o swit ch t o a t ask whose TSS descript or has a limit less t han
67H generat es an invalid-TSS except ion ( # TS) . A larger limit is required if an I / O
permission bit map is included or if t he operat ing syst em st ores addit ional dat a. The
processor does not check for a limit great er t han 67H on a t ask swit ch; however, it
does check when accessing t he I / O permission bit map or int errupt redirect ion bit
map.
Any program or procedure wit h access t o a TSS descript or ( t hat is, whose CPL is
numerically equal t o or less t han t he DPL of t he TSS descript or) can dispat ch t he t ask
wit h a call or a j ump.
I n most syst ems, t he DPLs of TSS descript ors are set t o values less t han 3, so t hat
only privileged soft ware can perform t ask swit ching. However, in mult it asking appli-
cat ions, DPLs for some TSS descript ors may be set t o 3 t o allow t ask swit ching at t he
applicat ion ( or user) privilege level.
7.2.3 TSS Descriptor in 64-bit mode
I n 64- bit mode, t ask swit ching is not support ed, but TSS descript ors st ill exist . The
format of a 64- bit TSS is described in Sect ion 7.7.
I n 64- bit mode, t he TSS descript or is expanded t o 16 byt es ( see Figure 7- 4) . This
expansion also applies t o an LDT descript or in 64- bit mode. Table 3- 2 provides t he
encoding informat ion for t he segment t ype field.
Vol. 3 7-9
TASK MANAGEMENT
7.2.4 Task Register
The t ask regist er holds t he 16- bit segment select or and t he ent ire segment
descript or ( 32- bit base address, 16- bit segment limit , and descript or at t ribut es) for
t he TSS of t he current t ask ( see Figure 2- 5) . This informat ion is copied from t he TSS
descript or in t he GDT for t he current t ask. Figure 7- 5 shows t he pat h t he processor
uses t o access t he TSS ( using t he informat ion in t he t ask regist er) .
The t ask regist er has a visible part ( t hat can be read and changed by soft ware) and
an invisible part ( maint ained by t he processor and is inaccessible by soft ware) . The
segment select or in t he visible port ion point s t o a TSS descript or in t he GDT. The
processor uses t he invisible port ion of t he t ask regist er t o cache t he segment
descript or for t he TSS. Caching t hese values in a regist er makes execut ion of t he t ask
more efficient . The LTR ( load t ask regist er) and STR ( st ore t ask regist er) inst ruct ions
load and read t he visible port ion of t he t ask regist er:
Figure 7-4. Format of TSS and LDT Descriptors in 64-bit Mode
31 24 23 22 2120 19 16 15 13 14 12 11 8 7 0
P Base 31:24 G
D
P
L
Type
0
0
31 16 15 0
Base Address 15:00 Segment Limit 15:00
Base 23:16
A
V
L
Limit
19:16
0
TSS (or LDT) Descriptor
AVL
B
BASE
DPL
G
Available for use by system software
Busy flag
Segment Base Address
Descriptor Privilege Level
Granularity
LIMIT
P
TYPE
Segment Limit
Segment Present
Segment Type
0
4
31 13 12 8 7 0
Reserved
31 0
Base Address 63:32
Reserved
0
8
12
7-10 Vol. 3
TASK MANAGEMENT
The LTR inst ruct ion loads a segment select or ( source operand) int o t he t ask regist er
t hat point s t o a TSS descript or in t he GDT. I t t hen loads t he invisible port ion of t he
t ask regist er wit h informat ion from t he TSS descript or. LTR is a privileged inst ruct ion
t hat may be execut ed only when t he CPL is 0. I t s used during syst em init ializat ion t o
put an init ial value in t he t ask regist er. Aft erwards, t he cont ent s of t he t ask regist er
are changed implicit ly when a t ask swit ch occurs.
The STR ( st ore t ask regist er) inst ruct ion st ores t he visible port ion of t he t ask regist er
in a general- purpose regist er or memory. This inst ruct ion can be execut ed by code
running at any privilege level in order t o ident ify t he current ly running t ask. However,
it is normally used only by operat ing syst em soft ware.
On power up or reset of t he processor, segment select or and base address are set t o
t he default value of 0; t he limit is set t o FFFFH.
Figure 7-5. Task Register
Segment Limit Selector
+
GDT
TSS Descriptor
0
Base Address
Task
Invisible Part Visible Part
TSS
Register
Vol. 3 7-11
TASK MANAGEMENT
7.2.5 Task-Gate Descriptor
A t ask- gat e descript or provides an indirect , prot ect ed reference t o a t ask ( see
Figure 7- 6) . I t can be placed in t he GDT, an LDT, or t he I DT. The TSS segment
select or field in a t ask- gat e descript or point s t o a TSS descript or in t he GDT. The RPL
in t his segment select or is not used.
The DPL of a t ask- gat e descript or cont rols access t o t he TSS descript or during a t ask
swit ch. When a program or procedure makes a call or j ump t o a t ask t hrough a t ask
gat e, t he CPL and t he RPL field of t he gat e select or point ing t o t he t ask gat e must be
less t han or equal t o t he DPL of t he t ask- gat e descript or. Not e t hat when a t ask gat e
is used, t he DPL of t he dest inat ion TSS descript or is not used.
A t ask can be accessed eit her t hrough a t ask- gat e descript or or a TSS descript or.
Bot h of t hese st ruct ures sat isfy t he following needs:
Need f or a t ask t o hav e onl y one busy f l ag Because t he busy flag for a t ask
is st ored in t he TSS descript or, each t ask should have only one TSS descript or.
There may, however, be several t ask gat es t hat reference t he same TSS
descript or.
Need t o pr ovi de sel ect i ve access t o t ask s Task gat es fill t his need, because
t hey can reside in an LDT and can have a DPL t hat is different from t he TSS
descript or' s DPL. A program or procedure t hat does not have sufficient privilege
t o access t he TSS descript or for a t ask in t he GDT ( which usually has a DPL of 0)
may be allowed access t o t he t ask t hrough a t ask gat e wit h a higher DPL. Task
gat es give t he operat ing syst em great er lat it ude for limit ing access t o specific
t asks.
Need f or an i nt er r upt or ex cept i on t o be handl ed by an i ndependent t ask
Task gat es may also reside in t he I DT, which allows int errupt s and except ions
Figure 7-6. Task-Gate Descriptor
31 16 15 13 14 12 11 8 7 0
P
D
P
L
Type
0
31 16 15 0
TSS Segment Selector
1 0 1 0
DPL
P
TYPE
Descriptor Privilege Level
Segment Present
Segment Type
4
0 Reserved
Reserved Reserved
7-12 Vol. 3
TASK MANAGEMENT
t o be handled by handler t asks. When an int errupt or except ion vect or point s t o
a t ask gat e, t he processor swit ches t o t he specified t ask.
Figure 7- 7 illust rat es how a t ask gat e in an LDT, a t ask gat e in t he GDT, and a t ask
gat e in t he I DT can all point t o t he same t ask.
7.3 TASK SWITCHING
The processor t ransfers execut ion t o anot her t ask in one of four cases:
The current program, t ask, or procedure execut es a JMP or CALL inst ruct ion t o a
TSS descript or in t he GDT.
The current program, t ask, or procedure execut es a JMP or CALL inst ruct ion t o a
t ask- gat e descript or in t he GDT or t he current LDT.
Figure 7-7. Task Gates Referencing the Same Task
LDT
Task Gate
TSS GDT
TSS Descriptor
IDT
Task Gate
Task Gate
Vol. 3 7-13
TASK MANAGEMENT
An int errupt or except ion vect or point s t o a t ask- gat e descript or in t he I DT.
The current t ask execut es an I RET when t he NT flag in t he EFLAGS regist er is set .
JMP, CALL, and I RET inst ruct ions, as well as int errupt s and except ions, are all mech-
anisms for redirect ing a program. The referencing of a TSS descript or or a t ask gat e
( when calling or j umping t o a t ask) or t he st at e of t he NT flag ( when execut ing an
I RET inst ruct ion) det ermines whet her a t ask swit ch occurs.
The processor performs t he following operat ions when swit ching t o a new t ask:
1. Obt ains t he TSS segment select or for t he new t ask as t he operand of t he JMP or
CALL inst ruct ion, from a t ask gat e, or from t he previous t ask link field ( for a t ask
swit ch init iat ed wit h an I RET inst ruct ion) .
2. Checks t hat t he current ( old) t ask is allowed t o swit ch t o t he new t ask. Dat a-
access privilege rules apply t o JMP and CALL inst ruct ions. The CPL of t he current
( old) t ask and t he RPL of t he segment select or for t he new t ask must be less t han
or equal t o t he DPL of t he TSS descript or or t ask gat e being referenced.
Except ions, int errupt s ( except for int errupt s generat ed by t he I NT n inst ruct ion) ,
and t he I RET inst ruct ion are permit t ed t o swit ch t asks regardless of t he DPL of
t he dest inat ion t ask- gat e or TSS descript or. For int errupt s generat ed by t he I NT n
inst ruct ion, t he DPL is checked.
3. Checks t hat t he TSS descript or of t he new t ask is marked present and has a valid
limit ( great er t han or equal t o 67H) .
4. Checks t hat t he new t ask is available ( call, j ump, except ion, or int errupt ) or busy
( I RET ret urn) .
5. Checks t hat t he current ( old) TSS, new TSS, and all segment descript ors used in
t he t ask swit ch are paged int o syst em memory.
6. I f t he t ask swit ch was init iat ed wit h a JMP or I RET inst ruct ion, t he processor
clears t he busy ( B) flag in t he current ( old) t asks TSS descript or; if init iat ed wit h
a CALL inst ruct ion, an except ion, or an int errupt : t he busy ( B) flag is left set .
( See Table 7- 2. )
7. I f t he t ask swit ch was init iat ed wit h an I RET inst ruct ion, t he processor clears t he
NT flag in a t emporarily saved image of t he EFLAGS regist er; if init iat ed wit h a
CALL or JMP inst ruct ion, an except ion, or an int errupt , t he NT flag is left
unchanged in t he saved EFLAGS image.
8. Saves t he st at e of t he current ( old) t ask in t he current t asks TSS. The processor
finds t he base address of t he current TSS in t he t ask regist er and t hen copies t he
st at es of t he following regist ers int o t he current TSS: all t he general- purpose
regist ers, segment select ors from t he segment regist ers, t he t emporarily saved
image of t he EFLAGS regist er, and t he inst ruct ion point er regist er ( EI P) .
9. I f t he t ask swit ch was init iat ed wit h a CALL inst ruct ion, an except ion, or an
int errupt , t he processor will set t he NT flag in t he EFLAGS loaded from t he new
t ask. I f init iat ed wit h an I RET inst ruct ion or JMP inst ruct ion, t he NT flag will reflect
t he st at e of NT in t he EFLAGS loaded from t he new t ask ( see Table 7- 2) .
7-14 Vol. 3
TASK MANAGEMENT
10. I f t he t ask swit ch was init iat ed wit h a CALL inst ruct ion, JMP inst ruct ion, an
except ion, or an int errupt , t he processor set s t he busy ( B) flag in t he new t asks
TSS descript or; if init iat ed wit h an I RET inst ruct ion, t he busy ( B) flag is left set .
11. Loads t he t ask regist er wit h t he segment select or and descript or for t he new
t ask' s TSS.
12. The TSS st at e is loaded int o t he processor. This includes t he LDTR regist er, t he
PDBR ( cont rol regist er CR3) , t he EFLAGS regist ers, t he EI P regist er, t he general-
purpose regist ers, and t he segment select ors. Not e t hat a fault during t he load of
t his st at e may corrupt archit ect ural st at e.
13. The descript ors associat ed wit h t he segment select ors are loaded and qualified.
Any errors associat ed wit h t his loading and qualificat ion occur in t he cont ext of
t he new t ask.
NOTES
I f all checks and saves have been carried out successfully, t he
processor commit s t o t he t ask swit ch. I f an unrecoverable error
occurs in st eps 1 t hrough 11, t he processor does not complet e t he
t ask swit ch and insures t hat t he processor is ret urned t o it s st at e
prior t o t he execut ion of t he inst ruct ion t hat init iat ed t he t ask swit ch.
I f an unrecoverable error occurs in st ep 12, archit ect ural st at e may
be corrupt ed, but an at t empt will be made t o handle t he error in t he
prior execut ion environment . I f an unrecoverable error occurs aft er
t he commit point ( in st ep 13) , t he processor complet es t he t ask
swit ch ( wit hout performing addit ional access and segment avail-
abilit y checks) and generat es t he appropriat e except ion prior t o
beginning execut ion of t he new t ask.
I f except ions occur aft er t he commit point , t he except ion handler
must finish t he t ask swit ch it self before allowing t he processor t o
begin execut ing t he new t ask. See Chapt er 6, I nt errupt 10I nvalid
TSS Except ion ( # TS) , for more informat ion about t he affect of
except ions on a t ask when t hey occur aft er t he commit point of a t ask
swit ch.
14. Begins execut ing t he new t ask. ( To an except ion handler, t he first inst ruct ion of
t he new t ask appears not t o have been execut ed. )
The st at e of t he current ly execut ing t ask is always saved when a successful t ask
swit ch occurs. I f t he t ask is resumed, execut ion st art s wit h t he inst ruct ion point ed t o
by t he saved EI P value, and t he regist ers are rest ored t o t he values t hey held when
t he t ask was suspended.
When swit ching t asks, t he privilege level of t he new t ask does not inherit it s privilege
level from t he suspended t ask. The new t ask begins execut ing at t he privilege level
specified in t he CPL field of t he CS regist er, which is loaded from t he TSS. Because
t asks are isolat ed by t heir separat e address spaces and TSSs and because privilege
Vol. 3 7-15
TASK MANAGEMENT
rules cont rol access t o a TSS, soft ware does not need t o perform explicit privilege
checks on a t ask swit ch.
Table 7- 1 shows t he except ion condit ions t hat t he processor checks for when
swit ching t asks. I t also shows t he except ion t hat is generat ed for each check if an
error is det ect ed and t he segment t hat t he error code references. ( The order of t he
checks in t he t able is t he order used in t he P6 family processors. The exact order is
model specific and may be different for ot her I A- 32 processors. ) Except ion handlers
designed t o handle t hese except ions may be subj ect t o recursive calls if t hey at t empt
t o reload t he segment select or t hat generat ed t he except ion. The cause of t he excep-
t ion ( or t he first of mult iple causes) should be fixed before reloading t he select or.
Table 7-1. Exception Conditions Checked During a Task Switch
Condition Checked Exception
1
Error Code
Reference
2
Segment selector for a TSS descriptor references
the GDT and is within the limits of the table.
#GP
#TS (for IRET)
New Tasks TSS
TSS descriptor is present in memory. #NP New Tasks TSS
TSS descriptor is not busy (for task switch initiated
by a call, interrupt, or exception).
#GP (for JMP, CALL,
INT)
Tasks back-link TSS
TSS descriptor is not busy (for task switch initiated
by an IRET instruction).
#TS (for IRET) New Tasks TSS
TSS segment limit greater than or equal to 108 (for
32-bit TSS) or 44 (for 16-bit TSS).
#TS New Tasks TSS
Registers are loaded from the values in the TSS.
LDT segment selector of new task is valid
3
. #TS New Tasks LDT
Code segment DPL matches segment selector RPL. #TS New Code Segment
SS segment selector is valid
2
. #TS New Stack Segment
Stack segment is present in memory. #SS New Stack Segment
Stack segment DPL matches CPL. #TS New stack segment
LDT of new task is present in memory. #TS New Tasks LDT
CS segment selector is valid
3
. #TS New Code Segment
Code segment is present in memory. #NP New Code Segment
Stack segment DPL matches selector RPL. #TS New Stack Segment
DS, ES, FS, and GS segment selectors are valid
3
. #TS New Data Segment
DS, ES, FS, and GS segments are readable. #TS New Data Segment
7-16 Vol. 3
TASK MANAGEMENT
The TS ( t ask swit ched) flag in t he cont rol regist er CR0 is set every t ime a t ask swit ch
occurs. Syst em soft ware uses t he TS flag t o coordinat e t he act ions of float ing- point
unit when generat ing float ing- point except ions wit h t he rest of t he processor. The TS
flag indicat es t hat t he cont ext of t he float ing- point unit may be different from t hat of
t he current t ask. See Sect ion 2. 5, Cont rol Regist ers , for a det ailed descript ion of
t he funct ion and use of t he TS flag.
7.4 TASK LINKING
The previous t ask link field of t he TSS ( somet imes called t he backlink ) and t he NT
flag in t he EFLAGS regist er are used t o ret urn execut ion t o t he previous t ask.
EFLAGS. NT = 1 indicat es t hat t he current ly execut ing t ask is nest ed wit hin t he
execut ion of anot her t ask.
When a CALL inst ruct ion, an int errupt , or an except ion causes a t ask swit ch: t he
processor copies t he segment select or for t he current TSS t o t he previous t ask link
field of t he TSS for t he new t ask; it t hen set s EFLAGS. NT = 1. I f soft ware uses an
I RET inst ruct ion t o suspend t he new t ask, t he processor checks for EFLAGS. NT = 1;
it t hen uses t he value in t he previous t ask link field t o ret urn t o t he previous t ask. See
Figures 7- 8.
When a JMP inst ruct ion causes a t ask swit ch, t he new t ask is not nest ed. The
previous t ask link field is not used and EFLAGS. NT = 0. Use a JMP inst ruct ion t o
dispat ch a new t ask when nest ing is not desired.
DS, ES, FS, and GS segments are present in memory. #NP New Data Segment
DS, ES, FS, and GS segment DPL greater than or
equal to CPL (unless these are
conforming segments).
#TS New Data Segment
NOTES:
1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS
exception, and #SS is stack-fault exception.
2. The error code contains an index to the segment descriptor referenced in this column.
3. A segment selector is valid if it is in a compatible type of table (GDT or LDT), occupies an address
within the table's segment limit, and refers to a compatible type of descriptor (for example, a seg-
ment selector in the CS register only is valid when it points to a code-segment descriptor).
Table 7-1. Exception Conditions Checked During a Task Switch (Contd.)
Condition Checked Exception
1
Error Code
Reference
2
Vol. 3 7-17
TASK MANAGEMENT
Table 7- 2 shows t he busy flag ( in t he TSS segment descript or) , t he NT flag, t he
previous t ask link field, and TS flag ( in cont rol regist er CR0) during a t ask swit ch.
The NT flag may be modified by soft ware execut ing at any privilege level. I t is
possible for a program t o set t he NT flag and execut e an I RET inst ruct ion. This might
randomly invoke t he t ask specified in t he previous link field of t he current t ask' s TSS.
To keep such spurious t ask swit ches from succeeding, t he operat ing syst em should
init ialize t he previous t ask link field in every TSS t hat it creat es t o 0.
Figure 7-8. Nested Tasks
Table 7-2. Effect of a Task Switch on Busy Flag, NT Flag,
Previous Task Link Field, and TS Flag
Flag or Field Effect of JMP
instruction
Effect of CALL
Instruction or
Interrupt
Effect of IRET
Instruction
Busy (B) flag of new
task.
Flag is set. Must have
been clear before.
Flag is set. Must have
been clear before.
No change. Must have
been set.
Busy flag of old task. Flag is cleared. No change. Flag is
currently set.
Flag is cleared.
NT flag of new task. Set to value from TSS
of new task.
Flag is set. Set to value from TSS
of new task.
NT flag of old task. No change. No change. Flag is cleared.
Previous task link field
of new task.
No change. Loaded with selector
for old tasks TSS.
No change.
Previous task link field
of old task.
No change. No change. No change.
TS flag in control
register CR0.
Flag is set. Flag is set. Flag is set.
Top Level
Task
NT=0
Previous
TSS
Nested
Task
NT=1
TSS
More Deeply
Nested Task
NT=1
TSS
Currently Executing
Task
NT=1
EFLAGS
Task Register
Task Link
Previous
Task Link
Previous
Task Link
7-18 Vol. 3
TASK MANAGEMENT
7.4.1 Use of Busy Flag To Prevent Recursive Task Switching
A TSS allows only one cont ext t o be saved for a t ask; t herefore, once a t ask is called
( dispat ched) , a recursive ( or re- ent rant ) call t o t he t ask would cause t he current
st at e of t he t ask t o be lost . The busy flag in t he TSS segment descript or is provided
t o prevent re- ent rant t ask swit ching and a subsequent loss of t ask st at e informat ion.
The processor manages t he busy flag as follows:
1. When dispat ching a t ask, t he processor set s t he busy flag of t he new t ask.
2. I f during a t ask swit ch, t he current t ask is placed in a nest ed chain ( t he t ask
swit ch is being generat ed by a CALL inst ruct ion, an int errupt , or an except ion) ,
t he busy flag for t he current t ask remains set .
3. When swit ching t o t he new t ask ( init iat ed by a CALL inst ruct ion, int errupt , or
except ion) , t he processor generat es a general- prot ect ion except ion ( # GP) if t he
busy flag of t he new t ask is already set . I f t he t ask swit ch is init iat ed wit h an I RET
inst ruct ion, t he except ion is not raised because t he processor expect s t he busy
flag t o be set .
4. When a t ask is t erminat ed by a j ump t o a new t ask ( init iat ed wit h a JMP
inst ruct ion in t he t ask code) or by an I RET inst ruct ion in t he t ask code, t he
processor clears t he busy flag, ret urning t he t ask t o t he not busy st at e.
The processor prevent s recursive t ask swit ching by prevent ing a t ask from swit ching
t o it self or t o any t ask in a nest ed chain of t asks. The chain of nest ed suspended t asks
may grow t o any lengt h, due t o mult iple calls, int errupt s, or except ions. The busy
flag prevent s a t ask from being invoked if it is in t his chain.
The busy flag may be used in mult iprocessor configurat ions, because t he processor
follows a LOCK prot ocol ( on t he bus or in t he cache) when it set s or clears t he busy
flag. This lock keeps t wo processors from invoking t he same t ask at t he same t ime.
See Sect ion 8. 1. 2.1, Aut omat ic Locking, for more informat ion about set t ing t he
busy flag in a mult iprocessor applicat ions.
7.4.2 Modifying Task Linkages
I n a uniprocessor syst em, in sit uat ions where it is necessary t o remove a t ask from a
chain of linked t asks, use t he following procedure t o remove t he t ask:
1. Disable int errupt s.
2. Change t he previous t ask link field in t he TSS of t he pre- empt ing t ask ( t he t ask
t hat suspended t he t ask t o be removed) . I t is assumed t hat t he pre- empt ing t ask
is t he next t ask ( newer t ask) in t he chain from t he t ask t o be removed. Change
t he previous t ask link field t o point t o t he TSS of t he next oldest t ask in t he chain
or t o an even older t ask in t he chain.
3. Clear t he busy ( B) flag in t he TSS segment descript or for t he t ask being removed
from t he chain. I f more t han one t ask is being removed from t he chain, t he busy
flag for each t ask being remove must be cleared.
4. Enable int errupt s.
Vol. 3 7-19
TASK MANAGEMENT
I n a mult iprocessing syst em, addit ional synchronizat ion and serializat ion operat ions
must be added t o t his procedure t o insure t hat t he TSS and it s segment descript or
are bot h locked when t he previous t ask link field is changed and t he busy flag is
cleared.
7.5 TASK ADDRESS SPACE
The address space for a t ask consist s of t he segment s t hat t he t ask can access.
These segment s include t he code, dat a, st ack, and syst em segment s referenced in
t he TSS and any ot her segment s accessed by t he t ask code. The segment s are
mapped int o t he processor s linear address space, which is in t urn mapped int o t he
processor s physical address space ( eit her direct ly or t hrough paging) .
The LDT segment field in t he TSS can be used t o give each t ask it s own LDT. Giving a
t ask it s own LDT allows t he t ask address space t o be isolat ed from ot her t asks by
placing t he segment descript ors for all t he segment s associat ed wit h t he t ask in t he
t asks LDT.
I t also is possible for several t asks t o use t he same LDT. This is a memory- efficient
way t o allow specific t asks t o communicat e wit h or cont rol each ot her, wit hout drop-
ping t he prot ect ion barriers for t he ent ire syst em.
Because all t asks have access t o t he GDT, it also is possible t o creat e shared
segment s accessed t hrough segment descript ors in t his t able.
I f paging is enabled, t he CR3 regist er ( PDBR) field in t he TSS allows each t ask t o
have it s own set of page t ables for mapping linear addresses t o physical addresses.
Or, several t asks can share t he same set of page t ables.
7.5.1 Mapping Tasks to the Linear and Physical Address Spaces
Tasks can be mapped t o t he linear address space and physical address space in one
of t wo ways:
One l i near - t o- phy si cal addr ess space mappi ng i s shar ed among al l t ask s.
When paging is not enabled, t his is t he only choice. Wit hout paging, all linear
addresses map t o t he same physical addresses. When paging is enabled, t his
form of linear- t o- physical address space mapping is obt ained by using one page
direct ory for all t asks. The linear address space may exceed t he available
physical space if demand- paged virt ual memory is support ed.
Each t ask has i t s ow n l i near addr ess space t hat i s mapped t o t he phy si cal
addr ess space. This form of mapping is accomplished by using a different
page direct ory for each t ask. Because t he PDBR ( cont rol regist er CR3) is loaded
on t ask swit ches, each t ask may have a different page direct ory.
The linear address spaces of different t asks may map t o complet ely dist inct physical
addresses. I f t he ent ries of different page direct ories point t o different page t ables
7-20 Vol. 3
TASK MANAGEMENT
and t he page t ables point t o different pages of physical memory, t hen t he t asks do
not share physical addresses.
Wit h eit her met hod of mapping t ask linear address spaces, t he TSSs for all t asks
must lie in a shared area of t he physical space, which is accessible t o all t asks. This
mapping is required so t hat t he mapping of TSS addresses does not change while t he
processor is reading and updat ing t he TSSs during a t ask swit ch. The linear address
space mapped by t he GDT also should be mapped t o a shared area of t he physical
space; ot herwise, t he purpose of t he GDT is defeat ed. Figure 7- 9 shows how t he
linear address spaces of t wo t asks can overlap in t he physical space by sharing page
t ables.
7.5.2 Task Logical Address Space
To allow t he sharing of dat a among t asks, use t he following t echniques t o creat e
shared logical- t o- physical address- space mappings for dat a segment s:
Thr ough t he segment descr i pt or s i n t he GDT All t asks must have access
t o t he segment descript ors in t he GDT. I f some segment descript ors in t he GDT
point t o segment s in t he linear- address space t hat are mapped int o an area of t he
physical- address space common t o all t asks, t hen all t asks can share t he dat a
and code in t hose segment s.
Thr ough a shar ed LDT Two or more t asks can use t he same LDT if t he LDT
fields in t heir TSSs point t o t he same LDT. I f some segment descript ors in a
Figure 7-9. Overlapping Linear-to-Physical Mappings
Task A
TSS
PDE
Page Directories
PDE
PTE
PTE
PTE
PTE
PTE
Page Tables Page Frames
Task A
Task A
Shared
Shared
Task B
Task B
Shared PT
PTE
PTE
PDE
PDE
PDBR
PDBR
Task A TSS
Task B TSS
Vol. 3 7-21
TASK MANAGEMENT
shared LDT point t o segment s t hat are mapped t o a common area of t he physical
address space, t he dat a and code in t hose segment s can be shared among t he
t asks t hat share t he LDT. This met hod of sharing is more select ive t han sharing
t hrough t he GDT, because t he sharing can be limit ed t o specific t asks. Ot her
t asks in t he syst em may have different LDTs t hat do not give t hem access t o t he
shared segment s.
Thr ough segment descr i pt or s i n di st i nct LDTs t hat ar e mapped t o
common addr esses i n l i near addr ess space I f t his common area of t he
linear address space is mapped t o t he same area of t he physical address space
for each t ask, t hese segment descript ors permit t he t asks t o share segment s.
Such segment descript ors are commonly called aliases. This met hod of sharing is
even more select ive t han t hose list ed above, because, ot her segment descript ors
in t he LDTs may point t o independent linear addresses which are not shared.
7.6 16-BIT TASK-STATE SEGMENT (TSS)
The 32- bit I A- 32 processors also recognize a 16- bit TSS format like t he one used in
I nt el 286 processors ( see Figure 7- 10) . This format is support ed for compat ibilit y
wit h soft ware writ t en t o run on earlier I A- 32 processors.
The following informat ion is import ant t o know about t he 16- bit TSS.
Do not use a 16- bit TSS t o implement a virt ual- 8086 t ask.
The valid segment limit for a 16- bit TSS is 2CH.
The 16- bit TSS does not cont ain a field for t he base address of t he page direct ory,
which is loaded int o cont rol regist er CR3. A separat e set of page t ables for each
t ask is not support ed for 16- bit t asks. I f a 16- bit t ask is dispat ched, t he page-
t able st ruct ure for t he previous t ask is used.
The I / O base address is not included in t he 16- bit TSS. None of t he funct ions of
t he I / O map are support ed.
When t ask st at e is saved in a 16- bit TSS, t he upper 16 bit s of t he EFLAGS regist er
and t he EI P regist er are lost .
When t he general- purpose regist ers are loaded or saved from a 16- bit TSS, t he
upper 16 bit s of t he regist ers are modified and not maint ained.
7-22 Vol. 3
TASK MANAGEMENT
7.7 TASK MANAGEMENT IN 64-BIT MODE
I n 64- bit mode, t ask st ruct ure and t ask st at e are similar t o t hose in prot ect ed mode.
However, t he t ask swit ching mechanism available in prot ect ed mode is not support ed
in 64- bit mode. Task management and swit ching must be performed by soft ware.
The processor issues a general- prot ect ion except ion ( # GP) if t he following is
at t empt ed in 64- bit mode:
Cont rol t ransfer t o a TSS or a t ask gat e using JMP, CALL, I NTn, or int errupt .
An I RET wit h EFLAGS. NT ( nest ed t ask) set t o 1.
Figure 7-10. 16-Bit TSS Format
Task LDT Selector
DS Selector
SS Selector
CS Selector
ES Selector
DI
SI
BP
SP
BX
DX
CX
AX
FLAG Word
IP (Entry Point)
SS2
SP2
SS1
SP1
SS0
SP0
Previous Task Link
15
0
42
40
36
34
32
30
38
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
Vol. 3 7-23
TASK MANAGEMENT
Alt hough hardware t ask- swit ching is not support ed in 64- bit mode, a 64- bit t ask
st at e segment ( TSS) must exist . Figure 7- 11 shows t he format of a 64- bit TSS. The
TSS holds informat ion import ant t o 64- bit mode and t hat is not direct ly relat ed t o t he
t ask- swit ch mechanism. This informat ion includes:
RSPn The full 64- bit canonical forms of t he st ack point ers ( RSP) for privilege
levels 0- 2.
I STn The full 64- bit canonical forms of t he int errupt st ack t able ( I ST) point ers.
I / O map base addr ess The 16- bit offset t o t he I / O permission bit map from
t he 64- bit TSS base.
The operat ing syst em must creat e at least one 64- bit TSS aft er act ivat ing I A- 32e
mode. I t must execut e t he LTR inst ruct ion ( in 64- bit mode) t o load t he TR regist er
wit h a point er t o t he 64- bit TSS responsible for bot h 64- bit - mode programs and
compat ibilit y- mode programs.
7-24 Vol. 3
TASK MANAGEMENT
Figure 7-11. 64-Bit TSS Format
0 31
100
96
92
88
84
80
76
I/O Map Base Address
15
72
68
64
60
56
52
48
44
40
36
32
28
24
20
16
12
8
4
0
RSP0 (lower 32 bits)
RSP1 (lower 32 bits)
RSP2 (lower 32 bits)
Reserved bits. Set to 0.
RSP0 (upper 32 bits)
RSP1 (upper 32 bits)
RSP2 (upper 32 bits)
IST1 (lower 32 bits)
IST1 (upper 32 bits)
IST2 (lower 32 bits)
IST3 (lower 32 bits)
IST4 (lower 32 bits)
IST5 (lower 32 bits)
IST6 (lower 32 bits)
IST7 (lower 32 bits)
IST2 (upper 32 bits)
IST3 (upper 32 bits)
IST4 (upper 32 bits)
IST5 (upper 32 bits)
IST6 (upper 32 bits)
IST7 (upper 32 bits)
Reserved
Reserved
Reserved
Reserved
Reserved
Reserved
Vol. 3 8-1
CHAPTER 8
MULTIPLE-PROCESSOR MANAGEMENT
The I nt el 64 and I A- 32 archit ect ures provide mechanisms for managing and
improving t he performance of mult iple processors connect ed t o t he same syst em
bus. These include:
Bus locking and/ or cache coherency management for performing at omic
operat ions on syst em memory.
Serializing inst ruct ions. These inst ruct ions apply only t o t he Pent ium 4, I nt el
Xeon, P6 family, and Pent ium processors.
An advance programmable int errupt cont roller ( API C) locat ed on t he processor
chip ( see Chapt er 10, Advanced Programmable I nt errupt Cont roller ( API C) ) .
This feat ure was int roduced by t he Pent ium processor.
A second- level cache ( level 2, L2) . For t he Pent ium 4, I nt el Xeon, and P6 family
processors, t he L2 cache is included in t he processor package and is t ight ly
coupled t o t he processor. For t he Pent ium and I nt el486 processors, pins are
provided t o support an ext ernal L2 cache.
A t hird- level cache ( level 3, L3) . For I nt el Xeon processors, t he L3 cache is
included in t he processor package and is t ight ly coupled t o t he processor.
I nt el Hyper-Threading Technology. This ext ension t o t he I nt el 64 and I A- 32 archi-
t ect ures enables a single processor core t o execut e t wo or more t hreads concur-
rent ly ( see Sect ion 8. 5, I nt el

Hyper-Threading Technology and I nt el

Mult i-
Core Technology ) .
These mechanisms are part icularly useful in symmet ric- mult iprocessing ( SMP)
syst ems. However, t hey can also be used when an I nt el 64 or I A- 32 processor and a
special- purpose processor ( such as a communicat ions, graphics, or video processor)
share t he syst em bus.
These mult iprocessing mechanisms have t he following charact erist ics:
To maint ain syst em memory coherency When t wo or more processors are
at t empt ing simult aneously t o access t he same address in syst em memory, some
communicat ion mechanism or memory access prot ocol must be available t o
promot e dat a coherency and, in some inst ances, t o allow one processor t o
t emporarily lock a memory locat ion.
To maint ain cache consist ency When one processor accesses dat a cached on
anot her processor, it must not receive incorrect dat a. I f it modifies dat a, all ot her
processors t hat access t hat dat a must receive t he modified dat a.
To allow predict able ordering of writ es t o memory I n some circumst ances, it is
import ant t hat memory writ es be observed ext ernally in precisely t he same order
as programmed.
8-2 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
To dist ribut e int errupt handling among a group of processors When several
processors are operat ing in a syst em in parallel, it is useful t o have a cent ralized
mechanism for receiving int errupt s and dist ribut ing t hem t o available processors
for servicing.
To increase syst em performance by exploit ing t he mult i- t hreaded and mult i-
process nat ure of cont emporary operat ing syst ems and applicat ions.
The caching mechanism and cache consist ency of I nt el 64 and I A- 32 processors are
discussed in Chapt er 11. The API C archit ect ure is described in Chapt er 10. Bus and
memory locking, serializing inst ruct ions, memory ordering, and I nt el Hyper-
Threading Technology are discussed in t he following sect ions.
8.1 LOCKED ATOMIC OPERATIONS
The 32- bit I A- 32 processors support locked at omic operat ions on locat ions in syst em
memory. These operat ions are t ypically used t o manage shared dat a st ruct ures ( such
as semaphores, segment descript ors, syst em segment s, or page t ables) in which t wo
or more processors may t ry simult aneously t o modify t he same field or flag. The
processor uses t hree int erdependent mechanisms for carrying out locked at omic
operat ions:
Guarant eed at omic operat ions
Bus locking, using t he LOCK# signal and t he LOCK inst ruct ion prefix
Cache coherency prot ocols t hat ensure t hat at omic operat ions can be carried out
on cached dat a st ruct ures ( cache lock) ; t his mechanism is present in t he
Pent ium 4, I nt el Xeon, and P6 family processors
These mechanisms are int erdependent in t he following ways. Cert ain basic memory
t ransact ions ( such as reading or writ ing a byt e in syst em memory) are always guar-
ant eed t o be handled at omically. That is, once st art ed, t he processor guarant ees t hat
t he operat ion will be complet ed before anot her processor or bus agent is allowed
access t o t he memory locat ion. The processor also support s bus locking for
performing select ed memory operat ions ( such as a read- modify- writ e operat ion in a
shared area of memory) t hat t ypically need t o be handled at omically, but are not
aut omat ically handled t his way. Because frequent ly used memory locat ions are oft en
cached in a processor s L1 or L2 caches, at omic operat ions can oft en be carried out
inside a processor s caches wit hout assert ing t he bus lock. Here t he processor s
cache coherency prot ocols ensure t hat ot her processors t hat are caching t he same
memory locat ions are managed properly while at omic operat ions are performed on
cached memory locat ions.
NOTE
Where t here are cont est ed lock accesses, soft ware may need t o
implement algorit hms t hat ensure fair access t o resources in order t o
prevent lock st arvat ion. The hardware provides no resource t hat
guarant ees fairness t o part icipat ing agent s. I t is t he responsibilit y of
Vol. 3 8-3
MULTIPLE-PROCESSOR MANAGEMENT
soft ware t o manage t he fairness of semaphores and exclusive locking
funct ions.
The mechanisms for handling locked at omic operat ions have evolved wit h t he
complexit y of I A- 32 processors. More recent I A- 32 processors ( such as t he
Pent ium 4, I nt el Xeon, and P6 family processors) and I nt el 64 provide a more refined
locking mechanism t han earlier processors. These mechanisms are described in t he
following sect ions.
8.1.1 Guaranteed Atomic Operations
The I nt el486 processor ( and newer processors since) guarant ees t hat t he following
basic memory operat ions will always be carried out at omically:
Reading or writ ing a byt e
Reading or writ ing a word aligned on a 16- bit boundary
Reading or writ ing a doubleword aligned on a 32- bit boundary
The Pent ium processor ( and newer processors since) guarant ees t hat t he following
addit ional memory operat ions will always be carried out at omically:
Reading or writ ing a quadword aligned on a 64- bit boundary
16- bit accesses t o uncached memory locat ions t hat fit wit hin a 32- bit dat a bus
The P6 family processors ( and newer processors since) guarant ee t hat t he following
addit ional memory operat ion will always be carried out at omically:
Unaligned 16- , 32- , and 64- bit accesses t o cached memory t hat fit wit hin a cache
line
Accesses t o cacheable memory t hat are split across bus widt hs, cache lines, and
page boundaries are not guarant eed t o be at omic by t he I nt el Core 2 Duo, I nt el


At om, I nt el Core Duo, Pent ium M, Pent ium 4, I nt el Xeon, P6 family, Pent ium, and
I nt el486 processors. The I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium M,
Pent ium 4, I nt el Xeon, and P6 family processors provide bus cont rol signals t hat
permit ext ernal memory subsyst ems t o make split accesses at omic; however,
nonaligned dat a accesses will seriously impact t he performance of t he processor and
should be avoided.
An x87 inst ruct ion or an SSE inst ruct ions t hat accesses dat a larger t han a quadword
may be implement ed using mult iple memory accesses. I f such an inst ruct ion st ores
t o memory, some of t he accesses may complet e ( writ ing t o memory) while anot her
causes t he operat ion t o fault for archit ect ural reasons ( e. g. due an page- t able ent ry
t hat is marked not present ) . I n t his case, t he effect s of t he complet ed accesses
may be visible t o soft ware even t hough t he overall inst ruct ion caused a fault . I f TLB
invalidat ion has been delayed ( see Sect ion 4. 10. 4. 4) , such page fault s may occur
even if all accesses are t o t he same page.
8-4 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.1.2 Bus Locking
I nt el 64 and I A- 32 processors provide a LOCK# signal t hat is assert ed aut omat ically
during cert ain crit ical memory operat ions t o lock t he syst em bus or equivalent link.
While t his out put signal is assert ed, request s from ot her processors or bus agent s for
cont rol of t he bus are blocked. Soft ware can specify ot her occasions when t he LOCK
semant ics are t o be followed by prepending t he LOCK prefix t o an inst ruct ion.
I n t he case of t he I nt el386, I nt el486, and Pent ium processors, explicit ly locked
inst ruct ions will result in t he assert ion of t he LOCK# signal. I t is t he responsibilit y of
t he hardware designer t o make t he LOCK# signal available in syst em hardware t o
cont rol memory accesses among processors.
For t he P6 and more recent processor families, if t he memory area being accessed is
cached int ernally in t he processor, t he LOCK# signal is generally not assert ed;
inst ead, locking is only applied t o t he processor s caches ( see Sect ion 8. 1. 4, Effect s
of a LOCK Operat ion on I nt ernal Processor Caches ) .
8.1.2.1 Automatic Locking
The operat ions on which t he processor aut omat ically follows t he LOCK semant ics are
as follows:
When execut ing an XCHG inst ruct ion t hat references memory.
When set t i ng t he B ( busy ) f l ag of a TSS descr i pt or The processor t est s
and set s t he busy flag in t he t ype field of t he TSS descript or when swit ching t o a
t ask. To ensure t hat t wo processors do not swit ch t o t he same t ask simult a-
neously, t he processor follows t he LOCK semant ics while t est ing and set t ing t his
flag.
When updat i ng segment descr i pt or s When loading a segment descript or,
t he processor will set t he accessed flag in t he segment descript or if t he flag is
clear. During t his operat ion, t he processor follows t he LOCK semant ics so t hat t he
descript or will not be modified by anot her processor while it is being updat ed. For
t his act ion t o be effect ive, operat ing- syst em procedures t hat updat e descript ors
should use t he following st eps:
Use a locked operat ion t o modify t he access- right s byt e t o indicat e t hat t he
segment descript or is not - present , and specify a value for t he t ype field t hat
indicat es t hat t he descript or is being updat ed.
Updat e t he fields of t he segment descript or. ( This operat ion may require
several memory accesses; t herefore, locked operat ions cannot be used. )
Use a locked operat ion t o modify t he access- right s byt e t o indicat e t hat t he
segment descript or is valid and present .
The I nt el386 processor always updat es t he accessed flag in t he segment
descript or, whet her it is clear or not . The Pent ium 4, I nt el Xeon, P6 family,
Pent ium, and I nt el486 processors only updat e t his flag if it is not already set .
Vol. 3 8-5
MULTIPLE-PROCESSOR MANAGEMENT
When updat i ng page- di r ect or y and page- t abl e ent r i es When updat ing
page- direct ory and page- t able ent ries, t he processor uses locked cycles t o set
t he accessed and dirt y flag in t he page- direct ory and page- t able ent ries.
Ack now l edgi ng i nt er r upt s Aft er an int errupt request , an int errupt cont roller
may use t he dat a bus t o send t he int errupt vect or for t he int errupt t o t he
processor. The processor follows t he LOCK semant ics during t his t ime t o ensure
t hat no ot her dat a appears on t he dat a bus when t he int errupt vect or is being
t ransmit t ed.
8.1.2.2 Software Controlled Bus Locking
To explicit ly force t he LOCK semant ics, soft ware can use t he LOCK prefix wit h t he
following inst ruct ions when t hey are used t o modify a memory locat ion. An invalid-
opcode except ion ( # UD) is generat ed when t he LOCK prefix is used wit h any ot her
inst ruct ion or when no writ e operat ion is made t o memory ( t hat is, when t he dest ina-
t ion operand is in a regist er) .
The bit t est and modify inst ruct ions ( BTS, BTR, and BTC) .
The exchange inst ruct ions ( XADD, CMPXCHG, and CMPXCHG8B) .
The LOCK prefix is aut omat ically assumed for XCHG inst ruct ion.
The following single- operand arit hmet ic and logical inst ruct ions: I NC, DEC, NOT,
and NEG.
The following t wo- operand arit hmet ic and logical inst ruct ions: ADD, ADC, SUB,
SBB, AND, OR, and XOR.
A locked inst ruct ion is guarant eed t o lock only t he area of memory defined by t he
dest inat ion operand, but may be int erpret ed by t he syst em as a lock for a larger
memory area.
Soft ware should access semaphores ( shared memory used for signalling bet ween
mult iple processors) using ident ical addresses and operand lengt hs. For example, if
one processor accesses a semaphore using a word access, ot her processors should
not access t he semaphore using a byt e access.
NOTE
Do not implement semaphores using t he WC memory t ype. Do not
perform non- t emporal st ores t o a cache line cont aining a locat ion
used t o implement a semaphore.
The int egrit y of a bus lock is not affect ed by t he alignment of t he memory field. The
LOCK semant ics are followed for as many bus cycles as necessary t o updat e t he
ent ire operand. However, it is recommend t hat locked accesses be aligned on t heir
nat ural boundaries for bet t er syst em performance:
Any boundary for an 8- bit access ( locked or ot herwise) .
16- bit boundary for locked word accesses.
8-6 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
32- bit boundary for locked doubleword accesses.
64- bit boundary for locked quadword accesses.
Locked operat ions are at omic wit h respect t o all ot her memory operat ions and all
ext ernally visible event s. Only inst ruct ion fet ch and page t able accesses can pass
locked inst ruct ions. Locked inst ruct ions can be used t o synchronize dat a writ t en by
one processor and read by anot her processor.
For t he P6 family processors, locked operat ions serialize all out st anding load and
st ore operat ions ( t hat is, wait for t hem t o complet e) . This rule is also t rue for t he
Pent ium 4 and I nt el Xeon processors, wit h one except ion. Load operat ions t hat refer-
ence weakly ordered memory t ypes ( such as t he WC memory t ype) may not be seri-
alized.
Locked inst ruct ions should not be used t o ensure t hat dat a writ t en can be fet ched as
inst ruct ions.
NOTE
The locked inst ruct ions for t he current versions of t he Pent ium 4,
I nt el Xeon, P6 family, Pent ium, and I nt el486 processors allow dat a
writ t en t o be fet ched as inst ruct ions. However, I nt el recommends
t hat developers who require t he use of self- modifying code use a
different synchronizing mechanism, described in t he following
sect ions.
8.1.3 Handling Self- and Cross-Modifying Code
The act of a pr ocessor wr it ing dat a int o a cur r ent ly execut ing code segment wit h
t he int ent of execut ing t hat dat a as code is called sel f - modi f y i n g code. I A- 32
pr ocessor s exhibit model- specif ic behavior when execut ing self- modif ied code,
depending upon how f ar ahead of t he cur r ent execut ion point er t he code has been
modif ied.
As processor microarchit ect ures become more complex and st art t o speculat ively
execut e code ahead of t he ret irement point ( as in P6 and more recent processor
families) , t he rules regarding which code should execut e, pre- or post - modificat ion,
become blurred. To writ e self- modifying code and ensure t hat it is compliant wit h
current and fut ure versions of t he I A- 32 archit ect ures, use one of t he following
coding opt ions:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Vol. 3 8-7
MULTIPLE-PROCESSOR MANAGEMENT
Execute new code;
The use of one of t hese opt ions is not required for programs int ended t o run on t he
Pent ium or I nt el486 processors, but are recommended t o ensure compat ibilit y wit h
t he P6 and more recent processor families.
Self- modifying code will execut e at a lower level of performance t han non- self- modi-
fying or normal code. The degree of t he performance det eriorat ion will depend upon
t he frequency of modificat ion and specific charact erist ics of t he code.
The act of one processor writ ing dat a int o t he current ly execut ing code segment of a
second processor wit h t he int ent of having t he second processor execut e t hat dat a as
code is called cr oss- modi f y i ng code. As wit h self- modifying code, I A- 32 processors
exhibit model- specific behavior when execut ing cross- modifying code, depending
upon how far ahead of t he execut ing processors current execut ion point er t he code
has been modified.
To writ e cross- modifying code and ensure t hat it is compliant wit h current and fut ure
versions of t he I A- 32 archit ect ure, t he following processor synchronizat ion algorit hm
must be implement ed:
(* Action of Modifying Processor *)
Memory_Flag 0; (* Set Memory_Flag to value other than 1 *)
Store modified code (as data) into code segment;
Memory_Flag 1;
(* Action of Executing Processor *)
WHILE (Memory_Flag 1)
Wait for code to update;
ELIHW;
Execute serializing instruction; (* For example, CPUID instruction *)
Begin executing modified code;
( The use of t his opt ion is not required for programs int ended t o run on t he I nt el486
processor, but is recommended t o ensure compat ibilit y wit h t he Pent ium 4, I nt el
Xeon, P6 family, and Pent ium processors. )
Like self- modifying code, cross- modifying code will execut e at a lower level of perfor-
mance t han non- cross- modifying ( normal) code, depending upon t he frequency of
modificat ion and specific charact erist ics of t he code.
The rest rict ions on self- modifying code and cross- modifying code also apply t o t he
I nt el 64 archit ect ure.
8.1.4 Effects of a LOCK Operation on Internal Processor Caches
For t he I nt el486 and Pent ium processors, t he LOCK# signal is always assert ed on t he
bus during a LOCK operat ion, even if t he area of memory being locked is cached in
t he processor.
8-8 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
For t he P6 and more recent processor families, if t he area of memory being locked
during a LOCK operat ion is cached in t he processor t hat is performing t he LOCK oper-
at ion as writ e- back memory and is complet ely cont ained in a cache line, t he
processor may not assert t he LOCK# signal on t he bus. I nst ead, it will modify t he
memory locat ion int ernally and allow it s cache coherency mechanism t o ensure t hat
t he operat ion is carried out at omically. This operat ion is called cache locking. The
cache coherency mechanism aut omat ically prevent s t wo or more processors t hat
have cached t he same area of memory from simult aneously modifying dat a in t hat
area.
8.2 MEMORY ORDERING
The t erm memor y or der i ng refers t o t he order in which t he processor issues reads
( loads) and writ es ( st ores) t hrough t he syst em bus t o syst em memory. The I nt el 64
and I A- 32 archit ect ures support several memory- ordering models depending on t he
implement at ion of t he archit ect ure. For example, t he I nt el386 processor enforces
pr ogr am or der i ng ( generally referred t o as st r ong or der i ng) , where reads and
writ es are issued on t he syst em bus in t he order t hey occur in t he inst ruct ion st ream
under all circumst ances.
To allow performance opt imizat ion of inst ruct ion execut ion, t he I A- 32 archit ect ure
allows depart ures from st rong- ordering model called pr ocessor or der i ng in
Pent ium 4, I nt el Xeon, and P6 family processors. These pr ocessor - or der i ng varia-
t ions ( called here t he memor y - or der i ng model ) allow performance enhancing
operat ions such as allowing reads t o go ahead of buffered writ es. The goal of any of
t hese variat ions is t o increase inst ruct ion execut ion speeds, while maint aining
memory coherency, even in mult iple- processor syst ems.
Sect ion 8.2. 1 and Sect ion 8.2. 2 describe t he memory- ordering implement ed by
I nt el486, Pent ium, I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium 4, I nt el
Xeon, and P6 family processors. Sect ion 8. 2.3 gives examples illust rat ing t he
behavior of t he memory- ordering model on I A- 32 and I nt el- 64 processors. Sect ion
8.2. 4 considers t he special t reat ment of st ores for st ring operat ions and Sect ion
8.2. 5 discusses how memory- ordering behavior may be modified t hrough t he use of
specific inst ruct ions.
8.2.1 Memory Ordering in the Intel

Pentium

and Intel486


Processors
The Pent ium and I nt el486 processors follow t he processor- ordered memory model;
however, t hey operat e as st rongly- ordered processors under most circumst ances.
Reads and writ es always appear in programmed order at t he syst em busexcept for
t he following sit uat ion where processor ordering is exhibit ed. Read misses are
permit t ed t o go ahead of buffered writ es on t he syst em bus when all t he buffered
writ es are cache hit s and, t herefore, are not direct ed t o t he same address being
accessed by t he read miss.
Vol. 3 8-9
MULTIPLE-PROCESSOR MANAGEMENT
I n t he case of I / O operat ions, bot h reads and writ es always appear in programmed
order.
Soft ware int ended t o operat e correct ly in processor- ordered processors ( such as t he
Pent ium 4, I nt el Xeon, and P6 family processors) should not depend on t he relat ively
st rong ordering of t he Pent ium or I nt el486 processors. I nst ead, it should ensure
t hat accesses t o shared var iables t hat are int ended t o cont rol concurrent execut ion
among processors are explicit ly required t o obey pr ogram or der ing t hr ough t he use
of appropriat e locking or serializing operat ions ( see Sect ion 8. 2. 5, St rengt hening or
Weakening t he Memory- Ordering Model ) .
8.2.2 Memory Ordering in P6 and More Recent Processor Families
The I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium 4, and P6 family proces-
sors also use a processor- ordered memory- ordering model t hat can be furt her
defined as writ e ordered wit h st ore- buffer forwarding. This model can be charact er-
ized as follows.
I n a single- processor syst em for memory regions defined as writ e- back cacheable,
t he memory- ordering model respect s t he following principles ( Not e t he memory-
ordering principles for single- processor and mult iple- processor syst ems are writ t en
from t he perspect ive of soft ware execut ing on t he processor, where t he t erm
processor refers t o a logical processor. For example, a physical processor
support ing mult iple cores and/ or HyperThreading Technology is t reat ed as a mult i-
processor syst ems. ) :
Reads are not reordered wit h ot her reads.
Writ es are not reordered wit h older reads.
Writ es t o memory are not reordered wit h ot her writ es, wit h t he following
except ions:
writ es execut ed wit h t he CLFLUSH inst ruct ion;
st reaming st ores ( writ es) execut ed wit h t he non- t emporal move inst ruct ions
( MOVNTI , MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) ; and
st ring operat ions ( see Sect ion 8.2. 4. 1) .
Reads may be reordered wit h older writ es t o different locat ions but not wit h older
writ es t o t he same locat ion.
Reads or writ es cannot be reordered wit h I / O inst ruct ions, locked inst ruct ions, or
serializing inst ruct ions.
Reads cannot pass earlier LFENCE and MFENCE inst ruct ions.
Writ es cannot pass earlier LFENCE, SFENCE, and MFENCE inst ruct ions.
LFENCE inst ruct ions cannot pass earlier reads.
SFENCE inst ruct ions cannot pass earlier writ es.
MFENCE inst ruct ions cannot pass earlier reads or writ es.
8-10 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
I n a mult iple- processor syst em, t he following ordering principles apply:
I ndividual processors use t he same ordering principles as in a single- processor
syst em.
Writ es by a single processor are observed in t he same order by all processors.
Writ es from an individual processor are NOT ordered wit h respect t o t he writ es
from ot her processors.
Memory ordering obeys causalit y ( memory ordering respect s t ransit ive
visibilit y) .
Any t wo st ores are seen in a consist ent order by processors ot her t han t hose
performing t he st ores
Locked inst ruct ions have a t ot al order.
See t he example in Figure 8- 1. Consider t hree processors in a syst em and each
processor performs t hree writ es, one t o each of t hree defined locat ions ( A, B, and C) .
I ndividually, t he processors perform t he writ es in t he same program order, but
because of bus arbit rat ion and ot her memory access mechanisms, t he order t hat t he
t hree processors writ e t he individual memory locat ions can differ each t ime t he
respect ive code sequences are execut ed on t he processors. The final values in loca-
t ion A, B, and C would possibly vary on each execut ion of t he writ e sequence.
The processor- ordering model described in t his sect ion is virt ually ident ical t o t hat
used by t he Pent ium and I nt el486 processors. The only enhancement s in t he Pent ium
4, I nt el Xeon, and P6 family processors are:
Added support for speculat ive reads, while st ill adhering t o t he ordering
principles above.
St ore- buffer forwarding, when a read passes a writ e t o t he same memory
locat ion.
Out of order st ore from long st ring st ore and st ring move operat ions ( see Sect ion
8.2.4, Out - of- Order St ores For St ring Operat ions, below) .
Vol. 3 8-11
MULTIPLE-PROCESSOR MANAGEMENT
NOTE
I n P6 processor family, st ore- buffer forwarding t o reads of WC memory from
st reaming st ores t o t he same address does not occur due t o errat a.
8.2.3 Examples Illustrating the Memory-Ordering Principles
This sect ion provides a set of examples t hat illust rat e t he behavior of t he memory-
ordering principles int roduced in Sect ion 8. 2.2. They are designed t o give soft ware
writ ers an underst anding of how memory ordering may affect t he result s of different
sequences of inst ruct ions.
These examples are limit ed t o accesses t o memory regions defined as writ e- back
cacheable ( WB) . ( Sect ion 8. 2.3. 1 describes ot her limit at ions on t he generalit y of t he
examples. ) The reader should underst and t hat t hey describe only soft ware- visible
behavior. A logical processor may reorder t wo accesses even if one of examples indi-
cat es t hat t hey may not be reordered. Such an example st at es only t hat soft ware
cannot det ect t hat such a reordering occurred. Similarly, a logical processor may
execut e a memory access more t han once as long as t he behavior visible t o soft ware
is consist ent wit h a single execut ion of t he memory access.
Figure 8-1. Example of Write Ordering in Multiple-Processor Systems
Processor #1 Processor #2 Processor #3
Write A.3
Write B.3
Write C.3
Write A.1
Write B.1
Write A.2
Write A.3
Write C.1
Write B.2
Write C.2
Write B.3
Write C.3
Order of Writes From Individual Processors
Write A.2
Write B.2
Write C.2
Write A.1
Write B.1
Write C.1
Writes from all
processors are
not guaranteed
to occur in a
particular order.
Each processor
is guaranteed to
perform writes in
program order.
Writes are in order
with respect to
individual processes.
Example of order of actual writes
from all processors to memory
8-12 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.2.3.1 Assumptions, Terminology, and Notation
As not ed above, t he examples in t his sect ion are limit ed t o accesses t o memory
regions defined as writ e- back cacheable ( WB) . They apply only t o ordinary loads
st ores and t o locked read- modify- writ e inst ruct ions. They do not necessarily apply t o
any of t he following: out - of- order st ores for st ring inst ruct ions ( see Sect ion 8. 2. 4) ;
accesses wit h a non- t emporal hint ; reads from memory by t he processor as part of
address t ranslat ion ( e. g., page walks) ; and updat es t o segment at ion and paging
st ruct ures by t he processor ( e. g., t o updat e accessed bit s) .
The principles underlying t he examples in t his sect ion apply t o individual memory
accesses and t o locked read- modify- writ e inst ruct ions. The I nt el- 64 memory-
ordering model guarant ees t hat , for each of t he following memory- access inst ruc-
t ions, t he const it uent memory operat ion appears t o execut e as a single memory
access:
I nst ruct ions t hat read or writ e a single byt e.
I nst ruct ions t hat read or writ e a word ( 2 byt es) whose address is aligned on a 2
byt e boundary.
I nst ruct ions t hat read or writ e a doubleword ( 4 byt es) whose address is aligned
on a 4 byt e boundary.
I nst ruct ions t hat read or writ e a quadword ( 8 byt es) whose address is aligned on
an 8 byt e boundary.
Any locked inst ruct ion ( eit her t he XCHG inst ruct ion or anot her read- modify- writ e
inst ruct ion wit h a LOCK prefix) appears t o execut e as an indivisible and unint errupt -
ible sequence of load( s) followed by st ore( s) regardless of alignment .
Ot her inst ruct ions may be implement ed wit h mult iple memory accesses. From a
memory- ordering point of view, t here are no guarant ees regarding t he relat ive order
in which t he const it uent memory accesses are made. There is also no guarant ee t hat
t he const it uent operat ions of a st ore are execut ed in t he same order as t he const it -
uent operat ions of a load.
Sect ion 8.2. 3. 2 t hrough Sect ion 8. 2. 3.7 give examples using t he MOV inst ruct ion.
The principles t hat underlie t hese examples apply t o load and st ore accesses in
general and t o ot her inst ruct ions t hat load from or st ore t o memory. Sect ion 8. 2. 3. 8
and Sect ion 8. 2. 3. 9 give examples using t he XCHG inst ruct ion. The principles t hat
underlie t hese examples apply t o ot her locked read- modify- writ e inst ruct ions.
This sect ion uses t he t erm processor is t o refer t o a logical processor. The examples
are writ t en using I nt el- 64 assembly- language synt ax and use t he following not a-
t ional convent ions:
Argument s beginning wit h an r , such as r1 or r2 refer t o regist ers ( e. g., EAX)
visible only t o t he processor being considered.
Memory locat ions are denot ed wit h x, y, z.
St ores are writ t en as mov [ _x] , val, which implies t hat val is being st ored int o
t he memory locat ion x.
Vol. 3 8-13
MULTIPLE-PROCESSOR MANAGEMENT
Loads are writ t en as mov r, [ _x] , which implies t hat t he cont ent s of t he memory
locat ion x are being loaded int o t he regist er r.
As not ed earlier, t he examples refer only t o soft ware visible behavior. When t he
succeeding sect ions make st at ement such as t he t wo st ores are reordered, t he
implicat ion is only t hat t he t wo st ores appear t o be reordered from t he point of view
of soft ware.
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
The I nt el- 64 memory- ordering model allows neit her loads nor st ores t o be reordered
wit h t he same kind of operat ion. That is, it ensures t hat loads are seen in program
order and t hat st ores are seen in program order. This is illust rat ed by t he following
example:
The disallowed ret urn values could be exhibit ed only if processor 0s t wo st ores are
reordered ( wit h t he t wo loads occurring bet ween t hem) or if processor 1s t wo loads
are reordered ( wit h t he t wo st ores occurring bet ween t hem) .
I f r1 = = 1, t he st ore t o y occurs before t he load from y. Because t he I nt el- 64
memory- ordering model does not allow st ores t o be reordered, t he earlier st ore t o x
occurs before t he load from y. Because t he I nt el- 64 memory- ordering model does
not allow loads t o be reordered, t he st ore t o x also occurs before t he lat er load from
x. This r2 = = 1.
8.2.3.3 Stores Are Not Reordered With Earlier Loads
The I nt el- 64 memory- ordering model ensures t hat a st ore by a processor may not
occur before a previous load by t he same processor. This is illust rat ed by t he
following example:
Example 8-1. Stores Are Not Reordered with Other Stores
Processor 0 Processor 1
mov [ _x], 1 mov r1, [ _y]
mov [ _y], 1 mov r2, [ _x]
Initially x == y == 0
r1 == 1 and r2 == 0 is not allowed
Example 8-2. Stores Are Not Reordered with Older Loads
Processor 0 Processor 1
mov r1, [ _x] mov r2, [ _y]
mov [ _y], 1 mov [ _x], 1
Initially x == y == 0
r1 == 1 and r2 == 1 is not allowed
8-14 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Assume r1 = = 1.
Because r1 = = 1, processor 1s st ore t o x occurs before processor 0s load from
x.
Because t he I nt el- 64 memory- ordering model prevent s each st ore from being
reordered wit h t he earlier load by t he same processor, processor 1s load from y
occurs before it s st ore t o x.
Similarly, processor 0s load from x occurs before it s st ore t o y.
Thus, processor 1s load from y occurs before processor 0s st ore t o y, implying
r2 = = 0.
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different
Locations
The I nt el- 64 memory- ordering model allows a load t o be reordered wit h an earlier
st ore t o a different locat ion. However, loads are not reordered wit h st ores t o t he
same locat ion.
The fact t hat a load may be reordered wit h an earlier st ore t o a different locat ion is
illust rat ed by t he following example:
At each processor, t he load and t he st ore are t o different locat ions and hence may be
reordered. Any int erleaving of t he operat ions is t hus allowed. One such int erleaving
has t he t wo loads occurring before t he t wo st ores. This would result in each load
ret urning value 0.
The fact t hat a load may not be reordered wit h an earlier st ore t o t he same locat ion
is illust rat ed by t he following example:
Example 8-3. Loads May be Reordered with Older Stores
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _y] mov r2, [ _x]
Initially x == y == 0
r1 == 0 and r2 == 0 is allowed
Example 8-4. Loads Are not Reordered with Older Stores to the Same Location
Processor 0
mov [ _x], 1
mov r1, [ _x]
Initially x == 0
r1 == 0 is not allowed
Vol. 3 8-15
MULTIPLE-PROCESSOR MANAGEMENT
The I nt el- 64 memory- ordering model does not allow t he load t o be reordered wit h
t he earlier st ore because t he accesses are t o t he same locat ion. Therefore, r1 = = 1
must hold.
8.2.3.5 Intra-Processor Forwarding Is Allowed
The memory- ordering model allows concurrent st ores by t wo processors t o be seen
in different orders by t hose t wo processors; specifically, each processor may perceive
it s own st ore occurring before t hat of t he ot her. This is illust rat ed by t he following
example:
The memory- ordering model imposes no const raint s on t he order in which t he t wo
st ores appear t o execut e by t he t wo processors. This fact allows processor 0 t o see
it s st ore before seeing processor 1' s, while processor 1 sees it s st ore before seeing
processor 0' s. ( Each processor is self consist ent . ) This allows r2 = = 0 and r4 = = 0.
I n pract ice, t he reordering in t his example can arise as a result of st ore- buffer
forwarding. While a st ore is t emporarily held in a processor' s st ore buffer, it can
sat isfy t he processor' s own loads but is not visible t o ( and cannot sat isfy) loads by
ot her processors.
8.2.3.6 Stores Are Transitively Visible
The memory- ordering model ensures t ransit ive visibilit y of st ores; st ores t hat are
causally relat ed appear t o all processors t o occur in an order consist ent wit h t he
causalit y relat ion. This is illust rat ed by t he following example:
Assume t hat r1 = = 1 and r2 = = 1.
Example 8-5. Intra-Processor Forwarding is Allowed
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _x] mov r3, [ _y]
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0
r2 == 0 and r4 == 0 is allowed
Example 8-6. Stores Are Transitively Visible
Processor 0 Processor 1 Processor 2
mov [ _x], 1 mov r1, [ _x]
mov [ _y], 1 mov r2, [ _y]
mov r3, [_x]
Initially x == y == 0
r1 == 1, r2 == 1, r3 == 0 is not allowed
8-16 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Because r1 = = 1, processor 0s st ore occurs before processor 1s load.
Because t he memory- ordering model prevent s a st ore from being reordered wit h
an earlier load ( see Sect ion 8. 2. 3. 3) , processor 1s load occurs before it s st ore.
Thus, processor 0s st ore causally precedes processor 1s st ore.
Because processor 0s st ore causally precedes processor 1s st ore, t he memory-
ordering model ensures t hat processor 0s st ore appears t o occur before
processor 1s st ore from t he point of view of all processors.
Because r2 = = 1, processor 1s st ore occurs before processor 2s load.
Because t he I nt el- 64 memory- ordering model prevent s loads from being
reordered ( see Sect ion 8. 2.3. 2) , processor 2s load occur in order.
The above it ems imply t hat processor 0s st ore t o x occurs before processor 2s
load from x. This implies t hat r3 = = 1.
8.2.3.7 Stores Are Seen in a Consistent Order by Other Processors
As not ed in Sect ion 8. 2.3. 5, t he memory- ordering model allows st ores by t wo
processors t o be seen in different orders by t hose t wo processors. However, any t wo
st ores must appear t o execut e in t he same order t o all processors ot her t han t hose
performing t he st ores. This is illust rat ed by t he following example:
By t he principles discussed in Sect ion 8. 2. 3. 2,
processor 2s first and second load cannot be reordered,
processor 3s first and second load cannot be reordered.
I f r1 = = 1 and r2 = = 0, processor 0s st ore appears t o precede processor 1s
st ore wit h respect t o processor 2.
Similarly, r3 = = 1 and r4 = = 0 imply t hat processor 1s st ore appears t o precede
processor 0s st ore wit h respect t o processor 1.
Because t he memory- ordering model ensures t hat any t wo st ores appear t o execut e
in t he same order t o all processors ( ot her t han t hose performing t he st ores) , t his set
of ret urn values is not allowed
Example 8-7. Stores Are Seen in a Consistent Order by Other Processors
Processor 0 Processor 1 Processor 2 Processor 3
mov [ _x], 1 mov [ _y], 1 mov r1, [ _x] mov r3, [_y]
mov r2, [ _y] mov r4, [_x]
Initially x == y ==0
r1 == 1, r2 == 0, r3 == 1, r4 == 0is not allowed
Vol. 3 8-17
MULTIPLE-PROCESSOR MANAGEMENT
8.2.3.8 Locked Instructions Have a Total Order
The memory- ordering model ensures t hat all processors agree on a single execut ion
order of all locked inst ruct ions, including t hose t hat are larger t han 8 byt es or are not
nat urally aligned. This is illust rat ed by t he following example:
Processor 2 and processor 3 must agree on t he order of t he t wo execut ions of XCHG.
Wit hout loss of generalit y, suppose t hat processor 0s XCHG occurs first .
I f r5 = = 1, processor 1s XCHG int o y occurs before processor 3s load from y.
Because t he I nt el- 64 memory- ordering model prevent s loads from being
reordered ( see Sect ion 8. 2. 3. 2) , processor 3s loads occur in order and,
t herefore, processor 1s XCHG occurs before processor 3s load from x.
Since processor 0s XCHG int o x occurs before processor 1s XCHG ( by
assumpt ion) , it occurs before processor 3s load from x. Thus, r6 = = 1.
A similar argument ( referring inst ead t o processor 2s loads) applies if processor 1s
XCHG occurs before processor 0s XCHG.
8.2.3.9 Loads and Stores Are Not Reordered with Locked Instructions
The memory- ordering model prevent s loads and st ores from being reordered wit h
locked inst ruct ions t hat execut e earlier or lat er. The examples in t his sect ion illust rat e
only cases in which a locked inst ruct ion is execut ed before a load or a st ore. The
reader should not e t hat reordering is prevent ed also if t he locked inst ruct ion is
execut ed aft er a load or a st ore.
The first example illust rat es t hat loads may not be reordered wit h earlier locked
inst ruct ions:
Example 8-8. Locked Instructions Have a Total Order
Processor 0 Processor 1 Processor 2 Processor 3
xchg [ _x], r1 xchg [ _y], r2
mov r3, [ _x] mov r5, [_y]
mov r4, [ _y] mov r6, [_x]
Initially r1 == r2 == 1, x == y == 0
r3 == 1, r4 == 0, r5 == 1, r6 == 0 is not allowed
Example 8-9. Loads Are not Reordered with Locks
Processor 0 Processor 1
xchg [ _x], r1 xchg [ _y], r3
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0, r1 == r3 == 1
r2 == 0 and r4 == 0 is not allowed
8-18 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
As explained in Sect ion 8. 2. 3. 8, t here is a t ot al order of t he execut ions of locked
inst ruct ions. Wit hout loss of generalit y, suppose t hat processor 0s XCHG occurs first .
Because t he I nt el- 64 memory- ordering model prevent s processor 1s load from
being reordered wit h it s earlier XCHG, processor 0s XCHG occurs before
processor 1s load. This implies r4 = = 1.
A similar argument ( referring inst ead t o processor 2s accesses) applies if
processor 1s XCHG occurs before processor 0s XCHG.
The second example illust rat es t hat a st ore may not be reordered wit h an earlier
locked inst ruct ion:
Assume r2 = = 1.
Because r2 = = 1, processor 0s st ore t o y occurs before processor 1s load from
y.
Because t he memory- ordering model prevent s a st ore from being reordered wit h
an earlier locked inst ruct ion, processor 0s XCHG int o x occurs before it s st ore t o
y. Thus, processor 0s XCHG int o x occurs before processor 1s load from y.
Because t he memory- ordering model prevent s loads from being reordered ( see
Sect ion 8.2. 3. 2) , processor 1s loads occur in order and, t herefore, processor 1s
XCHG int o x occurs before processor 1s load from x. Thus, r3 = = 1.
8.2.4 Out-of-Order Stores For String Operations
The I nt el Core 2 Duo, I nt el Core, Pent ium 4, and P6 family processors modify t he
processors operat ion during t he st ring st ore operat ions ( init iat ed wit h t he MOVS and
STOS inst ruct ions) t o maximize performance. Once t he fast st ring operat ions init ial
condit ions are met ( as described below) , t he processor will essent ially operat e on,
from an ext ernal perspect ive, t he st ring in a cache line by cache line mode. This
result s in t he processor looping on issuing a cache- line read for t he source address
and an invalidat ion on t he ext ernal bus for t he dest inat ion address, knowing t hat all
byt es in t he dest inat ion cache line will be modified, for t he lengt h of t he st ring. I n t his
mode int errupt s will only be accept ed by t he processor on cache line boundaries. I t is
possible in t his mode t hat t he dest inat ion line invalidat ions, and t herefore st ores, will
be issued on t he ext ernal bus out of order.
Code dependent upon sequent ial st ore ordering should not use t he st ring operat ions
for t he ent ire dat a st ruct ure t o be st ored. Dat a and semaphores should be separat ed.
Order dependent code should use a discret e semaphore uniquely st ored t o aft er any
st ring operat ions t o allow correct ly ordered dat a t o be seen by all processors.
Example 8-10. Stores Are not Reordered with Locks
Processor 0 Processor 1
xchg [ _x], r1 mov r2, [ _y]
mov [ _y], 1 mov r3, [ _x]
Initially x == y == 0, r1 == 1
r2 == 1 and r3 == 0 is not allowed
Vol. 3 8-19
MULTIPLE-PROCESSOR MANAGEMENT
Fast st ring operat ion can be disabled by clearing t he fast - st ring- enable bit ( bit 0) of
I A32_MI SC_ENABLES MSR.
I nit ial condit ions for fast st ring operat ions are implement at ion specific. Example
condit ions include:
EDI and ESI must be 8- byt e aligned for t he Pent ium III processor. EDI must be 8-
byt e aligned for t he Pent ium 4 processor.
St ring operat ion must be performed in ascending address order.
The init ial operat ion count er ( ECX) must be equal t o or great er t han 64.
Source and dest inat ion must not overlap by less t han a cache line ( 64 byt es, for
I nt el Core 2 Duo, I nt el Core, Pent ium M, and Pent ium 4 processors; 32 byt es P6
family and Pent ium processors) .
The memory t ype for bot h source and dest inat ion addresses must be eit her WB
or WC.
NOTE
I nit ial condit ions for fast st ring operat ion in fut ure I nt el 64 or I A- 32 processor fami-
lies may differ from above.
8.2.4.1 Memory-Ordering Model for String Operations on Write-back (WB)
Memory
This sect ion deals wit h t he memory- ordering model for st ring operat ions on writ e-
back ( WB) memory for t he I nt el 64 archit ect ure.
The memory- ordering model respect s t he follow principles:
1. St ores wit hin a single st ring operat ion may be execut ed out of order.
2. St ores from separat e st ring operat ions ( for example, st ores from consecut ive
st ring operat ions) do not execut e out of order. All t he st ores from an earlier st ring
operat ion will complet e before any st ore from a lat er st ring operat ion.
3. St ring operat ions are not reordered wit h ot her st ore operat ions.
Fast st ring operat ions ( e. g. st ring operat ions init iat ed wit h t he MOVS/ STOS inst ruc-
t ions and t he REP prefix) may be int errupt ed by except ions or int errupt s. The int er-
rupt s are precise but may be delayed - for example, t he int errupt ions may be t aken
at cache line boundaries, aft er every few it erat ions of t he loop, or aft er operat ing on
every few byt es. Different implement at ions may choose different opt ions, or may
even choose not t o delay int errupt handling, so soft ware should not rely on t he delay.
When t he int errupt / t rap handler is reached, t he source/ dest inat ion regist ers point t o
t he next st ring element t o be operat ed on, while t he EI P st ored in t he st ack point s t o
t he st ring inst ruct ion, and t he ECX regist er has t he value it held following t he last
successful it erat ion. The ret urn from t hat t rap/ int errupt handler should cause t he
st ring inst ruct ion t o be resumed from t he point where it was int errupt ed.
8-20 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
The st ring operat ion memory- ordering principles, ( it em 2 and 3 above) should be
int erpret ed by t aking t he incorrupt ibilit y of fast st ring operat ions int o account . For
example, if a fast st ring operat ion get s int errupt ed aft er k it erat ions, t hen st ores
performed by t he int errupt handler will become visible aft er t he fast st ring st ores
from it erat ion 0 t o k, and before t he fast st ring st ores from t he ( k+ 1) t h it erat ion
onward.
St ores wit hin a single st ring operat ion may execut e out of order ( it em 1 above) only
if fast st ring operat ion is enabled. Fast st ring operat ions are enabled/ disabled
t hrough t he I A32_MI SC_ENABLE model specific regist er.
8.2.4.2 Examples Illustrating Memory-Ordering Principles for String
Operations
The following examples uses t he same not at ion and convent ion as described in
Sect ion 8.2. 3. 1.
I n Example 8- 11, processor 0 does one round of ( 128 it erat ions) doubleword st ring
st ore operat ion via rep: st osd, writ ing t he value 1 ( value in EAX) int o a block of 512
byt es from locat ion _x ( kept in ES: EDI ) in ascending order. Since each operat ion
st ores a doubleword ( 4 byt es) , t he operat ion is repeat ed 128 t imes ( value in ECX) .
The block of memory init ially cont ained 0. Processor 1 is reading t wo memory loca-
t ions t hat are part of t he memory block being updat ed by processor 0, i. e, reading
locat ions in t he range _x t o ( _x+ 511) .
I t is possible for processor 1 t o perceive t hat t he repeat ed st ring st ores in processor
0 are happening out of order. Assume t hat fast st ring operat ions are enabled on
processor 0.
I n Example 8- 12, processor 0 does t wo separat e rounds of rep st osd operat ion of 128
doubleword st ores, writ ing t he value 1 ( value in EAX) int o t he first block of 512 byt es
from locat ion _x ( kept in ES: EDI ) in ascending order. I t t hen writ es 1 int o a second
block of memory from ( _x+ 512) t o ( _x+ 1023) . All of t he memory locat ions init ially
cont ain 0. The block of memory init ially cont ained 0. Processor 1 performs t wo load
operat ions from t he t wo blocks of memory.
Example 8-11. Stores Within a String Operation May be Reordered
Processor 0 Processor 1
rep:stosd [ _x] mov r1, [ _z]
mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_x] to 511[_x]== 0, _x <= _y < _z < _x+512
r1 == 1 and r2 == 0 is allowed
Vol. 3 8-21
MULTIPLE-PROCESSOR MANAGEMENT
I t is not possible in t he above example for processor 1 t o perceive any of t he st ores
from t he lat er st ring operat ion ( t o t he second 512 block) in processor 0 before seeing
t he st ores from t he earlier st ring operat ion t o t he first 512 block.
The above example assumes t hat writ es t o t he second block ( _x+ 512 t o _x+ 1023)
does not get execut ed while processor 0s st ring operat ion t o t he first block has been
int errupt ed. I f t he st ring operat ion t o t he first block by processor 0 is int errupt ed,
and a writ e t o t he second memory block is execut ed by t he int errupt handler, t hen
t hat change in t he second memory block will be visible before t he st ring operat ion t o
t he first memory block resumes.
I n Example 8- 13, processor 0 does one round of ( 128 it erat ions) doubleword st ring
st ore operat ion via rep: st osd, writ ing t he value 1 ( value in EAX) int o a block of 512
byt es from locat ion _x ( kept in ES: EDI ) in ascending order. I t t hen writ es t o a second
memory locat ion out side t he memory block of t he previous st ring operat ion.
Processor 1 performs t wo read operat ions, t he first read is from an address out side
t he 512- byt e block but t o be updat ed by processor 0, t he second ready is from inside
t he block of memory of st ring operat ion.
Processor 1 cannot perceive t he lat er st ore by processor 0 unt il it sees all t he st ores
from t he st ring operat ion. Example 8- 13 assumes t hat processor 0s st ore t o [ _z] is
Example 8-12. Stores Across String Operations Are not Reordered
Processor 0 Processor 1
rep:stosd [ _x]
mov r1, [ _z]
mov ecx, $128
mov r2, [ _y]
rep:stosd 512[ _x]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_x] to 1023[_x]== 0, _x <= _y < _x+512 < _z < _x+1024
r1 == 1 and r2 == 0 is not allowed
Example 8-13. String Operations Are not Reordered with later Stores
Processor 0 Processor 1
rep:stosd [ _x] mov r1, [ _z]
mov [_z], $1 mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is not allowed
8-22 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
not execut ed while t he st ring operat ion has been int errupt ed. I f t he st ring operat ion
is int errupt ed and t he st ore t o [ _z] by processor 0 is execut ed by t he int errupt
handler, t hen changes t o [ _z] will become visible before t he st ring operat ion
resumes.
Example 8- 14 illust rat es t he visibilit y principle when a st ring operat ion is int errupt ed.
I n Example 8- 14, processor 0 st art ed a st ring operat ion t o writ e t o a memory block
of 512 byt es st art ing at address _x. Processor 0 got int errupt ed aft er k it erat ions of
st ore operat ions. The address _y has not yet been updat ed by processor 0 when
processor 0 got int errupt ed. The int errupt handler t hat t ook cont rol on processor 0
writ es t o t he address _z. Processor 1 may see t he st ore t o _z from t he int errupt
handler, before seeing t he remaining st ores t o t he 512- byt e memory block t hat are
execut ed when t he st ring operat ion resumes.
Example 8- 15 illust rat es t he ordering of st ring operat ions wit h earlier st ores. No
st ore from a st ring operat ion can be visible before all prior st ores are visible.
Example 8-14. Interrupted String Operation
Processor 0 Processor 1
rep:stosd [ _x] // interrupted before es:edi reach
_y
mov r1, [ _z]
mov [_z], $1 // interrupt handler mov r2, [ _y]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is allowed
Example 8-15. String Operations Are not Reordered with Earlier Stores
Processor 0 Processor 1
mov [_z], $1 mov r1, [ _y]
rep:stosd [ _x] mov r2, [ _z]
Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x
Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory
location
r1 == 1 and r2 == 0 is not allowed
Vol. 3 8-23
MULTIPLE-PROCESSOR MANAGEMENT
8.2.5 Strengthening or Weakening the Memory-Ordering Model
The I nt el 64 and I A- 32 archit ect ures provide several mechanisms for st rengt hening
or weakening t he memory- ordering model t o handle special programming sit uat ions.
These mechanisms include:
The I / O inst ruct ions, locking inst ruct ions, t he LOCK prefix, and serializing
inst ruct ions force st ronger ordering on t he processor.
The SFENCE inst ruct ion ( int roduced t o t he I A- 32 archit ect ure in t he Pent ium III
processor) and t he LFENCE and MFENCE inst ruct ions ( int roduced in t he Pent ium
4 processor) provide memory- ordering and serializat ion capabilit ies for specific
t ypes of memory operat ions.
The memory t ype range regist ers ( MTRRs) can be used t o st rengt hen or weaken
memory ordering for specific area of physical memory ( see Sect ion 11. 11,
Memory Type Range Regist ers ( MTRRs) ) . MTRRs are available only in t he
Pent ium 4, I nt el Xeon, and P6 family processors.
The page at t ribut e t able ( PAT) can be used t o st rengt hen memory ordering for a
specific page or group of pages ( see Sect ion 11. 12, Page At t ribut e Table ( PAT) ) .
The PAT is available only in t he Pent ium 4, I nt el Xeon, and Pent ium III processors.
These mechanisms can be used as follows:
Memory mapped devices and ot her I / O devices on t he bus are oft en sensit ive t o t he
order of writ es t o t heir I / O buffers. I / O inst ruct ions can be used t o ( t he I N and OUT
inst ruct ions) impose st rong writ e ordering on such accesses as follows. Prior t o
execut ing an I / O inst ruct ion, t he processor wait s for all previous inst ruct ions in t he
program t o complet e and for all buffered writ es t o drain t o memory. Only inst ruct ion
fet ch and page t ables walks can pass I / O inst ruct ions. Execut ion of subsequent
inst ruct ions do not begin unt il t he processor det ermines t hat t he I / O inst ruct ion has
been complet ed.
Synchronizat ion mechanisms in mult iple- processor syst ems may depend upon a
st rong memory- ordering model. Here, a program can use a locking inst ruct ion such
as t he XCHG inst ruct ion or t he LOCK prefix t o ensure t hat a read- modify- writ e oper-
at ion on memory is carried out at omically. Locking operat ions t ypically operat e like
I / O operat ions in t hat t hey wait for all previous inst ruct ions t o complet e and for all
buffered writ es t o drain t o memory ( see Sect ion 8.1.2, Bus Locking ) .
Program synchronizat ion can also be carried out wit h serializing inst ruct ions ( see
Sect ion 8. 3) . These inst ruct ions are t ypically used at crit ical procedure or t ask
boundaries t o force complet ion of all previous inst ruct ions before a j ump t o a new
sect ion of code or a cont ext swit ch occurs. Like t he I / O and locking inst ruct ions, t he
processor wait s unt il all previous inst ruct ions have been complet ed and all buffered
writ es have been drained t o memory before execut ing t he serializing inst ruct ion.
The SFENCE, LFENCE, and MFENCE inst ruct ions provide a performance- efficient way
of ensuring load and st ore memory ordering bet ween rout ines t hat produce weakly-
ordered result s and rout ines t hat consume t hat dat a. The funct ions of t hese inst ruc-
t ions are as follows:
8-24 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
SFENCE Serializes all st ore ( writ e) operat ions t hat occurred prior t o t he
SFENCE inst ruct ion in t he program inst ruct ion st ream, but does not affect load
operat ions.
LFENCE Serializes all load ( read) operat ions t hat occurred prior t o t he LFENCE
inst ruct ion in t he program inst ruct ion st ream, but does not affect st ore
operat ions.
1
MFENCE Serializes all st ore and load operat ions t hat occurred prior t o t he
MFENCE inst ruct ion in t he program inst ruct ion st ream.
Not e t hat t he SFENCE, LFENCE, and MFENCE inst ruct ions provide a more efficient
met hod of cont rolling memory ordering t han t he CPUI D inst ruct ion.
The MTRRs were int roduced in t he P6 family processors t o define t he cache charac-
t erist ics for specified areas of physical memory. The following are t wo examples of
how memory t ypes set up wit h MTRRs can be used st rengt hen or weaken memory
ordering for t he Pent ium 4, I nt el Xeon, and P6 family processors:
The st rong uncached ( UC) memory t ype forces a st rong- ordering model on
memory accesses. Here, all reads and writ es t o t he UC memory region appear on
t he bus and out - of- order or speculat ive accesses are not performed. This
memory t ype can be applied t o an address range dedicat ed t o memory mapped
I / O devices t o force st rong memory ordering.
For areas of memory where weak ordering is accept able, t he writ e back ( WB)
memory t ype can be chosen. Here, reads can be performed speculat ively and
writ es can be buffered and combined. For t his t ype of memory, cache locking is
performed on at omic ( locked) operat ions t hat do not split across cache lines,
which helps t o reduce t he performance penalt y associat ed wit h t he use of t he
t ypical synchronizat ion inst ruct ions, such as XCHG, t hat lock t he bus during t he
ent ire read- modify- writ e operat ion. Wit h t he WB memory t ype, t he XCHG
inst ruct ion locks t he cache inst ead of t he bus if t he memory access is cont ained
wit hin a cache line.
The PAT was int roduced in t he Pent ium III processor t o enhance t he caching charac-
t erist ics t hat can be assigned t o pages or groups of pages. The PAT mechanism t ypi-
cally used t o st rengt hen caching charact erist ics at t he page level wit h respect t o t he
caching charact erist ics est ablished by t he MTRRs. Table 11- 7 shows t he int eract ion of
t he PAT wit h t he MTRRs.
I nt el recommends t hat soft ware writ t en t o run on I nt el Core 2 Duo, I nt el At om, I nt el
Core Duo, Pent ium 4, I nt el Xeon, and P6 family processors assume t he processor-
ordering model or a weaker memory- ordering model. The I nt el Core 2 Duo, I nt el
At om, I nt el Core Duo, Pent ium 4, I nt el Xeon, and P6 family processors do not imple-
1. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no
later instruction begins execution until LFENCE completes. As a result, an instruction that loads
from memory and that precedes an LFENCE receives data from memory prior to completion of
the LFENCE. An LFENCE that follows an instruction that stores to memory might complete before
the data being stored have become globally visible. Instructions following an LFENCE may be
fetched from memory before the LFENCE, but they will not execute until the LFENCE completes.
Vol. 3 8-25
MULTIPLE-PROCESSOR MANAGEMENT
ment a st rong memory- ordering model, except when using t he UC memory t ype.
Despit e t he fact t hat Pent ium 4, I nt el Xeon, and P6 family processors support
processor ordering, I nt el does not guarant ee t hat fut ure processors will support t his
model. To make soft ware port able t o fut ure processors, it is recommended t hat oper-
at ing syst ems provide crit ical region and resource cont rol const ruct s and API s ( appli-
cat ion program int erfaces) based on I / O, locking, and/ or serializing inst ruct ions be
used t o synchronize access t o shared areas of memory in mult iple- processor
syst ems. Also, soft ware should not depend on processor ordering in sit uat ions where
t he syst em hardware does not support t his memory- ordering model.
8.3 SERIALIZING INSTRUCTIONS
The I nt el 64 and I A- 32 archit ect ures define several ser i al i zi ng i nst r uct i ons. These
inst ruct ions force t he processor t o complet e all modificat ions t o flags, regist ers, and
memory by previous inst ruct ions and t o drain all buffered writ es t o memory before
t he next inst ruct ion is fet ched and execut ed. For example, when a MOV t o cont rol
regist er inst ruct ion is used t o load a new value int o cont rol regist er CR0 t o enable
prot ect ed mode, t he processor must perform a serializing operat ion before it ent ers
prot ect ed mode. This serializing operat ion ensures t hat all operat ions t hat were
st art ed while t he processor was in real- address mode are complet ed before t he
swit ch t o prot ect ed mode is made.
The concept of serializing inst ruct ions was int roduced int o t he I A- 32 archit ect ure
wit h t he Pent ium processor t o support parallel inst ruct ion execut ion. Serializing
inst ruct ions have no meaning for t he I nt el486 and earlier processors t hat do not
implement parallel inst ruct ion execut ion.
I t is import ant t o not e t hat execut ing of serializing inst ruct ions on P6 and more
recent processor families const rain speculat ive execut ion because t he result s of
speculat ively execut ed inst ruct ions are discarded. The following inst ruct ions are seri-
alizing inst ruct ions:
Pr i vi l eged ser i al i zi ng i nst r uct i ons I NVD, I NVEPT, I NVLPG, I NVVPI D, LGDT,
LI DT, LLDT, LTR, MOV ( t o cont rol regist er, wit h t he except ion of MOV CR8
2
) , MOV
( t o debug regist er) , WBI NVD, and WRMSR.
Non- pr i vi l eged ser i al i zi ng i nst r uct i ons CPUI D, I RET, and RSM.
When t he processor serializes inst ruct ion execut ion, it ensures t hat all pending
memory t ransact ions are complet ed ( including writ es st ored in it s st ore buffer)
before it execut es t he next inst ruct ion. Not hing can pass a serializing inst ruct ion and
a serializing inst ruct ion cannot pass any ot her inst ruct ion ( read, writ e, inst ruct ion
fet ch, or I / O) . For example, CPUI D can be execut ed at any privilege level t o serialize
inst ruct ion execut ion wit h no effect on program flow, except t hat t he EAX, EBX, ECX,
and EDX regist ers are modified.
2. MOV CR8 is not defined architecturally as a serializing instruction.
8-26 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
The following inst ruct ions are memory- ordering inst ruct ions, not serializing inst ruc-
t ions. These drain t he dat a memory subsyst em. They do not serialize t he inst ruct ion
execut ion st ream:
3
Non- pr i v i l eged memor y - or der i ng i nst r uct i ons SFENCE, LFENCE, and
MFENCE.
The SFENCE, LFENCE, and MFENCE inst ruct ions provide more granularit y in cont rol-
ling t he serializat ion of memory loads and st ores ( see Sect ion 8. 2. 5, St rengt hening
or Weakening t he Memory- Ordering Model ) .
The following addit ional informat ion is wort h not ing regarding serializing inst ruc-
t ions:
The processor does not writ eback t he cont ent s of modified dat a in it s dat a cache
t o ext ernal memory when it serializes inst ruct ion execut ion. Soft ware can force
modified dat a t o be writ t en back by execut ing t he WBI NVD inst ruct ion, which is a
serializing inst ruct ion. The amount of t ime or cycles for WBI NVD t o complet e will
vary due t o t he size of different cache hierarchies and ot her fact ors. As a conse-
quence, t he use of t he WBI NVD inst ruct ion can have an impact on
int errupt / event response t ime.
When an inst ruct ion is execut ed t hat enables or disables paging ( t hat is, changes
t he PG flag in cont rol regist er CR0) , t he inst ruct ion should be followed by a j ump
inst ruct ion. The t arget inst ruct ion of t he j ump inst ruct ion is fet ched wit h t he new
set t ing of t he PG flag ( t hat is, paging is enabled or disabled) , but t he j ump
inst ruct ion it self is fet ched wit h t he previous set t ing. The Pent ium 4, I nt el Xeon,
and P6 family processors do not require t he j ump operat ion following t he move t o
regist er CR0 ( because any use of t he MOV inst ruct ion in a Pent ium 4, I nt el Xeon,
or P6 family processor t o writ e t o CR0 is complet ely serializing) . However, t o
maint ain backwards and forward compat ibilit y wit h code writ t en t o run on ot her
I A- 32 processors, it is recommended t hat t he j ump operat ion be performed.
Whenever an inst ruct ion is execut ed t o change t he cont ent s of CR3 while paging
is enabled, t he next inst ruct ion is fet ched using t he t ranslat ion t ables t hat
correspond t o t he new value of CR3. Therefore t he next inst ruct ion and t he
sequent ially following inst ruct ions should have a mapping based upon t he new
value of CR3. ( Global ent ries in t he TLBs are not invalidat ed, see Sect ion 4. 10. 4,
I nvalidat ion of TLBs and Paging- St ruct ure Caches. )
The Pent ium processor and more recent processor families use branch- predict ion
t echniques t o improve performance by prefet ching t he dest inat ion of a branch
inst ruct ion before t he branch inst ruct ion is execut ed. Consequent ly, inst ruct ion
execut ion is not det erminist ically serialized when a branch inst ruct ion is
execut ed.
3. LFENCE does provide some guarantees on instruction ordering. It does not execute until all prior
instructions have completed locally, and no later instruction begins execution until LFENCE com-
pletes.
Vol. 3 8-27
MULTIPLE-PROCESSOR MANAGEMENT
8.4 MULTIPLE-PROCESSOR (MP) INITIALIZATION
The I A- 32 archit ect ure ( beginning wit h t he P6 family processors) defines a mult iple-
processor ( MP) init ializat ion prot ocol called t he Mult iprocessor Specificat ion Version
1. 4. This specificat ion defines t he boot prot ocol t o be used by I A- 32 processors in
mult iple- processor syst ems. ( Here, mul t i pl e pr ocessor s is defined as t wo or more
processors. ) The MP init ializat ion prot ocol has t he following import ant feat ures:
I t support s cont rolled boot ing of mult iple processors wit hout requiring dedicat ed
syst em hardware.
I t allows hardware t o init iat e t he boot ing of a syst em wit hout t he need for a
dedicat ed signal or a predefined boot processor.
I t allows all I A- 32 processors t o be boot ed in t he same manner, including t hose
support ing I nt el Hyper-Threading Technology.
The MP init ializat ion prot ocol also applies t o MP syst ems using I nt el 64
processors.
The mechanism for carrying out t he MP init ializat ion prot ocol differs depending on
t he I A- 32 processor family, as follows:
For P6 f ami l y pr ocessor s The select ion of t he BSP and APs ( see Sect ion
8. 4. 1, BSP and AP Processors ) is handled t hrough arbit rat ion on t he API C bus,
using BI PI and FI PI messages. See Appendix C, MP I nit ializat ion For P6 Family
Processors, for a complet e discussion of MP init ializat ion for P6 family
processors.
I nt el Xeon pr ocessor s w i t h f ami l y , model , and st eppi ng I Ds up t o F09H
The select ion of t he BSP and APs ( see Sect ion 8. 4.1, BSP and AP Processors ) is
handled t hrough arbit rat ion on t he syst em bus, using BI PI and FI PI messages
( see Sect ion 8.4. 3, MP I nit ializat ion Prot ocol Algorit hm for
I nt el Xeon Processors ) .
I nt el Xeon pr ocessor s w i t h f ami l y , model , and st eppi ng I Ds of F0AH and
beyond, 6E0H and beyond, 6F0H and beyond The select ion of t he BSP and
APs is handled t hrough a special syst em bus cycle, wit hout using BI PI and FI PI
message arbit rat ion ( see Sect ion 8.4. 3, MP I nit ializat ion Prot ocol Algorit hm for
I nt el Xeon Processors ) .
The family, model, and st epping I D for a processor is given in t he EAX regist er when
t he CPUI D inst ruct ion is execut ed wit h a value of 1 in t he EAX regist er.
8.4.1 BSP and AP Processors
The MP init ializat ion prot ocol defines t wo classes of processors: t he boot st rap
processor ( BSP) and t he applicat ion processors ( APs) . Following a power- up or
RESET of an MP syst em, syst em hardware dynamically select s one of t he processors
on t he syst em bus as t he BSP. The remaining processors are designat ed as APs.
8-28 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
As part of t he BSP select ion mechanism, t he BSP flag is set in t he I A32_API C_BASE
MSR ( see Figure 10- 5) of t he BSP, indicat ing t hat it is t he BSP. This flag is cleared for
all ot her processors.
The BSP execut es t he BI OSs boot - st rap code t o configure t he API C environment ,
set s up syst em- wide dat a st ruct ures, and st art s and init ializes t he APs. When t he BSP
and APs are init ialized, t he BSP t hen begins execut ing t he operat ing- syst em init ial-
izat ion code.
Following a power- up or reset , t he APs complet e a minimal self- configurat ion, t hen
wait for a st art up signal ( a SI PI message) from t he BSP processor. Upon receiving a
SI PI message, an AP execut es t he BI OS AP configurat ion code, which ends wit h t he
AP being placed in halt st at e.
For I nt el 64 and I A- 32 processors support ing I nt el Hyper-Threading Technology, t he
MP init ializat ion prot ocol t reat s each of t he logical processors on t he syst em bus or
coherent link domain as a separat e processor ( wit h a unique API C I D) . During boot -
up, one of t he logical processors is select ed as t he BSP and t he remainder of t he
logical processors are designat ed as APs.
8.4.2 MP Initialization Protocol Requirements and Restrictions
The MP init ializat ion prot ocol imposes t he following requirement s and rest rict ions on
t he syst em:
The MP prot ocol is execut ed only aft er a power- up or RESET. I f t he MP prot ocol
has complet ed and a BSP is chosen, subsequent I NI Ts ( eit her t o a specific
processor or syst em wide) do not cause t he MP prot ocol t o be repeat ed. I nst ead,
each logical processor examines it s BSP flag ( in t he I A32_API C_BASE MSR) t o
det ermine whet her it should execut e t he BI OS boot - st rap code ( if it is t he BSP) or
ent er a wait - for- SI PI st at e ( if it is an AP) .
All devices in t he syst em t hat are capable of delivering int errupt s t o t he
processors must be inhibit ed from doing so for t he durat ion of t he MP init ial-
izat ion prot ocol. The t ime during which int errupt s must be inhibit ed includes t he
window bet ween when t he BSP issues an I NI T- SI PI - SI PI sequence t o an AP and
when t he AP responds t o t he last SI PI in t he sequence.
8.4.3 MP Initialization Protocol Algorithm for
Intel Xeon Processors
Following a power- up or RESET of an MP syst em, t he processors in t he syst em
execut e t he MP init ializat ion prot ocol algorit hm t o init ialize each of t he logical proces-
sors on t he syst em bus or coherent link domain. I n t he course of execut ing t his algo-
rit hm, t he following boot - up and init ializat ion operat ions are carried out :
1. Each logical processor is assigned a unique API C I D, based on syst em t opology.
The unique I D is a 32- bit value if t he processor support s CPUI D leaf 0BH,
ot herwise t he unique I D is an 8- bit value. ( see Sect ion 8.4. 5, I dent ifying Logical
Vol. 3 8-29
MULTIPLE-PROCESSOR MANAGEMENT
Processors in an MP Syst em ) . This I D is writ t en int o t he local API C I D regist er for
each processor.
2. Each logical processor is assigned a unique arbit rat ion priorit y based on it s
API C I D.
3. Each logical processor execut es it s int ernal BI ST simult aneously wit h t he ot her
logical processors on t he syst em bus.
4. Upon complet ion of t he BI ST, t he logical processors use a hardware- defined
select ion mechanism t o select t he BSP and t he APs from t he available logical
processors on t he syst em bus. The BSP select ion mechanism differs depending
on t he family, model, and st epping I Ds of t he processors, as follows:
Family, model, and st epping I Ds of F0AH and onwards:
The logical processors begin monit oring t he BNR# signal, which is
t oggling. When t he BNR# pin st ops t oggling, each processor at t empt s t o
issue a NOP special cycle on t he syst em bus.
The logical processor wit h t he highest arbit rat ion priorit y succeeds in
issuing a NOP special cycle and is nominat ed t he BSP. This processor set s
t he BSP flag in it s I A32_API C_BASE MSR, t hen fet ches and begins
execut ing BI OS boot - st rap code, beginning at t he reset vect or ( physical
address FFFF FFF0H) .
The remaining logical processors ( t hat failed in issuing a NOP special
cycle) are designat ed as APs. They leave t heir BSP flags in t he clear st at e
and ent er a wait - for- SI PI st at e.
Family, model, and st epping I Ds up t o F09H:
Each processor broadcast s a BI PI t o all including self. The first processor
t hat broadcast s a BI PI ( and t hus receives it s own BI PI vect or) , select s
it self as t he BSP and set s t he BSP flag in it s I A32_API C_BASE MSR. ( See
Appendix C. 1, Overview of t he MP I nit ializat ion Process For P6 Family
Processors, for a descript ion of t he BI PI , FI PI , and SI PI messages. )
The remainder of t he processors ( which were not select ed as t he BSP) are
designat ed as APs. They leave t heir BSP flags in t he clear st at e and ent er
a wait - for- SI PI st at e.
The newly est ablished BSP broadcast s an FI PI message t o all including
self, which t he BSP and APs t reat as an end of MP init ializat ion signal.
Only t he processor wit h it s BSP flag set responds t o t he FI PI message. I t
responds by fet ching and execut ing t he BI OS boot - st rap code, beginning
at t he reset vect or ( physical address FFFF FFF0H) .
5. As part of t he boot - st rap code, t he BSP creat es an ACPI t able and an MP t able and
adds it s init ial API C I D t o t hese t ables as appropriat e.
6. At t he end of t he boot - st rap procedure, t he BSP set s a processor count er t o 1,
t hen broadcast s a SI PI message t o all t he APs in t he syst em. Here, t he SI PI
8-30 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
message cont ains a vect or t o t he BI OS AP init ializat ion code ( at 000VV000H,
where VV is t he vect or cont ained in t he SI PI message) .
7. The first act ion of t he AP init ializat ion code is t o set up a race ( among t he APs) t o
a BI OS init ializat ion semaphore. The first AP t o t he semaphore begins execut ing
t he init ializat ion code. ( See Sect ion 8.4. 4, MP I nit ializat ion Example, for
semaphore implement at ion det ails. ) As part of t he AP init ializat ion procedure,
t he AP adds it s API C I D number t o t he ACPI and MP t ables as appropriat e and
increment s t he processor count er by 1. At t he complet ion of t he init ializat ion
procedure, t he AP execut es a CLI inst ruct ion and halt s it self.
8. When each of t he APs has gained access t o t he semaphore and execut ed t he AP
init ializat ion code, t he BSP est ablishes a count for t he number of processors
connect ed t o t he syst em bus, complet es execut ing t he BI OS boot - st rap code,
and t hen begins execut ing operat ing- syst em boot - st rap and st art - up code.
9. While t he BSP is execut ing operat ing- syst em boot - st rap and st art - up code, t he
APs remain in t he halt ed st at e. I n t his st at e t hey will respond only t o I NI Ts, NMI s,
and SMI s. They will also respond t o snoops and t o assert ions of t he STPCLK# pin.
The following sect ion gives an example ( wit h code) of t he MP init ializat ion prot ocol
for mult iple I nt el Xeon processors operat ing in an MP configurat ion.
Appendix B, Model- Specific Regist ers ( MSRs) , describes how t o program t he
LI NT[ 0: 1] pins of t he processor s local API Cs aft er an MP configurat ion has been
complet ed.
8.4.4 MP Initialization Example
The following example illust rat es t he use of t he MP init ializat ion prot ocol used t o
init ialize processors in an MP syst em aft er t he BSP and APs have been est ablished.
The code runs on I nt el 64 or I A- 32 processors t hat use a prot ocol. This includes P6
Family processors, Pent ium 4 processors, I nt el Core Duo, I nt el Core 2 Duo and I nt el
Xeon processors.
The following const ant s and dat a definit ions are used in t he accompanying
code examples. They are based on t he addresses of t he API C regist ers defined in
Table 10- 1.
ICR_LOW EQU 0FEE00300H
SVR EQU 0FEE000F0H
APIC_ID EQU 0FEE00020H
LVT3 EQU 0FEE00370H
APIC_ENABLED EQU 0100H
BOOT_ID DD ?
COUNT EQU 00H
VACANT EQU 00H
Vol. 3 8-31
MULTIPLE-PROCESSOR MANAGEMENT
8.4.4.1 Typical BSP Initialization Sequence
Aft er t he BSP and APs have been select ed ( by means of a hardware prot ocol, see
Sect ion 8. 4. 3, MP I nit ializat ion Prot ocol Algorit hm for I nt el Xeon Processors ) , t he
BSP begins execut ing BI OS boot - st rap code ( POST) at t he normal I A- 32 archit ect ure
st art ing address ( FFFF FFF0H) . The boot - st rap code t ypically performs t he following
operat ions:
1. I nit ializes memory.
2. Loads t he microcode updat e int o t he processor.
3. I nit ializes t he MTRRs.
4. Enables t he caches.
5. Execut es t he CPUI D inst ruct ion wit h a value of 0H in t he EAX regist er, t hen reads
t he EBX, ECX, and EDX regist ers t o det ermine if t he BSP is GenuineI nt el.
6. Execut es t he CPUI D inst ruct ion wit h a value of 1H in t he EAX regist er, t hen saves
t he values in t he EAX, ECX, and EDX regist ers in a syst em configurat ion space in
RAM for use lat er.
7. Loads st art - up code for t he AP t o execut e int o a 4- KByt e page in t he lower 1
MByt e of memory.
8. Swit ches t o prot ect ed mode and ensures t hat t he API C address space is mapped
t o t he st rong uncacheable ( UC) memory t ype.
9. Det ermine t he BSPs API C I D from t he local API C I D regist er ( default is 0) , t he
code snippet below is an example t hat applies t o logical processors in a syst em
whose local API C unit s operat e in xAPI C mode t hat API C regist ers are accessed
using memory mapped int erface:
MOV ESI, APIC_ID; Address of local APIC ID register
MOV EAX, [ESI];
AND EAX, 0FF000000H; Zero out all other bits except APIC ID
MOV BOOT_ID, EAX; Save in memory
Saves t he API C I D in t he ACPI and MP t ables and opt ionally in t he syst em config-
urat ion space in RAM.
10. Convert s t he base address of t he 4- KByt e page for t he APs boot up code int o 8- bit
vect or. The 8- bit vect or defines t he address of a 4- KByt e page in t he real- address
mode address space ( 1- MByt e space) . For example, a vect or of 0BDH specifies a
st art - up memory address of 000BD000H.
11. Enables t he local API C by set t ing bit 8 of t he API C spurious vect or regist er ( SVR) .
MOV ESI, SVR; Address of SVR
MOV EAX, [ESI];
OR EAX, APIC_ENABLED; Set bit 8 to enable (0 on reset)
MOV [ESI], EAX;
8-32 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
12. Set s up t he LVT error handling ent ry by est ablishing an 8- bit vect or for t he API C
error handler.
MOV ESI, LVT3;
MOV EAX, [ESI];
AND EAX, FFFFFF00H; Clear out previous vector.
OR EAX, 000000xxH; xx is the 8-bit vector the APIC error handler.
MOV [ESI], EAX;
13. I nit ializes t he Lock Semaphore variable VACANT t o 00H. The APs use t his
semaphore t o det ermine t he order in which t hey execut e BI OS AP init ializat ion
code.
14. Performs t he following operat ion t o set up t he BSP t o det ect t he presence of APs
in t he syst em and t he number of processors:
Set s t he value of t he COUNT variable t o 1.
St art s a t imer ( set for an approximat e int erval of 100 milliseconds) . I n t he AP
BI OS init ializat ion code, t he AP will increment t he COUNT variable t o indicat e
it s presence. When t he t imer expires, t he BSP checks t he value of t he COUNT
variable. I f t he t imer expires and t he COUNT variable has not been incre-
ment ed, no APs are present or some error has occurred.
15. Broadcast s an I NI T- SI PI - SI PI I PI sequence t o t he APs t o wake t hem up and
init ialize t hem:
MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
; to all APs into EAX.
MOV [ESI], EAX; Broadcast INIT IPI to all APs
; 10-millisecond delay loop.
MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
; to all APs into EAX, where xx is the vector computed in step 10.
MOV [ESI], EAX; Broadcast SIPI IPI to all APs
; 200-microsecond delay loop
MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
; 200-microsecond delay loop
Step 15:
MOV EAX, 000C46XXH; Load ICR encoding from broadcast SIPI IP
; to all APs into EAX where xx is the vector computed in step 8.
16. Wait s for t he t imer int errupt .
17. Reads and evaluat es t he COUNT variable and est ablishes a processor count .
18. I f necessary, reconfigures t he API C and cont inues wit h t he remaining syst em
diagnost ics as appropriat e.
Vol. 3 8-33
MULTIPLE-PROCESSOR MANAGEMENT
8.4.4.2 Typical AP Initialization Sequence
When an AP receives t he SI PI , it begins execut ing BI OS AP init ializat ion code at t he
vect or encoded in t he SI PI . The AP init ializat ion code t ypically performs t he following
operat ions:
1. Wait s on t he BI OS init ializat ion Lock Semaphore. When cont rol of t he semaphore
is at t ained, init ializat ion cont inues.
2. Loads t he microcode updat e int o t he processor.
3. I nit ializes t he MTRRs ( using t he same mapping t hat was used for t he BSP) .
4. Enables t he cache.
5. Execut es t he CPUI D inst ruct ion wit h a value of 0H in t he EAX regist er, t hen reads
t he EBX, ECX, and EDX regist ers t o det ermine if t he AP is GenuineI nt el.
6. Execut es t he CPUI D inst ruct ion wit h a value of 1H in t he EAX regist er, t hen saves
t he values in t he EAX, ECX, and EDX regist ers in a syst em configurat ion space in
RAM for use lat er.
7. Swit ches t o prot ect ed mode and ensures t hat t he API C address space is mapped
t o t he st rong uncacheable ( UC) memory t ype.
8. Det ermines t he APs API C I D from t he local API C I D regist er, and adds it t o t he MP
and ACPI t ables and opt ionally t o t he syst em configurat ion space in RAM.
9. I nit ializes and configures t he local API C by set t ing bit 8 in t he SVR regist er and
set t ing up t he LVT3 ( error LVT) for error handling ( as described in st eps 9 and 10
in Sect ion 8.4.4. 1, Typical BSP I nit ializat ion Sequence ) .
10. Configures t he APs SMI execut ion environment . ( Each AP and t he BSP must have
a different SMBASE address. )
11. I ncrement s t he COUNT variable by 1.
12. Releases t he semaphore.
13. Execut es t he CLI and HLT inst ruct ions.
14. Wait s for an I NI T I PI .
8.4.5 Identifying Logical Processors in an MP System
Aft er t he BI OS has complet ed t he MP init ializat ion prot ocol, each logical processor
can be uniquely ident ified by it s local API C I D. Soft ware can access t hese API C I Ds in
eit her of t he following ways:
Read API C I D f or a l ocal API C Code running on a logical processor can read
API C I D in one of t wo ways depending on t he local API C unit is operat ing in
x2API C mode ( see I nt el 64 Archit ect ure x2API C Specificat ion) or in xAPI C
mode:
I f t he local API C unit support s x2API C and is operat ing in x2API C mode, 32-
bit API C I D can be read by execut ing a RDMSR inst ruct ion t o read t he
8-34 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
processor s x2API C I D regist er. This met hod is equivalent t o execut ing CPUI D
leaf 0BH described below.
I f t he local API C unit is operat ing in xAPI C mode, 8- bit API C I D can be read by
execut ing a MOV inst ruct ion t o read t he processor s local API C I D regist er
( see Sect ion 10. 4. 6, Local API C I D ) . This is t he I D t o use for direct ing
physical dest inat ion mode int errupt s t o t he processor.
Read ACPI or MP t abl e As part of t he MP init ializat ion prot ocol, t he BI OS
creat es an ACPI t able and an MP t able. These t ables are defined in t he Mult ipro-
cessor Specificat ion Version 1. 4 and provide soft ware wit h a list of t he processors
in t he syst em and t heir local API C I Ds. The format of t he ACPI t able is derived
from t he ACPI specificat ion, which is an indust ry st andard power management
and plat form configurat ion specificat ion for MP syst ems.
Read I ni t i al API C I D ( I f t he process does not support CPUI D leaf 0BH) An
API C I D is assigned t o a logical processor during power up. This is t he init ial API C
I D report ed by CPUI D.1: EBX[ 31: 24] and may be different from t he current value
read from t he local API C. The init ial API C I D can be used t o det ermine t he
t opological relat ionship bet ween logical processors for mult i- processor syst ems
t hat do not support CPUI D leaf 0BH.
Bit s in t he 8- bit init ial API C I D can be int erpret ed using several bit masks. Each
bit mask can be used t o ext ract an ident ifier t o represent a hierarchical level of
t he mult i- t hreading resource t opology in an MP syst em ( See Sect ion 8. 9. 1,
Hierarchical Mapping of Shared Resources ) . The init ial API C I D may consist of
up t o four bit - fields. I n a non- clust ered MP syst em, t he field consist s of up t o
t hree bit fields.
Read 32- bi t API C I D f r om CPUI D l eaf 0BH ( I f t he processor support s CPUI D
leaf 0BH) A unique API C I D is assigned t o a logical processor during power up.
This API C I D is report ed by CPUI D. 0BH: EDX[ 31: 0] as a 32- bit value. Use t he 32-
bit API C I D and CPUI D leaf 0BH t o det ermine t he t opological relat ionship bet ween
logical processors if t he processor support s CPUI D leaf 0BH.
Bit s in t he 32- bit x2API C I D can be ext ract ed int o sub- fields using CPUI D leaf 0BH
paramet ers. ( See Sect ion 8. 9. 1, Hierarchical Mapping of Shared Resources ) .
Figure 8- 2 shows t wo examples of API C I D bit fields in earlier single- core processors.
I n single- core I nt el Xeon processors, t he API C I D assigned t o a logical processor
during power- up and init ializat ion is 8 bit s. Bit s 2: 1 form a 2- bit physical package
ident ifier ( which can also be t hought of as a socket ident ifier) . I n syst ems t hat
configure physical processors in clust ers, bit s 4: 3 form a 2- bit clust er I D. Bit 0 is used
in t he I nt el Xeon processor MP t o ident ify t he t wo logical processors wit hin t he
package ( see Sect ion 8. 9.3, Hierarchical I D of Logical Processors in an MP Syst em ) .
For I nt el Xeon processors t hat do not support I nt el Hyper-Threading Technology, bit
0 is always set t o 0; for I nt el Xeon processors support ing I nt el Hyper-Threading
Technology, bit 0 performs t he same funct ion as it does for I nt el Xeon processor MP.
For more recent mult i- core processors, see Sect ion 8.9.1, Hierarchical Mapping of
Shared Resources for a complet e descript ion of t he t opological relat ionships
Vol. 3 8-35
MULTIPLE-PROCESSOR MANAGEMENT
bet ween logical processors and bit field locat ions wit hin an init ial API C I D across I nt el
64 and I A- 32 processor families.
Not e t he number of bit fields and t he widt h of bit - fields are dependent on processor
and plat form hardware capabilit ies. Soft ware should det ermine t hese at runt ime.
When init ial API C I Ds are assigned t o logical processors, t he value of API C I D
assigned t o a logical processor will respect t he bit - field boundaries corresponding
core, physical package, et c. Addit ional examples of t he bit fields in t he init ial API C I D
of mult i- t hreading capable syst ems are shown in Sect ion 8. 9.
For P6 family processors, t he API C I D t hat is assigned t o a processor during power-
up and init ializat ion is 4 bit s ( see Figure 8- 2) . Here, bit s 0 and 1 form a 2- bit
processor ( or socket ) ident ifier and bit s 2 and 3 form a 2- bit clust er I D.
8.5 INTEL

HYPER-THREADING TECHNOLOGY AND


INTEL

MULTI-CORE TECHNOLOGY
I nt el Hyper-Threading Technology and I nt el mult i- core t echnology are ext ensions t o
I nt el 64 and I A- 32 archit ect ures t hat enable a single physical processor t o execut e
t wo or more separat e code st reams ( called t hreads) concurrent ly. I n I nt el Hyper-
Threading Technology, a single processor core provides t wo logical processors t hat
share execut ion resources ( see Sect ion 8. 7, I nt el

Hyper-Threading Technology
Archit ect ure ) . I n I nt el mult i- core t echnology, a physical processor package provides
Figure 8-2. Interpretation of APIC ID in Early MP Systems
0
Processor ID
1 7 4 3 2
Cluster
Reserved
0
Processor ID
1 7 4 3 2 5
Cluster
Reserved
APIC ID Format for Intel Xeon Processors that
APIC ID Format for P6 Family Processors
0
do not Support Intel Hyper-Threading Technology
8-36 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
t wo or more processor cores. Bot h configurat ions r equir e chipset s and a BI OS t hat
suppor t t he t echnologies.
Soft ware should not rely on processor names t o det ermine whet her a processor
support s I nt el Hyper-Threading Technology or I nt el mult i- core t echnology. Use t he
CPUI D inst ruct ion t o det ermine processor capabilit y ( see Sect ion 8. 6. 2, I nit ializing
Mult i- Core Processors ) .
8.6 DETECTING HARDWARE MULTI-THREADING
SUPPORT AND TOPOLOGY
Use t he CPUI D inst ruct ion t o det ect t he presence of hardware mult i- t hreading
support in a physical processor. Hardware mult i- t hreading can support several vari-
et ies of mult igrade and/ or I nt el Hyper-Threading Technology. CPUI D inst ruct ion
provides several set s of paramet er informat ion t o aid soft ware enumerat ing t opology
informat ion. The relevant t opology enumerat ion paramet ers provided by CPUI D
include:
Har dw ar e Mul t i - Thr eadi ng f eat ur e f l ag ( CPUI D.1: EDX[ 28] = 1)
I ndicat es when set t hat t he physical package is capable of support ing I nt el
Hyper-Threading Technology and/ or mult iple cores.
Pr ocessor t opol ogy enumer at i on par amet er s f or 8- bi t API C I D:
Addr essabl e I Ds f or Logi cal pr ocessor s i n t he same Pack age
( CPUI D.1: EBX[ 23: 16] ) I ndicat es t he maximum number of addressable
I D for logical processors in a physical package. Wit hin a physical package,
t here may be addressable I Ds t hat are not occupied by any logical
processors. This paramet er does not represent s t he hardware capabilit y of
t he physical processor.
4
Addr essabl e I Ds f or pr ocessor cor es i n t he same Pack age
5

( CPUI D.( EAX= 4, ECX= 0
6
) : EAX[ 31: 26] + 1 = Y) I ndicat es t he maximum
number of addressable I Ds at t ribut able t o processor cores ( Y) in t he physical
package.
Ex t ended Pr ocessor Topol ogy Enumer at i on par amet er s f or 32- bi t API C
I D: I nt el 64 processors support ing CPUI D leaf 0BH will assign unique API C I Ds t o
each logical processor in t he syst em. CPUI D leaf 0BH report s t he 32- bit API C I D
4. Operating system and BIOS may implement features that reduce the number of logical proces-
sors available in a platform to applications at runtime to less than the number of physical pack-
ages times the number of hardware-capable logical processors per package.
5. Software must check CPUID for its support of leaf 4 when implementing support for multi-core. If
CPUID leaf 4 is not available at runtime, software should handle the situation as if there is only
one core per package.
6. Maximum number of cores in the physical package must be queried by executing CPUID with
EAX=4 and a valid ECX input value. Valid ECX input values start from 0.
Vol. 3 8-37
MULTIPLE-PROCESSOR MANAGEMENT
and provide t opology enumerat ion paramet ers. See CPUI D inst ruct ion reference
pages in I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2A.
The CPUI D feat ure flag may indicat e support for hardware mult i- t hreading when only
one logical processor available in t he package. I n t his case, t he decimal value repre-
sent ed by bit s 16 t hrough 23 in t he EBX regist er will have a value of 1.
Soft ware should not e t hat t he number of logical processors enabled by syst em soft -
ware may be less t han t he value of Addressable I Ds for Logical processors. Simi-
larly, t he number of cores enabled by syst em soft ware may be less t han t he value of
Addressable I Ds for processor cores.
Soft ware can det ect t he availabilit y of t he CPUI D ext ended t opology enumerat ion leaf
( 0BH) by performing t wo st eps:
Check maximum input value for basic CPUI D informat ion by execut ing CPUI D
wit h EAX= 0. I f CPUI D.0H: EAX is great er t han or equal or 11 ( 0BH) , t hen proceed
t o next st ep,
Check CPUI D. EAX= 0BH, ECX= 0H: EBX is non- zero.
I f bot h of t he above condit ions are t rue, ext ended t opology enumerat ion leaf is avail-
able. Not e t he presence of CPUI D leaf 0BH in a processor does not guarant ee support
t hat t he local API C support s x2API C. I f CPUI D. ( EAX= 0BH, ECX= 0H) : EBX ret urns
zero and maximum input value for basic CPUI D informat ion is great er t han 0BH, t hen
CPUI D. 0BH leaf is not support ed on t hat processor.
8.6.1 Initializing Processors
Supporting Hyper-Threading Technology
The init ializat ion process for an MP syst em t hat cont ains processors support ing I nt el
Hyper-Threading Technology is t he same as for convent ional MP syst ems ( see
Sect ion 8. 4, Mult iple- Processor ( MP) I nit ializat ion ) . One logical processor in t he
syst em is select ed as t he BSP and ot her processors ( or logical processors) are desig-
nat ed as APs. The init ializat ion process is ident ical t o t hat described in Sect ion 8. 4. 3,
MP I nit ializat ion Prot ocol Algorit hm for I nt el Xeon Processors, and Sect ion 8. 4.4,
MP I nit ializat ion Example.
During init ializat ion, each logical processor is assigned an API C I D t hat is st ored in
t he local API C I D regist er for each logical processor. I f t wo or more processors
support ing I nt el Hyper-Threading Technology are present , each logical processor on
t he syst em bus is assigned a unique I D ( see Sect ion 8. 9. 3, Hierarchical I D of Logical
Processors in an MP Syst em ) . Once logical processors have API C I Ds, soft ware
communicat es wit h t hem by sending API C I PI messages.
8-38 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.6.2 Initializing Multi-Core Processors
The init ializat ion process for an MP syst em t hat cont ains mult i- core I nt el 64 or I A- 32
processors is t he same as for convent ional MP syst ems ( see Sect ion 8. 4, Mult iple-
Processor ( MP) I nit ializat ion ) . A logical processor in one core is select ed as t he BSP;
ot her logical processors are designat ed as APs.
During init ializat ion, each logical processor is assigned an API C I D. Once logical
processors have API C I Ds, soft ware may communicat e wit h t hem by sending API C
I PI messages.
8.6.3 Executing Multiple Threads on an Intel

64 or IA-32
Processor Supporting Hardware Multi-Threading
Upon complet ing t he operat ing syst em boot - up procedure, t he boot st rap processor
( BSP) execut es operat ing syst em code. Ot her logical processors are placed in t he
halt st at e. To execut e a code st ream ( t hread) on a halt ed logical processor, t he oper-
at ing syst em issues an int erprocessor int errupt ( I PI ) addressed t o t he halt ed logical
processor. I n response t o t he I PI , t he processor wakes up and begins execut ing t he
t hread ident ified by t he int errupt vect or received as part of t he I PI .
To manage execut ion of mult iple t hreads on logical processors, an operat ing syst em
can use convent ional symmet ric mult iprocessing ( SMP) t echniques. For example, t he
operat ing- syst em can use a t ime- slice or load balancing mechanism t o periodically
int errupt each of t he act ive logical processors. Upon int errupt ing a logical processor,
t he operat ing syst em checks it s run queue for a t hread wait ing t o be execut ed and
dispat ches t he t hread t o t he int errupt ed logical processor.
8.6.4 Handling Interrupts on an IA-32 Processor Supporting
Hardware Multi-Threading
I nt errupt s are handled on processors support ing I nt el Hyper-Threading Technology
as t hey are on convent ional MP syst ems. Ext ernal int errupt s are received by t he I / O
API C, which dist ribut es t hem as int errupt messages t o specific logical processors
( see Figure 8- 3) .
Logical processors can also send I PI s t o ot her logical processors by writ ing t o t he I CR
regist er of it s local API C ( see Sect ion 10. 6, I ssuing I nt erprocessor I nt errupt s ) . This
also applies t o dual- core processors.
Vol. 3 8-39
MULTIPLE-PROCESSOR MANAGEMENT
8.7 INTEL

HYPER-THREADING TECHNOLOGY
ARCHITECTURE
Figure 8- 4 shows a generalized view of an I nt el processor support ing I nt el Hyper-
Threading Technology, using t he original I nt el Xeon processor MP as an example.
This implement at ion of t he I nt el Hyper-Threading Technology consist s of t wo logical
processors ( each represent ed by a separat e archit ect ural st at e) which share t he
processor s execut ion engine and t he bus int erface. Each logical processor also has
it s own advanced programmable int errupt cont roller ( API C) .

Figure 8-3. Local APICs and I/O APIC in MP System Supporting Intel HT Technology
I/O APIC
External
Interrupts
System Chip Set
Bridge
PCI
Interrupt Messages
Local APIC
Logical
Processor 0
Local APIC
Logical
Processor 1
Hyper-Threading Technology
Intel Processor with Intel
Bus Interface
Processor Core
IPIs
Interrupt
Messages
Local APIC
Logical
Processor 0
Local APIC
Logical
Processor 1
Hyper-Threading Technology
Intel Processor with Intel
Bus Interface
Processor Core
IPIs
Interrupt
Messages
8-40 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.7.1 State of the Logical Processors
The following feat ures are part of t he archit ect ural st at e of logical processors wit hin
I nt el 64 or I A- 32 processors support ing I nt el Hyper-Threading Technology. The
feat ures can be subdivided int o t hree groups:
Duplicat ed for each logical processor
Shared by logical processors in a physical processor
Shared or duplicat ed, depending on t he implement at ion
The following feat ures are duplicat ed for each logical processor:
General purpose regist ers ( EAX, EBX, ECX, EDX, ESI , EDI , ESP, and EBP)
Segment regist ers ( CS, DS, SS, ES, FS, and GS)
EFLAGS and EI P regist ers. Not e t hat t he CS and EI P/ RI P regist ers for each logical
processor point t o t he inst ruct ion st ream for t he t hread being execut ed by t he
logical processor.
x87 FPU regist ers ( ST0 t hrough ST7, st at us word, cont rol word, t ag word, dat a
operand point er, and inst ruct ion point er)
MMX regist ers ( MM0 t hrough MM7)
XMM regist ers ( XMM0 t hrough XMM7) and t he MXCSR regist er
Cont rol regist ers and syst em t able point er regist ers ( GDTR, LDTR, I DTR, t ask
regist er)
Figure 8-4. IA-32 Processor with Two Logical Processors Supporting Intel HT
Technology
Logical
Processor 0
Architectural
State
Bus Interface
Local APIC Local APIC
Logical
Processor 1
Architectural
State
Execution Engine
System Bus
Vol. 3 8-41
MULTIPLE-PROCESSOR MANAGEMENT
Debug regist ers ( DR0, DR1, DR2, DR3, DR6, DR7) and t he debug cont rol MSRs
Machine check global st at us ( I A32_MCG_STATUS) and machine check capabilit y
( I A32_MCG_CAP) MSRs
Thermal clock modulat ion and ACPI Power management cont rol MSRs
Time st amp count er MSRs
Most of t he ot her MSR regist ers, including t he page at t ribut e t able ( PAT) . See t he
except ions below.
Local API C regist ers.
Addit ional general purpose regist ers ( R8- R15) , XMM regist ers ( XMM8-XMM15) ,
cont rol regist er, I A32_EFER on I nt el 64 processors.
The following feat ures are shared by logical processors:
Memory t ype range regist ers ( MTRRs)
Whet her t he following feat ures are shared or duplicat ed is implement at ion- specific:
I A32_MI SC_ENABLE MSR ( MSR address 1A0H)
Machine check archit ect ure ( MCA) MSRs ( except for t he I A32_MCG_STATUS and
I A32_MCG_CAP MSRs)
Performance monit oring cont rol and count er MSRs
8.7.2 APIC Functionality
When a processor support ing I nt el Hyper-Threading Technology support is init ialized,
each logical processor is assigned a local API C I D ( see Table 10- 1) . The local API C I D
serves as an I D for t he logical processor and is st ored in t he logical processor s API C
I D regist er. I f t wo or more processors support ing I nt el Hyper-Threading Technology
are present in a dual processor ( DP) or MP syst em, each logical processor on t he
syst em bus is assigned a unique local API C I D ( see Sect ion 8. 9. 3, Hierarchical I D of
Logical Processors in an MP Syst em ) .
Soft ware communicat es wit h local processors using t he API Cs int erprocessor int er-
rupt ( I PI ) messaging facilit y. Set up and programming for API Cs is ident ical in proces-
sors t hat support and do not support I nt el Hyper-Threading Technology. See Chapt er
10, Advanced Programmable I nt errupt Cont roller ( API C) , for a det ailed discussion.
8.7.3 Memory Type Range Registers (MTRR)
MTRRs in a processor support ing I nt el Hyper-Threading Technology are shared by
logical processors. When one logical processor updat es t he set t ing of t he MTRRs,
set t ings are aut omat ically shared wit h t he ot her logical processors in t he same phys-
ical package.
The archit ect ures require t hat all MP syst ems based on I nt el 64 and I A- 32 processors
( t his includes logical processors) must use an ident ical MTRR memory map. This
8-42 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
gives soft ware a consist ent view of memory, independent of t he processor on which
it is running. See Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) , for infor-
mat ion on set t ing up MTRRs.
8.7.4 Page Attribute Table (PAT)
Each logical processor has it s own PAT MSR ( I A32_PAT) . However, as described in
Sect ion 11. 12, Page At t ribut e Table ( PAT) , t he PAT MSR set t ings must be t he same
for all processors in a syst em, including t he logical processors.
8.7.5 Machine Check Architecture
I n t he I nt el HT Technology cont ext as implement ed by processors based on I nt el
Net Burst

microarchit ect ure, all of t he machine check archit ect ure ( MCA) MSRs
( except for t he I A32_MCG_STATUS and I A32_MCG_CAP MSRs) are duplicat ed for
each logical processor. This permit s logical processors t o init ialize, configure, query,
and handle machine- check except ions simult aneously wit hin t he same physical
processor. The design is compat ible wit h machine check except ion handlers t hat
follow t he guidelines given in Chapt er 15, Machine- Check Archit ect ure.
The I A32_MCG_STATUS MSR is duplicat ed for each logical processor so t hat it s
machine check in progress bit field ( MCI P) can be used t o det ect recursion on t he
part of MCA handlers. I n addit ion, t he MSR allows each logical processor t o det er-
mine t hat a machine- check except ion is in progress independent of t he act ions of
anot her logical processor in t he same physical package.
Because t he logical processors wit hin a physical package are t ight ly coupled wit h
respect t o shared hardware resources, bot h logical processors are not ified of
machine check errors t hat occur wit hin a given physical processor. I f machine- check
except ions are enabled when a fat al error is report ed, all t he logical processors wit hin
a physical package are dispat ched t o t he machine- check except ion handler. I f
machine- check except ions are disabled, t he logical processors ent er t he shut down
st at e and assert t he I ERR# signal.
When enabling machine- check except ions, t he MCE flag in cont rol regist er CR4
should be set for each logical processor.
On I nt el At om family processors t hat support I nt el Hyper-Threading Technology, t he
MCA facilit ies are shared bet ween all logical processors on t he same processor core.
8.7.6 Debug Registers and Extensions
Each logical processor has it s own set of debug regist ers ( DR0, DR1, DR2, DR3, DR6,
DR7) and it s own debug cont rol MSR. These can be set t o cont rol and record debug
informat ion for each logical processor independent ly. Each logical processor also has
it s own last branch records ( LBR) st ack.
Vol. 3 8-43
MULTIPLE-PROCESSOR MANAGEMENT
8.7.7 Performance Monitoring Counters
Performance count ers and t heir companion cont rol MSRs are shared bet ween t he
logical processors wit hin a processor core for processors based on I nt el Net Burst
microarchit ect ure. As a result , soft ware must manage t he use of t hese resources.
The performance count er int errupt s, event s, and precise event monit oring support
can be set up and allocat ed on a per t hread ( per logical processor) basis.
See Sect ion 30. 9, Performance Monit oring and I nt el Hyper-Threading Technology in
Processors Based on I nt el Net Burst

Microarchit ect ure, for a discussion of perfor-


mance monit oring in t he I nt el Xeon processor MP.
I n I nt el At om processor family t hat support I nt el Hyper-Threading Technology, t he
performance count ers ( general- purpose and fixed- funct ion count ers) and t heir
companion cont rol MSRs are duplicat ed for each logical processor.
8.7.8 IA32_MISC_ENABLE MSR
The I A32_MI SC_ENABLE MSR ( MSR address 1A0H) is generally shared bet ween t he
logical processors in a processor core support ing I nt el Hyper-Threading Technology.
However, some bit fields wit hin I A32_MI SC_ENABLES MSR may be duplicat ed per
logical processor. The part it ion of shared or duplicat ed bit fields wit hin
I A32_MI SC_ENABLES is implement at ion dependent . Soft ware should program dupli-
cat ed fields carefully on all logical processors in t he syst em t o ensure consist ent
behavior.
8.7.9 Memory Ordering
The logical processors in an I nt el 64 or I A- 32 processor support ing I nt el Hyper-
Threading Technology obey t he same rules for memory ordering as I nt el 64 or I A- 32
processors wit hout I nt el HT Technology ( see Sect ion 8. 2, Memory Ordering ) . Each
logical processor uses a processor- ordered memory model t hat can be furt her
defined as writ e- ordered wit h st ore buffer forwarding. All mechanisms for st rengt h-
ening or weakening t he memory- ordering model t o handle special programming sit u-
at ions apply t o each logical processor.
8.7.10 Serializing Instructions
As a general rule, when a logical processor in a processor support ing I nt el Hyper-
Threading Technology execut es a serializing inst ruct ion, only t hat logical processor is
affect ed by t he operat ion. An except ion t o t his rule is t he execut ion of t he WBI NVD,
I NVD, and WRMSR inst ruct ions; and t he MOV CR inst ruct ion when t he st at e of t he CD
flag in cont rol regist er CR0 is modified. Here, bot h logical processors are serialized.
8-44 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.7.11 MICROCODE UPDATE Resources
I n an I nt el processor support ing I nt el Hyper-Threading Technology, t he microcode
updat e facilit ies are shared bet ween t he logical processors; eit her logical processor
can init iat e an updat e. Each logical processor has it s own BI OS signat ure MSR
( I A32_BI OS_SI GN_I D at MSR address 8BH) . When a logical processor performs an
updat e for t he physical processor, t he I A32_BI OS_SI GN_I D MSRs for resident logical
processors are updat ed wit h ident ical informat ion. I f logical processors init iat e an
updat e simult aneously, t he processor core provides t he necessary synchronizat ion
needed t o ensure t hat only one updat e is performed at a t ime.
Operat ing syst em microcode updat e drivers t hat adhere t o I nt els guidelines do not
need t o be modified t o run on processors support ing I nt el Hyper-Threading Tech-
nology.
8.7.12 Self Modifying Code
I nt el processors support ing I nt el Hyper-Threading Technology support self- modifying
code, where dat a writ es modify inst ruct ions cached or current ly in flight . They also
support cross- modifying code, where on an MP syst em writ es generat ed by one
processor modify inst ruct ions cached or current ly in flight on anot her. See Sect ion
8.1. 3, Handling Self- and Cross- Modifying Code, for a descript ion of t he require-
ment s for self- and cross- modifying code in an I A- 32 processor.
8.7.13 Implementation-Specific Intel HT Technology Facilities
The following non- archit ect ural facilit ies are implement at ion- specific in I A- 32 proces-
sors support ing I nt el Hyper-Threading Technology:
Caches
Translat ion lookaside buffers ( TLBs)
Thermal monit oring facilit ies
The I nt el Xeon processor MP implement at ion is described in t he following sect ions.
8.7.13.1 Processor Caches
For processors support ing I nt el Hyper-Threading Technology, t he caches are shared.
Any cache manipulat ion inst ruct ion t hat is execut ed on one logical processor has a
global effect on t he cache hierarchy of t he physical processor. Not e t he following:
WBI NVD i nst r uct i on The ent ire cache hierarchy is invalidat ed aft er modified
dat a is writ t en back t o memory. All logical processors are st opped from execut ing
unt il aft er t he writ e- back and invalidat e operat ion is complet ed. A special bus
cycle is sent t o all caching agent s. The amount of t ime or cycles for WBI NVD t o
complet e will vary due t o t he size of different cache hierarchies and ot her fact ors.
Vol. 3 8-45
MULTIPLE-PROCESSOR MANAGEMENT
As a consequence, t he use of t he WBI NVD inst ruct ion can have an impact on
int errupt / event response t ime.
I NVD i nst r uct i on The ent ire cache hierarchy is invalidat ed wit hout writ ing
back modified dat a t o memory. All logical processors are st opped from execut ing
unt il aft er t he invalidat e operat ion is complet ed. A special bus cycle is sent t o all
caching agent s.
CLFLUSH i nst r uct i on The specified cache line is invalidat ed from t he cache
hierarchy aft er any modified dat a is writ t en back t o memory and a bus cycle is
sent t o all caching agent s, regardless of which logical processor caused t he cache
line t o be filled.
CD f l ag i n cont r ol r egi st er CR0 Each logical processor has it s own CR0
cont rol regist er, and t hus it s own CD flag in CR0. The CD flags for t he t wo logical
processors are ORed t oget her, such t hat when any logical processor set s it s CD
flag, t he ent ire cache is nominally disabled.
8.7.13.2 Processor Translation Lookaside Buffers (TLBs)
I n processors support ing I nt el Hyper-Threading Technology, dat a cache TLBs are
shared. The inst ruct ion cache TLB may be duplicat ed or shared in each logical
processor, depending on implement at ion specifics of different processor families.
Ent ries in t he TLBs are t agged wit h an I D t hat indicat es t he logical processor t hat
init iat ed t he t ranslat ion. This t ag applies even for t ranslat ions t hat are marked global
using t he page- global feat ure for memory paging. See Sect ion 4. 10, Caching Trans-
lat ion I nformat ion, for informat ion about global t ranslat ions.
When a logical processor performs a TLB invalidat ion operat ion, only t he TLB ent ries
t hat are t agged for t hat logical processor are guarant eed t o be flushed. This prot ocol
applies t o all TLB invalidat ion operat ions, including writ es t o cont rol regist ers CR3
and CR4 and uses of t he I NVLPG inst ruct ion.
8.7.13.3 Thermal Monitor
I n a processor t hat support s I nt el Hyper-Threading Technology, logical processors
share t he cat ast rophic shut down det ect or and t he aut omat ic t hermal monit oring
mechanism ( see Sect ion 14. 5, Thermal Monit oring and Prot ect ion ) . Sharing result s
in t he following behavior:
I f t he processor s core t emperat ure rises above t he preset cat ast rophic shut down
t emperat ure, t he processor core halt s execut ion, which causes bot h logical
processors t o st op execut ion.
When t he processor s core t emperat ure rises above t he preset aut omat ic t hermal
monit or t rip t emperat ure, t he clock speed of t he processor core is aut omat ically
modulat ed, which effect s t he execut ion speed of bot h logical processors.
For soft ware cont rolled clock modulat ion, each logical processor has it s own
I A32_CLOCK_MODULATI ON MSR, allowing clock modulat ion t o be enabled or
8-46 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
disabled on a logical processor basis. Typically, if soft ware cont rolled clock modula-
t ion is going t o be used, t he feat ure must be enabled for all t he logical processors
wit hin a physical processor and t he modulat ion dut y cycle must be set t o t he same
value for each logical processor. I f t he dut y cycle values differ bet ween t he logical
processors, t he processor clock will be modulat ed at t he highest dut y cycle select ed.
8.7.13.4 External Signal Compatibility
This sect ion describes t he const raint s on ext ernal signals received t hrough t he pins
of a processor support ing I nt el Hyper-Threading Technology and how t hese signals
are shared bet ween it s logical processors.
STPCLK# A single STPCLK# pin is provided on t he physical package of t he
I nt el Xeon processor MP. Ext ernal cont rol logic uses t his pin for power
management wit hin t he syst em. When t he STPCLK# signal is assert ed, t he
processor core t ransit ions t o t he st op- grant st at e, where inst ruct ion execut ion is
halt ed but t he processor core cont inues t o respond t o snoop t ransact ions.
Regardless of whet her t he logical processors are act ive or halt ed when t he
STPCLK# signal is assert ed, execut ion is st opped on bot h logical processors and
neit her will respond t o int errupt s.
I n MP syst ems, t he STPCLK# pins on all physical processors are generally t ied
t oget her. As a result t his signal affect s all t he logical processors wit hin t he syst em
simult aneously.
LI NT0 and LI NT1 pi ns A processor support ing I nt el Hyper-Threading
Technology has only one set of LI NT0 and LI NT1 pins, which are shared bet ween
t he logical processors. When one of t hese pins is assert ed, bot h logical
processors respond unless t he pin has been masked in t he API C local vect or
t ables for one or bot h of t he logical processors.
Typically in MP syst ems, t he LI NT0 and LI NT1 pins are not used t o deliver
int errupt s t o t he logical processors. I nst ead all int errupt s are delivered t o t he
local processors t hrough t he I / O API C.
A20M# pi n On an I A- 32 processor, t he A20M# pin is t ypically provided for
compat ibilit y wit h t he I nt el 286 processor. Assert ing t his pin causes bit 20 of t he
physical address t o be masked ( forced t o zero) for all ext ernal bus memory
accesses. Processors support ing I nt el Hyper-Threading Technology provide one
A20M# pin, which affect s t he operat ion of bot h logical processors wit hin t he
physical processor.
The funct ionalit y of A20M# is used primarily by older operat ing syst ems and not
used by modern operat ing syst ems. On newer I nt el 64 processors, A20M# may
be absent .
Vol. 3 8-47
MULTIPLE-PROCESSOR MANAGEMENT
8.8 MULTI-CORE ARCHITECTURE
This sect ion describes t he archit ect ure of I nt el 64 and I A- 32 processors support ing
dual- core and quad- core t echnology. The discussion is applicable t o t he I nt el Pent ium
processor Ext reme Edit ion, Pent ium D, I nt el Core Duo, I nt el Core 2 Duo, Dual- core
I nt el Xeon processor, I nt el Core 2 Quad processors, and quad- core I nt el Xeon
processors. Feat ures vary across different microarchit ect ures and are det ect able
using CPUI D.
I n general, each processor core has dedicat ed microarchit ect ural resources ident ical
t o a single- processor implement at ion of t he underlying microarchit ect ure wit hout
hardware mult i- t hreading capabilit y. Each logical processor in a dual- core processor
( whet her support ing I nt el Hyper-Threading Technology or not ) has it s own API C
funct ionalit y, PAT, machine check archit ect ure, debug regist ers and ext ensions. Each
logical processor handles serializat ion inst ruct ions or self- modifying code on it s own.
Memory order is handled t he same way as in I nt el Hyper-Threading Technology.
The t opology of t he cache hierarchy ( wit h respect t o whet her a given cache level is
shared by one or more processor cores or by all logical processors in t he physical
package) depends on t he processor implement at ion. Soft ware must use t he det er-
minist ic cache paramet er leaf of CPUI D inst ruct ion t o discover t he cache- sharing
t opology bet ween t he logical processors in a mult i- t hreading environment .
8.8.1 Logical Processor Support
The t opological composit ion of processor cores and logical processors in a mult i- core
processor can be discovered using CPUI D. Wit hin each processor core, one or more
logical processors may be available.
Syst em soft ware must follow t he requirement MP init ializat ion sequences ( see
Sect ion 8. 4, Mult iple- Processor ( MP) I nit ializat ion ) t o recognize and enable logical
processors. At runt ime, soft ware can enumerat e t hose logical processors enabled by
syst em soft ware t o ident ify t he t opological relat ionships bet ween t hese logical
processors. ( See Sect ion 8. 9. 5, I dent ifying Topological Relat ionships in a MP
Syst em ) .
8.8.2 Memory Type Range Registers (MTRR)
MTRR is shared bet ween t wo logical processors sharing a processor core if t he phys-
ical processor support s I nt el Hyper-Threading Technology. MTRR is not shared
bet ween logical processors locat ed in different cores or different physical packages.
The I nt el 64 and I A- 32 archit ect ures require t hat all logical processors in an MP
syst em use an ident ical MTRR memory map. This gives soft ware a consist ent view of
memory, independent of t he processor on which it is running.
See Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) .
8-48 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.8.3 Performance Monitoring Counters
Performance count ers and t heir companion cont rol MSRs are shared bet ween t wo
logical processors sharing a processor core if t he processor core support s I nt el
Hyper-Threading Technology and is based on I nt el Net Burst microarchit ect ure. They
are not shared bet ween logical processors in different cores or different physical
packages. As a result , soft ware must manage t he use of t hese resources, based on
t he t opology of performance monit oring resources. Performance count er int errupt s,
event s, and precise event monit oring support can be set up and allocat ed on a per
t hread ( per logical processor) basis.
See Sect ion 30. 9, Performance Monit oring and I nt el Hyper-Threading Technology in
Processors Based on I nt el Net Burst

Microarchit ect ure.


8.8.4 IA32_MISC_ENABLE MSR
Some bit fields in I A32_MI SC_ENABLE MSR ( MSR address 1A0H) may be shared
bet ween t wo logical processors sharing a processor core, or may be shared bet ween
different cores in a physical processor. See Appendix B, Model- Specific Regist ers
( MSRs) .
8.8.5 MICROCODE UPDATE Resources
Microcode updat e facilit ies are shared bet ween t wo logical processors sharing a
processor core if t he physical package support s I nt el Hyper-Threading Technology.
They are not shared bet ween logical processors in different cores or different phys-
ical packages. Eit her logical processor t hat has access t o t he microcode updat e
facilit y can init iat e an updat e.
Each logical processor has it s own BI OS signat ure MSR ( I A32_BI OS_SI GN_I D at MSR
address 8BH) . When a logical processor performs an updat e for t he physical
processor, t he I A32_BI OS_SI GN_I D MSRs for resident logical processors are
updat ed wit h ident ical informat ion. I f logical processors init iat e an updat e simult a-
neously, t he processor core provides t he synchronizat ion needed t o ensure t hat only
one updat e is performed at a t ime.
8.9 PROGRAMMING CONSIDERATIONS FOR HARDWARE
MULTI-THREADING CAPABLE PROCESSORS
I n a mult i- t hreading environment , t here may be cert ain hardware resources t hat are
physically shared at some level of t he hardware t opology. I n t he mult i- processor
syst ems, t ypically bus and memory sub- syst ems are physically shared bet ween
mult iple socket s. Wit hin a hardware mult i- t hreading capable processors, cert ain
resources are provided for each processor core, while ot her resources may be
Vol. 3 8-49
MULTIPLE-PROCESSOR MANAGEMENT
provided for each logical processors ( see Sect ion 8. 7, I nt el

Hyper-Threading Tech-
nology Archit ect ure, and Sect ion 8. 8, Mult i- Core Archit ect ure ) .
From a soft ware programming perspect ive, cont rol t ransfer of processor operat ion is
managed at t he granularit y of logical processor ( operat ing syst ems dispat ch a
runnable t ask by allocat ing an available logical processor on t he plat form) . To
manage t he t opology of shared resources in a mult i- t hreading environment , it may
be useful for soft ware t o underst and and manage resources t hat are shared by more
t han one logical processors.
8.9.1 Hierarchical Mapping of Shared Resources
The API C_I D value associat ed wit h each logical processor in a mult i- processor
syst em is unique ( see Sect ion 8. 6, Det ect ing Hardware Mult i-Threading Support and
Topology ) . This 8- bit or 32- bit value can be decomposed int o sub- fields, where each
sub- field corresponds a hierarchical level of t he t opological mapping of hardware
resources.
The decomposit ion of an API C_I D may consist of several sub fields represent ing t he
t opology wit hin a physical processor package, t he higher- order bit s of an API C I D
may also be used by clust er vendors t o represent t he t opology of clust er nodes of
each coherent mult iprocessor syst ems. I f t he processor does not support CPUI D leaf
0BH, t he 8- bit init ial API C I D can represent 4 levels of hierarchy:
Cl ust er Some mult i- t hreading environment s consist s of mult iple clust ers of
mult i- processor syst ems. The CLUSTER_I D sub- field is usually support ed by
vendor firmware t o dist inguish different clust ers. For non- clust ered syst ems,
CLUSTER_I D is usually 0 and syst em t opology is reduced t o t hree levels of
hierarchy.
Pack age A mult i- processor syst em consist s of t wo or more socket s, each
mat es wit h a physical processor package. The PACKAGE_I D sub- field dist in-
guishes different physical packages wit hin a clust er.
Cor e A physical processor package consist s of one or more processor cores.
The CORE_I D sub- field dist inguishes processor cores in a package. For a single-
core processor, t he widt h of t his bit field is 0.
SMT A processor core provides one or more logical processors sharing
execut ion resources. The SMT_I D sub- field dist inguishes logical processors in a
core. The widt h of t his bit field is non- zero if a processor core provides more t han
one logical processors.
SMT and CORE sub- fields are bit - wise cont iguous in t he API C_I D field ( see
Figure 8- 5) .
8-50 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
I f t he processor support s CPUI D leaf 0BH, t he 32- bit API C I D can represent clust er
plus several levels of t opology wit hin t he physical processor package. The exact
number of hierarchical levels wit hin a physical processor package must be enumer-
at ed t hrough CPUI D leaf 0BH. Common processor families may employ t opology
similar t o t hat represent ed by 8- bit I nit ial API C I D. I n general, CPUI D leaf 0BH can
support t opology enumerat ion algorit hm t hat decompose a 32- bit API C I D int o more
t han four sub- fields ( see Figure 8- 6) .
The widt h of each sub- field depends on hardware and soft ware configurat ions. Field
widt hs can be det ermined at runt ime using t he algorit hm discussed below ( Example
8- 16 t hrough Example 8- 20) .
Figure 7- 6 depict s t he relat ionships of t hree of t he hierarchical sub- fields in a hypo-
t het ical MP syst em. The value of valid API C_I Ds need not be cont iguous across
package boundary or core boundaries.
Figure 8-5. Generalized Four level Interpretation of the APIC ID
Figure 8-6. Conceptual Five-level Topology and 32-bit APIC ID Composition
0
Package ID
SMT ID
X
Cluster ID
Reserved
Core ID
X=31 if x2APIC is supported
Otherwise X= 7
0
Package ID
R ID
31
Cluster ID
Reserved
Q ID
SMT ID
R
SMT
Q
Package
Physical Processor Topology 32-bit APIC ID Composition
Vol. 3 8-51
MULTIPLE-PROCESSOR MANAGEMENT
8.9.2 Hierarchical Mapping of CPUID Extended Topology Leaf
CPUI D leaf 0BH provides enumerat ion paramet ers for soft ware t o ident ify each hier-
archy of t he processor t opology in a det erminist ic manner. Each hierarchical level of
t he t opology st art ing from t he SMT level is represent ed numerically by a sub- leaf
index wit hin t he CPUI D 0BH leaf. Each level of t he t opology is mapped t o a sub- field
in t he API C I D, following t he general relat ionship depict ed in Figure 8- 6. This mech-
anism allows soft ware t o query t he exact number of levels wit hin a physical
processor package and t he bit - widt h of each sub- field of x2API C I D direct ly. For
example,
St art ing from sub- leaf index 0 and increment ing ECX unt il CPUI D. ( EAX= 0BH,
ECX= N) : ECX[ 15: 8] ret urns an invalid level t ype encoding. The number of
levels wit hin t he physical processor package is N ( excluding PACKAGE) . Using
Figure 8- 6 as an example, CPUI D. ( EAX= 0BH, ECX= 3) : ECX[ 15: 8] will report
00H, indicat ing sub leaf 03H is invalid. This is also depict ed by a pseudo code
example:
Example 8-16. Number of Levels Below the Physical Processor Package
Byte type = 1;
s = 0;
While ( type ) {
EAX = 0BH; // query each sub leaf of CPUID leaf 0BH
ECX = s;
CPUID;
type = ECX[15:8]; // examine level type encoding
s ++;
}
N = ECX[7:0];
Sub- leaf index 0 ( ECX= 0 as input ) provides enumerat ion paramet ers t o ext ract
t he SMT sub- field of x2API C I D. I f EAX = 0BH, and ECX = 0 is specified as input
when execut ing CPUI D, CPUI D. ( EAX= 0BH, ECX= 0) : EAX[ 4: 0] report s a value ( a
right - shift count ) t hat allow soft ware t o ext ract part of x2API C I D t o dist inguish
t he next higher t opological ent it ies above t he SMT level. This value also
corresponds t o t he bit - widt h of t he sub- field of x2API C I D corresponding t he
hierarchical level wit h sub- leaf index 0.
For each subsequent higher sub- leaf index m, CPUI D. ( EAX= 0BH,
ECX= m) : EAX[ 4: 0] report s t he right - shift count t hat will allow soft ware t o ext ract
part of x2API C I D t o dist inguish higher- level t opological ent it ies. This means t he
right - shift value at of sub- leaf m, corresponds t o t he least significant ( m+ 1)
subfields of t he 32- bit x2API C I D.
Example 8-17. BitWidth Determination of x2APIC ID Subfields
8-52 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
For m = 0, m < N, m ++;
{ cumulative_width[m] = CPUID.(EAX=0BH, ECX= m): EAX[4:0]; }
BitWidth[0] = cumulative_width[0];
For m = 1, m < N, m ++;
BitWidth[m] = cumulative_width[m] - cumulative_width[m-1];
Current ly, only t he following encoding of hierarchical level t ype are defined: 0
( invalid) , 1 ( SMT) , and 2 ( core) . Soft ware must not assume any level t ype encoding
value t o be relat ed t o any sub- leaf index, except sub- leaf 0.
Example 8- 16 and Example 8- 17 represent t he general t echnique for using CPUI D
leaf 0BH t o enumerat e processor t opology of more t han t wo levels of hierarchy inside
a physical package. Most processor families t o dat e requires only SMT and CORE
levels wit hin a physical package. The examples in lat er sect ions will focus on t hese
t hree- level t opology only.
8.9.3 Hierarchical ID of Logical Processors in an MP System
For I nt el 64 and I A- 32 processors, syst em hardware est ablishes an 8- bit init ial API C
I D ( or 32- bit API C I D if t he processor support s CPUI D leaf 0BH) t hat is unique for
each logical processor following power- up or RESET ( see Sect ion 8. 6.1) . Each logical
processor on t he syst em is allocat ed an init ial API C I D. BI OS may implement feat ures
t hat t ell t he OS t o support less t han t he t ot al number of logical processors on t he
syst em bus. Those logical processors t hat are not available t o applicat ions at runt ime
are halt ed during t he OS boot process. As a result , t he number valid local API C_I Ds
t hat can be queried by affinit izing- current - t hread- cont ext ( See Example 8- 22) is
limit ed t o t he number of logical processors enabled at runt ime by t he OS boot
process.
Table 8- 1 shows an example of t he 8- bit API C I Ds t hat are init ially report ed for logical
processors in a syst em wit h four I nt el Xeon MP processors t hat support I nt el Hyper-
Threading Technology ( a t ot al of 8 logical processors, each physical package has t wo
processor cores and support s I nt el Hyper-Threading Technology) . Of t he t wo logical
processors wit hin a I nt el Xeon processor MP, logical processor 0 is designat ed t he
primary logical processor and logical processor 1 as t he secondary logical processor.
Vol. 3 8-53
MULTIPLE-PROCESSOR MANAGEMENT
Table 8- 2 shows t he init ial API C I Ds for a hypot het ical sit uat ion wit h a dual processor
syst em. Each physical package providing t wo processor cores, and each processor
core also support ing I nt el Hyper-Threading Technology.
Figure 8-7. Topological Relationships between Hierarchical IDs in a Hypothetical MP
Platform
Table 8-1. Initial APIC IDs for the Logical Processors in a System that has Four Intel
Xeon MP Processors Supporting Intel Hyper-Threading Technology
1
Initial APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 1H 0H 0H
3H 1H 0H 1H
4H 2H 0H 0H
5H 2H 0H 1H
6H 3H 0H 0H
7H 3H 0H 1H
NOTE:
1. Because information on the number of processor cores in a physical package was not available
in early single-core processors supporting Intel Hyper-Threading Technology, the core ID can be
treated as 0.
Package 0
Core 0
T0 T1
Core1
T0 T1
Package 1
Core 0
T0 T1
Core1
T0 T1 SMT_ID
Core ID
Package ID
8-54 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.9.3.1 Hierarchical ID of Logical Processors with x2APIC ID
Table 8- 3 shows an example of possible x2API C I D assignment s for a dual processor
syst em t hat support x2API C. Each physical package providing four processor cores,
and each processor core also support ing I nt el Hyper-Threading Technology. Not e t hat
t he x2API C I D need not be cont iguous in t he syst em.
Table 8-2. Initial APIC IDs for the Logical Processors in a System that has Two
Physical Processors Supporting Dual-Core and Intel Hyper-Threading Technology
Initial APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 0H 1H 0H
3H 0H 1H 1H
4H 1H 0H 0H
5H 1H 0H 1H
6H 1H 1H 0H
7H 1H 1H 1H
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two
Physical Processors Supporting x2APIC and Intel Hyper-Threading Technology
x2APIC ID Package ID Core ID SMT ID
0H 0H 0H 0H
1H 0H 0H 1H
2H 0H 1H 0H
3H 0H 1H 1H
4H 0H 2H 0H
5H 0H 2H 1H
6H 0H 3H 0H
7H 0H 3H 1H
10H 1H 0H 0H
11H 1H 0H 1H
12H 1H 1H 0H
13H 1H 1H 1H
14H 1H 2H 0H
Vol. 3 8-55
MULTIPLE-PROCESSOR MANAGEMENT
8.9.4 Algorithm for Three-Level Mappings of APIC_ID
Soft ware can gat her t he init ial API C_I Ds for each logical processor support ed by t he
operat ing syst em at runt ime
7
and ext ract ident ifiers corresponding t o t he t hree
levels of sharing t opology ( package, core, and SMT) . The t hree- level algorit hms
below focus on a non- clust ered MP syst em for simplicit y. They do not assume API C
I Ds are cont iguous or t hat all logical processors on t he plat form are enabled.
I nt el support s mult i- t hreading syst ems where all physical processors report ident ical
values in CPUI D leaf 0BH, CPUI D.1: EBX[ 23: 16] ) , CPUI D. 4
8
: EAX[ 31: 26] , and
CPUI D. 4
9
: EAX[ 25: 14] . The algorit hms below assume t he t arget syst em has
symmet ry across physical package boundaries wit h respect t o t he number of logical
processors per package, number of cores per package, and cache t opology wit hin a
package.
The ext ract ion algorit hm ( for t hree- level mappings from an API C I D) uses t he
general procedure depict ed in Example 8- 18, and is supplement ed by more det ailed
descript ions on t he derivat ion of t opology enumerat ion paramet ers for ext ract ion bit
masks:
1. Det ect hardware mult i- t hreading support in t he processor.
2. Derive a set of bit masks t hat can ext ract t he sub I D of each hierarchical level of
t he t opology. The algorit hm t o derive ext ract ion bit masks for
SMT_I D/ CORE_I D/ PACKAGE_I D differs based on API C I D is 32- bit ( see st ep 3
below) or 8- bit ( see st ep 4 below) :
3. I f t he processor support s CPUI D leaf 0BH, each API C I D cont ains a 32- bit value,
t he t opology enumerat ion paramet ers needed t o derive t hree- level ext ract ion bit
masks are:
15H 1H 2H 1H
16H 1H 3H 0H
17H 1H 3H 1H
7. As noted in Section 8.6 and Section 8.9.3, the number of logical processors supported by the OS
at runtime may be less than the total number logical processors available in the platform hard-
ware.
8. Maximum number of addressable ID for processor cores in a physical processor is obtained by
executing CPUID with EAX=4 and a valid ECX index, The ECX index start at 0.
9. Maximum number addressable ID for processor cores sharing the target cache level is obtained
by executing CPUID with EAX = 4 and the ECX index corresponding to the target cache level.
Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two
Physical Processors Supporting x2APIC and Intel Hyper-Threading Technology
x2APIC ID Package ID Core ID SMT ID
8-56 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
a. Query t he right - shift value for t he SMT level of t he t opology using CPUI D leaf
0BH wit h ECX = 0H as input . The number of bit s t o shift - right on x2API C I D
( EAX[ 4: 0] ) can dist inguish different higher- level ent it ies above SMT ( e. g.
processor cores) in t he same physical package. This is also t he widt h of t he
bit mask t o ext ract t he SMT_I D.
b. Query CPUI D leaf 0BH for t he amount of bit shift t o dist inguish next higher-
level ent it ies ( e. g. physical processor packages) in t he syst em. This describes
an explicit t hree- level- t opology sit uat ion for commonly available processors.
Consult Example 8- 17 t o adapt t o sit uat ions beyond t hree- level t opology of a
physical processor. The widt h of t he ext ract ion bit mask can be used t o derive
t he cumulat ive ext ract ion bit mask t o ext ract t he sub I Ds of logical processors
( including different processor cores) in t he same physical package. The
ext ract ion bit mask t o dist inguish merely different processor cores can be
derived by xor ing t he SMT ext ract ion bit mask from t he cumulat ive
ext ract ion bit mask.
c. Query t he 32- bit x2API C I D for t he logical processor where t he current t hread
is execut ing.
d. Derive t he ext ract ion bit masks corresponding t o SMT_I D, CORE_I D, and
PACKAGE_I D, st art ing from SMT_I D.
e. Apply each ext ract ion bit mask t o t he 32- bit x2API C I D t o ext ract sub- field
I Ds.
4. I f t he processor does not support CPUI D leaf 0BH, each init ial API C I D cont ains
an 8- bit value, t he t opology enumerat ion paramet ers needed t o derive ext ract ion
bit masks are:
a. Query t he size of address space for sub I Ds t hat can accommodat e logical
processors in a physical processor package. This size paramet ers
( CPUI D. 1: EBX[ 23: 16] ) can be used t o derive t he widt h of an ext ract ion
bit mask t o enumerat e t he sub I Ds of different logical processors in t he same
physical package.
b. Query t he size of address space for sub I Ds t hat can accommodat e processor
cores in a physical processor package. This size paramet ers can be used t o
derive t he widt h of an ext ract ion bit mask t o enumerat e t he sub I Ds of
processor cores in t he same physical package.
c. Query t he 8- bit init ial API C I D for t he logical processor where t he current
t hread is execut ing.
d. Derive t he ext ract ion bit masks using respect ive address sizes corresponding
t o SMT_I D, CORE_I D, and PACKAGE_I D, st art ing from SMT_I D.
e. Apply each ext ract ion bit mask t o t he 8- bit init ial API C I D t o ext ract sub- field
I Ds.
Vol. 3 8-57
MULTIPLE-PROCESSOR MANAGEMENT
Example 8-18. Support Routines for Detecting Hardware Multi-Threading and Identifying the
Relationships Between Package, Core and Logical Processors
1. Detect support for Hardware Multi-Threading Support in a processor.
// Returns a non-zero value if CPUID reports the presence of hardware multi-threading
// support in the physical package where the current logical processor is located.
// This does not guarantee BIOS or OS will enable all logical processors in the physical
// package and make them available to applications.
// Returns zero if hardware multi-threading is not present.
#define HWMT_BIT 0x10000000
unsigned int HWMTSupported(void)
{
// ensure cpuid instruction is supported
execute cpuid with eax = 0 to get vendor string
execute cpuid with eax = 1 to get feature flag and signature
// Check to see if this a Genuine Intel Processor
if (vendor string EQ GenuineIntel) {
return (feature_flag_edx & HWMT_BIT); // bit 28
}
return 0;
}
Example 8-19. Support Routines for Identifying Package, Core and Logical Processors from
32-bit x2APIC ID
a. Derive the extraction bitmask for logical processors in a processor core and
associated mask offset for different cores.
int DeriveSMT_Mask_Offsets (void)
{
if (!HWMTSupported()) return -1;
execute cpuid with eax = 11, ECX = 0;
If (returned level type encoding in ECX[15:8] does not match SMT) return -1;
Mask_SMT_shift = EAX[4:0]; // # bits shift right of APIC ID to distinguish different cores
SMT_MASK = ~( (-1) << Mask_SMT_shift); // shift left to derive extraction bitmask for SMT_ID
return 0;
}
b. Derive the extraction bitmask for processor cores in a physical processor package
and associated mask offset for different packages.
8-58 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
int DeriveCore_Mask_Offsets (void)
{
if (!HWMTSupported()) return -1;
execute cpuid with eax = 11, ECX = 0;
while( ECX[15:8] ) { // level type encoding is valid
If (returned level type encoding in ECX[15:8] matches CORE) {
Mask_Core_shift = EAX[4:0]; // needed to distinguish different physical packages
COREPlusSMT_MASK = ~( (-1) << Mask_Core_shift);
CORE_MASK = COREPlusSMT_MASK ^ SMT_MASK;
PACKAGE_MASK = (-1) << Mask_Core_shift;
return 0
}
ECX ++;
execute cpuid with eax = 11;
}
return -1;
}
c. Query the x2APIC ID of a logical processor.
APIC_IDs for each logical processor.
unsigned char Getx2APIC_ID (void)
{
unsigned reg_edx = 0;
execute cpuid with eax = 11, ECX = 0
store returned value of edx
return (unsigned) (reg_edx) ;
}
Example 8-20. Support Routines for Identifying Package, Core and Logical Processors from 8-
bit Initial APIC ID
a. Find the size of address space for logical processors in a physical processor
package.
#define NUM_LOGICAL_BITS 0x00FF0000
// Use the mask above and CPUID.1.EBX[23:16] to obtain the max number of addressable IDs
// for logical processors in a physical package,
//Returns the size of address space of logical processors in a physical processor package;
// Software should not assume the value to be a power of 2.
Vol. 3 8-59
MULTIPLE-PROCESSOR MANAGEMENT
unsigned char MaxLPIDsPerPackage(void)
{
if (!HWMTSupported()) return 1;
execute cpuid with eax = 1
store returned value of ebx
return (unsigned char) ((reg_ebx & NUM_LOGICAL_BITS) >> 16);
}
b. Find the size of address space for processor cores in a physical processor package.
// Returns the max number of addressable IDs for processor cores in a physical processor package;
// Software should not assume cpuid reports this value to be a power of 2.
unsigned MaxCoreIDsPerPackage(void)
{
if (!HWMTSupported()) return (unsigned char) 1;
if cpuid supports leaf number 4
{ // we can retrieve multi-core topology info using leaf 4
execute cpuid with eax = 4, ecx = 0
store returned value of eax
return (unsigned) ((reg_eax >> 26) +1);
}
else // must be a single-core processor
return 1;
}
c. Query the initial APIC ID of a logical processor.
#define INITIAL_APIC_ID_BITS 0xFF000000 // CPUID.1.EBX[31:24] initial APIC ID
// Returns the 8-bit unique initial APIC ID for the processor running the code.
// Software can use OS services to affinitize the current thread to each logical processor
// available under the OS to gather the initial APIC_IDs for each logical processor.
unsigned GetInitAPIC_ID (void)
{
unsigned int reg_ebx = 0;
execute cpuid with eax = 1
store returned value of ebx
return (unsigned) ((reg_ebx & INITIAL_APIC_ID_BITS) >> 24;
}
d. Find the width of an extraction bitmask from the maximum count of the bit-field
(address size).
8-60 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
// Returns the mask bit width of a bit field from the maximum count that bit field can represent.
// This algorithm does not assume address size to have a value equal to power of 2.
// Address size for SMT_ID can be calculated from MaxLPIDsPerPackage()/MaxCoreIDsPerPackage()
// Then use the routine below to derive the corresponding width of SMT extraction bitmask
// Address size for CORE_ID is MaxCoreIDsPerPackage(),
// Derive the bitwidth for CORE extraction mask similarly
unsigned FindMaskWidth(Unsigned Max_Count)
{unsigned int mask_width, cnt = Max_Count;
__asm {
mov eax, cnt
mov ecx, 0
mov mask_width, ecx
dec eax
bsr cx, ax
jz next
inc cx
mov mask_width, ecx
next:
mov eax, mask_width
}
return mask_width;
}
e. Extract a sub ID from an 8-bit full ID, using address size of the sub ID and shift
count.
// The routine below can extract SMT_ID, CORE_ID, and PACKAGE_ID respectively from the init
APIC_ID
// To extract SMT_ID, MaxSubIDvalue is set to the address size of SMT_ID, Shift_Count = 0
// To extract CORE_ID, MaxSubIDvalue is the address size of CORE_ID, Shift_Count is width of SMT
extraction bitmask.
// Returns the value of the sub ID, this is not a zero-based value
Unsigned char GetSubID(unsigned char Full_ID, unsigned char MaxSubIDvalue, unsigned char
Shift_Count)
{
MaskWidth = FindMaskWidth(MaxSubIDValue);
MaskBits = ((uchar) (0xff << Shift_Count)) ^ ((uchar) (0xff << Shift_Count + MaskWidth)) ;
SubID = Full_ID & MaskBits;
Return SubID;
}
Vol. 3 8-61
MULTIPLE-PROCESSOR MANAGEMENT
Soft ware must not assume local API C_I D values in an MP syst em are consecut ive.
Non- consecut ive local API C_I Ds may be t he result of hardware configurat ions or
debug feat ures implement ed in t he BI OS or OS.
An ident ifier for each hierarchical level can be ext ract ed from an 8- bit API C_I D using
t he support rout ines illust rat ed in Example 8- 20. The appropriat e bit mask and shift
value t o const ruct t he appropriat e bit mask for each level must be det ermined
dynamically at runt ime.
8.9.5 Identifying Topological Relationships in a MP System
To det ect t he number of physical packages, processor cores, or ot her t opological
relat ionships in a MP syst em, t he following procedures are recommended:
Ext ract t he t hree- level ident ifiers from t he API C I D of each logical processor
enabled by syst em soft ware. The sequence is as follows ( See t he pseudo code
shown in Example 8- 21 and support rout ines shown in Example 8- 18) :
The ext ract ion st art from t he right - most bit field, corresponding t o
SMT_I D, t he innermost hierarchy in a t hree- level t opology ( See Figure
8- 7) . For t he right - most bit field, t he shift value of t he working mask is
zero. The widt h of t he bit field is det ermined dynamically using t he
maximum number of logical processor per core, which can be derived
from informat ion provided from CPUI D.
To ext ract t he next bit - field, t he shift value of t he working mask is
det ermined from t he widt h of t he bit mask of t he previous st ep. The widt h
of t he bit field is det ermined dynamically using t he maximum number of
cores per package.
To ext ract t he remaining bit - field, t he shift value of t he working mask is
det ermined from t he maximum number of logical processor per package.
So t he remaining bit s in t he API C I D ( excluding t hose bit s already
ext ract ed in t he t wo previous st eps) are ext ract ed as t he t hird ident ifier.
This applies t o a non- clust ered MP syst em, or if t here is no need t o
dist inguish bet ween PACKAGE_I D and CLUSTER_I D.
I f t here is need t o dist inguish bet ween PACKAGE_I D and CLUSTER_I D,
PACKAGE_I D can be ext ract ed using an algorit hm similar t o t he
ext ract ion of CORE_I D, assuming t he number of physical packages in
each node of a clust ered syst em is symmet ric.
Assemble t he t hree- level ident ifiers of SMT_I D, CORE_I D, PACKAGE_I Ds int o
arrays for each enabled logical processor. This is shown in Example 8- 22a.
To det ect t he number of physical packages: use PACKAGE_I D t o ident ify t hose
logical processors t hat reside in t he same physical package. This is shown in
Example 8- 22b. This example also depict s a t echnique t o const ruct a mask t o
represent t he logical processors t hat reside in t he same package.
To det ect t he number of processor cores: use CORE_I D t o ident ify t hose logical
processors t hat reside in t he same core. This is shown in Example 8- 22. This
8-62 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
example also depict s a t echnique t o const ruct a mask t o represent t he logical
processors t hat reside in t he same core.
I n Example 8- 21, t he numerical I D value can be obt ained from t he value ext ract ed
wit h t he mask by shift ing it right by shift count . Algorit hms below do not shift t he
value. The assumpt ion is t hat t he SubI D values can be compared for equivalence
wit hout t he need t o shift .
Example 8-21. Pseudo Code Depicting Three-level Extraction Algorithm
For Each local_APIC_ID{
// Calculate SMT_MASK, the bit mask pattern to extract SMT_ID,
// SMT_MASK is determined using topology enumertaion parameters
// from CPUID leaf 0BH (Example 8- 19) ;
// otherwise, SMT_MASK is determined using CPUID leaf 01H and leaf 04H (Example 8- 20).
// This algorithm assumes there is symmetry across core boundary, i.e. each core within a
// package has the same number of logical processors
// SMT_ID always starts from bit 0, corresponding to the right-most bit-field
SMT_ID = APIC_ID & SMT_MASK;
// Extract CORE_ID:
// CORE_MASK is determined in Example 8- 19 or Example 8- 20
CORE_ID = (APIC_ID & CORE_MASK) ;
// Extract PACKAGE_ID:
// Assume single cluster.
// Shift out the mask width for maximum logical processors per package
// PACKAGE_MASK is determined in Example 8- 19 or Example 8- 20
PACKAGE_ID = (APIC_ID & PACKAGE_MASK) ;
}
Example 8-22. Compute the Number of Packages, Cores, and Processor Relationships in a MP
System
a) Assemble lists of PACKAGE_ID, CORE_ID, and SMT_ID of each enabled logical processors
//The BIOS and/or OS may limit the number of logical processors available to applications
// after system boot. The below algorithm will compute topology for the processors visible
// to the thread that is computing it.
// Extract the 3-levels of IDs on every processor
// SystemAffinity is a bitmask of all the processors started by the OS. Use OS specific APIs to
// obtain it.
// ThreadAffinityMask is used to affinitize the topology enumeration thread to each processor
Vol. 3 8-63
MULTIPLE-PROCESSOR MANAGEMENT
using OS specific APIs.
// Allocate per processor arrays to store the Package_ID, Core_ID and SMT_ID for every started
// processor.

ThreadAffinityMask = 1;
ProcessorNum = 0;
while (ThreadAffinityMask != 0 && ThreadAffinityMask <= SystemAffinity) {
// Check to make sure we can utilize this processor first.
if (ThreadAffinityMask & SystemAffinity){
Set thread to run on the processor specified in ThreadAffinityMask
Wait if necessary and ensure thread is running on specified processor
APIC_ID = GetAPIC_ID(); // 32 bit ID in Example 8- 19 or 8- bit I D in Example
8- 20
Extract the Package_ID, Core_ID and SMT_ID as explained in three level extraction
algorithm of Example 8-21
PackageID[ProcessorNUM] = PACKAGE_ID;
CoreID[ProcessorNum] = CORE_ID;
SmtID[ProcessorNum] = SMT_ID;
ProcessorNum++;
}
ThreadAffinityMask <<= 1;
}
NumStartedLPs = ProcessorNum;
b) Using the list of PACKAGE_ID to count the number of physical packages in a MP system and
construct, for each package, a multi-bit mask corresponding to those logical processors residing in
the same package.
// Compute the number of packages by counting the number of processors
// with unique PACKAGE_IDs in the PackageID array.
// Compute the mask of processors in each package.
PackageIDBucket is an array of unique PACKAGE_ID values. Allocate an array of
NumStartedLPs count of entries in this array.
PackageProcessorMask is a corresponding array of the bit mask of processors belonging to
the same package, these are processors with the same PACKAGE_ID
The algorithm below assumes there is symmetry across package boundary if more than
one socket is populated in an MP system.
// Bucket Package IDs and compute processor mask for every package.
PackageNum = 1;
PackageIDBucket[0] = PackageID[0];
ProcessorMask = 1;
8-64 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
PackageProcessorMask[0] = ProcessorMask;
For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) {
ProcessorMask << = 1;
For (i=0; i < PackageNum; i++) {
// we may be comparing bit-fields of logical processors residing in different
// packages, the code below assume package symmetry
If (PackageID[ProcessorNum] == PackageIDBucket[i]) {
PackageProcessorMask[i] |= ProcessorMask;
Break; // found in existing bucket, skip to next iteration
}
}
if (i ==PackageNum) {
//PACKAGE_ID did not match any bucket, start new bucket
PackageIDBucket[i] = PackageID[ProcessorNum];
PackageProcessorMask[i] = ProcessorMask;
PackageNum++;
}
}
// PackageNum has the number of Packages started in OS
// PackageProcessorMask[] array has the processor set of each package
c) Using the list of CORE_ID to count the number of cores in a MP system and construct, for each
core, a multi-bit mask corresponding to those logical processors residing in the same core.
Processors in the same core can be determined by bucketing the processors with the same
PACKAGE_ID and CORE_ID. Note that code below can BIT OR the values of PACKGE and CORE ID
because they have not been shifted right.
The algorithm below assumes there is symmetry across package boundary if more than one socket
is populated in an MP system.
//Bucketing PACKAGE and CORE IDs and computing processor mask for every core
CoreNum = 1;
CoreIDBucket[0] = PackageID[0] | CoreID[0];
ProcessorMask = 1;
CoreProcessorMask[0] = ProcessorMask;
For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) {
ProcessorMask << = 1;
For (i=0; i < CoreNum; i++) {
// we may be comparing bit-fields of logical processors residing in different
// packages, the code below assume package symmetry
If ((PackageID[ProcessorNum] | CoreID[ProcessorNum]) == CoreIDBucket[i]) {
CoreProcessorMask[i] |= ProcessorMask;
Break; // found in existing bucket, skip to next iteration
}
Vol. 3 8-65
MULTIPLE-PROCESSOR MANAGEMENT
}
if (i == CoreNum) {
//Did not match any bucket, start new bucket
CoreIDBucket[i] = PackageID[ProcessorNum] | CoreID[ProcessorNum];
CoreProcessorMask[i] = ProcessorMask;
CoreNum++;
}
}
// CoreNum has the number of cores started in the OS
// CoreProcessorMask[] array has the processor set of each core
Ot her processor relat ionships such as processor mask of sibling cores can be
comput ed from set operat ions of t he PackageProcessorMask[ ] and CoreProcessor-
Mask[ ] .
The algorit hm shown above can be adapt ed t o work wit h earlier generat ions of
single- core I A- 32 processors t hat support I nt el Hyper-Threading Technology and in
sit uat ions t hat t he det erminist ic cache paramet er leaf is not support ed ( provided
CPUI D support s init ial API C I D) . A reference code example is available ( see I nt el 64
Archit ect ure Processor Topology Enumerat ion) .
8.10 MANAGEMENT OF IDLE AND BLOCKED CONDITIONS
When a logical processor in an MP syst em ( including mult i- core processor or proces-
sors support ing I nt el Hyper-Threading Technology) is idle ( no work t o do) or blocked
( on a lock or semaphore) , addit ional management of t he core execut ion engine
resource can be accomplished by using t he HLT ( halt ) , PAUSE, or t he
MONI TOR/ MWAI T inst ruct ions.
8.10.1 HLT Instruction
The HLT inst ruct ion st ops t he execut ion of t he logical processor on which it is
execut ed and places it in a halt ed st at e unt il furt her not ice ( see t he descript ion of t he
HLT inst ruct ion in Chapt er 3 of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2A) . When a logical processor is halt ed, act ive logical
processors cont inue t o have full access t o t he shared resources wit hin t he physical
package. Here shared resources t hat were being used by t he halt ed logical processor
become available t o act ive logical processors, allowing t hem t o execut e at great er
efficiency. When t he halt ed logical processor resumes execut ion, shared resources
are again shared among all act ive logical processors. ( See Sect ion 8. 10. 6. 3, Halt
I dle Logical Processors, for more informat ion about using t he HLT inst ruct ion wit h
processors support ing I nt el Hyper-Threading Technology. )
8-66 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.10.2 PAUSE Instruction
The PAUSE inst ruct ion can improves t he performance of processors support ing I nt el
Hyper-Threading Technology when execut ing spin- wait loops and ot her rout ines
where one t hread is accessing a shared lock or semaphore in a t ight polling loop.
When execut ing a spin- wait loop, t he processor can suffer a severe performance
penalt y when exit ing t he loop because it det ect s a possible memory order violat ion
and flushes t he core processor s pipeline. The PAUSE inst ruct ion provides a hint t o
t he processor t hat t he code sequence is a spin- wait loop. The processor uses t his hint
t o avoid t he memory order violat ion and pr event t he pipeline flush. I n addit ion, t he
PAUSE inst ruct ion de- pipelines t he spin- wait loop t o prevent it from consuming
execut ion resources excessively and consume power needlessly. ( See Sect ion
8.10. 6. 1, Use t he PAUSE I nst ruct ion in Spin-Wait Loops, for more informat ion
about using t he PAUSE inst ruct ion wit h I A- 32 processors support ing I nt el Hyper-
Threading Technology. )
8.10.3 Detecting Support MONITOR/MWAIT Instruction
St reaming SI MD Ext ensions 3 int roduced t wo inst ruct ions ( MONI TOR and MWAI T) t o
help mult it hreaded soft ware improve t hread synchronizat ion. I n t he init ial imple-
ment at ion, MONI TOR and MWAI T are available t o soft ware at ring 0. The inst ruct ions
are condit ionally available at levels great er t han 0. Use t he following st eps t o det ect
t he availabilit y of MONI TOR and MWAI T:
Use CPUI D t o query t he MONI TOR bit ( CPUI D. 1. ECX[ 3] = 1) .
I f CPUI D indicat es support , execut e MONI TOR inside a TRY/ EXCEPT except ion
handler and t rap for an except ion. I f an except ion occurs, MONI TOR and MWAI T
are not support ed at a privilege level great er t han 0. See Example 8- 23.
Example 8-23. Verifying MONITOR/MWAIT Support
boolean MONITOR_MWAIT_works = TRUE;
try {
_asm {
xor ecx, ecx
xor edx, edx
mov eax, MemArea
monitor
}
// Use monitor
} except (UNWIND) {
// if we get here, MONITOR/MWAIT is not supported
MONITOR_MWAIT_works = FALSE;
}
Vol. 3 8-67
MULTIPLE-PROCESSOR MANAGEMENT
8.10.4 MONITOR/MWAIT Instruction
Operat ing syst ems usually implement idle loops t o handle t hread synchronizat ion. I n
a t ypical idle- loop scenario, t here could be several busy loops and t hey would use a
set of memory locat ions. An impact ed processor wait s in a loop and poll a memory
locat ion t o det ermine if t here is available work t o execut e. The post ing of work is
t ypically a writ e t o memory ( t he work- queue of t he wait ing processor) . The t ime for
init iat ing a work request and get t ing it scheduled is on t he order of a few bus cycles.
From a resource sharing perspect ive ( logical processors sharing execut ion
resources) , use of t he HLT inst ruct ion in an OS idle loop is desirable but has implica-
t ions. Execut ing t he HLT inst ruct ion on a idle logical processor put s t he t arget ed
processor in a non- execut ion st at e. This requires anot her processor ( when post ing
work for t he halt ed logical processor) t o wake up t he halt ed processor using an int er-
processor int errupt . The post ing and servicing of such an int errupt int roduces a delay
in t he servicing of new work request s.
I n a shared memory configurat ion, exit s from busy loops usually occur because of a
st at e change applicable t o a specific memory locat ion; such a change t ends t o be
t riggered by writ es t o t he memory locat ion by anot her agent ( t ypically a processor) .
MONI TOR/ MWAI T complement t he use of HLT and PAUSE t o allow for efficient part i-
t ioning and un- part it ioning of shared resources among logical processors sharing
physical resources. MONI TOR set s up an effect ive address range t hat is monit ored for
writ e- t o- memory act ivit ies; MWAI T places t he processor in an opt imized st at e ( t his
may vary bet ween different implement at ions) unt il a writ e t o t he monit ored address
range occurs.
I n t he init ial implement at ion of MONI TOR and MWAI T, t hey are available at CPL = 0
only.
Bot h inst ruct ions rely on t he st at e of t he processor s monit or hardware. The monit or
hardware can be eit her armed ( by execut ing t he MONI TOR inst ruct ion) or t riggered
( due t o a variet y of event s, including a st ore t o t he monit ored memory region) . I f
upon execut ion of MWAI T, monit or hardware is in a t riggered st at e: MWAI T behaves
as a NOP and execut ion cont inues at t he next inst ruct ion in t he execut ion st ream.
The st at e of monit or hardware is not archit ect urally visible except t hrough t he
behavior of MWAI T.
Mult iple event s ot her t han a writ e t o t he t riggering address range can cause a
processor t hat execut ed MWAI T t o wake up. These include event s t hat would lead t o
volunt ary or involunt ary cont ext swit ches, such as:
Ext ernal int errupt s, including NMI , SMI , I NI T, BI NI T, MCERR, A20M#
Fault s, Abort s ( including Machine Check)
Archit ect ural TLB invalidat ions including writ es t o CR0, CR3, CR4 and cert ain MSR
writ es; execut ion of LMSW ( occurring prior t o issuing MWAI T but aft er set t ing t he
monit or)
Volunt ary t ransit ions due t o fast syst em call and far calls ( occurring prior t o
issuing MWAI T but aft er set t ing t he monit or)
8-68 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
Power management relat ed event s ( such as Thermal Monit or 2 or chipset driven
STPCLK# assert ion) will not cause t he monit or event pending flag t o be cleared.
Fault s will not cause t he monit or event pending flag t o be cleared.
Soft ware should not allow for volunt ary cont ext swit ches in bet ween
MONI TOR/ MWAI T in t he inst ruct ion flow. Not e t hat execut ion of MWAI T does not re-
arm t he monit or hardware. This means t hat MONI TOR/ MWAI T need t o be execut ed in
a loop. Also not e t hat exit s from t he MWAI T st at e could be due t o a condit ion ot her
t han a writ e t o t he t riggering address; soft ware should explicit ly check t he t riggering
dat a locat ion t o det ermine if t he writ e occurred. Soft ware should also check t he value
of t he t riggering address following t he execut ion of t he monit or inst ruct ion ( and prior
t o t he execut ion of t he MWAI T inst ruct ion) . This check is t o ident ify any writ es t o t he
t riggering address t hat occurred during t he course of MONI TOR execut ion.
The address range provided t o t he MONI TOR inst ruct ion must be of writ e- back
caching t ype. Only writ e- back memory t ype st ores t o t he monit ored address range
will t rigger t he monit or hardware. I f t he address range is not in memory of writ e-
back t ype, t he address monit or hardware may not be set up properly or t he monit or
hardware may not be armed. Soft ware is also responsible for ensuring t hat
Writ es t hat are not int ended t o cause t he exit of a busy loop do not writ e t o a
locat ion wit hin t he address region being monit ored by t he monit or hardware,
Writ es int ended t o cause t he exit of a busy loop are writ t en t o locat ions wit hin t he
monit ored address region.
Not doing so will lead t o more false wakeups ( an exit from t he MWAI T st at e not due
t o a writ e t o t he int ended dat a locat ion) . These have negat ive performance implica-
t ions. I t might be necessary for soft ware t o use padding t o prevent false wakeups.
CPUI D provides a mechanism for det ermining t he size dat a locat ions for monit oring
as well as a mechanism for det ermining t he size of a t he pad.
8.10.5 Monitor/Mwait Address Range Determination
To use t he MONI TOR/ MWAI T inst ruct ions, soft ware should know t he lengt h of t he
region monit ored by t he MONI TOR/ MWAI T inst ruct ions and t he size of t he coherence
line size for cache- snoop t raffic in a mult iprocessor syst em. This informat ion can be
queried using t he CPUI D monit or leaf funct ion ( EAX = 05H) . You will need t he
smallest and largest monit or line size:
To avoid missed wake- ups: make sure t hat t he dat a st ruct ure used t o monit or
writ es fit s wit hin t he smallest monit or line- size. Ot herwise, t he processor may
not wake up aft er a writ e int ended t o t rigger an exit from MWAI T.
To avoid false wake- ups; use t he largest monit or line size t o pad t he dat a
st ruct ure used t o monit or writ es. Soft ware must make sure t hat beyond t he dat a
st ruct ure, no unrelat ed dat a variable exist s in t he t riggering area for MWAI T. A
pad may be needed t o avoid t his sit uat ion.
These above t wo values bear no relat ionship t o cache line size in t he syst em and soft -
ware should not make any assumpt ions t o t hat effect . Wit hin a single- clust er syst em,
Vol. 3 8-69
MULTIPLE-PROCESSOR MANAGEMENT
t he t wo paramet ers should default t o be t he same ( t he size of t he monit or t riggering
area is t he same as t he syst em coherence line size) .
Based on t he monit or line sizes ret urned by t he CPUI D, t he OS should dynamically
allocat e st ruct ures wit h appropriat e padding. I f st at ic dat a st ruct ures must be used
by an OS, at t empt t o adapt t he dat a st ruct ure and use a dynamically allocat ed dat a
buffer for t hread synchronizat ion. When t he lat t er t echnique is not possible, consider
not using MONI TOR/ MWAI T when using st at ic dat a st ruct ures.
To set up t he dat a st ruct ure correct ly for MONI TOR/ MWAI T on mult i- clust ered
syst ems: int eract ion bet ween processors, chipset s, and t he BI OS is required ( syst em
coherence line size may depend on t he chipset used in t he syst em; t he size could be
different from t he processor s monit or t riggering area) . The BI OS is responsible t o
set t he correct value for syst em coherence line size using t he
I A32_MONI TOR_FI LTER_LI NE_SI ZE MSR. Depending on t he relat ive magnit ude of
t he size of t he monit or t riggering area versus t he value writ t en int o t he
I A32_MONI TOR_FI LTER_LI NE_SI ZE MSR, t he smaller of t he paramet ers will be
report ed as t he Smallest Monit or Line Size. The larger of t he paramet ers will be
report ed as t he Largest Monit or Line Size.
8.10.6 Required Operating System Support
This sect ion describes changes t hat must be made t o an operat ing syst em t o run on
processors support ing I nt el Hyper-Threading Technology. I t also describes opt imiza-
t ions t hat can help an operat ing syst em make more efficient use of t he logical
processors sharing execut ion resources. The required changes and suggest ed opt i-
mizat ions are represent at ive of t he t ypes of modificat ions t hat appear in Windows*
XP and Linux* kernel 2.4. 0 operat ing syst ems for I nt el processors support ing I nt el
Hyper-Threading Technology. Addit ional opt imizat ions for processors support ing
I nt el Hyper-Threading Technology are described in t he I nt el 64 and I A- 32 Archit ec-
t ures Opt imizat ion Reference Manual.
8.10.6.1 Use the PAUSE Instruction in Spin-Wait Loops
I nt el recommends t hat a PAUSE inst ruct ion be placed in all spin- wait loops t hat run
on I nt el processors support ing I nt el Hyper-Threading Technology and mult i- core
processors.
Soft ware rout ines t hat use spin- wait loops include mult iprocessor synchronizat ion
primit ives ( spin- locks, semaphores, and mut ex variables) and idle loops. Such
rout ines keep t he processor core busy execut ing a load- compare- branch loop while a
t hread wait s for a resource t o become available. I ncluding a PAUSE inst ruct ion in such
a loop great ly improves efficiency ( see Sect ion 8.10.2, PAUSE I nst ruct ion ) . The
following rout ine gives an example of a spin- wait loop t hat uses a PAUSE inst ruct ion:
Spin_Lock:
CMP lockvar, 0 ;Check if lock is free
8-70 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
JE Get_Lock
PAUSE ;Short delay
JMP Spin_Lock
Get_Lock:
MOV EAX, 1
XCHG EAX, lockvar ;Try to get lock
CMP EAX, 0 ;Test if successful
JNE Spin_Lock
Critical_Section:
<critical section code>
MOV lockvar, 0
...
Continue:
The spin- wait loop above uses a t est , t est - and- set t echnique for det ermining t he
availabilit y of t he synchronizat ion variable. This t echnique is recommended when
writ ing spin- wait loops.
I n I A- 32 processor generat ions earlier t han t he Pent ium 4 processor, t he PAUSE
inst ruct ion is t reat ed as a NOP inst ruct ion.
8.10.6.2 Potential Usage of MONITOR/MWAIT in C0 Idle Loops
An operat ing syst em may implement different handlers for different idle st at es. A
t ypical OS idle loop on an ACPI - compat ible OS is shown in Example 8- 24:
Example 8-24. A Typical OS Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue.
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1 handler
// shown below
}
}
}
Vol. 3 8-71
MULTIPLE-PROCESSOR MANAGEMENT
// C1 handler uses a Halt instruction
VOID C1Handler()
{ STI
HLT
}
The MONITOR and MWAIT instructions may be considered for use in the C0 idle state loops, if
MONITOR and MWAIT are supported.
Example 8-25. An OS Idle Loop with MONITOR/MWAIT in the C0 Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The following example assumes that the necessary padding has been
// added surrounding WorkQueue to eliminate false wakeups
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue.
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated.
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1
// handler shown below
MONITOR WorkQueue // Setup of eax with WorkQueue
// LinearAddress,
// ECX, EDX = 0
IF (WorkQueue != 0) THEN {
MWAIT
}
}
}
}
// C1 handler uses a Halt instruction.
VOID C1Handler()
{ STI
HLT
8-72 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
}
8.10.6.3 Halt Idle Logical Processors
I f one of t wo logical processors is idle or in a spin- wait loop of long durat ion, explicit ly
halt t hat processor by means of a HLT inst ruct ion.
I n an MP syst em, operat ing syst ems can place idle processors int o a loop t hat cont in-
uously checks t he run queue for runnable soft ware t asks. Logical processors t hat
execut e idle loops consume a significant amount of cores execut ion resources t hat
might ot herwise be used by t he ot her logical processors in t he physical package. For
t his reason, halt ing idle logical processors opt imizes t he performance.
10
I f all logical
processors wit hin a physical package are halt ed, t he processor will ent er a power-
saving st at e.
8.10.6.4 Potential Usage of MONITOR/MWAIT in C1 Idle Loops
An operat ing syst em may also consider replacing HLT wit h MONI TOR/ MWAI T in it s C1
idle loop. An example is shown in Example 8- 26:
Example 8-26. An OS Idle Loop with MONITOR/MWAIT in the C1 Idle Loop
// WorkQueue is a memory location indicating there is a thread
// ready to run. A non-zero value for WorkQueue is assumed to
// indicate the presence of work to be scheduled on the processor.
// The following example assumes that the necessary padding has been
// added surrounding WorkQueue to eliminate false wakeups
// The idle loop is entered with interrupts disabled.
WHILE (1) {
IF (WorkQueue) THEN {
// Schedule work at WorkQueue
} ELSE {
// No work to do - wait in appropriate C-state handler depending
// on Idle time accumulated
IF (IdleTime >= IdleTimeThreshhold) THEN {
// Call appropriate C1, C2, C3 state handler, C1
// handler shown below
}
}
}
// C1 handler uses a Halt instruction
VOID C1Handler()
10. Excessive transitions into and out of the HALT state could also incur performance penalties.
Operating systems should evaluate the performance trade-offs for their operating system.
Vol. 3 8-73
MULTIPLE-PROCESSOR MANAGEMENT
{
MONITOR WorkQueue // Setup of eax with WorkQueue LinearAddress,
// ECX, EDX = 0
IF (WorkQueue != 0) THEN {
STI
MWAIT // EAX, ECX = 0
}
}
8.10.6.5 Guidelines for Scheduling Threads on Logical Processors Sharing
Execution Resources
Because t he logical processors, t he order in which t hreads are dispat ched t o logical
processors for execut ion can affect t he overall efficiency of a syst em. The following
guidelines are recommended for scheduling t hreads for execut ion.
Dispat ch t hreads t o one logical processor per processor core before dispat ching
t hreads t o t he ot her logical processor sharing execut ion resources in t he same
processor core.
I n an MP syst em wit h t wo or more physical packages, dist ribut e t hreads out over
all t he physical processors, rat her t han concent rat e t hem in one or t wo physical
processors.
Use processor affinit y t o assign a t hread t o a specific processor core or package,
depending on t he cache- sharing t opology. The pract ice increases t he chance t hat
t he processor s caches will cont ain some of t he t hreads code and dat a when it is
dispat ched for execut ion aft er being suspended.
8.10.6.6 Eliminate Execution-Based Timing Loops
I nt el discourages t he use of t iming loops t hat depend on a processor s execut ion
speed t o measure t ime. There are several reasons:
Timing loops cause problems when t hey are calibrat ed on a I A- 32 processor
running at one clock speed and t hen execut ed on a processor running at anot her
clock speed.
Rout ines for calibrat ing execut ion- based t iming loops produce unpredict able
result s when run on an I A- 32 processor support ing I nt el Hyper-Threading
Technology. This is due t o t he sharing of execut ion resources bet ween t he logical
processors wit hin a physical package.
To avoid t he problems described, t iming loop rout ines must use a t iming mechanism
for t he loop t hat does not depend on t he execut ion speed of t he logical processors in
t he syst em. The following sources are generally available:
A high resolut ion syst em t imer ( for example, an I nt el 8254) .
8-74 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
A high resolut ion t imer wit hin t he processor ( such as, t he local API C t imer or t he
t ime- st amp count er) .
For addit ional informat ion, see t he I nt el 64 and I A- 32 Archit ect ures Opt imizat ion
Reference Manual.
8.10.6.7 Place Locks and Semaphores in Aligned, 128-Byte Blocks of
Memory
When soft ware uses locks or semaphores t o synchronize processes, t hreads, or ot her
code sect ions; I nt el recommends t hat only one lock or semaphore be present wit hin
a cache line ( or 128 byt e sect or, if 128- byt e sect or is support ed) . I n processors based
on I nt el Net Burst microarchit ect ure ( which support 128- byt e sect or consist ing of t wo
cache lines) , following t his recommendat ion means t hat each lock or semaphore
should be cont ained in a 128- byt e block of memory t hat begins on a 128- byt e
boundary. The pract ice minimizes t he bus t raffic required t o service locks.
Vol. 3 9-1
CHAPTER 9
PROCESSOR MANAGEMENT AND INITIALIZATION
This chapt er describes t he facilit ies provided for managing processor wide funct ions
and for init ializing t he processor. The subj ect s covered include: processor init ializa-
t ion, x87 FPU init ializat ion, processor configurat ion, feat ure det erminat ion, mode
swit ching, t he MSRs ( in t he Pent ium, P6 family, Pent ium 4, and I nt el Xeon proces-
sors) , and t he MTRRs ( in t he P6 family, Pent ium 4, and I nt el Xeon processors) .
9.1 INITIALIZATION OVERVIEW
Following power- up or an assert ion of t he RESET# pin, each processor on t he syst em
bus performs a hardware init ializat ion of t he processor ( known as a hardware reset )
and an opt ional built - in self- t est ( BI ST) . A hardware reset set s each processor s
regist ers t o a known st at e and places t he processor in real- address mode. I t also
invalidat es t he int ernal caches, t ranslat ion lookaside buffers ( TLBs) and t he branch
t arget buffer ( BTB) . At t his point , t he act ion t aken depends on t he processor family:
Pent i um 4 and I nt el Xeon pr ocessor s All t he processors on t he syst em bus
( including a single processor in a uniprocessor syst em) execut e t he mult iple
processor ( MP) init ializat ion prot ocol. The processor t hat is select ed t hrough t his
prot ocol as t he boot st rap processor ( BSP) t hen immediat ely st art s execut ing
soft ware- init ializat ion code in t he current code segment beginning at t he offset in
t he EI P regist er. The applicat ion ( non- BSP) processors ( APs) go int o a Wait For
St art up I PI ( SI PI ) st at e while t he BSP is execut ing init ializat ion code. See Sect ion
8. 4, Mult iple- Processor ( MP) I nit ializat ion, for more det ails. Not e t hat in a
uniprocessor syst em, t he single Pent ium 4 or I nt el Xeon processor aut omat ically
becomes t he BSP.
P6 f ami l y pr ocessor s The act ion t aken is t he same as for t he Pent ium 4 and
I nt el Xeon processors ( as described in t he previous paragraph) .
Pent i um pr ocessor s I n eit her a single- or dual- processor syst em, a single
Pent ium processor is always pre- designat ed as t he primary processor. Following
a reset , t he primary processor behaves as follows in bot h single- and dual-
processor syst ems. Using t he dual- processor ( DP) ready init ializat ion prot ocol,
t he primary processor immediat ely st art s execut ing soft ware- init ializat ion code
in t he current code segment beginning at t he offset in t he EI P regist er. The
secondary processor ( if t here is one) goes int o a halt st at e.
I nt el 486 pr ocessor The primary processor ( or single processor in a unipro-
cessor syst em) immediat ely st art s execut ing soft ware- init ializat ion code in t he
current code segment beginning at t he offset in t he EI P regist er. ( The I nt el486
does not aut omat ically execut e a DP or MP init ializat ion prot ocol t o det ermine
which processor is t he primary processor. )
9-2 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
The soft ware- init ializat ion code performs all syst em- specific init ializat ion of t he BSP
or primary processor and t he syst em logic.
At t his point , for MP ( or DP) syst ems, t he BSP ( or primary) processor wakes up each
AP ( or secondary) processor t o enable t hose processors t o execut e self- configurat ion
code.
When all processors are init ialized, configured, and synchronized, t he BSP or primary
processor begins execut ing an init ial operat ing- syst em or execut ive t ask.
The x87 FPU is also init ialized t o a known st at e during hardware reset . x87 FPU soft -
ware init ializat ion code can t hen be execut ed t o perform operat ions such as set t ing
t he precision of t he x87 FPU and t he except ion masks. No special init ializat ion of t he
x87 FPU is required t o swit ch operat ing modes.
Assert ing t he I NI T# pin on t he processor invokes a similar response t o a hardware
reset . The maj or difference is t hat during an I NI T, t he int ernal caches, MSRs, MTRRs,
and x87 FPU st at e are left unchanged ( alt hough, t he TLBs and BTB are invalidat ed as
wit h a hardware reset ) . An I NI T provides a met hod for swit ching from prot ect ed t o
real- address mode while maint aining t he cont ent s of t he int ernal caches.
9.1.1 Processor State After Reset
Table 9- 1 shows t he st at e of t he flags and ot her regist ers following power- up for t he
Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors. The st at e of cont rol
regist er CR0 is 60000010H ( see Figure 9- 1) . This places t he processor is in real-
address mode wit h paging disabled.
9.1.2 Processor Built-In Self-Test (BIST)
Hardware may request t hat t he BI ST be performed at power- up. The EAX regist er is
cleared ( 0H) if t he processor passes t he BI ST. A nonzero value in t he EAX regist er
aft er t he BI ST indicat es t hat a processor fault was det ect ed. I f t he BI ST is not
request ed, t he cont ent s of t he EAX regist er aft er a hardware reset is 0H.
The overhead for performing a BI ST varies bet ween processor families. For example,
t he BI ST t akes approximat ely 30 million processor clock periods t o execut e on t he
Pent ium 4 processor. This clock count is model- specific; I nt el reserves t he right t o
change t he number of periods for any I nt el 64 or I A- 32 processor, wit hout not ificat ion.
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
EFLAGS
1
00000002H 00000002H 00000002H
EIP 0000FFF0H 0000FFF0H 0000FFF0H
CR0 60000010H
2
60000010H
2
60000010H
2
Vol. 3 9-3
PROCESSOR MANAGEMENT AND INITIALIZATION
CR2, CR3, CR4 00000000H 00000000H 00000000H
CS Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
SS, DS, ES, FS, GS Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
EDX 00000FxxH 000n06xxH
3
000005xxH
EAX 0
4
0
4
0
4
EBX, ECX, ESI, EDI,
EBP, ESP
00000000H 00000000H 00000000H
ST0 through ST7
5
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
x87 FPU Control
Word
5
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
x87 FPU Status
Word
5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Tag
Word
5
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
x87 FPU Data
Operand and CS
Seg. Selectors
5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Data
Operand and Inst.
Pointers
5
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
MM0 through
MM7
5
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium II and Pentium III
Processors Only
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium with MMX
Technology Only
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
XMM0 through
XMM7
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
Pentium III processor Only
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
NA
MXCSR Pwr up or Reset: 1F80H
INIT: Unchanged
Pentium III processor only-
Pwr up or Reset: 1F80H
INIT: Unchanged
NA
GDTR, IDTR Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
9-4 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
LDTR, Task
Register
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
DR0, DR1, DR2,
DR3
00000000H 00000000H 00000000H
DR6 FFFF0FF0H FFFF0FF0H FFFF0FF0H
DR7 00000400H 00000400H 00000400H
Time-Stamp
Counter
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Perf. Counters and
Event Select
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
All Other MSRs Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Data and Code
Cache, TLBs
Invalid Invalid Invalid
Fixed MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Variable MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Machine-Check
Architecture
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Not Implemented
APIC Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
NOTES:
1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software
should not depend on the states of any of these bits.
2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared.
3. Where n is the Extended Model Value for the respective processor.
4. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST
cannot be invoked during an INIT.)
5. The state of the x87 FPU and MMX registers is not changed by the execution of an INIT.
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register Pentium 4 and Intel
Xeon Processor
P6 Family Processor Pentium Processor
Vol. 3 9-5
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.3 Model and Stepping Information
Following a hardware reset , t he EDX regist er cont ains component ident ificat ion and
revision informat ion ( see Figure 9- 2) . For example, t he model, family, and processor
t ype ret urned for t he first processor in t he I nt el Pent ium 4 family is as follows: model
( 0000B) , family ( 1111B) , and processor t ype ( 00B) .
The st epping I D field cont ains a unique ident ifier for t he processor s st epping I D or
revision level. The ext ended family and ext ended model fields were added t o t he
I A- 32 archit ect ure in t he Pent ium 4 processors.
Figure 9-1. Contents of CR0 Register after Reset
Figure 9-2. Version Information in the EDX Register after Reset
External x87 FPU error reporting: 0
(Not used): 1
No task switch: 0
x87 FPU instructions not trapped: 0
WAIT/FWAIT instructions not trapped: 0
Real-address mode: 0
31 19 16 15 0
P
E
1 2 3 4 5 6 17 18 28 29 30
M
P
E
M
1
N
E
T
S
P
G
C
D
N
W
W
P
A
M
Paging disabled: 0
Alignment check disabled: 0
Caching disabled: 1
Not write-through disabled: 1
Write-protect disabled: 0
Reserved Reserved
31 12 11 8 7 4 3 0
EDX
Family (1111B for the Pentium 4 Processor Family)
Model (Beginning with 0000B)
13 14
Processor Type
Model Family
Stepping
ID
15
Model
Extended
Extended
Family
16 19 20 23 24
Reserved
9-6 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.4 First Instruction Executed
The first inst ruct ion t hat is fet ched and execut ed following a hardware reset is
locat ed at physical address FFFFFFF0H. This address is 16 byt es below t he
processor s uppermost physical address. The EPROM cont aining t he soft ware-
init ializat ion code must be locat ed at t his address.
The address FFFFFFF0H is beyond t he 1- MByt e addressable range of t he processor
while in real- address mode. The processor is init ialized t o t his st art ing address as
follows. The CS regist er has t wo part s: t he visible segment select or part and t he
hidden base address part . I n real- address mode, t he base address is normally
formed by shift ing t he 16- bit segment select or value 4 bit s t o t he left t o produce a
20- bit base address. However, during a hardware reset , t he segment select or in t he
CS regist er is loaded wit h F000H and t he base address is loaded wit h FFFF0000H. The
st art ing address is t hus formed by adding t he base address t o t he value in t he EI P
regist er ( t hat is, FFFF0000 + FFF0H = FFFFFFF0H) .
The first t ime t he CS regist er is loaded wit h a new value aft er a hardware reset , t he
processor will follow t he normal rule for address t ranslat ion in real- address mode
( t hat is, [ CS base address = CS segment select or * 16] ) . To insure t hat t he base
address in t he CS regist er remains unchanged unt il t he EPROM based soft ware-
init ializat ion code is complet ed, t he code must not cont ain a far j ump or far call or
allow an int errupt t o occur ( which would cause t he CS select or value t o be changed) .
9.2 X87 FPU INITIALIZATION
Soft ware- init ializat ion code can det ermine t he whet her t he processor cont ains an
x87 FPU by using t he CPUI D inst ruct ion. The code must t hen init ialize t he x87 FPU
and set flags in cont rol regist er CR0 t o reflect t he st at e of t he x87 FPU environment .
A hardware reset places t he x87 FPU in t he st at e shown in Table 9- 1. This st at e is
different from t he st at e t he x87 FPU is placed in following t he execut ion of an FI NI T
or FNI NI T inst ruct ion ( also shown in Table 9- 1) . I f t he x87 FPU is t o be used, t he soft -
ware- init ializat ion code should execut e an FI NI T/ FNI NI T inst ruct ion following a hard-
ware reset . These inst ruct ions, t ag all dat a regist ers as empt y, clear all t he except ion
masks, set t he TOP- of- st ack value t o 0, and select t he default rounding and precision
cont rols set t ing ( round t o nearest and 64- bit precision) .
I f t he processor is reset by assert ing t he I NI T# pin, t he x87 FPU st at e is not changed.
9.2.1 Configuring the x87 FPU Environment
I nit ializat ion code must load t he appropriat e values int o t he MP, EM, and NE flags of
cont rol regist er CR0. These bit s are cleared on hardware reset of t he processor.
Figure 9- 2 shows t he suggest ed set t ings for t hese flags, depending on t he I A- 32
processor being init ialized. I nit ializat ion code can t est for t he t ype of processor
present before set t ing or clearing t hese flags.
Vol. 3 9-7
PROCESSOR MANAGEMENT AND INITIALIZATION
The EM flag det ermines whet her float ing- point inst ruct ions are execut ed by t he x87
FPU ( EM is cleared) or a device- not - available except ion ( # NM) is generat ed for all
float ing- point inst ruct ions so t hat an except ion handler can emulat e t he float ing-
point operat ion ( EM = 1) . Ordinarily, t he EM flag is cleared when an x87 FPU or mat h
coprocessor is present and set if t hey are not present . I f t he EM flag is set and no x87
FPU, mat h coprocessor, or float ing- point emulat or is present , t he processor will hang
when a float ing- point inst ruct ion is execut ed.
The MP flag det ermines whet her WAI T/ FWAI T inst ruct ions react t o t he set t ing of t he
TS flag. I f t he MP flag is clear, WAI T/ FWAI T inst ruct ions ignore t he set t ing of t he TS
flag; if t he MP flag is set , t hey will generat e a device- not - available except ion ( # NM)
if t he TS flag is set . Generally, t he MP flag should be set for processors wit h an int e-
grat ed x87 FPU and clear for processors wit hout an int egrat ed x87 FPU and wit hout a
mat h coprocessor present . However, an operat ing syst em can choose t o save t he
float ing- point cont ext at every cont ext swit ch, in which case t here would be no need
t o set t he MP bit .
Table 2- 1 shows t he act ions t aken for float ing- point and WAI T/ FWAI T inst ruct ions
based on t he set t ings of t he EM, MP, and TS flags.
The NE flag det ermines whet her unmasked float ing- point except ions are handled by
generat ing a float ing- point error except ion int ernally ( NE is set , nat ive mode) or
t hrough an ext ernal int errupt ( NE is cleared) . I n syst ems where an ext ernal int errupt
cont roller is used t o invoke numeric except ion handlers ( such as MS- DOS- based
syst ems) , t he NE bit should be cleared.
9.2.2 Setting the Processor for x87 FPU Software Emulation
Set t ing t he EM flag causes t he processor t o generat e a device- not - available excep-
t ion ( # NM) and t rap t o a soft ware except ion handler whenever it encount ers a
float ing- point inst ruct ion. ( Table 9- 2 shows when it is appropriat e t o use t his flag. )
Set t ing t his flag has t wo funct ions:
Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors
EM MP NE IA-32 processor
1 0 1 Intel486 SX, Intel386 DX, and Intel386 SX processors
only, without the presence of a math coprocessor.
0 1 1 or 0
*
Pentium 4, Intel Xeon, P6 family, Pentium, Intel486 DX, and
Intel 487 SX processors, and Intel386 DX and Intel386 SX
processors when a companion math coprocessor is present.
0 1 1 or 0
*
More recent Intel 64 or IA-32 processors
NOTE:
* The setting of the NE flag depends on the operating system being used.
9-8 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
I t allows x87 FPU code t o run on an I A- 32 processor t hat has neit her an
int egrat ed x87 FPU nor is connect ed t o an ext ernal mat h coprocessor, by using a
float ing- point emulat or.
I t allows float ing- point code t o be execut ed using a special or nonst andard
float ing- point emulat or, select ed for a part icular applicat ion, regardless of
whet her an x87 FPU or mat h coprocessor is present .
To emulat e float ing- point inst ruct ions, t he EM, MP, and NE flag in cont rol regist er CR0
should be set as shown in Table 9- 3.
Regardless of t he value of t he EM bit , t he I nt el486 SX processor generat es a device-
not - available except ion ( # NM) upon encount ering any float ing- point inst ruct ion.
9.3 CACHE ENABLING
I A- 32 processors ( beginning wit h t he I nt el486 processor) and I nt el 64 processors
cont ain int ernal inst ruct ion and dat a caches. These caches are enabled by clearing
t he CD and NW flags in cont rol regist er CR0. ( They are set during a hardware reset . )
Because all int ernal cache lines are invalid following reset init ializat ion, it is not
necessary t o invalidat e t he cache before enabling caching. Any ext ernal caches may
require init ializat ion and invalidat ion using a syst em- specific init ializat ion and invali-
dat ion code sequence.
Depending on t he hardware and operat ing syst em or execut ive requirement s, addi-
t ional configurat ion of t he processor s caching facilit ies will probably be required.
Beginning wit h t he I nt el486 processor, page- level caching can be cont rolled wit h t he
PCD and PWT flags in page- direct ory and page- t able ent ries. Beginning wit h t he P6
family processors, t he memory t ype range regist ers ( MTRRs) cont rol t he caching
charact erist ics of t he regions of physical memory. ( For t he I nt el486 and Pent ium
processors, ext ernal hardware can be used t o cont rol t he caching charact erist ics of
regions of physical memory. ) See Chapt er 11, Memory Cache Cont rol, for det ailed
informat ion on configurat ion of t he caching facilit ies in t he Pent ium 4, I nt el Xeon, and
P6 family processors and syst em memory.
Table 9-3. Software Emulation Settings of EM, MP, and NE Flags
CR0 Bit Value
EM 1
MP 0
NE 1
Vol. 3 9-9
PROCESSOR MANAGEMENT AND INITIALIZATION
9.4 MODEL-SPECIFIC REGISTERS (MSRS)
Most I A- 32 processors ( st art ing from Pent ium processors) and I nt el 64 processors
cont ain a model- specific regist ers ( MSRs) . A given MSR may not be support ed across
all families and models for I nt el 64 and I A- 32 processors. Some MSRs are designat ed
as archit ect ural t o simplify soft ware programming; a feat ure int roduced by an archi-
t ect ural MSR is expect ed t o be support ed in fut ure processors. Non- archit ect ural
MSRs are not guarant eed t o be support ed or t o have t he same funct ions on fut ure
processors.
MSRs t hat provide cont rol for a number of hardware and soft ware- relat ed feat ures,
include:
Performance- monit oring count ers ( see Chapt er 20, I nt roduct ion t o Virt ual-
Machine Ext ensions ) .
Debug ext ensions ( see Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext en-
sions. ) .
Machine- check except ion capabilit y and it s accompanying machine- check archi-
t ect ure ( see Chapt er 15, Machine- Check Archit ect ure ) .
MTRRs ( see Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) ) .
Thermal and power management .
I nst ruct ion- specific support ( for example: SYSENTER, SYSEXI T, SWAPGS, et c. ) .
Processor feat ure/ mode support ( for example: I A32_EFER,
I A32_FEATURE_CONTROL) .
The MSRs can be read and writ t en t o using t he RDMSR and WRMSR inst ruct ions,
respect ively.
When performing soft ware init ializat ion of an I A- 32 or I nt el 64 processor, many of
t he MSRs will need t o be init ialized t o set up t hings like performance- monit oring
event s, run- t ime machine checks, and memory t ypes for physical memory.
List s of available performance- monit oring event s are given in Appendix A, Perfor-
mance Monit oring Event s , and list s of available MSRs are given in Appendix B,
Model- Specific Regist ers ( MSRs) The references earlier in t his sect ion show where
t he funct ions of t he various groups of MSRs are described in t his manual.
9.5 MEMORY TYPE RANGE REGISTERS (MTRRS)
Memory t ype range regist ers ( MTRRs) were int roduced int o t he I A- 32 archit ect ure
wit h t he Pent ium Pro processor. They allow t he t ype of caching ( or no caching) t o be
specified in syst em memory for select ed physical address ranges. They allow
memory accesses t o be opt imized for various t ypes of memory such as RAM, ROM,
frame buffer memory, and memory- mapped I / O devices.
I n general, init ializing t he MTRRs is normally handled by t he soft ware init ializat ion
code or BI OS and is not an operat ing syst em or execut ive funct ion. At t he very least ,
9-10 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
all t he MTRRs must be cleared t o 0, which select s t he uncached ( UC) memory t ype.
See Sect ion 11. 11, Memory Type Range Regist ers ( MTRRs) , for det ailed informa-
t ion on t he MTRRs.
9.6 INITIALIZING SSE/SSE2/SSE3/SSSE3 EXTENSIONS
For processors t hat cont ain SSE/ SSE2/ SSE3/ SSSE3 ext ensions, st eps must be t aken
when init ializing t he processor t o allow execut ion of t hese inst ruct ions.
1. Check t he CPUI D feat ure flags for t he presence of t he SSE/ SSE2/ SSE3/ SSSE3
ext ensions ( respect ively: EDX bit s 25 and 26, ECX bit 0 and 9) and support for
t he FXSAVE and FXRSTOR inst ruct ions ( EDX bit 24) . Also check for support for
t he CLFLUSH inst ruct ion ( EDX bit 19) . The CPUI D feat ure flags are loaded in t he
EDX and ECX regist ers when t he CPUI D inst ruct ion is execut ed wit h a 1 in t he
EAX regist er.
2. Set t he OSFXSR flag ( bit 9 in cont rol regist er CR4) t o indicat e t hat t he operat ing
syst em support s saving and rest oring t he SSE/ SSE2/ SSE3/ SSSE3 execut ion
environment ( XXM and MXCSR regist ers) wit h t he FXSAVE and FXRSTOR inst ruc-
t ions, respect ively. See Sect ion 2.5, Cont rol Regist ers, for a descript ion of t he
OSFXSR flag.
3. Set t he OSXMMEXCPT flag ( bit 10 in cont rol regist er CR4) t o indicat e t hat t he
operat ing syst em support s t he handling of SSE/ SSE2/ SSE3 SI MD float ing- point
except ions ( # XF) . See Sect ion 2. 5, Cont rol Regist ers, for a descript ion of t he
OSXMMEXCPT flag.
4. Set t he mask bit s and flags in t he MXCSR regist er according t o t he mode of
operat ion desired for SSE/ SSE2/ SSE3 SI MD float ing- point inst ruct ions. See
MXCSR Cont rol and St at us Regist er in Chapt er 10, Programming wit h
St reaming SI MD Ext ensions ( SSE) , of t he I nt el 64 and I A- 32 Archit ect ures
Soft ware Developers Manual, Volume 1, for a det ailed descript ion of t he bit s and
flags in t he MXCSR regist er.
9.7 SOFTWARE INITIALIZATION FOR REAL-ADDRESS
MODE OPERATION
Following a hardware reset ( eit her t hrough a power- up or t he assert ion of t he
RESET# pin) t he processor is placed in real- address mode and begins execut ing soft -
ware init ializat ion code from physical address FFFFFFF0H. Soft ware init ializat ion code
must first set up t he necessary dat a st ruct ures for handling basic syst em funct ions,
such as a real- mode I DT for handling int errupt s and except ions. I f t he processor is t o
remain in real- address mode, soft ware must t hen load addit ional operat ing- syst em
or execut ive code modules and dat a st ruct ures t o allow reliable execut ion of applica-
t ion programs in real- address mode.
I f t he processor is going t o operat e in prot ect ed mode, soft ware must load t he neces-
sary dat a st ruct ures t o operat e in prot ect ed mode and t hen swit ch t o prot ect ed
Vol. 3 9-11
PROCESSOR MANAGEMENT AND INITIALIZATION
mode. The prot ect ed- mode dat a st ruct ures t hat must be loaded are described in
Sect ion 9. 8, Soft ware I nit ializat ion for Prot ect ed- Mode Operat ion.
9.7.1 Real-Address Mode IDT
I n real- address mode, t he only syst em dat a st ruct ure t hat must be loaded int o
memory is t he I DT ( also called t he int errupt vect or t able ) . By default , t he address
of t he base of t he I DT is physical address 0H. This address can be changed by using
t he LI DT inst ruct ion t o change t he base address value in t he I DTR. Soft ware init ial-
izat ion code needs t o load int errupt - and except ion- handler point ers int o t he I DT
before int errupt s can be enabled.
The act ual int errupt - and except ion- handler code can be cont ained eit her in EPROM
or RAM; however, t he code must be locat ed wit hin t he 1- MByt e addressable range of
t he processor in real- address mode. I f t he handler code is t o be st ored in RAM, it
must be loaded along wit h t he I DT.
9.7.2 NMI Interrupt Handling
The NMI int errupt is always enabled ( except when mult iple NMI s are nest ed) . I f t he
I DT and t he NMI int errupt handler need t o be loaded int o RAM, t here will be a period
of t ime following hardware reset when an NMI int errupt cannot be handled. During
t his t ime, hardware must provide a mechanism t o prevent an NMI int errupt from
halt ing code execut ion unt il t he I DT and t he necessary NMI handler soft ware is
loaded. Here are t wo examples of how NMI s can be handled during t he init ial st at es
of processor init ializat ion:
A simple I DT and NMI int errupt handler can be provided in EPROM. This allows an
NMI int errupt t o be handled immediat ely aft er reset init ializat ion.
The syst em hardware can provide a mechanism t o enable and disable NMI s by
passing t he NMI # signal t hrough an AND gat e cont rolled by a flag in an I / O port .
Hardware can clear t he flag when t he processor is reset , and soft ware can set t he
flag when it is ready t o handle NMI int errupt s.
9.8 SOFTWARE INITIALIZATION FOR PROTECTED-MODE
OPERATION
The processor is placed in real- address mode following a hardware reset . At t his
point in t he init ializat ion process, some basic dat a st ruct ures and code modules must
be loaded int o physical memory t o support furt her init ializat ion of t he processor, as
described in Sect ion 9.7, Soft ware I nit ializat ion for Real-Address Mode Operat ion.
Before t he processor can be swit ched t o prot ect ed mode, t he soft ware init ializat ion
code must load a minimum number of prot ect ed mode dat a st ruct ures and code
9-12 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
modules int o memory t o support reliable operat ion of t he processor in prot ect ed
mode. These dat a st ruct ures include t he following:
A I DT.
A GDT.
A TSS.
( Opt ional) An LDT.
I f paging is t o be used, at least one page direct ory and one page t able.
A code segment t hat cont ains t he code t o be execut ed when t he processor
swit ches t o prot ect ed mode.
One or more code modules t hat cont ain t he necessary int errupt and except ion
handlers.
Soft ware init ializat ion code must also init ialize t he following syst em regist ers before
t he processor can be swit ched t o prot ect ed mode:
The GDTR.
( Opt ional. ) The I DTR. This regist er can also be init ialized immediat ely aft er
swit ching t o prot ect ed mode, prior t o enabling int errupt s.
Cont rol regist ers CR1 t hrough CR4.
( Pent ium 4, I nt el Xeon, and P6 family processors only. ) The memory t ype range
regist ers ( MTRRs) .
Wit h t hese dat a st ruct ures, code modules, and syst em regist ers init ialized, t he
processor can be swit ched t o prot ect ed mode by loading cont rol regist er CR0 wit h a
value t hat set s t he PE flag ( bit 0) .
9.8.1 Protected-Mode System Data Structures
The cont ent s of t he prot ect ed- mode syst em dat a st ruct ures loaded int o memory
during soft ware init ializat ion, depend largely on t he t ype of memory management
t he prot ect ed- mode operat ing- syst em or execut ive is going t o support : flat , flat wit h
paging, segment ed, or segment ed wit h paging.
To implement a flat memory model wit hout paging, soft ware init ializat ion code must
at a minimum load a GDT wit h one code and one dat a- segment descript or. A null
descript or in t he first GDT ent ry is also required. The st ack can be placed in a normal
read/ writ e dat a segment , so no dedicat ed descript or for t he st ack is required. A flat
memory model wit h paging also requires a page direct ory and at least one page t able
( unless all pages are 4 MByt es in which case only a page direct ory is required) . See
Sect ion 9.8. 3, I nit ializing Paging.
Before t he GDT can be used, t he base address and limit for t he GDT must be loaded
int o t he GDTR regist er using an LGDT inst ruct ion.
A mult i- segment ed model may require addit ional segment s for t he operat ing syst em,
as well as segment s and LDTs for each applicat ion program. LDTs require segment
Vol. 3 9-13
PROCESSOR MANAGEMENT AND INITIALIZATION
descript ors in t he GDT. Some operat ing syst ems allocat e new segment s and LDTs as
t hey are needed. This provides maximum flexibilit y for handling a dynamic program-
ming environment . However, many operat ing syst ems use a single LDT for all t asks,
allocat ing GDT ent ries in advance. An embedded syst em, such as a process
cont roller, might pre- allocat e a fixed number of segment s and LDTs for a fixed
number of applicat ion programs. This would be a simple and efficient way t o st ruc-
t ure t he soft ware environment of a real- t ime syst em.
9.8.2 Initializing Protected-Mode Exceptions and Interrupts
Soft ware init ializat ion code must at a minimum load a prot ect ed- mode I DT wit h gat e
descript or for each except ion vect or t hat t he processor can generat e. I f int errupt or
t rap gat es are used, t he gat e descript ors can all point t o t he same code segment ,
which cont ains t he necessary except ion handlers. I f t ask gat es are used, one TSS
and accompanying code, dat a, and t ask segment s are required for each except ion
handler called wit h a t ask gat e.
I f hardware allows int errupt s t o be generat ed, gat e descript ors must be provided in
t he I DT for one or more int errupt handlers.
Before t he I DT can be used, t he base address and limit for t he I DT must be loaded
int o t he I DTR regist er using an LI DT inst ruct ion. This operat ion is t ypically carried out
immediat ely aft er swit ching t o prot ect ed mode.
9.8.3 Initializing Paging
Paging is cont rolled by t he PG flag in cont rol regist er CR0. When t his flag is clear ( it s
st at e following a hardware reset ) , t he paging mechanism is t urned off; when it is set ,
paging is enabled. Before set t ing t he PG flag, t he following dat a st ruct ures and regis-
t ers must be init ialized:
Soft ware must load at least one page direct ory and one page t able int o physical
memory. The page t able can be eliminat ed if t he page direct ory cont ains a
direct ory ent ry point ing t o it self ( here, t he page direct ory and page t able reside
in t he same page) , or if only 4- MByt e pages are used.
Cont rol regist er CR3 ( also called t he PDBR regist er) is loaded wit h t he physical
base address of t he page direct ory.
( Opt ional) Soft ware may provide one set of code and dat a descript ors in t he GDT
or in an LDT for supervisor mode and anot her set for user mode.
Wit h t his paging init ializat ion complet e, paging is enabled and t he processor is
swit ched t o prot ect ed mode at t he same t ime by loading cont rol regist er CR0 wit h an
image in which t he PG and PE flags are set . ( Paging cannot be enabled before t he
processor is swit ched t o prot ect ed mode. )
9-14 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.8.4 Initializing Multitasking
I f t he mult it asking mechanism is not going t o be used and changes bet ween privilege
levels are not allowed, it is not necessary load a TSS int o memory or t o init ialize t he
t ask regist er.
I f t he mult it asking mechanism is going t o be used and/ or changes bet ween privilege
levels are allowed, soft ware init ializat ion code must load at least one TSS and an
accompanying TSS descript or. ( A TSS is required t o change privilege levels because
point ers t o t he privileged- level 0, 1, and 2 st ack segment s and t he st ack point ers for
t hese st acks are obt ained from t he TSS. ) TSS descript ors must not be marked as
busy when t hey are creat ed; t hey should be marked busy by t he processor only as a
side- effect of performing a t ask swit ch. As wit h descript ors for LDTs, TSS descript ors
reside in t he GDT.
Aft er t he processor has swit ched t o prot ect ed mode, t he LTR inst ruct ion can be used
t o load a segment select or for a TSS descript or int o t he t ask regist er. This inst ruct ion
marks t he TSS descript or as busy, but does not perform a t ask swit ch. The processor
can, however, use t he TSS t o locat e point ers t o privilege- level 0, 1, and 2 st acks. The
segment select or for t he TSS must be loaded before soft ware performs it s first t ask
swit ch in prot ect ed mode, because a t ask swit ch copies t he current t ask st at e int o
t he TSS.
Aft er t he LTR inst ruct ion has been execut ed, furt her operat ions on t he t ask regist er
are performed by t ask swit ching. As wit h ot her segment s and LDTs, TSSs and TSS
descript ors can be eit her pre- allocat ed or allocat ed as needed.
9.8.5 Initializing IA-32e Mode
On I nt el 64 processors, t he I A32_EFER MSR is cleared on syst em reset . The oper-
at ing syst em must be in prot ect ed mode wit h paging enabled before at t empt ing t o
init ialize I A- 32e mode. I A- 32e mode operat ion also requires physical- address ext en-
sions wit h four levels of enhanced paging st ruct ures ( see Sect ion 4. 5, I A- 32e
Paging ) .
Operat ing syst ems should follow t his sequence t o init ialize I A- 32e mode:
1. St art ing from prot ect ed mode, disable paging by set t ing CR0.PG = 0. Use t he
MOV CR0 inst ruct ion t o disable paging ( t he inst ruct ion must be locat ed in an
ident it y- mapped page) .
2. Enable physical- address ext ensions ( PAE) by set t ing CR4.PAE = 1. Failure t o
enable PAE will result in a # GP fault when an at t empt is made t o init ialize I A- 32e
mode.
3. Load CR3 wit h t he physical base address of t he Level 4 page map t able ( PML4) .
4. Enable I A- 32e mode by set t ing I A32_EFER. LME = 1.
5. Enable paging by set t ing CR0.PG = 1. This causes t he processor t o set t he
I A32_EFER. LMA bit t o 1. The MOV CR0 inst ruct ion t hat enables paging and t he
Vol. 3 9-15
PROCESSOR MANAGEMENT AND INITIALIZATION
following inst ruct ions must be locat ed in an ident it y- mapped page ( unt il such
t ime t hat a branch t o non- ident it y mapped pages can be effect ed) .
64- bit mode paging t ables must be locat ed in t he first 4 GByt es of physical- address
space prior t o act ivat ing I A- 32e mode. This is necessary because t he MOV CR3
inst ruct ion used t o init ialize t he page- direct ory base must be execut ed in legacy
mode prior t o act ivat ing I A- 32e mode ( set t ing CR0. PG = 1 t o enable paging) .
Because MOV CR3 is execut ed in prot ect ed mode, only t he lower 32 bit s of t he
regist er are writ t en, limit ing t he t able locat ion t o t he low 4 GByt es of memory. Soft -
ware can relocat e t he page t ables anywhere in physical memory aft er I A- 32e mode
is act ivat ed.
The processor performs 64- bit mode consist ency checks whenever soft ware
at t empt s t o modify any of t he enable bit s direct ly involved in act ivat ing I A- 32e mode
( I A32_EFER. LME, CR0.PG, and CR4. PAE) . I t will generat e a general prot ect ion fault
( # GP) if consist ency checks fail. 64- bit mode consist ency checks ensure t hat t he
processor does not ent er an undefined mode or st at e wit h unpredict able behavior.
64- bit mode consist ency checks fail in t he following circumst ances:
An at t empt is made t o enable or disable I A- 32e mode while paging is enabled.
I A- 32e mode is enabled and an at t empt is made t o enable paging prior t o
enabling physical- address ext ensions ( PAE) .
I A- 32e mode is act ive and an at t empt is made t o disable physical- address
ext ensions ( PAE) .
I f t he current CS has t he L- bit set on an at t empt t o act ivat e I A- 32e mode.
I f t he TR cont ains a 16- bit TSS.
9.8.5.1 IA-32e Mode System Data Structures
Aft er act ivat ing I A- 32e mode, t he syst em- descript or- t able regist ers ( GDTR, LDTR,
I DTR, TR) cont inue t o reference legacy prot ect ed- mode descript or t ables. Tables
referenced by t he descript ors all reside in t he lower 4 GByt es of linear- address space.
Aft er act ivat ing I A- 32e mode, 64- bit operat ing- syst ems should use t he LGDT, LLDT,
LI DT, and LTR inst ruct ions t o load t he syst em- descript or- t able regist ers wit h refer-
ences t o 64- bit descript or t ables.
9.8.5.2 IA-32e Mode Interrupts and Exceptions
Soft ware must not allow except ions or int errupt s t o occur bet ween t he t ime I A- 32e
mode is act ivat ed and t he updat e of t he int errupt - descript or- t able regist er ( I DTR)
t hat est ablishes references t o a 64- bit int errupt - descript or t able ( I DT) . This is
because t he I DT remains in legacy form immediat ely aft er I A- 32e mode is act ivat ed.
I f an int errupt or except ion occurs prior t o updat ing t he I DTR, a legacy 32- bit int er-
rupt gat e will be referenced and int erpret ed as a 64- bit int errupt gat e wit h unpredict -
able result s. Ext ernal int errupt s can be disabled by using t he CLI inst ruct ion.
Non- maskable int errupt s ( NMI ) must be disabled using ext ernal hardware.
9-16 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.8.5.3 64-bit Mode and Compatibility Mode Operation
I A- 32e mode uses t wo code segment - descript or bit s ( CS.L and CS.D, see Figure 3- 8)
t o cont rol t he operat ing modes aft er I A- 32e mode is init ialized. I f CS.L = 1 and CS.D =
0, t he processor is running in 64- bit mode. Wit h t his encoding, t he default operand
size is 32 bit s and default address size is 64 bit s. Using inst ruct ion prefixes, operand
size can be changed t o 64 bit s or 16 bit s; address size can be changed t o 32 bit s.
When I A- 32e mode is act ive and CS. L = 0, t he processor operat es in compat ibilit y
mode. I n t his mode, CS. D cont rols default operand and address sizes exact ly as it
does in t he I A- 32 archit ect ure. Set t ing CS. D = 1 specifies default operand and
address size as 32 bit s. Clearing CS. D t o 0 specifies default operand and address size
as 16 bit s ( t he CS. L = 1, CS. D = 1 bit combinat ion is reserved) .
Compat ibilit y mode execut ion is select ed on a code- segment basis. This mode allows
legacy applicat ions t o coexist wit h 64- bit applicat ions running in 64- bit mode. An
operat ing syst em running in I A- 32e mode can execut e exist ing 16- bit and 32- bit
applicat ions by clearing t heir code- segment descript or s CS. L bit t o 0.
I n compat ibilit y mode, t he following syst em- level mechanisms cont inue t o operat e
using t he I A- 32e- mode archit ect ural semant ics:
Linear- t o- physical address t ranslat ion uses t he 64- bit mode ext ended page-
t ranslat ion mechanism.
I nt errupt s and except ions are handled using t he 64- bit mode mechanisms.
Syst em calls ( calls t hrough call gat es and SYSENTER/ SYSEXI T) are handled using
t he I A- 32e mode mechanisms.
9.8.5.4 Switching Out of IA-32e Mode Operation
To ret urn from I A- 32e mode t o paged- prot ect ed mode operat ion. Operat ing syst ems
must use t he following sequence:
1. Swit ch t o compat ibilit y mode.
2. Deact ivat e I A- 32e mode by clearing CR0.PG = 0. This causes t he processor t o set
I A32_EFER. LMA = 0. The MOV CR0 inst ruct ion used t o disable paging and
subsequent inst ruct ions must be locat ed in an ident it y- mapped page.
3. Load CR3 wit h t he physical base address of t he legacy page- t able- direct ory base
address.
4. Disable I A- 32e mode by set t ing I A32_EFER. LME = 0.
5. Enable legacy paged- prot ect ed mode by set t ing CR0. PG = 1
6. A branch inst ruct ion must follow t he MOV CR0 t hat enables paging. Bot h t he MOV
CR0 and t he branch inst ruct ion must be locat ed in an ident it y- mapped page.
Regist ers only available in 64- bit mode ( R8- R15 and XMM8-XMM15) are preserved
across t ransit ions from 64- bit mode int o compat ibilit y mode t hen back int o 64- bit
mode. However, values of R8- R15 and XMM8-XMM15 are undefined aft er t ransit ions
Vol. 3 9-17
PROCESSOR MANAGEMENT AND INITIALIZATION
from 64- bit mode t hrough compat ibilit y mode t o legacy or real mode and t hen back
t hrough compat ibilit y mode t o 64- bit mode.
9.9 MODE SWITCHING
To use t he processor in prot ect ed mode aft er hardware or soft ware reset , a mode
swit ch must be performed from real- address mode. Once in prot ect ed mode, soft -
ware generally does not need t o ret urn t o real- address mode. To run soft ware writ t en
t o run in real- address mode ( 8086 mode) , it is generally more convenient t o run t he
soft ware in virt ual- 8086 mode, t han t o swit ch back t o real- address mode.
9.9.1 Switching to Protected Mode
Before swit ching t o prot ect ed mode from real mode, a minimum set of syst em dat a
st ruct ures and code modules must be loaded int o memory, as described in Sect ion
9. 8, Soft ware I nit ializat ion for Prot ect ed- Mode Operat ion. Once t hese t ables are
creat ed, soft ware init ializat ion code can swit ch int o prot ect ed mode.
Prot ect ed mode is ent ered by execut ing a MOV CR0 inst ruct ion t hat set s t he PE flag
in t he CR0 regist er. ( I n t he same inst ruct ion, t he PG flag in regist er CR0 can be set t o
enable paging. ) Execut ion in prot ect ed mode begins wit h a CPL of 0.
I nt el 64 and I A- 32 processors have slight ly different requirement s for swit ching t o
prot ect ed mode. To insure upwards and downwards code compat ibilit y wit h I nt el 64
and I A- 32 processors, we r ecommend t hat you follow t hese st eps:
1. Disable int errupt s. A CLI inst ruct ion disables maskable hardware int errupt s. NMI
int errupt s can be disabled wit h ext ernal circuit ry. ( Soft ware must guarant ee t hat
no except ions or int errupt s are generat ed during t he mode swit ching operat ion. )
2. Execut e t he LGDT inst ruct ion t o load t he GDTR regist er wit h t he base address of
t he GDT.
3. Execut e a MOV CR0 inst ruct ion t hat set s t he PE flag ( and opt ionally t he PG flag)
in cont rol regist er CR0.
4. I mmediat ely following t he MOV CR0 inst ruct ion, execut e a far JMP or far CALL
inst ruct ion. ( This operat ion is t ypically a far j ump or call t o t he next inst ruct ion in
t he inst ruct ion st ream. )
5. The JMP or CALL inst ruct ion immediat ely aft er t he MOV CR0 inst ruct ion changes
t he flow of execut ion and serializes t he processor.
6. I f paging is enabled, t he code for t he MOV CR0 inst ruct ion and t he JMP or CALL
inst ruct ion must come from a page t hat is ident it y mapped ( t hat is, t he linear
address before t he j ump is t he same as t he physical address aft er paging and
prot ect ed mode is enabled) . The t arget inst ruct ion for t he JMP or CALL inst ruct ion
does not need t o be ident it y mapped.
9-18 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
7. I f a local descript or t able is going t o be used, execut e t he LLDT inst ruct ion t o load
t he segment select or for t he LDT in t he LDTR regist er.
8. Execut e t he LTR inst ruct ion t o load t he t ask regist er wit h a segment select or t o
t he init ial prot ect ed- mode t ask or t o a writ able area of memory t hat can be used
t o st ore TSS informat ion on a t ask swit ch.
9. Aft er ent ering prot ect ed mode, t he segment regist ers cont inue t o hold t he
cont ent s t hey had in real- address mode. The JMP or CALL inst ruct ion in st ep 4
reset s t he CS regist er. Perform one of t he following operat ions t o updat e t he
cont ent s of t he remaining segment regist ers.
Reload segment regist ers DS, SS, ES, FS, and GS. I f t he ES, FS, and/ or GS
regist ers are not going t o be used, load t hem wit h a null select or.
Perform a JMP or CALL inst ruct ion t o a new t ask, which aut omat ically reset s
t he values of t he segment regist ers and branches t o a new code segment .
10. Execut e t he LI DT inst ruct ion t o load t he I DTR regist er wit h t he address and limit
of t he prot ect ed- mode I DT.
11. Execut e t he STI inst ruct ion t o enable maskable hardware int errupt s and perform
t he necessary hardware operat ion t o enable NMI int errupt s.
Random failures can occur if ot her inst ruct ions exist bet ween st eps 3 and 4 above.
Failures will be readily seen in some sit uat ions, such as when inst ruct ions t hat refer-
ence memory are insert ed bet ween st eps 3 and 4 while in syst em management
mode.
9.9.2 Switching Back to Real-Address Mode
The processor swit ches from prot ect ed mode back t o real- address mode if soft ware
clears t he PE bit in t he CR0 regist er wit h a MOV CR0 inst ruct ion. A procedure t hat re-
ent ers real- address mode should perform t he following st eps:
1. Disable int errupt s. A CLI inst ruct ion disables maskable hardware int errupt s. NMI
int errupt s can be disabled wit h ext ernal circuit ry.
2. I f paging is enabled, perform t he following operat ions:
Transfer program cont rol t o linear addresses t hat are ident it y mapped t o
physical addresses ( t hat is, linear addresses equal physical addresses) .
I nsure t hat t he GDT and I DT are in ident it y mapped pages.
Clear t he PG bit in t he CR0 regist er.
Move 0H int o t he CR3 regist er t o flush t he TLB.
3. Transfer program cont rol t o a readable segment t hat has a limit of 64 KByt es
( FFFFH) . This operat ion loads t he CS regist er wit h t he segment limit required in
real- address mode.
Vol. 3 9-19
PROCESSOR MANAGEMENT AND INITIALIZATION
4. Load segment regist ers SS, DS, ES, FS, and GS wit h a select or for a descript or
cont aining t he following values, which are appropriat e for real- address mode:
Limit = 64 KByt es ( 0FFFFH)
Byt e granular ( G = 0)
Expand up ( E = 0)
Writ able ( W = 1)
Present ( P = 1)
Base = any value
5. The segment regist ers must be loaded wit h non- null segment select ors or t he
segment regist ers will be unusable in real- address mode. Not e t hat if t he
segment regist ers are not reloaded, execut ion cont inues using t he descript or
at t ribut es loaded during prot ect ed mode.
6. Execut e an LI DT inst ruct ion t o point t o a real- address mode int errupt t able t hat is
wit hin t he 1- MByt e real- address mode address range.
7. Clear t he PE flag in t he CR0 regist er t o swit ch t o real- address mode.
8. Execut e a far JMP inst ruct ion t o j ump t o a real- address mode program. This
operat ion flushes t he inst ruct ion queue and loads t he appropriat e base and
access right s values in t he CS regist er.
9. Load t he SS, DS, ES, FS, and GS regist ers as needed by t he real- address mode
code. I f any of t he regist ers are not going t o be used in real- address mode, writ e
0s t o t hem.
10. Execut e t he STI inst ruct ion t o enable maskable hardware int errupt s and perform
t he necessary hardware operat ion t o enable NMI int errupt s.
NOTE
All t he code t hat is execut ed in st eps 1 t hrough 9 must be in a single
page and t he linear addresses in t hat page must be ident it y mapped
t o physical addresses.
9.10 INITIALIZATION AND MODE SWITCHING EXAMPLE
This sect ion provides an init ializat ion and mode swit ching example t hat can be incor-
porat ed int o an applicat ion. This code was originally writ t en t o init ialize t he I nt el386
processor, but it will execut e successfully on t he Pent ium 4, I nt el Xeon, P6 family,
Pent ium, and I nt el486 processors. The code in t his example is int ended t o reside in
EPROM and t o run following a hardware reset of t he processor. The funct ion of t he
code is t o do t he following:
Est ablish a basic real- address mode operat ing environment .
Load t he necessary prot ect ed- mode syst em dat a st ruct ures int o RAM.
9-20 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Load t he syst em regist ers wit h t he necessary point ers t o t he dat a st ruct ures and
t he appropriat e flag set t ings for prot ect ed- mode operat ion.
Swit ch t he processor t o prot ect ed mode.
Figure 9- 3 shows t he physical memory layout for t he processor following a hardware
reset and t he st art ing point of t his example. The EPROM t hat cont ains t he init ializa-
t ion code resides at t he upper end of t he processor s physical memory address range,
st art ing at address FFFFFFFFH and going down from t here. The address of t he first
inst ruct ion t o be execut ed is at FFFFFFF0H, t he default st art ing address for t he
processor following a hardware reset .
The main st eps carried out in t his example are summarized in Table 9- 4. The source
list ing for t he example ( wit h t he filename STARTUP. ASM) is given in Example 9- 1.
The line numbers given in Table 9- 4 refer t o t he source list ing.
The following are some addit ional not es concerning t his example:
When t he processor is swit ched int o prot ect ed mode, t he original code segment
base- address value of FFFF0000H ( locat ed in t he hidden part of t he CS regist er)
is ret ained and execut ion cont inues from t he current offset in t he EI P regist er.
The processor will t hus cont inue t o execut e code in t he EPROM unt il a far j ump or
call is made t o a new code segment , at which t ime, t he base address in t he CS
regist er will be changed.
Maskable hardware int errupt s are disabled aft er a hardware reset and should
remain disabled unt il t he necessary int errupt handlers have been inst alled. The
NMI int errupt is not disabled following a reset . The NMI # pin must t hus be
inhibit ed from being assert ed unt il an NMI handler has been loaded and made
available t o t he processor.
The use of a t emporary GDT allows simple t ransfer of t ables from t he EPROM t o
anywhere in t he RAM area. A GDT ent ry is const ruct ed wit h it s base point ing t o
address 0 and a limit of 4 GByt es. When t he DS and ES regist ers are loaded wit h
t his descript or, t he t emporary GDT is no longer needed and can be replaced by
t he applicat ion GDT.
This code loads one TSS and no LDTs. I f more TSSs exist in t he applicat ion, t hey
must be loaded int o RAM. I f t here are LDTs t hey may be loaded as well.
Vol. 3 9-21
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-3. Processor State After Reset
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing
STARTUP.ASM Line
Numbers
Description
From To
157 157 Jump (short) to the entry code in the EPROM
162 169 Construct a temporary GDT in RAM with one entry:
0 - null
1 - R/W data segment, base = 0, limit = 4 GBytes
171 172 Load the GDTR to point to the temporary GDT
174 177 Load CR0 with PE flag set to switch to protected mode
179 181 Jump near to clear real mode instruction queue
184 186 Load DS, ES registers with GDT[1] descriptor, so both point to the
entire physical memory space
0
FFFF FFFFH
After Reset
[CS.BASE+EIP] FFFF FFF0H
EIP = 0000 FFF0H
[SP, DS, SS, ES]
FFFF 0000H
64K EPROM
CS.BASE = FFFF 0000H
DS.BASE = 0H
ES.BASE = 0H
SS.BASE = 0H
ESP = 0H
9-22 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.1 Assembler Usage
I n t his example, t he I nt el assembler ASM386 and build t ools BLD386 are used t o
assemble and build t he init ializat ion code module. The following assumpt ions are
used when using t he I nt el ASM386 and BLD386 t ools.
The ASM386 will generat e t he right operand size opcodes according t o t he code-
segment at t ribut e. The at t ribut e is assigned eit her by t he ASM386 invocat ion
cont rols or in t he code- segment definit ion.
I f a code segment t hat is going t o run in real- address mode is defined, it must be
set t o a USE 16 at t ribut e. I f a 32- bit operand is used in an inst ruct ion in t his code
segment ( for example, MOV EAX, EBX) , t he assembler aut omat ically generat es
an operand prefix for t he inst ruct ion t hat forces t he processor t o execut e a 32- bit
operat ion, even t hough it s default code- segment at t ribut e is 16- bit .
I nt el' s ASM386 assembler allows specific use of t he 16- or 32- bit inst ruct ions, for
example, LGDTW, LGDTD, I RETD. I f t he generic inst ruct ion LGDT is used, t he
default - segment at t ribut e will be used t o generat e t he right opcode.
188 195 Perform specific board initialization that is imposed by the new
protected mode
196 218 Copy the application's GDT from ROM into RAM
220 238 Copy the application's IDT from ROM into RAM
241 243 Load application's GDTR
244 245 Load application's IDTR
247 261 Copy the application's TSS from ROM into RAM
263 267 Update TSS descriptor and other aliases in GDT (GDT alias or IDT
alias)
277 277 Load the task register (without task switch) using LTR instruction
282 286 Load SS, ESP with the value found in the application's TSS
287 287 Push EFLAGS value found in the application's TSS
288 288 Push CS value found in the application's TSS
289 289 Push EIP value found in the application's TSS
290 293 Load DS, ES with the value found in the application's TSS
296 296 Perform IRET; pop the above values and enter the application code
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing (Contd.)
STARTUP.ASM Line
Numbers
Description
From To
Vol. 3 9-23
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.2 STARTUP.ASM Listing
Example 9- 1 provides high- level sample code designed t o move t he processor int o
prot ect ed mode. This list ing does not include any opcode and offset informat ion.
Example 9-1. STARTUP.ASM
MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP 09:44:51 08/19/92
PAGE 1
MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE
STARTUP
OBJECT MODULE PLACED IN startup.obj
ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.a58 pw (132 )
LINE SOURCE
1 NAME STARTUP
2
3 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
4 ;
5 ; ASSUMPTIONS:
6 ;
7 ; 1. The bottom 64K of memory is ram, and can be used for
8 ; scratch space by this module.
9 ;
10 ; 2. The system has sufficient free usable ram to copy the
11 ; initial GDT, IDT, and TSS
12 ;
13 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
14
15 ; configuration data - must match with build definition
16
17 CS_BASE EQU 0FFFF0000H
18
19 ; CS_BASE is the linear address of the segment STARTUP_CODE
20 ; - this is specified in the build language file
21
22 RAM_START EQU 400H
23
24 ; RAM_START is the start of free, usable ram in the linear
25 ; memory space. The GDT, IDT, and initial TSS will be
26 ; copied above this space, and a small data segment will be
27 ; discarded at this linear address. The 32-bit word at
9-24 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
28 ; RAM_START will contain the linear address of the first
29 ; free byte above the copied tables - this may be useful if
30 ; a memory manager is used.
31
32 TSS_INDEX EQU 10
33
34 ; TSS_INDEX is the index of the TSS of the first task to
35 ; run after startup
36
37
38 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
39
40 ; ------------------------- STRUCTURES and EQU ---------------
41 ; structures for system data
42
43 ; TSS structure
44 TASK_STATE STRUC
45 link DW ?
46 link_h DW ?
47 ESP0 DD ?
48 SS0 DW ?
49 SS0_h DW ?
50 ESP1 DD ?
51 SS1 DW ?
52 SS1_h DW ?
53 ESP2 DD ?
54 SS2 DW ?
55 SS2_h DW ?
56 CR3_reg DD ?
57 EIP_reg DD ?
58 EFLAGS_regDD ?
59 EAX_reg DD ?
60 ECX_reg DD ?
61 EDX_reg DD ?
62 EBX_reg DD ?
63 ESP_reg DD ?
64 EBP_reg DD ?
65 ESI_reg DD ?
66 EDI_reg DD ?
67 ES_reg DW ?
68 ES_h DW ?
69 CS_reg DW ?
70 CS_h DW ?
Vol. 3 9-25
PROCESSOR MANAGEMENT AND INITIALIZATION
71 SS_reg DW ?
72 SS_h DW ?
73 DS_reg DW ?
74 DS_h DW ?
75 FS_reg DW ?
76 FS_h DW ?
77 GS_reg DW ?
78 GS_h DW ?
79 LDT_reg DW ?
80 LDT_h DW ?
81 TRAP_reg DW ?
82 IO_map_baseDW ?
83 TASK_STATE ENDS
84
85 ; basic structure of a descriptor
86 DESC STRUC
87 lim_0_15 DW ?
88 bas_0_15 DW ?
89 bas_16_23DB ?
90 access DB ?
91 gran DB ?
92 bas_24_31DB ?
93 DESC ENDS
94
95 ; structure for use with LGDT and LIDT instructions
96 TABLE_REG STRUC
97 table_limDW ?
98 table_linearDD ?
99 TABLE_REG ENDS
100
101 ; offset of GDT and IDT descriptors in builder generated GDT
102 GDT_DESC_OFF EQU 1*SIZE(DESC)
103 IDT_DESC_OFF EQU 2*SIZE(DESC)
104
105 ; equates for building temporary GDT in RAM
106 LINEAR_SEL EQU 1*SIZE (DESC)
107 LINEAR_PROTO_LO EQU 00000FFFFH ; LINEAR_ALIAS
108 LINEAR_PROTO_HI EQU 000CF9200H
109
110 ; Protection Enable Bit in CR0
111 PE_BIT EQU 1B
112
113 ; ------------------------------------------------------------
9-26 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
114
115 ; ------------------------- DATA SEGMENT----------------------
116
117 ; Initially, this data segment starts at linear 0, according
118 ; to the processors power-up state.
119
120 STARTUP_DATA SEGMENT RW
121
122 free_mem_linear_base LABEL DWORD
123 TEMP_GDT LABEL BYTE ; must be first in segment
124 TEMP_GDT_NULL_DESC DESC <>
125 TEMP_GDT_LINEAR_DESC DESC <>
126
127 ; scratch areas for LGDT and LIDT instructions
128 TEMP_GDT_SCRATCH TABLE_REG <>
129 APP_GDT_RAM TABLE_REG <>
130 APP_IDT_RAM TABLE_REG <>
131 ; align end_data
132 fill DW ?
133
134 ; last thing in this segment - should be on a dword boundary
135 end_data LABEL BYTE
136
137 STARTUP_DATA ENDS
138 ; ------------------------------------------------------------
139
140
141 ; ------------------------- CODE SEGMENT----------------------
142 STARTUP_CODE SEGMENT ER PUBLIC USE16
143
144 ; filled in by builder
145 PUBLIC GDT_EPROM
146 GDT_EPROM TABLE_REG <>
147
148 ; filled in by builder
149 PUBLIC IDT_EPROM
150 IDT_EPROM TABLE_REG <>
151
152 ; entry point into startup code - the bootstrap will vector
153 ; here with a near JMP generated by the builder. This
154 ; label must be in the top 64K of linear memory.
155
156 PUBLIC STARTUP
157 STARTUP:
158
Vol. 3 9-27
PROCESSOR MANAGEMENT AND INITIALIZATION
159 ; DS,ES address the bottom 64K of flat linear memory
160 ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA
161 ; See Figure 9-4
162 ; load GDTR with temporary GDT
163 LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram,
164 MOV DWORD PTR [EBX],0 ; where we can address
165 MOV DWORD PTR [EBX]+4,0
166 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO
167 MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI
168 MOV TEMP_GDT_scratch.table_linear,EBX
169 MOV TEMP_GDT_scratch.table_lim,15
170
171 DB 66H; execute a 32 bit LGDT
172 LGDT TEMP_GDT_scratch
173
174 ; enter protected mode
175 MOV EBX,CR0
176 OR EBX,PE_BIT
177 MOV CR0,EBX
178
179 ; clear prefetch queue
180 JMP CLEAR_LABEL
181 CLEAR_LABEL:
182
183 ; make DS and ES address 4G of linear memory
184 MOV CX,LINEAR_SEL
185 MOV DS,CX
186 MOV ES,CX
187
188 ; do board specific initialization
189 ;
190 ;
191 ; ......
192 ;
193
194
195 ; See Figure 9-5
196 ; copy EPROM GDT to ram at:
197 ; RAM_START + size (STARTUP_DATA)
198 MOV EAX,RAM_START
199 ADD EAX,OFFSET (end_data)
200 MOV EBX,RAM_START
9-28 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
201 MOV ECX, CS_BASE
202 ADD ECX, OFFSET (GDT_EPROM)
203 MOV ESI, [ECX].table_linear
204 MOV EDI,EAX
205 MOVZX ECX, [ECX].table_lim
206 MOV APP_GDT_ram[EBX].table_lim,CX
207 INC ECX
208 MOV EDX,EAX
209 MOV APP_GDT_ram[EBX].table_linear,EAX
210 ADD EAX,ECX
211 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
212
213 ; fixup GDT base in descriptor
214 MOV ECX,EDX
215 MOV [EDX].bas_0_15+GDT_DESC_OFF,CX
216 ROR ECX,16
217 MOV [EDX].bas_16_23+GDT_DESC_OFF,CL
218 MOV [EDX].bas_24_31+GDT_DESC_OFF,CH
219
220 ; copy EPROM IDT to ram at:
221 ; RAM_START+size(STARTUP_DATA)+SIZE (EPROM GDT)
222 MOV ECX, CS_BASE
223 ADD ECX, OFFSET (IDT_EPROM)
224 MOV ESI, [ECX].table_linear
225 MOV EDI,EAX
226 MOVZX ECX, [ECX].table_lim
227 MOV APP_IDT_ram[EBX].table_lim,CX
228 INC ECX
229 MOV APP_IDT_ram[EBX].table_linear,EAX
230 MOV EBX,EAX
231 ADD EAX,ECX
232 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
233
234 ; fixup IDT pointer in GDT
235 MOV [EDX].bas_0_15+IDT_DESC_OFF,BX
236 ROR EBX,16
237 MOV [EDX].bas_16_23+IDT_DESC_OFF,BL
238 MOV [EDX].bas_24_31+IDT_DESC_OFF,BH
239
240 ; load GDTR and IDTR
241 MOV EBX,RAM_START
242 DB 66H ; execute a 32 bit LGDT
243 LGDT APP_GDT_ram[EBX]
244 DB 66H ; execute a 32 bit LIDT
245 LIDT APP_IDT_ram[EBX]
Vol. 3 9-29
PROCESSOR MANAGEMENT AND INITIALIZATION
246
247 ; move the TSS
248 MOV EDI,EAX
249 MOV EBX,TSS_INDEX*SIZE(DESC)
250 MOV ECX,GDT_DESC_OFF ;build linear address for TSS
251 MOV GS,CX
252 MOV DH,GS:[EBX].bas_24_31
253 MOV DL,GS:[EBX].bas_16_23
254 ROL EDX,16
255 MOV DX,GS:[EBX].bas_0_15
256 MOV ESI,EDX
257 LSL ECX,EBX
258 INC ECX
259 MOV EDX,EAX
260 ADD EAX,ECX
261 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
262
263 ; fixup TSS pointer
264 MOV GS:[EBX].bas_0_15,DX
265 ROL EDX,16
266 MOV GS:[EBX].bas_24_31,DH
267 MOV GS:[EBX].bas_16_23,DL
268 ROL EDX,16
269 ;save start of free ram at linear location RAMSTART
270 MOV free_mem_linear_base+RAM_START,EAX
271
272 ;assume no LDT used in the initial task - if necessary,
273 ;code to move the LDT could be added, and should resemble
274 ;that used to move the TSS
275
276 ; load task register
277 LTR BX ; No task switch, only descriptor loading
278 ; See Figure 9-6
279 ; load minimal set of registers necessary to simulate task
280 ; switch
281
282
283 MOV AX,[EDX].SS_reg ; start loading registers
284 MOV EDI,[EDX].ESP_reg
285 MOV SS,AX
286 MOV ESP,EDI ; stack now valid
287 PUSH DWORD PTR [EDX].EFLAGS_reg
288 PUSH DWORD PTR [EDX].CS_reg
9-30 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
289 PUSH DWORD PTR [EDX].EIP_reg
290 MOV AX,[EDX].DS_reg
291 MOV BX,[EDX].ES_reg
292 MOV DS,AX ; DS and ES no longer linear memory
293 MOV ES,BX
294
295 ; simulate far jump to initial task
296 IRETD
297
298 STARTUP_CODE ENDS
*** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED
INSTRUCTION(S)
299
300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA
301
302
ASSEMBLY COMPLETE, 1 WARNING, NO ERRORS.
Vol. 3 9-31
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines
162-172 of List File)
FFFF FFFFH
Base=0, Limit=4G
START: [CS.BASE+EIP]
TEMP_GDT
Jump near start
FFFF 0000H
Construct TEMP_GDT
LGDT
Move to protected mode
DS, ES = GDT[1] 4 GB
0
GDT [1]
GDT [0]
GDT_SCRATCH
Base
Limit
9-32 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List
File)
FFFF FFFFH
GDT RAM
Move the GDT, IDT, TSS
Fix Aliases
LTR
0
RAM_START
TSS
IDT
GDT
TSS RAM
IDT RAM
from ROM to RAM
Vol. 3 9-33
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.3 MAIN.ASM Source Code
The file MAI N. ASM shown in Example 9- 2 defines t he dat a and st ack segment s for
t his applicat ion and can be subst it ut ed wit h t he main module t ask writ t en in a high-
level language t hat is invoked by t he I RET inst ruct ion execut ed by STARTUP. ASM.
Example 9-2. MAIN.ASM
NAME main_module
data SEGMENT RW
dw 1000 dup(?)
DATA ENDS
stack stackseg 800
Figure 9-6. Task Switching (Lines 282-296 of List File)
GDT RAM
RAM_START
TSS RAM
IDT RAM
GDT Alias
IDT Alias
DS
EIP
EFLAGS
CS
SS
0
ES
ESP

SS = TSS.SS
ESP = TSS.ESP
PUSH TSS.EFLAG
PUSH TSS.CS
PUSH TSS.EIP
ES = TSS.ES
DS = TSS.DS
IRET
GDT
9-34 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
CODE SEGMENT ER use32 PUBLIC
main_start:
nop
nop
nop
CODE ENDS
END main_start, ds:data, ss:stack
9.10.4 Supporting Files
The bat ch file shown in Example 9- 3 can be used t o assemble t he source code files
STARTUP.ASM and MAI N. ASM and build t he final applicat ion.
Example 9-3. Batch File to Assemble and Build the Application
ASM386 STARTUP.ASM
ASM386 MAIN.ASM
BLD386 STARTUP.OBJ, MAIN.OBJ buildfile(EPROM.BLD) bootstrap(STARTUP)
Bootload
BLD386 performs several operations in this example:
It allocates physical memory location to segments and tables.
It generates tables using the build file and the input files.
It links object files and resolves references.
It generates a boot-loadable file to be programmed into the EPROM.
Example 9- 4 shows t he build file used as an input t o BLD386 t o perform t he above
funct ions.
Example 9-4. Build File
INIT_BLD_EXAMPLE;
SEGMENT
*SEGMENTS(DPL = 0)
, startup.startup_code(BASE = 0FFFF0000H)
;
TASK
BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0,
NOT INTENABLED)
, PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0,
NOT INTENABLED)
;
Vol. 3 9-35
PROCESSOR MANAGEMENT AND INITIALIZATION
TABLE
GDT (
LOCATION = GDT_EPROM
, ENTRY = (
10: PROTECTED_MODE_TASK
, startup.startup_code
, startup.startup_data
, main_module.data
, main_module.code
, main_module.stack
)
),
IDT (
LOCATION = IDT_EPROM
);
MEMORY
(
RESERVE = (0..3FFFH
-- Area for the GDT, IDT, TSS copied from ROM
, 60000H..0FFFEFFFFH)
, RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH))
-- Eprom size 64K
, RANGE = (RAM_AREA = RAM (4000H..05FFFFH))
);
END
Table 9- 5 shows t he relat ionship of each build it em wit h an ASM source file.
Table 9-5. Relationship Between BLD Item and ASM Source File
Item ASM386 and
Startup.A58
BLD386 Controls
and BLD file
Effect
Bootstrap public startup
startup:
bootstrap
start(startup)
Near jump at 0FFFFFFF0H
to start.
GDT location public GDT_EPROM
GDT_EPROM TABLE_REG <>
TABLE
GDT(location = GDT_EPROM)
The location of the GDT
will be programmed into
the GDT_EPROM location.
IDT location public IDT_EPROM
IDT_EPROM TABLE_REG <>
TABLE
IDT(location = IDT_EPROM
The location of the IDT
will be programmed into
the IDT_EPROM location.
9-36 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11 MICROCODE UPDATE FACILITIES
The Pent ium 4, I nt el Xeon, and P6 family processors have t he capabilit y t o correct
errat a by loading an I nt el- supplied dat a block int o t he processor. The dat a block is
called a microcode updat e. This sect ion describes t he mechanisms t he BI OS needs t o
provide in order t o use t his feat ure during syst em init ializat ion. I t also describes a
specificat ion t hat permit s t he incorporat ion of fut ure updat es int o a syst em BI OS.
I nt el considers t he release of a microcode updat e for a silicon revision t o be t he
equivalent of a processor st epping and complet es a full- st epping level validat ion for
releases of microcode updat es.
A microcode updat e is used t o correct errat a in t he processor. The BI OS, which has
an updat e loader, is responsible for loading t he updat e on processors during syst em
init ializat ion ( Figure 9- 7) . There are t wo st eps t o t his process: t he first is t o incorpo-
rat e t he necessary updat e dat a blocks int o t he BI OS; t he second is t o load updat e
dat a blocks int o t he processor.
RAM start RAM_START equ 400H memory (reserve = (0..3FFFH)) RAM_START is used as
the ram destination for
moving the tables. It must
be excluded from the
application's segment
area.
Location of the
application TSS
in the GDT
TSS_INDEX EQU 10 TABLE GDT(
ENTRY = (10:
PROTECTED_MODE_
TASK))
Put the descriptor of the
application TSS in GDT
entry 10.
EPROM size
and location
size and location of the
initialization code
SEGMENT startup.code (base =
0FFFF0000H) ...memory
(RANGE(
ROM_AREA = ROM(x..y))
Initialization code size
must be less than 64K
and resides at upper most
64K of the 4-GByte
memory space.
Table 9-5. Relationship Between BLD Item and ASM Source File (Contd.)
Item ASM386 and
Startup.A58
BLD386 Controls
and BLD file
Effect
Vol. 3 9-37
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.1 Microcode Update
A microcode updat e consist s of an I nt el- supplied binary t hat cont ains a descript ive
header and dat a. No execut able code resides wit hin t he updat e. Each microcode
updat e is t ailored for a specific list of processor signat ures. A mismat ch of t he
processor s signat ure wit h t he signat ure cont ained in t he updat e will result in a
failure t o load. A processor signat ure includes t he ext ended family, ext ended model,
t ype, family, model, and st epping of t he processor ( st art ing wit h processor family
0fH, model 03H, a given microcode updat e may be associat ed wit h one of mult iple
processor signat ures; see Sect ion 9. 11. 2 for det ail) .
Microcode updat es are composed of a mult i- byt e header, followed by encrypt ed dat a
and t hen by an opt ional ext ended signat ure t able. Table 9- 6 provides a definit ion of
t he fields; Table 9- 7 shows t he format of an updat e.
The header is 48 byt es. The first 4 byt es of t he header cont ain t he header version.
The updat e header and it s reserved fields are int erpret ed by soft ware based upon t he
header version. An encoding scheme guards against t ampering and provides a
means for det ermining t he aut hent icit y of any given updat e. For microcode updat es
wit h a dat a size field equal t o 00000000H, t he size of t he microcode updat e is 2048
byt es. The first 48 byt es cont ain t he microcode updat e header. The remaining 2000
byt es cont ain encrypt ed dat a.
For microcode updat es wit h a dat a size not equal t o 00000000H, t he t ot al size field
specifies t he size of t he microcode updat e. The first 48 byt es cont ain t he microcode
updat e header. The second part of t he microcode updat e is t he encrypt ed dat a. The
dat a size field of t he microcode updat e header specifies t he encrypt ed dat a size, it s
value must be a mult iple of t he size of DWORD. The t ot al size field of t he microcode
updat e header specifies t he encrypt ed dat a size plus t he header size; it s value must
be in mult iples of 1024 byt es ( 1 KByt es) . The opt ional ext ended signat ure t able if
implement ed follows t he encrypt ed dat a, and it s size is calculat ed by ( Tot al Size
( Dat a Size + 48) ) .
Figure 9-7. Applying Microcode Updates
CPU
BIOS
Update
Blocks
New Update
Update
Loader
9-38 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
NOTE
The opt ional ext ended signat ure t able is support ed st art ing wit h
processor family 0FH, model 03H.
.
Table 9-6. Microcode Update Field Definitions
Field Name Offset
(bytes)
Length
(bytes)
Description
Header Version 0 4 Version number of the update header.
Update Revision 4 4 Unique version number for the update, the basis for the
update signature provided by the processor to indicate
the current update functioning within the processor.
Used by the BIOS to authenticate the update and verify
that the processor loads successfully. The value in this
field cannot be used for processor stepping identification
alone. This is a signed 32-bit number.
Date 8 4 Date of the update creation in binary format: mmddyyyy
(e.g. 07/18/98 is 07181998H).
Processor
Signature
12 4 Extended family, extended model, type, family, model,
and stepping of processor that requires this particular
update revision (e.g., 00000650H). Each microcode
update is designed specifically for a given extended
family, extended model, type, family, model, and stepping
of the processor.
The BIOS uses the processor signature field in
conjunction with the CPUID instruction to determine
whether or not an update is appropriate to load on a
processor. The information encoded within this field
exactly corresponds to the bit representations returned
by the CPUID instruction.
Checksum 16 4 Checksum of Update Data and Header. Used to verify the
integrity of the update header and data. Checksum is
correct when the summation of all the DWORDs (including
the extended Processor Signature Table) that comprise
the microcode update result in 00000000H.
Loader Revision 20 4 Version number of the loader program needed to
correctly load this update. The initial version is
00000001H.
Processor Flags 24 4 Platform type information is encoded in the lower 8 bits
of this 4-byte field. Each bit represents a particular
platform type for a given CPUID. The BIOS uses the
processor flags field in conjunction with the platform Id
bits in MSR (17H) to determine whether or not an update
is appropriate to load on a processor. Multiple bits may be
set representing support for multiple platform IDs.
Data Size 28 4 Specifies the size of the encrypted data in bytes, and
must be a multiple of DWORDs. If this value is
00000000H, then the microcode update encrypted data
is 2000 bytes (or 500 DWORDs).
Total Size 32 4 Specifies the total size of the microcode update in bytes.
It is the summation of the header size, the encrypted
data size and the size of the optional extended signature
table. This value is always a multiple of 1024.
Vol. 3 9-39
PROCESSOR MANAGEMENT AND INITIALIZATION
Reserved 36 12 Reserved fields for future expansion
Update Data 48 Data Size or
2000
Update data
Extended Signature
Count
Data Size +
48
4 Specifies the number of extended signature structures
(Processor Signature[n], processor flags[n] and
checksum[n]) that exist in this microcode update.
Extended
Checksum
Data Size +
52
4 Checksum of update extended processor signature table.
Used to verify the integrity of the extended processor
signature table. Checksum is correct when the
summation of the DWORDs that comprise the extended
processor signature table results in 00000000H.
Reserved Data Size +
56
12 Reserved fields
Processor
Signature[n]
Data Size +
68 + (n * 12)
4 Extended family, extended model, type, family, model,
and stepping of processor that requires this particular
update revision (e.g., 00000650H). Each microcode
update is designed specifically for a given extended
family, extended model, type, family, model, and stepping
of the processor.
The BIOS uses the processor signature field in
conjunction with the CPUID instruction to determine
whether or not an update is appropriate to load on a
processor. The information encoded within this field
exactly corresponds to the bit representations returned
by the CPUID instruction.
Processor Flags[n] Data Size +
72 + (n * 12)
4 Platform type information is encoded in the lower 8 bits
of this 4-byte field. Each bit represents a particular
platform type for a given CPUID. The BIOS uses the
processor flags field in conjunction with the platform Id
bits in MSR (17H) to determine whether or not an update
is appropriate to load on a processor. Multiple bits may be
set representing support for multiple platform IDs.
Checksum[n] Data Size +
76 + (n * 12)
4 Used by utility software to decompose a microcode
update into multiple microcode updates where each of
the new updates is constructed without the optional
Extended Processor Signature Table.
To calculate the Checksum, substitute the Primary
Processor Signature entry and the Processor Flags entry
with the corresponding Extended Patch entry. Delete the
Extended Processor Signature Table entries. The
Checksum is correct when the summation of all DWORDs
that comprise the created Extended Processor Patch
results in 00000000H.
Table 9-6. Microcode Update Field Definitions (Contd.)
Field Name Offset
(bytes)
Length
(bytes)
Description
9-40 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 9-7. Microcode Update Format
31 24 16 8 0 Bytes
Header Version 0
Update Revision 4
Month: 8 Day: 8 Year: 16 8
Processor Signature (CPUID) 12
R
e
s
:

4
E
x
t
e
n
d
e
d
F
a
m
i
l
y
:

8
E
x
t
e
n
d
e
d

M
o
d
e
:

4
R
e
s
e
r
v
e
d
:

2
T
y
p
e
:

2
F
a
m
i
l
y
:

4
M
o
d
e
l
:

4
S
t
e
p
p
i
n
g
:

4
Checksum 16
Loader Revision 20
Processor Flags 24
Reserved (24 bits)
P
7
P
6
P
5
P
4
P
3
P
2
P
1
P
0
Data Size 28
Total Size 32
Reserved (12 Bytes) 36
Update Data (Data Size bytes, or 2000 Bytes if Data Size = 00000000H) 48
Extended Signature Count n Data Size
+ 48
Extended Processor Signature Table Checksum Data Size
+ 52
Reserved (12 Bytes) Data Size
+ 56
Processor Signature[n] Data Size
+ 68 +
(n * 12)
Processor Flags[n] Data Size
+ 72 +
(n * 12)
Checksum[n] Data Size
+ 76 +
(n * 12)
Vol. 3 9-41
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.2 Optional Extended Signature Table
The ext ended signat ure t able is a st ruct ure t hat may be appended t o t he end of t he
encrypt ed dat a when t he encrypt ed dat a only support s a single processor signat ure
( opt ional case) . The ext ended signat ure t able will always be present when t he
encrypt ed dat a support s mult iple processor st eppings and/ or models ( required
case) .
The ext ended signat ure t able consist s of a 20- byt e ext ended signat ure header st ruc-
t ure, which cont ains t he ext ended signat ure count , t he ext ended processor signat ure
t able checksum, and 12 reserved byt es ( Table 9- 8) . Following t he ext ended signa-
t ure header st ruct ure, t he ext ended signat ure t able cont ains 0- t o- n ext ended
processor signat ure st ruct ures.
Each processor signat ure st ruct ure consist of t he processor signat ure, processor
flags, and a checksum ( Table 9- 9) .
The ext ended signat ure count in t he ext ended signat ure header st ruct ure indicat es
t he number of processor signat ure st ruct ures t hat exist in t he ext ended signat ure
t able.
The ext ended processor signat ure t able checksum is a checksum of all DWORDs t hat
comprise t he ext ended signat ure t able. That includes t he ext ended signat ure count ,
ext ended processor signat ure t able checksum, 12 reserved byt es and t he n
processor signat ure st ruct ures. A valid ext ended signat ure t able exist s when t he
result of a DWORD checksum is 00000000H.
9.11.3 Processor Identification
Each microcode updat e is designed t o for a specific processor or set of processors. To
det ermine t he correct microcode updat e t o load, soft ware must ensure t hat one of
t he processor signat ures embedded in t he microcode updat e mat ches t he 32- bit
processor signat ure ret urned by t he CPUI D inst ruct ion when execut ed by t he t arget
processor wit h EAX = 1. At t empt ing t o load a microcode updat e t hat does not mat ch
Table 9-8. Extended Processor Signature Table Header Structure
Extended Signature Count n Data Size + 48
Extended Processor Signature Table Checksum Data Size + 52
Reserved (12 Bytes) Data Size + 56
Table 9-9. Processor Signature Structure
Processor Signature[n] Data Size + 68 + (n * 12)
Processor Flags[n] Data Size + 72 + (n * 12)
Checksum[n] Data Size + 76 + (n * 12)
9-42 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
a processor signat ure embedded in t he microcode updat e wit h t he processor signa-
t ure ret urned by CPUI D will cause t he BI OS t o rej ect t he updat e.
Example 9- 5 shows how t o check for a valid processor signat ure mat ch bet ween t he
processor and microcode updat e.
Example 9-5. Pseudo Code to Validate the Processor Signature
ProcessorSignature CPUID(1):EAX
If (Update.HeaderVersion == 00000001h)
{
// first check the ProcessorSignature field
If (ProcessorSignature == Update.ProcessorSignature)
Success
// if extended signature is present
Else If (Update.TotalSize > (Update.DataSize + 48))
{
//
// Assume the Data Size has been used to calculate the
// location of Update.ProcessorSignature[0].
//
For (N 0; ((N < Update.ExtendedSignatureCount) AND
(ProcessorSignature != Update.ProcessorSignature[N])); N++);
// if the loops ended when the iteration count is
// less than the number of processor signatures in
// the table, we have a match
If (N < Update.ExtendedSignatureCount)
Success
Else
Fail
}
Else
Fail
Else
Fail
9.11.4 Platform Identification
I n addit ion t o verifying t he processor signat ure, t he int ended processor plat form t ype
must be det ermined t o properly t arget t he microcode updat e. The int ended
processor plat form t ype is det ermined by reading t he I A32_PLATFORM_I D regist er,
( MSR 17H) . This 64- bit regist er must be read using t he RDMSR inst ruct ion.
Vol. 3 9-43
PROCESSOR MANAGEMENT AND INITIALIZATION
The t hree plat form I D bit s, when read as a binary coded decimal ( BCD) number, indi-
cat e t he bit posit ion in t he microcode updat e header s processor flags field associat ed
wit h t he inst alled processor. The processor flags in t he 48- byt e header and t he
processor flags field associat ed wit h t he ext ended processor signat ure st ruct ures
may have mult iple bit s set . Each set bit represent s a different plat form I D t hat t he
updat e support s.
Register Name: IA32_PLATFORM_ID
MSR Address: 017H
Access: Read Only
IA32_PLATFORM_ID is a 64-bit register accessed only when referenced as a Qword through a
RDMSR instruction.
To validat e t he plat form informat ion, soft ware may implement an algorit hm similar t o
t he algorit hms in Example 9- 6.
Example 9-6. Pseudo Code Example of Processor Flags Test
Flag 1 << IA32_PLATFORM_ID[52:50]
If (Update.HeaderVersion == 00000001h)
{
If (Update.ProcessorFlags & Flag)
{
Load Update
Table 9-10. Processor Flags
Bit Descriptions
63:53 Reserved
52:50 Platform Id Bits (RO). The field gives information concerning the intended platform for
the processor. See also Table 9-7.
52 51 50
0 0 0 Processor Flag 0
0 0 1 Processor Flag 1
0 1 0 Processor Flag 2
0 1 1 Processor Flag 3
1 0 0 Processor Flag 4
1 0 1 Processor Flag 5
1 1 0 Processor Flag 6
1 1 1 Processor Flag 7
49:0 Reserved
9-44 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
}
Else
{
//
// Assume the Data Size has been used to calculate the
// location of Update.ProcessorSignature[N] and a match
// on Update.ProcessorSignature[N] has already succeeded
//
If (Update.ProcessorFlags[n] & Flag)
{
Load Update
}
}
}
9.11.5 Microcode Update Checksum
Each microcode updat e cont ains a DWORD checksum locat ed in t he updat e header. I t
is soft wares responsibilit y t o ensure t hat a microcode updat e is not corrupt . To check
for a corrupt microcode updat e, soft ware must perform a unsigned DWORD ( 32- bit )
checksum of t he microcode updat e. Even t hough some fields are signed, t he
checksum procedure t reat s all DWORDs as unsigned. Microcode updat es wit h a
header version equal t o 00000001H must sum all DWORDs t hat comprise t he micro-
code updat e. A valid checksum check will yield a value of 00000000H. Any ot her
value indicat es t he microcode updat e is corrupt and should not be loaded.
The checksum algorit hm shown by t he pseudo code in Example 9- 7 t reat s t he micro-
code updat e as an array of unsigned DWORDs. I f t he dat a size DWORD field at byt e
offset 32 equals 00000000H, t he size of t he encrypt ed dat a is 2000 byt es, result ing
in 500 DWORDs. Ot herwise t he microcode updat e size in DWORDs = ( Tot al Size / 4) ,
where t he t ot al size is a mult iple of 1024 byt es ( 1 KByt es) .
Example 9-7. Pseudo Code Example of Checksum Test
N 512
If (Update.DataSize != 00000000H)
N Update.TotalSize / 4
ChkSum 0
For (I 0; I < N; I++)
{
ChkSum ChkSum + MicrocodeUpdate[I]
}
Vol. 3 9-45
PROCESSOR MANAGEMENT AND INITIALIZATION
If (ChkSum == 00000000H)
Success
Else
Fail
9.11.6 Microcode Update Loader
This sect ion describes an updat e loader used t o load an updat e int o a Pent ium 4, I nt el
Xeon, or P6 family processor. I t also discusses t he requirement s placed on t he BI OS
t o ensure proper loading. The updat e loader described cont ains t he minimal inst ruc-
t ions needed t o load an updat e. The specific inst ruct ion sequence t hat is required t o
load an updat e is dependent upon t he loader revision field cont ained wit hin t he
updat e header. This revision is expect ed t o change infrequent ly ( pot ent ially, only
when new processor models are int roduced) .
Example 9- 8 below represent s t he updat e loader wit h a loader revision of
00000001H. Not e t hat t he microcode updat e must be aligned on a 16- byt e boundary
and t he size of t he microcode updat e must be 1- KByt e granular.
Example 9-8. Assembly Code Example of Simple Microcode Update Loader
mov ecx,79h ; MSR to read in ECX
xor eax,eax ; clear EAX
xor ebx,ebx ; clear EBX
mov ax,cs ; Segment of microcode update
shl eax,4
mov bx,offset Update ; Offset of microcode update
add eax,ebx ; Linear Address of Update in EAX
add eax,48d ; Offset of the Update Data within the Update
xor edx,edx ; Zero in EDX
WRMSR ; microcode update trigger
The loader shown in Example 9- 8 assumes t hat updat e is t he address of a microcode
updat e ( header and dat a) embedded wit hin t he code segment of t he BI OS. I t also
assumes t hat t he processor is operat ing in real mode. The dat a may reside anywhere
in memory, aligned on a 16- byt e boundary, t hat is accessible by t he processor wit hin
it s current operat ing mode.
Before t he BI OS execut es t he microcode updat e t rigger ( WRMSR) inst ruct ion, t he
following must be t rue:
I n 64- bit mode, EAX cont ains t he lower 32- bit s of t he microcode updat e linear
address. I n prot ect ed mode, EAX cont ains t he full 32- bit linear address of t he
microcode updat e.
I n 64- bit mode, EDX cont ains t he upper 32- bit s of t he microcode updat e linear
address. I n prot ect ed mode, EDX equals zero.
9-46 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
ECX cont ains 79H ( address of I A32_BI OS_UPDT_TRI G) .
Ot her requirement s are:
I f t he updat e is loaded while t he processor is in real mode, t hen t he updat e dat a
may not cross a segment boundary.
I f t he updat e is loaded while t he processor is in real mode, t hen t he updat e dat a
may not exceed a segment limit .
I f paging is enabled, pages t hat are current ly present must map t he updat e dat a.
The microcode updat e dat a requires a 16- byt e boundary alignment .
9.11.6.1 Hard Resets in Update Loading
The effect s of a loaded updat e are cleared from t he processor upon a hard reset .
Therefore, each t ime a hard reset is assert ed during t he BI OS POST, t he updat e must
be reloaded on all processors t hat observed t he reset . The effect s of a loaded updat e
are, however, maint ained across a processor I NI T. There are no side effect s caused
by loading an updat e int o a processor mult iple t imes.
9.11.6.2 Update in a Multiprocessor System
A mult iprocessor ( MP) syst em requires loading each processor wit h updat e dat a
appropriat e for it s CPUI D and plat form I D bit s. The BI OS is responsible for ensuring
t hat t his requirement is met and t hat t he loader is locat ed in a module execut ed by
all processors in t he syst em. I f a syst em design permit s mult iple st eppings of
Pent ium 4, I nt el Xeon, and P6 family processors t o exist concurrent ly; t hen t he BI OS
must verify individual processors against t he updat e header informat ion t o ensure
appropriat e loading. Given t hese considerat ions, it is most pract ical t o load t he
updat e during MP init ializat ion.
9.11.6.3 Update in a System Supporting Intel Hyper-Threading Technology
I nt el Hyper-Threading Technology has implicat ions on t he loading of t he microcode
updat e. The updat e must be loaded for each core in a physical processor. Thus, for a
processor support ing I nt el Hyper-Threading Technology, only one logical processor
per core is required t o load t he microcode updat e. Each individual logical processor
can independent ly load t he updat e. However, MP init ializat ion must provide some
mechanism ( e. g. a soft ware semaphore) t o force serializat ion of microcode updat e
loads and t o prevent simult aneous load at t empt s t o t he same core.
9.11.6.4 Update in a System Supporting Dual-Core Technology
Dual- core t echnology has implicat ions on t he loading of t he microcode updat e. The
microcode updat e facilit y is not shared bet ween processor cores in t he same physical
package. The updat e must be loaded for each core in a physical processor.
Vol. 3 9-47
PROCESSOR MANAGEMENT AND INITIALIZATION
I f processor core support s I nt el Hyper-Threading Technology, t he guideline described
in Sect ion 9. 11. 6. 3 also applies.
9.11.6.5 Update Loader Enhancements
The updat e loader present ed in Sect ion 9. 11. 6, Microcode Updat e Loader, is a
minimal implement at ion t hat can be enhanced t o provide addit ional funct ionalit y.
Pot ent ial enhancement s are described below:
BI OS can incorporat e mult iple updat es t o support mult iple st eppings of t he
Pent ium 4, I nt el Xeon, and P6 family processors. This feat ure provides for
operat ing in a mixed st epping environment on an MP syst em and enables a user
t o upgrade t o a lat er version of t he processor. I n t his case, modify t he loader t o
check t he CPUI D and plat form I D bit s of t he processor t hat it is running on
against t he available headers before loading a part icular updat e. The number of
updat es is only limit ed by available BI OS space.
A loader can load t he updat e and t est t he processor t o det ermine if t he updat e
was loaded correct ly. See Sect ion 9. 11. 7, Updat e Signat ure and Verificat ion.
A loader can verify t he int egrit y of t he updat e dat a by performing a checksum on
t he double words of t he updat e summing t o zero. See Sect ion 9. 11. 5, Microcode
Updat e Checksum.
A loader can provide power- on messages indicat ing successful loading of an
updat e.
9.11.7 Update Signature and Verification
The Pent ium 4, I nt el Xeon, and P6 family processors provide capabilit ies t o verify t he
aut hent icit y of a part icular updat e and t o ident ify t he current updat e revision. This
sect ion describes t he model- specific ext ensions of processors t hat support t his
feat ure. The updat e verificat ion met hod below assumes t hat t he BI OS will only verify
an updat e t hat is more recent t han t he revision current ly loaded in t he processor.
CPUI D ret urns a value in a model specific regist er in addit ion t o it s usual regist er
ret urn values. The semant ics of CPUI D cause it t o deposit an updat e I D value in t he
64- bit model- specific regist er at address 08BH ( I A32_BI OS_SI GN_I D) . I f no updat e
is present in t he processor, t he value in t he MSR remains unmodified. The BI OS must
pre- load a zero int o t he MSR before execut ing CPUI D. I f a read of t he MSR at 8BH st ill
ret urns zero aft er execut ing CPUI D, t his indicat es t hat no updat e is present .
The updat e I D value ret urned in t he EDX regist er aft er RDMSR execut es indicat es t he
revision of t he updat e loaded in t he processor. This value, in combinat ion wit h t he
CPUI D value ret urned in t he EAX regist er, uniquely ident ifies a part icular updat e. The
signat ure I D can be direct ly compared wit h t he updat e revision field in a microcode
updat e header for verificat ion of a correct load. No consecut ive updat es released for
a given st epping of a processor may share t he same signat ure. The processor signa-
t ure ret urned by CPUI D different iat es updat es for different st eppings.
9-48 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.7.1 Determining the Signature
An updat e t hat is successfully loaded int o t he processor provides a signat ure t hat
mat ches t he updat e revision of t he current ly funct ioning revision. This signat ure is
available any t ime aft er t he act ual updat e has been loaded. Request ing t he signat ure
does not have a negat ive impact upon a loaded updat e.
The procedure for det ermining t his signat ure shown in Example 9- 9.
Example 9-9. Assembly Code to Retrieve the Update Revision
MOV ECX, 08BH ;IA32_BIOS_SIGN_ID
XOR EAX, EAX ;clear EAX
XOR EDX, EDX ;clear EDX
WRMSR ;Load 0 to MSR at 8BH
MOV EAX, 1
cpuid
MOV ECX, 08BH ;IA32_BIOS_SIGN_ID
rdmsr ;Read Model Specific Register
I f t here is an updat e act ive in t he processor, it s revision is ret urned in t he EDX
regist er aft er t he RDMSR inst ruct ion execut es.
IA32_BIOS_SIGN_ID Microcode Update Signature Register
MSR Address: 08BH Accessed as a Qword
Default Value: XXXX XXXX XXXX XXXXh
Access: Read/Write
The I A32_BI OS_SI GN_I D regist er is used t o report t he microcode updat e signat ure
when CPUI D execut es. The signat ure is ret urned in t he upper DWORD ( Table 9- 11) .
9.11.7.2 Authenticating the Update
An updat e may be aut hent icat ed by t he BI OS using t he signat ure primit ive,
described above, and t he algorit hm in Example 9- 10.
Table 9-11. Microcode Update Signature
Bit Description
63:32 Microcode update signature. This field contains the signature of the currently loaded
microcode update when read following the execution of the CPUID instruction, function
1. It is required that this register field be pre-loaded with zero prior to executing the
CPUID, function 1. If the field remains equal to zero, then there is no microcode update
loaded. Another non-zero value will be the signature.
31:0 Reserved.
Vol. 3 9-49
PROCESSOR MANAGEMENT AND INITIALIZATION
Example 9-10. Pseudo Code to Authenticate the Update
Z Obtain Update Revision from the Update Header to be authenticated;
X Obtain Current Update Signature from MSR 8BH;
If (Z > X)
{
Load Update that is to be authenticated;
Y Obtain New Signature from MSR 8BH;
If (Z == Y)
Success
Else
Fail
}
Else
Fail
Example 9- 10 requires t hat t he BI OS only aut hent icat e updat es t hat cont ain a
numerically larger revision t han t he current ly loaded revision, where Current Signa-
t ure ( X) < New Updat e Revision ( Z) . A processor wit h no loaded updat e is considered
t o have a revision equal t o zero.
This aut hent icat ion procedure relies upon t he decoding provided by t he processor t o
verify an updat e from a pot ent ially host ile source. As an example, t his mechanism in
conj unct ion wit h ot her safeguards provides securit y for dynamically incorporat ing
field updat es int o t he BI OS.
9.11.8 Pentium 4, Intel Xeon, and P6 Family Processor
Microcode Update Specifications
This sect ion describes t he int erface t hat an applicat ion can use t o dynamically int e-
grat e processor- specific updat es int o t he syst em BI OS. I n t his discussion, t he appli-
cat ion is referred t o as t he calling program or caller.
The real mode I NT15 call specificat ion described here is an I nt el ext ension t o an OEM
BI OS. This ext ension allows an applicat ion t o read and modify t he cont ent s of t he
microcode updat e dat a in NVRAM. The updat e loader, which is part of t he syst em
BI OS, cannot be updat ed by t he int erface. All of t he funct ions defined in t he specifi-
cat ion must be implement ed for a syst em t o be considered compliant wit h t he speci-
ficat ion. The I NT15 funct ions are accessible only from real mode.
9.11.8.1 Responsibilities of the BIOS
I f a BI OS passes t he presence t est ( I NT 15H, AX = 0D042H, BL = 0H) , it must imple-
ment all of t he sub- funct ions defined in t he I NT 15H, AX = 0D042H specificat ion.
9-50 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
There are no opt ional funct ions. BI OS must load t he appropriat e updat e for each
processor during syst em init ializat ion.
A Header Version of an updat e block cont aining t he value 0FFFFFFFFH indicat es t hat
t he updat e block is unused and available for st oring a new updat e.
The BI OS is responsible for providing a region of non- volat ile st orage ( NVRAM) for
each pot ent ial processor st epping wit hin a syst em. This st orage unit consist s of one
or more updat e blocks. An updat e block is a cont iguous 2048- byt e block of memory.
The BI OS for a single processor syst em need only provide updat e blocks t o st ore one
microcode updat e. I f t he BI OS for a mult iple processor syst em is int ended t o support
mixed processor st eppings, t hen t he BI OS needs t o provide enough updat e blocks t o
st ore each unique microcode updat e or for each processor socket on t he OEMs
syst em board.
The BI OS is responsible for managing t he NVRAM updat e blocks. This includes
garbage collect ion, such as removing microcode updat es t hat exist in NVRAM for
which a corresponding processor does not exist in t he syst em. This specificat ion only
provides t he mechanism for ensuring securit y, t he uniqueness of an ent ry, and t hat
st ale ent ries are not loaded. The act ual updat e block management is implement at ion
specific on a per- BI OS basis.
As an example, t he BI OS may use updat e blocks sequent ially in ascending order wit h
CPU signat ures sort ed versus t he first available block. I n addit ion, garbage collect ion
may be implement ed as a set up opt ion t o clear all NVRAM slot s or as BI OS code t hat
searches and eliminat es unused ent ries during boot .
NOTES
For I A- 32 processors st art ing wit h family 0FH and model 03H and
I nt el 64 processors, t he microcode updat e may be as large as 16
KByt es. Thus, BI OS must allocat e 8 updat e blocks for each microcode
updat e. I n a MP syst em, a common microcode updat e may be
sufficient for each socket in t he syst em.
For I A- 32 processors earlier t han family 0FH and model 03H, t he
microcode updat e is 2 KByt es. An MP- capable BI OS t hat support s
mult iple st eppings must allocat e a block for each socket in t he syst em.
A single- processor BI OS t hat support s variable- sized microcode
updat e and fixed- sized microcode updat e must allocat e one 16- KByt e
region and a second region of at least 2 KByt es.
The following algorit hm ( Example 9- 11) describes t he st eps performed during BI OS
init ializat ion used t o load t he updat es int o t he processor( s) . The algorit hm assumes:
The BI OS ensures t hat no updat e cont ained wit hin NVRAM has a header version
or loader version t hat does not mat ch one current ly support ed by t he BI OS.
The updat e cont ains a correct checksum.
The BI OS ensures t hat ( at most ) one updat e exist s for each processor st epping.
Older updat e revisions are not allowed t o overwrit e more recent ones.
Vol. 3 9-51
PROCESSOR MANAGEMENT AND INITIALIZATION
These requirement s are checked by t he BI OS during t he execut ion of t he writ e
updat e funct ion of t his int erface. The BI OS sequent ially scans t hrough all of t he
updat e blocks in NVRAM st art ing wit h index 0. The BI OS scans unt il it finds an updat e
where t he processor fields in t he header mat ch t he processor signat ure ( ext ended
family, ext ended model, t ype, family, model, and st epping) as well as t he plat form
bit s of t he current processor.
Example 9-11. Pseudo Code, Checks Required Prior to Loading an Update
For each processor in the system
{
Determine the Processor Signature via CPUID function 1;
Determine the Platform Bits 1 << IA32_PLATFORM_ID[52:50];
For (I UpdateBlock 0, I < NumOfBlocks; I++)
{
If (Update.Header_Version == 0x00000001)
{
If ((Update.ProcessorSignature == Processor Signature) &&
(Update.ProcessorFlags & Platform Bits))
{
Load Update.UpdateData into the Processor;
Verify update was correctly loaded into the processor
Go on to next processor
Break;
}
Else If (Update.TotalSize > (Update.DataSize + 48))
{
N 0
While (N < Update.ExtendedSignatureCount)
{
If ((Update.ProcessorSignature[N] ==
Processor Signature) &&
(Update.ProcessorFlags[N] & Platform Bits))
{
Load Update.UpdateData into the Processor;
Verify update correctly loaded into the processor
Go on to next processor
Break;
}
N N + 1
}
I I + (Update.TotalSize / 2048)
If ((Update.TotalSize MOD 2048) == 0)
I I + 1
}
}
9-52 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
}
}
NOTES
The plat form I d bit s in I A32_PLATFORM_I D are encoded as a t hree-
bit binary coded decimal field. The plat form bit s in t he microcode
updat e header are individually bit encoded. The algorit hm must do a
t ranslat ion from one format t o t he ot her prior t o doing a check.
When performing t he I NT 15H, 0D042H funct ions, t he BI OS must assume t hat t he
caller has no knowledge of plat form specific requirement s. I t is t he responsibilit y of
BI OS calls t o manage all chipset and plat form specific prerequisit es for managing t he
NVRAM device. When writ ing t he updat e dat a using t he Writ e Updat e sub- funct ion,
t he BI OS must maint ain implement at ion specific dat a requirement s ( such as t he
updat e of NVRAM checksum) . The BI OS should also at t empt t o verify t he success of
writ e operat ions on t he st orage device used t o record t he updat e.
9.11.8.2 Responsibilities of the Calling Program
This sect ion of t he document list s t he responsibilit ies of a calling program using t he
int erface specificat ions t o load microcode updat e( s) int o BI OS NVRAM.
The calling program should call t he I NT 15H, 0D042H funct ions from a pure real
mode program and should be execut ing on a syst em t hat is running in pure real
mode.
The caller should issue t he presence t est funct ion ( sub funct ion 0) and verify t he
signat ure and ret urn codes of t hat funct ion.
I t is import ant t hat t he calling program provides t he required scrat ch RAM buffers
for t he BI OS and t he proper st ack size as specified in t he int erface definit ion.
The calling program should read any updat e dat a t hat already exist s in t he BI OS
in order t o make decisions about t he appropriat eness of loading t he updat e. The
BI OS must refuse t o overwrit e a newer updat e wit h an older version. The updat e
header cont ains informat ion about version and processor specifics for t he calling
program t o make an int elligent decision about loading.
There can be no ambiguous updat es. The BI OS must refuse t o allow mult iple
updat es for t he same CPU t o exist at t he same t ime; it also must refuse t o load
updat es for processors t hat don t exist on t he syst em.
The calling applicat ion should implement a verify funct ion t hat is run aft er t he
updat e writ e funct ion successfully complet es. This funct ion reads back t he
updat e and verifies t hat t he BI OS ret urned an image ident ical t o t he one t hat was
writ t en.
Example 9- 12 represent s a calling program.
Vol. 3 9-53
PROCESSOR MANAGEMENT AND INITIALIZATION
Example 9-12. INT 15 DO42 Calling Program Pseudo-code
//
// We must be in real mode
//
If the system is not in Real mode exit
//
// Detect presence of Genuine Intel processor(s) that can be updated
// using(CPUID)
//
If no Intel processors exist that can be updated exit
//
// Detect the presence of the Intel microcode update extensions
//
If the BIOS fails the PresenceTestexit
//
// If the APIC is enabled, see if any other processors are out there
//
Read IA32_APICBASE
If APIC enabled
{
Send Broadcast Message to all processors except self via APIC
Have all processors execute CPUID, record the Processor Signature
(i.e.,Extended Family, Extended Model, Type, Family, Model,
Stepping)
Have all processors read IA32_PLATFORM_ID[52:50], record Platform
Id Bits
If current processor cannot be updated
exit
}
//
// Determine the number of unique update blocks needed for this system
//
NumBlocks = 0
For each processor
{
If ((this is a unique processor stepping) AND
(we have a unique update in the database for this processor))
{
Checksum the update from the database;
If Checksum fails
exit
NumBlocks NumBlocks + size of microcode update / 2048
}
}
//
9-54 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
// Do we have enough update slots for all CPUs?
//
If there are more blocks required to support the unique processor
steppings than update blocks provided by the BIOS exit
//
// Do we need any update blocks at all? If not, we are done
//
If (NumBlocks == 0)
exit
//
// Record updates for processors in NVRAM.
//
For (I=0; I<NumBlocks; I++)
{
//
// Load each Update
//
Issue the WriteUpdate function
If (STORAGE_FULL) returned
{
Display Error -- BIOS is not managing NVRAM appropriately
exit
}
If (INVALID_REVISION) returned
{
Display Message: More recent update already loaded in NVRAM for
this stepping
continue
}
If any other error returned
{
Display Diagnostic
exit
}
//
// Verify the update was loaded correctly
//
Issue the ReadUpdate function
If an error occurred
{
Display Diagnostic
exit
Vol. 3 9-55
PROCESSOR MANAGEMENT AND INITIALIZATION
}
//
// Compare the Update read to that written
//
If (Update read != Update written)
{
Display Diagnostic
exit
}
I I + (size of microcode update / 2048)
}
//
// Enable Update Loading, and inform user
//
Issue the Update Control function with Task = Enable.
9.11.8.3 Microcode Update Functions
Table 9- 12 defines current Pent ium 4, I nt el Xeon, and P6 family processor microcode
updat e funct ions.
9.11.8.4 INT 15H-based Interface
I nt el recommends t hat a BI OS int erface be provided t hat allows addit ional microcode
updat es t o be added t o syst em flash. The I NT15H int erface is t he I nt el- defined
met hod for doing t his.
The program t hat calls t his int erface is responsible for providing t hree 64- kilobyt e
RAM areas for BI OS use during calls t o t he read and writ e funct ions. These RAM
scrat ch pads can be used by t he BI OS for any purpose, but only for t he durat ion of
t he funct ion call. The calling rout ine places real mode segment s point ing t o t he RAM
blocks in t he CX, DX and SI regist ers. Calls t o funct ions in t his int erface must be
made wit h a minimum of 32 kilobyt es of st ack available t o t he BI OS.
Table 9-12. Microcode Update Functions
Microcode Update
Function
Function
Number
Description Required/Optional
Presence test 00H Returns information about the
supported functions.
Required
Write update data 01H Writes one of the update data areas
(slots).
Required
Update control 02H Globally controls the loading of updates. Required
Read update data 03H Reads one of the update data areas
(slots).
Required
9-56 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
I n general, each funct ion ret urns wit h CF cleared and AH cont ains t he ret urned
st at us. The general ret urn codes and ot her const ant definit ions are list ed in Sect ion
9.11.8. 9, Ret urn Codes.
The OEM error field ( AL) is provided for t he OEM t o ret urn addit ional error informa-
t ion specific t o t he plat form. I f t he BI OS provides no addit ional informat ion about t he
error, OEM error must be set t o SUCCESS. The OEM error field is undefined if AH
cont ains eit her SUCCESS ( 00H) or NOT_I MPLEMENTED ( 86H) . I n all ot her cases, it
must be set wit h eit her SUCCESS or a value meaningful t o t he OEM.
The following sect ions describe funct ions provided by t he I NT15H- based int erface.
9.11.8.5 Function 00HPresence Test
This funct ion verifies t hat t he BI OS has implement ed required microcode updat e
funct ions. Table 9- 13 list s t he paramet ers and ret urn codes for t he funct ion.
I n order t o assure t hat t he BI OS funct ion is present , t he caller must verify t he carry
flag, t he ret urn code, and t he 64- bit signat ure. The updat e count reflect s t he number
of 2048- byt e blocks available for st orage wit hin one non- volat ile RAM.
The loader version number refers t o t he revision of t he updat e loader program t hat is
included in t he syst em BI OS image.
Table 9-13. Parameters for the Presence Test
Input
AX Function Code 0D042H
BL Sub-function 00H - Presence test
Output
CF Carry Flag Carry Set - Failure - AH contains status
Carry Clear - All return values valid
AH Return Code
AL OEM Error Additional OEM information.
EBX Signature Part 1 'INTE' - Part one of the signature
ECX Signature Part 2 'LPEP'- Part two of the signature
EDX Loader Version Version number of the microcode update loader
SI Update Count Number of 2048 update blocks in NVRAM the BIOS
allocated to storing microcode updates
Return Codes (see Table 9-18 for code definitions
SUCCESS The function completed successfully.
NOT_IMPLEMENTED The function is not implemented.
Vol. 3 9-57
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.8.6 Function 01HWrite Microcode Update Data
This funct ion int egrat es a new microcode updat e int o t he BI OS st orage device. Table
9- 14 list s t he paramet ers and ret urn codes for t he funct ion.
Table 9-14. Parameters for the Write Update Data Function
Input
AX Function Code 0D042H
BL Sub-function 01H - Write update
ES:DI Update Address Real Mode pointer to the Intel Update structure. This
buffer is 2048 bytes in length if the processor supports
only fixed-size microcode update or...
Real Mode pointer to the Intel Update structure. This
buffer is 64 KBytes in length if the processor supports a
variable-size microcode update.
CX Scratch Pad1 Real mode segment address of 64 KBytes of RAM block
DX Scratch Pad2 Real mode segment address of 64 KBytes of RAM block
SI Scratch Pad3 Real mode segment address of 64 KBytes of RAM block
SS:SP Stack pointer 32 KBytes of stack minimum
Output
CF Carry Flag Carry Set - Failure - AH Contains status
Carry Clear - All return values valid
AH Return Code Status of the call
AL OEM Error Additional OEM information
Return Codes (see Table 9-18 for code definitions
SUCCESS The function completed successfully.
NOT_IMPLEMENTED The function is not implemented.
WRITE_FAILURE A failure occurred because of the inability to write the
storage device.
ERASE_FAILURE A failure occurred because of the inability to erase the
storage device.
READ_FAILURE A failure occurred because of the inability to read the
storage device.
STORAGE_FULL The BIOS non-volatile storage area is unable to
accommodate the update because all available update
blocks are filled with updates that are needed for
processors in the system.
9-58 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Description
The BI OS is responsible for select ing an appropriat e updat e block in t he non- volat ile
st orage for st oring t he new updat e. This BI OS is also responsible for ensuring t he
int egrit y of t he informat ion provided by t he caller, including aut hent icat ing t he
proposed updat e before incorporat ing it int o st orage.
Before writ ing t he updat e block int o NVRAM, t he BI OS should ensure t hat t he updat e
st ruct ure meet s t he following crit eria in t he following order:
1. The updat e header version should be equal t o an updat e header version
recognized by t he BI OS.
2. The updat e loader version in t he updat e header should be equal t o t he updat e
loader version cont ained wit hin t he BI OS image.
3. The updat e block must checksum. This checksum is comput ed as a 32- bit
summat ion of all double words in t he st ruct ure, including t he header, dat a, and
processor signat ure t able.
The BI OS select s updat e block( s) in non- volat ile st orage for st oring t he candidat e
updat e. The BI OS can select any available updat e block as long as it guarant ees t hat
only a single updat e exist s for any given processor st epping in non- volat ile st orage.
I f t he updat e block select ed already cont ains an updat e, t he following addit ional
crit eria apply t o overwrit e it :
The processor signat ure in t he proposed updat e must be equal t o t he processor
signat ure in t he header of t he current updat e in NVRAM ( Processor Signat ure +
plat form I D bit s) .
The updat e revision in t he proposed updat e should be great er t han t he updat e
revision in t he header of t he current updat e in NVRAM.
I f no unused updat e blocks are available and t he above crit eria are not met , t he BI OS
can overwrit e updat e block( s) for a processor st epping t hat is no longer present in
t he syst em. This can be done by scanning t he updat e blocks and comparing t he
processor st eppings, ident ified in t he MP Specificat ion t able, t o t he processor st ep-
pings t hat current ly exist in t he syst em.
CPU_NOT_PRESENT The processor stepping does not currently exist in the
system.
INVALID_HEADER The update header contains a header or loader version
that is not recognized by the BIOS.
INVALID_HEADER_CS The update does not checksum correctly.
SECURITY_FAILURE The processor rejected the update.
INVALID_REVISION The same or more recent revision of the update exists in
the storage device.
Table 9-14. Parameters for the Write Update Data Function (Contd.)
Input
Vol. 3 9-59
PROCESSOR MANAGEMENT AND INITIALIZATION
Finally, before st oring t he proposed updat e in NVRAM, t he BI OS must verify t he
aut hent icit y of t he updat e via t he mechanism described in Sect ion 9. 11. 6, Micro-
code Updat e Loader. This includes loading t he updat e int o t he current processor,
execut ing t he CPUI D inst ruct ion, reading MSR 08Bh, and comparing a calculat ed
value wit h t he updat e revision in t he proposed updat e header for equalit y.
When performing t he writ e updat e funct ion, t he BI OS must record t he ent ire updat e,
including t he header, t he updat e dat a, and t he ext ended processor signat ure t able ( if
applicable) . When writ ing an updat e, t he original cont ent s may be overwrit t en,
assuming t he above crit eria have been met . I t is t he responsibilit y of t he BI OS t o
ensure t hat more recent updat es are not overwrit t en t hrough t he use of t his BI OS
call, and t hat only a single updat e exist s wit hin t he NVRAM for any processor st ep-
ping and plat form I D.
Figure 9- 8 and Figure 9- 9 show t he process t he BI OS follows t o choose an updat e
block and ensure t he int egrit y of t he dat a when it st ores t he new microcode updat e.
9-60 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-8. Microcode Update Write Operation Flow [1]

1
Valid Update
Header Version?
Loader Revision Match
BIOSs Loader?
Does Update Match A
CPU in The System
Write Microcode Update
Does Update
ChecksumCorrectly?
Yes
Yes
Yes
No
Return
CPU_NOT_PRESENT
No
Return
INVALID_HEADER
No
Return
INVALID_HEADER
No
Return
INVALID_HEADER_CS
Vol. 3 9-61
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-9. Microcode Update Write Operation Flow [2]

Return
INVALID_REVISION
Yes
1
Update Revision Newer
Than NVRAM Update?
Update Pass
Authenticity Test?
Return
SECURITY_FAILURE
Yes
Update NMRAM Record
Return
SUCCESS
Update Matching CPU
Already In NVRAM?
Space Available in
NVRAM?
Yes
No
Return
STORAGE_FULL
Replacement
policy implemented?
No
No
No
Yes
Yes
9-62 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11.8.7 Function 02HMicrocode Update Control
This funct ion enables loading of binary updat es int o t he processor. Table 9- 15 list s
t he paramet ers and ret urn codes for t he funct ion.
This cont rol is provided on a global basis for all updat es and processors. The caller
can det ermine t he current st at us of updat e loading ( enabled or disabled) wit hout
changing t he st at e. The funct ion does not allow t he caller t o disable loading of binary
updat es, as t his poses a securit y risk.
The caller specifies t he request ed operat ion by placing one of t he values from Table
9- 16 in t he BH regist er. Aft er successfully complet ing t his funct ion, t he BL regist er
cont ains eit her t he enable or t he disable designat or. Not e t hat if t he funct ion fails, t he
updat e st at us ret urn value is undefined.
Table 9-15. Parameters for the Control Update Sub-function
Input
AX Function Code 0D042H
BL Sub-function 02H - Control update
BH Task See the description below.
CX Scratch Pad1 Real mode segment of 64 KBytes of RAM block
DX Scratch Pad2 Real mode segment of 64 KBytes of RAM block
SI Scratch Pad3 Real mode segment of 64 KBytes of RAM block
SS:SP Stack pointer 32 kilobytes of stack minimum
Output
CF Carry Flag Carry Set - Failure - AH contains status
Carry Clear - All return values valid.
AH Return Code Status of the call
AL OEM Error Additional OEM Information.
BL Update Status Either enable or disable indicator
Return Codes (see Table 9-18 for code definitions)
SUCCESS Function completed successfully.
READ_FAILURE A failure occurred because of the inability to read the
storage device.
Vol. 3 9-63
PROCESSOR MANAGEMENT AND INITIALIZATION
The READ_FAI LURE error code ret urned by t his funct ion has meaning only if t he
cont rol funct ion is implement ed in t he BI OS NVRAM. The st at e of t his feat ure
( enabled/ disabled) can also be implement ed using CMOS RAM bit s where READ
failure errors cannot occur.
9.11.8.8 Function 03HRead Microcode Update Data
This funct ion reads a current ly inst alled microcode updat e from t he BI OS st orage int o
a caller- provided RAM buffer. Table 9- 17 list s t he paramet ers and ret urn codes.
Table 9-16. Mnemonic Values
Mnemonic Value Meaning
Enable 1 Enable the Update loading at initialization time.
Query 2 Determine the current state of the update control without
changing its status.
Table 9-17. Parameters for the Read Microcode Update Data Function
Input
AX Function Code 0D042H
BL Sub-function 03H - Read Update
ES:DI Buffer Address Real Mode pointer to the Intel Update
structure that will be written with the
binary data
ECX Scratch Pad1 Real Mode Segment address of 64
KBytes of RAM Block (lower 16 bits)
ECX Scratch Pad2 Real Mode Segment address of 64
KBytes of RAM Block (upper 16 bits)
DX Scratch Pad3 Real Mode Segment address of 64
KBytes of RAM Block
SS:SP Stack pointer 32 KBytes of Stack Minimum
SI Update Number This is the index number of the update
block to be read. This value is zero based
and must be less than the update count
returned from the presence test
function.
Output
CF Carry Flag Carry Set - Failure - AH contains Status
Carry Clear - All return
values are valid.
AH Return Code Status of the Call
9-64 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
The read funct ion enables t he caller t o read any microcode updat e dat a t hat already
exist s in a BI OS and make decisions about t he addit ion of new updat es. As a result
of a successful call, t he BI OS copies t he microcode updat e int o t he locat ion point ed
t o by ES: DI , wit h t he cont ent s of all Updat e block( s) t hat are used t o st ore t he spec-
ified microcode updat e.
I f t he specified block is not a header block, but does cont ain valid dat a from a micro-
code updat e t hat spans mult iple updat e blocks, t hen t he BI OS must ret urn Failure
wit h t he NOT_EMPTY error code in AH.
An updat e block is considered unused and available for st oring a new updat e if it s
Header Version cont ains t he value 0FFFFFFFFH aft er ret urn from t his funct ion call.
The act ual implement at ion of NVRAM st orage management is not specified here and
is BI OS dependent . As an example, t he act ual dat a value used t o represent an
empt y block by t he BI OS may be zero, rat her t han 0FFFFFFFFH. The BI OS is respon-
sible for t ranslat ing t his informat ion int o t he header provided by t his funct ion.
9.11.8.9 Return Codes
Aft er t he call has been made, t he ret urn codes list ed in Table 9- 18 are available in t he
AH regist er.
AL OEM Error Additional OEM Information
Return Codes (see Table 9-18 for code definitions)
SUCCESS The function completed successfully.
READ_FAILURE There was a failure because of the
inability to read the storage device.
UPDATE_NUM_INVALID Update number exceeds the maximum
number of update blocks implemented
by the BIOS.
NOT_EMPTY The specified update block is a
subsequent block in use to store a valid
microcode update that spans multiple
blocks.
The specified block is not a header block
and is not empty.
Table 9-17. Parameters for the Read Microcode Update Data Function (Contd.)
Vol. 3 9-65
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 9-18. Return Code Definitions
Return Code Value Description
SUCCESS 00H The function completed successfully.
NOT_IMPLEMENTED 86H The function is not implemented.
ERASE_FAILURE 90H A failure because of the inability to erase the storage
device.
WRITE_FAILURE 91H A failure because of the inability to write the storage
device.
READ_FAILURE 92H A failure because of the inability to read the storage
device.
STORAGE_FULL 93H The BIOS non-volatile storage area is unable to
accommodate the update because all available update
blocks are filled with updates that are needed for
processors in the system.
CPU_NOT_PRESENT 94H The processor stepping does not currently exist in the
system.
INVALID_HEADER 95H The update header contains a header or loader version
that is not recognized by the BIOS.
INVALID_HEADER_CS 96H The update does not checksum correctly.
SECURITY_FAILURE 97H The update was rejected by the processor.
INVALID_REVISION 98H The same or more recent revision of the update exists
in the storage device.
UPDATE_NUM_INVALID 99H The update number exceeds the maximum number of
update blocks implemented by the BIOS.
NOT_EMPTY 9AH The specified update block is a subsequent block in use
to store a valid microcode update that spans multiple
blocks.
The specified block is not a header block and is not
empty.
9-66 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Vol. 3 10-1
CHAPTER 10
ADVANCED PROGRAMMABLE
INTERRUPT CONTROLLER (APIC)
The Advanced Programmable I nt errupt Cont roller ( API C) , referred t o in t he following
sect ions as t he local API C, was int roduced int o t he I A- 32 processors wit h t he Pent ium
processor ( see Sect ion 19. 27, Advanced Programmable I nt errupt Cont roller
( API C) ) and is included in t he P6 family, Pent ium 4, I nt el Xeon processors, and ot her
more recent I nt el 64 and I A- 32 processor families ( see Sect ion 10. 4. 2, Presence of
t he Local API C ) . The local API C performs t wo primary funct ions for t he processor:
I t receives int errupt s from t he processor s int errupt pins, from int ernal sources
and from an ext ernal I / O API C ( or ot her ext ernal int errupt cont roller) . I t sends
t hese t o t he processor core for handling.
I n mult iple processor ( MP) syst ems, it sends and receives int erprocessor
int errupt ( I PI ) messages t o and from ot her logical processors on t he syst em bus.
I PI messages can be used t o dist ribut e int errupt s among t he processors in t he
syst em or t o execut e syst em wide funct ions ( such as, boot ing up processors or
dist ribut ing work among a group of processors) .
The ext ernal I / O API C is part of I nt els syst em chip set . I t s primary funct ion is t o
receive ext ernal int errupt event s from t he syst em and it s associat ed I / O devices and
relay t hem t o t he local API C as int errupt messages. I n MP syst ems, t he I / O API C also
provides a mechanism for dist ribut ing ext ernal int errupt s t o t he local API Cs of
select ed processors or groups of processors on t he syst em bus.
This chapt er provides a descript ion of t he local API C and it s programming int erface.
I t also provides an overview of t he int erface bet ween t he local API C and t he I / O
API C. Cont act I nt el for det ailed informat ion about t he I / O API C.
When a local API C has sent an int errupt t o it s processor core for handling, t he
processor uses t he int errupt and except ion handling mechanism described in Chapt er
6, I nt errupt and Except ion Handling. See Sect ion 6. 1, I nt errupt and Except ion
Overview, for an int roduct ion t o int errupt and except ion handling.
10.1 LOCAL AND I/O APIC OVERVIEW
Each local API C consist s of a set of API C regist ers ( see Table 10- 1) and associat ed
hardware t hat cont rol t he delivery of int errupt s t o t he processor core and t he gener-
at ion of I PI messages. The API C regist ers are memory mapped and can be read and
writ t en t o using t he MOV inst ruct ion.
Local API Cs can receive int errupt s from t he following sources:
Local l y connect ed I / O devi ces These int errupt s originat e as an edge or
level assert ed by an I / O device t hat is connect ed direct ly t o t he processor s local
10-2 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
int errupt pins ( LI NT0 and LI NT1) . The I / O devices may also be connect ed t o an
8259- t ype int errupt cont roller t hat is in t urn connect ed t o t he processor t hrough
one of t he local int errupt pins.
Ex t er nal l y connect ed I / O devi ces These int errupt s originat e as an edge or
level assert ed by an I / O device t hat is connect ed t o t he int errupt input pins of an
I / O API C. I nt errupt s are sent as I / O int errupt messages from t he I / O API C t o one
or more of t he processors in t he syst em.
I nt er - pr ocessor i nt er r upt s ( I PI s) An I nt el 64 or I A- 32 processor can use
t he I PI mechanism t o int errupt anot her processor or group of processors on t he
syst em bus. I PI s are used for soft ware self- int errupt s, int errupt forwarding, or
preempt ive scheduling.
API C t i mer gener at ed i nt er r upt s The local API C t imer can be programmed
t o send a local int errupt t o it s associat ed processor when a programmed count is
reached ( see Sect ion 10. 5. 4, API C Timer ) .
Per f or mance moni t or i ng count er i nt er r upt s P6 family, Pent ium 4, and
I nt el Xeon processors provide t he abilit y t o send an int errupt t o it s associat ed
processor when a performance- monit oring count er overflows ( see Sect ion
30. 8. 5. 8, Generat ing an I nt errupt on Overflow ) .
Ther mal Sensor i nt er r upt s Pent ium 4 and I nt el Xeon processors provide t he
abilit y t o send an int errupt t o t hemselves when t he int ernal t hermal sensor has
been t ripped ( see Sect ion 14. 5. 2, Thermal Monit or ) .
API C i nt er nal er r or i nt er r upt s When an error condit ion is recognized wit hin
t he local API C ( such as an at t empt t o access an unimplement ed regist er) , t he
API C can be programmed t o send an int errupt t o it s associat ed processor ( see
Sect ion 10. 5. 3, Error Handling ) .
Of t hese int errupt sources: t he processor s LI NT0 and LI NT1 pins, t he API C t imer, t he
performance- monit oring count ers, t he t hermal sensor, and t he int ernal API C error
det ect or are referred t o as l ocal i nt er r upt sour ces. Upon receiving a signal from a
local int errupt source, t he local API C delivers t he int errupt t o t he processor core
using an int errupt delivery prot ocol t hat has been set up t hrough a group of API C
regist ers called t he l ocal v ect or t abl e or LVT ( see Sect ion 10. 5. 1, Local Vect or
Table ) . A separat e ent ry is provided in t he local vect or t able for each local int errupt
source, which allows a specific int errupt delivery prot ocol t o be set up for each
source. For example, if t he LI NT1 pin is going t o be used as an NMI pin, t he LI NT1
ent ry in t he local vect or t able can be set up t o deliver an int errupt wit h vect or number
2 ( NMI int errupt ) t o t he processor core.
The local API C handles int errupt s from t he ot her t wo int errupt sources ( ext ernally
connect ed I / O devices and I PI s) t hrough it s I PI message handling facilit ies.
A processor can generat e I PI s by programming t he int errupt command regist er ( I CR)
in it s local API C ( see Sect ion 10. 6. 1, I nt errupt Command Regist er ( I CR) ) . The act
of writ ing t o t he I CR causes an I PI message t o be generat ed and issued on t he
syst em bus ( for Pent ium 4 and I nt el Xeon processors) or on t he API C bus ( for
Pent ium and P6 family processors) . See Sect ion 10. 2, Syst em Bus Vs. API C Bus.
Vol. 3 10-3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
I PI s can be sent t o ot her processors in t he syst em or t o t he originat ing processor
( self- int errupt s) . When t he t arget processor receives an I PI message, it s local API C
handles t he message aut omat ically ( using informat ion included in t he message such
as vect or number and t rigger mode) . See Sect ion 10. 6, I ssuing I nt erprocessor
I nt errupt s, for a det ailed explanat ion of t he local API Cs I PI message delivery and
accept ance mechanism.
The local API C can also receive int errupt s from ext ernally connect ed devices t hrough
t he I / O API C ( see Figure 10- 1) . The I / O API C is responsible for receiving int errupt s
generat ed by syst em hardware and I / O devices and forwarding t hem t o t he local
API C as int errupt messages.
I ndividual pins on t he I / O API C can be programmed t o generat e a specific int errupt
vect or when assert ed. The I / O API C also has a virt ual wire mode t hat allows it t o
communicat e wit h a st andard 8259A- st yle ext ernal int errupt cont roller. Not e t hat t he
local API C can be disabled ( see Sect ion 10. 4. 3, Enabling or Disabling t he Local
API C ) . This allows an associat ed processor core t o receive int errupt s direct ly from
an 8259A int errupt cont roller.
Bot h t he local API C and t he I / O API C are designed t o operat e in MP syst ems ( see
Figures 10- 2 and 10- 3) . Each local API C handles int errupt s from t he I / O API C, I PI s
from processors on t he syst em bus, and self- generat ed int errupt s. I nt errupt s can

Figure 10-1. Relationship of Local APIC and I/O APIC In Single-Processor Systems
I/O APIC
External
Interrupts
System Chip Set
System Bus
Processor Core
Local APIC
Pentium 4 and
Local
Interrupts
Bridge
PCI
Intel Xeon Processors
I/O APIC
External
Interrupts
System Chip Set
3-Wire APIC Bus
Processor Core
Local APIC
Pentium and P6
Local
Interrupts
Family Processors
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
10-4 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
also be delivered t o t he individual processors t hrough t he local int errupt pins;
however, t his mechanism is commonly not used in MP syst ems.

Figure 10-2. Local APICs and I/O APIC When Intel Xeon Processors Are Used in
Multiple-Processor Systems

Figure 10-3. Local APICs and I/O APIC When P6 Family Processors Are Used in
Multiple-Processor Systems
I/O APIC
External
Interrupts
System Chip Set
Processor System Bus
CPU
Local APIC
Processor #2
CPU
Local APIC
Processor #3
CPU
Local APIC
Processor #1
CPU
Local APIC
Processor #3
Bridge
PCI
IPIs IPIs IPIs
Interrupt
Messages
IPIs
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
CPU
Local APIC
Processor #2
CPU
Local APIC
Processor #3
CPU
Local APIC
Processor #1
Interrupt
Messages
I/O APIC
External
Interrupts
System Chip Set
3-wire APIC Bus
CPU
Local APIC
Processor #4
IPIs
IPIs IPIs IPIs
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Interrupt
Messages
Vol. 3 10-5
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The I PI mechanism is t ypically used in MP syst ems t o send fixed int errupt s ( int er-
rupt s for a specific vect or number) and special- purpose int errupt s t o processors on
t he syst em bus. For example, a local API C can use an I PI t o forward a fixed int errupt
t o anot her processor for servicing. Special- purpose I PI s ( including NMI , I NI T, SMI
and SI PI I PI s) allow one or more processors on t he syst em bus t o perform syst em-
wide boot - up and cont rol funct ions.
The following sect ions focus on t he local API C and it s implement at ion in t he
Pent ium 4, I nt el Xeon, and P6 family processors. I n t hese sect ions, t he t erms local
API C and I / O API C refer t o local and I / O API Cs used wit h t he P6 family processors
and t o local and I / O xAPI Cs used wit h t he Pent ium 4 and I nt el Xeon processors ( see
Sect ion 10. 3, The I nt el

82489DX Ext ernal API C, The API C, t he xAPI C, AND THE


X2API C ) .
10.2 SYSTEM BUS VS. APIC BUS
For t he P6 family and Pent ium processors, t he I / O API C and local API Cs communicat e
t hrough t he 3- wire int er- API C bus ( see Figure 10- 3) . Local API Cs also use t he API C
bus t o send and receive I PI s. The API C bus and it s messages are invisible t o soft ware
and are not classed as archit ect ural.
Beginning wit h t he Pent ium 4 and I nt el Xeon processors, t he I / O API C and local
API Cs ( using t he xAPI C archit ect ure) communicat e t hrough t he syst em bus ( see
Figure 10- 2) . The I / O API C sends int errupt request s t o t he processors on t he syst em
bus t hrough bridge hardware t hat is part of t he I nt el chip set . The bridge hardware
generat es t he int errupt messages t hat go t o t he local API Cs. I PI s bet ween local
API Cs are t ransmit t ed direct ly on t he syst em bus.
10.3 THE INTEL

82489DX EXTERNAL APIC,


THE APIC, THE XAPIC, AND THE X2APIC
The local API C in t he P6 family and Pent ium processors is an archit ect ural subset of
t he I nt el

82489DX ext ernal API C. See Sect ion 19. 27. 1, Soft ware Visible Differ-
ences Bet ween t he Local API C and t he 82489DX.
The API C archit ect ure used in t he Pent ium 4 and I nt el Xeon processors ( called t he
xAPI C archit ect ure) is an ext ension of t he API C archit ect ure found in t he P6 family
processors. The primary difference bet ween t he API C and xAPI C archit ect ures is t hat
wit h t he xAPI C archit ect ure, t he local API Cs and t he I / O API C communicat e t hrough
t he syst em bus. Wit h t he API C archit ect ure, t hey communicat ion t hrough t he API C
bus ( see Sect ion 10. 2, Syst em Bus Vs. API C Bus ) . Also, some API C archit ect ural
feat ures have been ext ended and/ or modified in t he xAPI C archit ect ure. These
ext ensions and modificat ions are described in Sect ion 10. 4 t hrough Sect ion 10. 10.
The x2API C archit ect ure is an ext ension of t he xAPI C archit ect ure, primarily t o
increase processor addressabilit y. The x2API C archit ect ure provides backward
10-6 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
compat ibilit y t o t he xAPI C archit ect ure and forward ext endabilit y for fut ure I nt el
plat form innovat ions. These ext ensions and modificat ions are support ed by a new
mode of execut ion ( x 2API C mode) are det ailed in Sect ion 10. 12.
10.4 LOCAL APIC
The following sect ions describe t he archit ect ure of t he local API C and how t o det ect
it , ident ify it , and det ermine it s st at us. Descript ions of how t o program t he local API C
are given in Sect ion 10. 5. 1, Local Vect or Table, and Sect ion 10. 6. 1, I nt errupt
Command Regist er ( I CR) .
10.4.1 The Local APIC Block Diagram
Figure 10- 4 gives a funct ional block diagram for t he local API C. Soft ware int eract s
wit h t he local API C by reading and writ ing it s regist ers. API C regist ers are memory-
mapped t o a 4- KByt e region of t he processor s physical address space wit h an init ial
st art ing address of FEE00000H. For correct API C operat ion, t his address space must
be mapped t o an area of memory t hat has been designat ed as st rong uncacheable
( UC) . See Sect ion 11. 3, Met hods of Caching Available.
I n MP syst em configurat ions, t he API C regist ers for I nt el 64 or I A- 32 processors on
t he syst em bus are init ially mapped t o t he same 4- KByt e region of t he physical
address space. Soft ware has t he opt ion of changing init ial mapping t o a different
4- KByt e region for all t he local API Cs or of mapping t he API C regist ers for each local
API C t o it s own 4- KByt e region. Sect ion 10. 4. 5, Relocat ing t he Local API C Regis-
t ers, describes how t o relocat e t he base address for API C regist ers.
On processors support ing x2API C archit ect ure ( indicat ed by CPUI D. 01H: ECX[ 21] =
1) , t he local API C support s operat ion in t he xAPI C mode ( as described in Sect ion
10. 4. Addit ionally, soft ware can enable t he local API C t o operat e in x2API C mode for
ext ended processor addressabilit y ( see Sect ion 10. 12) .
NOTE
For P6 family, Pent ium 4, and I nt el Xeon processors, t he API C
handles all memory accesses t o addresses wit hin t he 4- KByt e API C
regist er space int ernally and no ext ernal bus cycles are produced. For
t he Pent ium processors wit h an on- chip API C, bus cycles are
produced for accesses t o t he API C regist er space. Thus, for soft ware
int ended t o run on Pent ium processors, syst em soft ware should
explicit ly not map t he API C regist er space t o regular syst em memory.
Doing so can result in an invalid opcode except ion ( # UD) being
generat ed or unpredict able execut ion.
Vol. 3 10-7
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-4. Local APIC Structure
Current Count
Register
Initial Count
Register
Divide Configuration
Register
Version Register
Error Status
Register
In-Service Register (ISR)
Vector
Decode
Interrupt Command
Register (ICR)
Acceptance
Logic
Vec[3:0]
& TMR Bit
Register
Select
INIT
NMI
SMI
Protocol
Translation Logic
Dest. Mode
& Vector
Processor System Bus
3
APIC ID
Register
Logical Destination
Register
Destination Format
Register
Timer
Local
Interrupts 0,1
Performance
Monitoring Counters
1
Error
Timer
Local Vector Table
DATA/ADDR
Prioritizer
Task Priority Register
EOI Register
INTR
EXTINT
INTA
LINT0/1
1. Introduced in P6 family processors.
Thermal Sensor
2
2. Introduced in the Pentium 4 and Intel Xeon processors.
Perf. Mon.
Thermal
(Internal
Interrupt)
Sensor
(Internal
Interrupt)
Spurious Vector
Register
Local
Interrupts
3. Three-wire APIC bus in P6 family and Pentium processors.
To
CPU
Core
From
CPU
Core
Interrupt Request Register (IRR)
Trigger Mode Register (TMR)
To
CPU
Core
Processor Priority
Register
4. Not implemented in Pentium 4 and Intel Xeon processors.
Arb. ID
Register
4
10-8 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Table 10- 1 shows how t he API C regist ers are mapped int o t he 4- KByt e API C regist er
space. Regist ers are 32 bit s, 64 bit s, or 256 bit s in widt h; all are aligned on 128- bit
boundaries. All 32- bit regist ers should be accessed using 128- bit aligned 32- bit loads
or st ores. Some processors may support loads and st ores of less t han 32 bit s t o some
of t he API C regist ers. This is model specific behavior and is not guarant eed t o work
on all processors. Any FP/ MMX/ SSE access t o an API C regist er, or any access t hat
t ouches byt es 4 t hrough 15 of an API C regist er may cause undefined behavior and
must not be execut ed. This undefined behavior could include hangs, incorrect result s
or unexpect ed except ions, including machine checks, and may vary bet ween imple-
ment at ions. Wider regist ers ( 64- bit or 256- bit ) must be accessed using mult iple 32-
bit loads or st ores, wit h all accesses being 128- bit aligned.
The local API C regist ers list ed in Table 10- 1 are not MSRs. The only MSR associat ed
wit h t he programming of t he local API C is t he I A32_API C_BASE MSR ( see Sect ion
10. 4. 3, Enabling or Disabling t he Local API C ) .
NOTE
I n processors based on I nt el Microarchit ect ure ( Nehalem) t he Local
API C I D Regist er is no longer Read/ Writ e; it is Read Only.
Table 10-1 Local APIC Register Address Map
Address Register Name Software
Read/Write
FEE0 0000H Reserved
FEE0 0010H Reserved
FEE0 0020H Local APIC ID Register Read/Write.
FEE0 0030H Local APIC Version Register Read Only.
FEE0 0040H Reserved
FEE0 0050H Reserved
FEE0 0060H Reserved
FEE0 0070H Reserved
FEE0 0080H Task Priority Register (TPR) Read/Write.
FEE0 0090H Arbitration Priority Register
1
(APR) Read Only.
FEE0 00A0H Processor Priority Register (PPR) Read Only.
FEE0 00B0H EOI Register Write Only.
FEE0 00C0H Remote Read Register
1
(RRD) Read Only
FEE0 00D0H Logical Destination Register Read/Write.
FEE0 00E0H Destination Format Register Read/Write (see
Section 10.6.2.2).
Vol. 3 10-9
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
FEE0 00F0H Spurious Interrupt Vector Register Read/Write (see
Section 10.9.
FEE0 0100H In-Service Register (ISR); bits 31:0 Read Only.
FEE0 0110H In-Service Register (ISR); bits 63:32 Read Only.
FEE0 0120H In-Service Register (ISR); bits 95:64 Read Only.
FEE0 0130H In-Service Register (ISR); bits 127:96 Read Only.
FEE0 0140H In-Service Register (ISR); bits 159:128 Read Only.
FEE0 0150H In-Service Register (ISR); bits 191:160 Read Only.
FEE0 0160H In-Service Register (ISR); bits 223:192 Read Only.
FEE0 0170H In-Service Register (ISR); bits 255:224 Read Only.
FEE0 0180H Trigger Mode Register (TMR); bits 31:0 Read Only.
FEE0 0190H Trigger Mode Register (TMR); bits 63:32 Read Only.
FEE0 01A0H Trigger Mode Register (TMR); bits 95:64 Read Only.
FEE0 01B0H Trigger Mode Register (TMR); bits 127:96 Read Only.
FEE0 01C0H Trigger Mode Register (TMR); bits 159:128 Read Only.
FEE0 01D0H Trigger Mode Register (TMR); bits 191:160 Read Only.
FEE0 01E0H Trigger Mode Register (TMR); bits 223:192 Read Only.
FEE0 01F0H Trigger Mode Register (TMR); bits 255:224 Read Only.
FEE0 0200H Interrupt Request Register (IRR); bits 31:0 Read Only.
FEE0 0210H Interrupt Request Register (IRR); bits 63:32 Read Only.
FEE0 0220H Interrupt Request Register (IRR); bits 95:64 Read Only.
FEE0 0230H Interrupt Request Register (IRR); bits 127:96 Read Only.
FEE0 0240H Interrupt Request Register (IRR); bits 159:128 Read Only.
FEE0 0250H Interrupt Request Register (IRR); bits 191:160 Read Only.
FEE0 0260H Interrupt Request Register (IRR); bits 223:192 Read Only.
FEE0 0270H Interrupt Request Register (IRR); bits 255:224 Read Only.
FEE0 0280H Error Status Register Read Only.
FEE0 0290H through
FEE0 02E0H
Reserved
FEE0 02F0H LVT CMCI Registers Read/Write.
FEE0 0300H Interrupt Command Register (ICR); bits 0-31 Read/Write.
Table 10-1 Local APIC Register Address Map (Contd.)
Address Register Name Software
Read/Write
10-10 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.4.2 Presence of the Local APIC
Beginning wit h t he P6 family processors, t he presence or absence of an on- chip local
API C can be det ect ed using t he CPUI D inst ruct ion. When t he CPUI D inst ruct ion is
execut ed wit h a source operand of 1 in t he EAX regist er, bit 9 of t he CPUI D feat ure
flags ret urned in t he EDX regist er indicat es t he presence ( set ) or absence ( clear) of a
local API C.
10.4.3 Enabling or Disabling the Local APIC
The local API C can be enabled or disabled in eit her of t wo ways:
FEE0 0310H Interrupt Command Register (ICR); bits 32-63 Read/Write.
FEE0 0320H LVT Timer Register Read/Write.
FEE0 0330H LVT Thermal Sensor Register
2
Read/Write.
FEE0 0340H LVT Performance Monitoring Counters
Register
3
Read/Write.
FEE0 0350H LVT LINT0 Register Read/Write.
FEE0 0360H LVT LINT1 Register Read/Write.
FEE0 0370H LVT Error Register Read/Write.
FEE0 0380H Initial Count Register (for Timer) Read/Write.
FEE0 0390H Current Count Register (for Timer) Read Only.
FEE0 03A0H through
FEE0 03D0H
Reserved
FEE0 03E0H Divide Configuration Register (for Timer) Read/Write.
FEE0 03F0H Reserved
NOTES:
1. Not supported in the Pentium 4 and Intel Xeon processors. The Illegal Register Access bit (7) of
the ESR will not be set when writing to these registers.
2. Introduced in the Pentium 4 and Intel Xeon processors. This APIC register and its associated
function are implementation dependent and may not be present in future IA-32 or Intel 64 pro-
cessors.
3. Introduced in the Pentium Pro processor. This APIC register and its associated function are
implementation dependent and may not be present in future IA-32 or Intel 64 processors.
Table 10-1 Local APIC Register Address Map (Contd.)
Address Register Name Software
Read/Write
Vol. 3 10-11
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
1. Using t he API C global enable/ disable flag in t he I A32_API C_BASE MSR ( MSR
address 1BH; see Figure 10- 5) :
When I A32_API C_BASE[ 11] is 0, t he processor is funct ionally equivalent t o
an I A- 32 processor wit hout an on- chip API C. The CPUI D feat ure flag for t he
API C ( see Sect ion 10. 4. 2, Presence of t he Local API C ) is also set t o 0.
When I A32_API C_BASE[ 11] is set t o 0, processor API Cs based on t he 3- wire
API C bus cannot be generally re- enabled unt il a syst em hardware reset . The
3- wire bus loses t rack of arbit rat ion t hat would be necessary for complet e re-
enabling. Cert ain API C funct ionalit y can be enabled ( for example:
performance and t hermal monit oring int errupt generat ion) .
For processors t hat use Front Side Bus ( FSB) delivery of int errupt s, soft ware
may disable or enable t he API C by set t ing and reset t ing
I A32_API C_BASE[ 11] . A hardware reset is not required t o re- st art API C
funct ionalit y, if soft ware guarant ees no int errupt will be sent t o t he API C as
I A32_API C_BASE[ 11] is cleared.
When I A32_API C_BASE[ 11] is set t o 0, prior init ializat ion t o t he API C may be
lost and t he API C may ret urn t o t he st at e described in Sect ion 10. 4. 7. 1,
Local API C St at e Aft er Power- Up or Reset .
2. Using t he API C soft ware enable/ disable flag in t he spurious- int errupt vect or
regist er ( see Figure 10- 23) :
I f I A32_API C_BASE[ 11] is 1, soft ware can t emporarily disable a local API C at
any t ime by clearing t he API C soft ware enable/ disable flag in t he spurious-
int errupt vect or regist er ( see Figure 10- 23) . The st at e of t he local API C when
in t his soft ware- disabled st at e is described in Sect ion 10. 4. 7. 2, Local API C
St at e Aft er I t Has Been Soft ware Disabled.
When t he local API C is in t he soft ware- disabled st at e, it can be re- enabled at
any t ime by set t ing t he API C soft ware enable/ disable flag t o 1.
For t he Pent ium processor, t he API CEN pin ( which is shared wit h t he PI CD1 pin) is
used during power- up or RESET t o disable t he local API C.
Not e t hat each ent ry in t he LVT has a mask bit t hat can be used t o inhibit int errupt s
from being delivered t o t he processor from select ed local int errupt sources ( t he
LI NT0 and LI NT1 pins, t he API C t imer, t he performance- monit oring count ers, t he
t hermal sensor, and/ or t he int ernal API C error det ect or) .
10.4.4 Local APIC Status and Location
The st at us and locat ion of t he local API C are cont ained in t he I A32_API C_BASE MSR
( see Figure 10- 5) . MSR bit funct ions are described below:
BSP f l ag, bi t 8 I ndicat es if t he processor is t he boot st rap processor ( BSP) .
See Sect ion 8. 4, Mult iple- Processor ( MP) I nit ializat ion. Following a power- up or
RESET, t his flag is set t o 1 for t he processor select ed as t he BSP and set t o 0 for
t he remaining processors ( APs) .
10-12 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
API C Gl obal Enabl e f l ag, bi t 11 Enables or disables t he local API C ( see
Sect ion 10. 4. 3, Enabling or Disabling t he Local API C ) . This flag is available in
t he Pent ium 4, I nt el Xeon, and P6 family processors. I t is not guarant eed t o be
available or available at t he same locat ion in fut ure I nt el 64 or I A- 32 processors.
API C Base f i el d, bi t s 12 t hr ough 35 Specifies t he base address of t he API C
regist ers. This 24- bit value is ext ended by 12 bit s at t he low end t o form t he base
address. This aut omat ically aligns t he address on a 4- KByt e boundary. Following
a power- up or RESET, t he field is set t o FEE0 0000H.
Bit s 0 t hrough 7, bit s 9 and 10, and bit s MAXPHYADDR
1
t hrough 63 in t he
I A32_API C_BASE MSR are reserved.
10.4.5 Relocating the Local APIC Registers
The Pent ium 4, I nt el Xeon, and P6 family processors permit t he st art ing address of
t he API C regist ers t o be relocat ed from FEE00000H t o anot her physical address by
modifying t he value in t he 24- bit base address field of t he I A32_API C_BASE MSR.
This ext ension of t he API C archit ect ure is provided t o help resolve conflict s wit h
memory maps of exist ing syst ems and t o allow individual processors in an MP syst em
t o map t heir API C regist ers t o different locat ions in physical memory.
10.4.6 Local APIC ID
At power up, syst em hardware assigns a unique API C I D t o each local API C on t he
syst em bus ( for Pent ium 4 and I nt el Xeon processors) or on t he API C bus ( for P6
family and Pent ium processors) . The hardware assigned API C I D is based on syst em
t opology and includes encoding for socket posit ion and clust er informat ion ( see
Figure 8- 2) .
I n MP syst ems, t he local API C I D is also used as a processor I D by t he BI OS and t he
operat ing syst em. Some processors permit soft ware t o modify t he API C I D. However,
t he abilit y of soft ware t o modify t he API C I D is processor model specific. Because of
1. The MAXPHYADDR is 36 bits for processors that do not support CPUID leaf 80000008H, or indi-
cated by CPUID.80000008H:EAX[bits 7:0] for processors that support CPUID leaf 80000008H.
Figure 10-5. IA32_APIC_BASE MSR (APIC_BASE_MSR in P6 Family)
BSPProcessor is BSP
APIC global enable/disable
APIC BaseBase physical address
63 0 7 10 11 8 9 12
Reserved
MAXPHYADDR
APIC Base Reserved
Vol. 3 10-13
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
t his, operat ing syst em soft ware should avoid writ ing t o t he local API C I D regist er. The
value ret urned by bit s 31- 24 of t he EBX regist er ( when t he CPUI D inst ruct ion is
execut ed wit h a source operand value of 1 in t he EAX regist er) is always t he I nit ial
API C I D ( det ermined by t he plat form init ializat ion) . This is t rue even if soft ware has
changed t he value in t he Local API C I D regist er.
The processor receives t he hardware assigned API C I D ( or I nit ial API C I D) by
sampling pins A11# and A12# and pins BR0# t hrough BR3# ( for t he Pent ium 4, I nt el
Xeon, and P6 family processors) and pins BE0# t hrough BE3# ( for t he Pent ium
processor) . The API C I D lat ched from t hese pins is st ored in t he API C I D field of t he
local API C I D regist er ( see Figure 10- 6) , and is used as t he I nit ial API C I D for t he
processor.
For t he P6 family and Pent ium processors, t he local API C I D field in t he local API C I D
regist er is 4 bit s. Encodings 0H t hrough EH can be used t o uniquely ident ify 15
different processors connect ed t o t he API C bus. For t he Pent ium 4 and I nt el Xeon
processors, t he xAPI C specificat ion ext ends t he local API C I D field t o 8 bit s. These
can be used t o ident ify up t o 255 processors in t he syst em.
10.4.7 Local APIC State
The following sect ions describe t he st at e of t he local API C and it s regist ers following
a power- up or RESET, aft er t he local API C has been soft ware disabled, following an
I NI T reset , and following an I NI T- deassert message.
Figure 10-6. Local APIC ID Register
31 27 24 0
Reserved APIC ID
Address: 0FEE0 0020H
Value after reset: 0000 0000H
P6 family and Pentium processors
Pentium 4 processors, Xeon processors, and later processors
31 24 0
Reserved APIC ID
MSR Address: 802H
31 0
x2APIC ID
x2APIC Mode
10-14 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
x2API C will int roduce 32- bit I D; see Sect ion 10. 12.
10.4.7.1 Local APIC State After Power-Up or Reset
Following a power- up or RESET of t he processor, t he st at e of local API C and it s regis-
t ers are as follows:
The following regist ers are reset t o all 0s:
I RR, I SR, TMR, I CR, LDR, and TPR
Timer init ial count and t imer current count regist ers
Divide configurat ion regist er
The DFR regist er is reset t o all 1s.
The LVT regist er is reset t o 0s except for t he mask bit s; t hese are set t o 1s.
The local API C version regist er is not affect ed.
The local API C I D regist er is set t o a unique API C I D. ( Pent ium and P6 family
processors only) . The Arb I D regist er is set t o t he value in t he API C I D regist er.
The spurious- int errupt vect or regist er is init ialized t o 000000FFH. By set t ing bit 8
t o 0, soft ware disables t he local API C.
I f t he processor is t he only processor in t he syst em or it is t he BSP in an MP
syst em ( see Sect ion 8. 4.1, BSP and AP Processors ) ; t he local API C will respond
normally t o I NI T and NMI messages, t o I NI T# signals and t o STPCLK# signals. I f
t he processor is in an MP syst em and has been designat ed as an AP; t he local
API C will respond t he same as for t he BSP. I n addit ion, it will respond t o SI PI
messages. For P6 family processors only, an AP will not respond t o a STPCLK#
signal.
10.4.7.2 Local APIC State After It Has Been Software Disabled
When t he API C soft ware enable/ disable flag in t he spurious int errupt vect or regist er
has been explicit ly cleared ( as opposed t o being cleared during a power up or
RESET) , t he local API C is t emporarily disabled ( see Sect ion 10. 4. 3, Enabling or
Disabling t he Local API C ) . The operat ion and response of a local API C while in t his
soft ware- disabled st at e is as follows:
The local API C will respond normally t o I NI T, NMI , SMI , and SI PI messages.
Pending int errupt s in t he I RR and I SR regist ers are held and require masking or
handling by t he CPU.
The local API C can st ill issue I PI s. I t is soft wares responsibilit y t o avoid issuing
I PI s t hrough t he I PI mechanism and t he I CR regist er if sending int errupt s
t hrough t his mechanism is not desired.
The recept ion or t ransmission of any I PI s t hat are in progress when t he local API C
is disabled are complet ed before t he local API C ent ers t he soft ware- disabled
st at e.
Vol. 3 10-15
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The mask bit s for all t he LVT ent ries are set . At t empt s t o reset t hese bit s will be
ignored.
( For Pent ium and P6 family processors) The local API C cont inues t o list en t o all
bus messages in order t o keep it s arbit rat ion I D synchronized wit h t he rest of t he
syst em.
10.4.7.3 Local APIC State After an INIT Reset (Wait-for-SIPI State)
An I NI T reset of t he processor can be init iat ed in eit her of t wo ways:
By assert ing t he processor s I NI T# pin.
By sending t he processor an I NI T I PI ( an I PI wit h t he delivery mode set t o I NI T) .
Upon receiving an I NI T t hrough eit her of t hese mechanisms, t he processor responds
by beginning t he init ializat ion process of t he processor core and t he local API C. The
st at e of t he local API C following an I NI T reset is t he same as it is aft er a power- up or
hardware RESET, except t hat t he API C I D and arbit rat ion I D regist ers are not
affect ed. This st at e is also referred t o at t he wait - for- SI PI st at e ( see also: Sect ion
8. 4. 2, MP I nit ializat ion Prot ocol Requirement s and Rest rict ions ) .
10.4.7.4 Local APIC State After It Receives an INIT-Deassert IPI
Only t he Pent ium and P6 family processors support t he I NI T- deassert I PI . An I NI T-
disassert I PI has no affect on t he st at e of t he API C, ot her t han t o reload t he arbit ra-
t ion I D regist er wit h t he value in t he API C I D regist er.
10.4.8 Local APIC Version Register
The local API C cont ains a hardwired version regist er. Soft ware can use t his regist er t o
ident ify t he API C version ( see Figure 10- 7) . I n addit ion, t he regist er specifies t he
number of ent ries in t he local vect or t able ( LVT) for a specific implement at ion.
The fields in t he local API C version regist er are as follows:
Ver si on The version numbers of t he local API C:
1XH Local API C. For Pent ium 4 and I nt el Xeon
processors, 14H is ret urned.
0XH 82489DX ext ernal API C.
20H - FFH Reserved.
Max LVT Ent r y Shows t he number of LVT ent ries minus 1. For t he Pent ium 4 and
I nt el Xeon processors ( which have 6 LVT ent ries) , t he value
ret urned in t he Max LVT field is 5; for t he P6 family processors
( which have 5 LVT ent ries) , t he value ret urned is 4; for t he
Pent ium processor ( which has 4 LVT ent ries) , t he value ret urned
is 3.
10-16 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Suppr ess EOI - br oadcast s
I ndicat es whet her soft ware can inhibit t he broadcast of EOI
message by set t ing bit 12 of t he Spurious I nt errupt Vect or
Regist er; see Sect ion 10. 8. 5 and Sect ion 10. 9.
10.5 HANDLING LOCAL INTERRUPTS
The following sect ions describe facilit ies t hat are provided in t he local API C for
handling local int errupt s. These include: t he processor s LI NT0 and LI NT1 pins, t he
API C t imer, t he performance- monit oring count ers, t he t hermal sensor, and t he
int ernal API C error det ect or. Local int errupt handling facilit ies include: t he LVT, t he
error st at us regist er ( ESR) , t he divide configurat ion regist er ( DCR) , and t he init ial
count and current count regist ers.
10.5.1 Local Vector Table
The local vect or t able ( LVT) allows soft ware t o specify t he manner in which t he local
int errupt s are delivered t o t he processor core. I t consist s of t he following 32- bit API C
regist ers ( see Figure 10- 8) , one for each local int errupt :
LVT Ti mer Regi st er ( FEE0 0320H) Specifies int errupt delivery when t he
API C t imer signals an int errupt ( see Sect ion 10. 5. 4, API C Timer ) .
LVT Ther mal Moni t or Regi st er ( FEE0 0330H) Specifies int errupt delivery
when t he t hermal sensor generat es an int errupt ( see Sect ion 14. 5. 2, Thermal
Monit or ) . This LVT ent ry is implement at ion specific, not archit ect ural. I f imple-
ment ed, it will always be at base address FEE0 0330H.
LVT Per f or mance Count er Regi st er ( FEE0 0340H) Specifies int errupt
delivery when a performance count er generat es an int errupt on overflow ( see
Sect ion 30. 8. 5.8, Generat ing an I nt errupt on Overflow ) . This LVT ent ry is
implement at ion specific, not archit ect ural. I f implement ed, it is not guarant eed
t o be at base address FEE0 0340H.
Figure 10-7. Local APIC Version Register
31 0
Reserved
7 8 23 15
Support for EOI-broadcast suppression
16
Reserved
25 24
Version Max LVT Entry
Value after reset: 00BN 00VVH
V = Version, N = # of LVT entries minus 1,
Address: FEE0 0030H
B = 1 if EOI-broadcast suppression supported
Vol. 3 10-17
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
LVT LI NT0 Regi st er ( FEE0 0350H) Specifies int errupt delivery when an
int errupt is signaled at t he LI NT0 pin.
LVT LI NT1 Regi st er ( FEE0 0360H) Specifies int errupt delivery when an
int errupt is signaled at t he LI NT1 pin.
LVT Er r or Regi st er ( FEE0 0370H) Specifies int errupt delivery when t he
API C det ect s an int ernal error ( see Sect ion 10. 5. 3, Error Handling ) .
CMCI LVT Regi st er ( FEE0 02F0H) Specifies int errupt delivery when an
overflow condit ion of correct ed machine check error count reaching a t hreshold
value occurred in a machine check bank support ing CMCI ( see Sect ion 15. 5. 1,
CMCI Local API C I nt erface ) .
The LVT performance count er regist er and it s associat ed int errupt were int roduced in
t he P6 processors and are also present in t he Pent ium 4 and I nt el Xeon processors.
The LVT t hermal monit or regist er and it s associat ed int errupt were int roduced in t he
Pent ium 4 and I nt el Xeon processors.
As shown in Figures 10- 8, some of t hese fields and flags are not available ( and
reserved) for some ent ries.
The set up informat ion t hat can be specified in t he regist ers of t he LVT t able is as
follows:
Vect or I nt errupt vect or number.
Del i ver y Mode Specifies t he t ype of int errupt t o be sent t o t he processor. Some
delivery modes will only operat e as int ended when used in
conj unct ion wit h a specific t rigger mode. The allowable delivery
modes are as follows:
000 ( Fi x ed) Delivers t he int errupt specified in t he vect or
field.
010 ( SMI ) Delivers an SMI int errupt t o t he processor
core t hrough t he processor s local SMI signal
pat h. When using t his delivery mode, t he
vect or field should be set t o 00H for fut ure
compat ibilit y.
100 ( NMI ) Delivers an NMI int errupt t o t he processor.
The vect or informat ion is ignored.
101 ( I NI T) Delivers an I NI T request t o t he processor
core, which causes t he processor t o perform
an I NI T. When using t his delivery mode, t he
vect or field should be set t o 00H for fut ure
compat ibilit y.
10-18 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-8. Local Vector Table (LVT)
31 0 7
Vector
Timer Mode
0: One-shot
1: Periodic
12 15 16 17 18
Delivery Mode
000: Fixed
100: NMI
Mask

0: Not Masked
1: Masked
Address: FEE0 0350H
Value After Reset: 0001 0000H
Reserved
12 13 15 16
Vector
31 0 7 8 10
Address: FEE0 0360H
Address: FEE0 0370H
Vector
Vector
Error
LINT1
LINT0
Value after Reset: 0001 0000H
Address: FEE0 0320H
111: ExtlNT
All other combinations
are Reserved
Interrupt Input
Pin Polarity
Trigger Mode
0: Edge
1: Level
Remote
IRR
Delivery Status
0: Idle
1: Send Pending
Timer
13 11 8
11
14
17
Address: FEE0 0340H
Performance
Vector
Thermal
Vector
Mon. Counters
Sensor
Address: FEE0 0330H
(Pentium 4 and Intel Xeon processors.) When a
performance monitoring counters interrupt is generated,
the mask bit for its associated LVT entry is set.
010: SMI
101: INIT
Vol. 3 10-19
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
111 ( Ex t I NT) Causes t he processor t o respond t o t he in-
t errupt as if t he int errupt originat ed in an
ext ernally connect ed ( 8259A- compat ible)
int errupt cont roller. A special I NTA bus cycle
corresponding t o Ext I NT, is rout ed t o t he ex-
t ernal cont roller. The ext ernal cont roller is
expect ed t o supply t he vect or informat ion.
The API C archit ect ure support s only one Ex-
t I NT source in a syst em, usually cont ained in
t he compat ibilit y bridge.
Del i ver y St at us ( Read Onl y)
I ndicat es t he int errupt delivery st at us, as follows:
0 ( I dl e) There is current ly no act ivit y for t his int er-
rupt source, or t he previous int errupt from
t his source was delivered t o t he processor
core and accept ed.
1 ( Send Pendi ng)
I ndicat es t hat an int errupt from t his source
has been delivered t o t he processor core,
but has not yet been accept ed ( see Sect ion
10. 5. 5, Local I nt errupt Accept ance ) .
I nt er r upt I nput Pi n Pol ar i t y
Specifies t he polarit y of t he corresponding int errupt pin: ( 0)
act ive high or ( 1) act ive low.
Remot e I RR Fl ag ( Read Onl y)
For fixed mode, level- t riggered int errupt s; t his flag is set when
t he local API C accept s t he int errupt for servicing and is reset
when an EOI command is received from t he processor. The
meaning of t his flag is undefined for edge- t riggered int errupt s
and ot her delivery modes.
Tr i gger Mode Select s t he t rigger mode for t he local LI NT0 and LI NT1 pins: ( 0)
edge sensit ive and ( 1) level sensit ive. This flag is only used
when t he delivery mode is Fixed. When t he delivery mode is
NMI , SMI , or I NI T, t he t rigger mode is always edge sensit ive.
When t he delivery mode is Ext I NT, t he t rigger mode is always
level sensit ive. The t imer and error int errupt s are always t reat ed
as edge sensit ive.
I f t he local API C is not used in conj unct ion wit h an I / O API C and
fixed delivery mode is select ed; t he Pent ium 4, I nt el Xeon, and
P6 family processors will always use level- sensit ive t riggering,
regardless if edge- sensit ive t riggering is select ed.
Mask I nt errupt mask: ( 0) enables recept ion of t he int errupt and ( 1)
inhibit s recept ion of t he int errupt . When t he local API C handles
a performance- monit oring count ers int errupt , it aut omat ically
10-20 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
set s t he mask flag in t he corresponding LVT ent ry. This flag will
remain set unt il soft ware clears it .
Ti mer Mode Select s t he t imer mode: ( 0) one- shot and ( 1) periodic ( see
Sect ion 10.5. 4, API C Timer ) .
10.5.2 Valid Interrupt Vectors
The I nt el 64 and I A- 32 archit ect ures define 256 vect or numbers, ranging from 0
t hrough 255 ( see Sect ion 6. 2, Except ion and I nt errupt Vect ors ) . Local and I / O
API Cs support 240 of t hese vect ors ( in t he range of 16 t o 255) as valid int errupt s.
When an int errupt vect or in t he range of 0 t o 15 is sent or received t hrough t he local
API C, t he API C indicat es an illegal vect or in it s Error St at us Regist er ( see Sect ion
10. 5. 3, Error Handling ) . The I nt el 64 and I A- 32 archit ect ures reserve vect ors 16
t hrough 31 for predefined int errupt s, except ions, and I nt el- reserved encodings ( see
Table 6- 1) . However, t he local API C does not t reat vect ors in t his range as illegal.
When an illegal vect or value ( 0 t o 15) is writ t en t o an LVT ent ry and t he delivery
mode is Fixed ( bit s 8- 11 equal 0) , t he API C may signal an illegal vect or error, wit hout
regard t o whet her t he mask bit is set or whet her an int errupt is act ually seen on t he
input .
10.5.3 Error Handling
The local API C provides an error st at us regist er ( ESR) t hat it uses t o record errors
t hat it det ect s when handling int errupt s ( see Figure 10- 9) . An API C error int errupt is
generat ed when t he local API C set s one of t he error bit s in t he ESR. The LVT error
regist er allows select ion of t he int errupt vect or t o be delivered t o t he processor core
when API C error is det ect ed. The LVT error regist er also provides a means of masking
an API C error int errupt .
The ESR is a writ e/ read regist er. A writ e ( of any value) t o t he ESR must be done t o
updat e t he regist er before at t empt ing t o read it . This writ e clears any previously
logged errors and updat es t he ESR wit h any errors det ect ed since t he last writ e t o t he
ESR. Errors are collect ed regardless of LVT Error mask bit , but t he API C will only
issue an int errupt due t o t he error if t he LVT Error mask bit is cleared.
The funct ions of t he ESR are list ed in Table 10- 2.
Error handling in x2API C mode is discussed in Sect ion 10. 12. 8.
Vol. 3 10-21
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10-9. Error Status Register (ESR)
Table 10-2. ESR Flags
FLAG Function
Send Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it sent on the APIC bus.
Receive Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it received on the APIC
bus.
Send Accept Error (P6 family and Pentium processors only) Set when the local APIC
detects that a message it sent was not accepted by any APIC on the
APIC bus.
Receive Accept Error (P6 family and Pentium processors only) Set when the local APIC
detects that the message it received was not accepted by any APIC
on the APIC bus, including itself.
Send Illegal Vector Set when the local APIC detects an illegal vector in the message that
it is sending.
Receive Illegal Vector Set when the local APIC detects an illegal vector in the message it
received, including an illegal vector code in the local vector table
interrupts or in a self-interrupt.
Address: FEE0 0280H
Value after reset: 0H
31 0
Reserved
7 8 1 2 3 4 5 6
Illegal Register Address
1
Received Illegal Vector
Send Illegal Vector
Reserved
Receive Accept Error
2
Send Accept Error
2
Receive Checksum Error
2
Send Checksum Error
2
2. Only used in the P6 family and Pentium processors;
reserved in Intel Core, Pentium 4 and Intel Xeon processors.
1. Used in Intel Core, Pentium 4, Intel Xeon, and P6 family
processors; reserved in the Pentium processor.
NOTES:
10-22 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.5.4 APIC Timer
The local API C unit cont ains a 32- bit programmable t imer t hat is available t o soft -
ware t o t ime event s or operat ions. This t imer is set up by programming four regis-
t ers: t he divide configurat ion regist er ( see Figure 10- 10) , t he init ial- count and
current - count regist ers ( see Figure 10- 11) , and t he LVT t imer regist er ( see
Figure 10- 8) .
I f CPUI D. 06H: EAX. ARAT[ bit 2] = 1, t he processor s API C t imer runs at a const ant
rat e regardless of P- st at e t ransit ions and it cont inues t o run at t he same rat e in deep
C- st at es.
I f CPUI D. 06H: EAX. ARAT[ bit 2] = 0 or if CPUI D 06H is not support ed, t he API C t imer
may t emporarily st op while t he processor is in deep C- st at es or during t ransit ions
caused by Enhanced I nt el SpeedSt ep Technology.
Illegal Reg. Address (Intel Core, Intel

Atom, Pentium 4, Intel Xeon, and P6 family


processors only) Set when the processor is trying to access a
register in the processor's local APIC register address space that is
reserved (see Table 10-1). Addresses in one of the 0x10 byte
regions marked reserved are illegal register addresses.
The Local APIC Register Map is the address range of the APIC
register base address (specified in the IA32_APIC_BASE MSR) plus
4 KBytes.
Figure 10-10. Divide Configuration Register
Table 10-2. ESR Flags
FLAG Function
Send Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it sent on the APIC bus.
Receive Checksum Error (P6 family and Pentium processors only) Set when the local APIC
detects a checksum error for a message that it received on the APIC
bus.
Address: FEE0 03E0H
Value after reset: 0H
0
Divide Value (bits 0, 1 and 3)
000: Divide by 2
001: Divide by 4
010: Divide by 8
011: Divide by 16
100: Divide by 32
101: Divide by 64
110: Divide by 128
111: Divide by 1
31 0
Reserved
1 2 3 4
Vol. 3 10-23
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The t ime base for t he t imer is derived from t he processor s bus clock, divided by t he
value specified in t he divide configurat ion regist er.
The t imer can be configured t hrough t he t imer LVT ent ry for one- shot or periodic
operat ion. I n one- shot mode, t he t imer is st art ed by programming it s init ial- count
regist er. The init ial count value is t hen copied int o t he current - count regist er and
count - down begins. Aft er t he t imer reaches zero, an t imer int errupt is generat ed and
t he t imer remains at it s 0 value unt il reprogrammed.
I n periodic mode, t he current - count regist er is aut omat ically reloaded from t he
init ial- count regist er when t he count reaches 0 and a t imer int errupt is generat ed,
and t he count - down is repeat ed. I f during t he count - down process t he init ial- count
regist er is set , count ing will rest art , using t he new init ial- count value. The init ial-
count regist er is a read- writ e regist er; t he current - count regist er is read only.
A writ e of 0 t o t he init ial- count regist er effect ively st ops t he local API C t imer, in bot h
one- shot and periodic mode.
The LVT t imer regist er det ermines t he vect or number t hat is delivered t o t he
processor wit h t he t imer int errupt t hat is generat ed when t he t imer count reaches
zero. The mask flag in t he LVT t imer regist er can be used t o mask t he t imer int errupt .
10.5.5 Local Interrupt Acceptance
When a local int errupt is sent t o t he processor core, it is subj ect t o t he accept ance
crit eria specified in t he int errupt accept ance flow chart in Figure 10- 17. I f t he int er-
rupt is accept ed, it is logged int o t he I RR regist er and handled by t he processor
according t o it s priorit y ( see Sect ion 10. 8. 4, I nt errupt Accept ance for Fixed I nt er-
rupt s ) . I f t he int errupt is not accept ed, it is sent back t o t he local API C and ret ried.
10.6 ISSUING INTERPROCESSOR INTERRUPTS
The following sect ions describe t he local API C facilit ies t hat are provided for issuing
int erprocessor int errupt s ( I PI s) from soft ware. The primary local API C facilit y for

Figure 10-11. Initial Count and Current Count Registers
31 0
Initial Count
Address: Initial Count
Value after reset: 0H
Current Count
Current Count FEE0 0390H
FEE0 0380H
10-24 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
issuing I PI s is t he int errupt command regist er ( I CR) . The I CR can be used for t he
following funct ions:
To send an int errupt t o anot her processor.
To allow a processor t o forward an int errupt t hat it received but did not service t o
anot her processor for servicing.
To direct t he processor t o int errupt it self ( perform a self int errupt ) .
To deliver special I PI s, such as t he st art - up I PI ( SI PI ) message, t o ot her
processors.
I nt errupt s generat ed wit h t his facilit y are delivered t o t he ot her processors in t he
syst em t hrough t he syst em bus ( for Pent ium 4 and I nt el Xeon processors) or t he
API C bus ( for P6 family and Pent ium processors) . The abilit y for a processor t o send
a lowest priorit y I PI is model specific and should be avoided by BI OS and operat ing
syst em soft ware.
10.6.1 Interrupt Command Register (ICR)
The int errupt command regist er ( I CR) is a 64- bit
2
local API C regist er ( see
Figure 10- 12) t hat allows soft ware running on t he processor t o specify and send
int erprocessor int errupt s ( I PI s) t o ot her processors in t he syst em.
To send an I PI , soft ware must set up t he I CR t o indicat e t he t ype of I PI message t o
be sent and t he dest inat ion processor or processors. ( All fields of t he I CR are read-
writ e by soft ware wit h t he except ion of t he delivery st at us field, which is read- only.)
The act of writ ing t o t he low doubleword of t he I CR causes t he I PI t o be sent .
The I CR consist s of t he following fields.
Vect or The vect or number of t he int errupt being sent .
Del i ver y Mode Specifies t he t ype of I PI t o be sent . This field is also know as t he
I PI message t ype field.
000 ( Fi x ed) Delivers t he int errupt specified in t he vect or
field t o t he t arget processor or processors.
001 ( Low est Pr i or i t y )
Same as fixed mode, except t hat t he int er-
rupt is delivered t o t he processor execut ing
at t he lowest priorit y among t he set of pro-
cessors specified in t he dest inat ion field. The
abilit y for a processor t o send a lowest prior-
it y I PI is model specific and should be avoid-
ed by BI OS and operat ing syst em soft ware.
2. In XAPIC mode the ICR is addressed as two 32-bit registers, ICR_LOW (FFE0 0300H) and
ICR_HIGH (FFE0 0310H).
Vol. 3 10-25
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
010 ( SMI ) Delivers an SMI int errupt t o t he t arget pro-
cessor or processors. The vect or field must
be programmed t o 00H for fut ure compat i-
bilit y.
011 ( Reser v ed)
100 ( NMI ) Delivers an NMI int errupt t o t he t arget pro-
cessor or processors. The vect or informat ion
is ignored.
101 ( I NI T) Delivers an I NI T request t o t he t arget pro-
cessor or processors, which causes t hem t o
perform an I NI T. As a result of t his I PI mes-
Figure 10-12. Interrupt Command Register (ICR)
31 0
Reserved
7
Vector
Destination Shorthand
8 10
Delivery Mode
000: Fixed
001: Lowest Priority
1
00: No Shorthand
01: Self
11 12 13 14 15 16 17 18 19
10: All Including Self
11: All Excluding Self
010: SMI
011: Reserved
100: NMI
101: INIT
110: Start Up
111: Reserved
Destination Mode
0: Physical
1: Logical
Delivery Status
0: Idle
1: Send Pending
Level
0 = De-assert
1 = Assert
Trigger Mode
0: Edge
1: Level
63
32
Reserved Destination Field
56
Address: FEE0 0300H (0 - 31)
Value after Reset: 0H
Reserved
20
55
FEE0 0310H (32 - 63)
NOTE:
1. The ability of a processor to send Lowest Priority IPI is model specific.
10-26 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
sage, all t he t arget processors perform an
I NI T. The vect or field must be programmed
t o 00H for fut ure compat ibilit y.
101 ( I NI T Level De- asser t )
( Not support ed in t he Pent ium 4 and I nt el
Xeon processors. ) Sends a synchronizat ion
message t o all t he local API Cs in t he syst em
t o set t heir arbit rat ion I Ds ( st ored in t heir
Arb I D regist ers) t o t he values of t heir API C
I Ds ( see Sect ion 10. 7, Syst em and API C
Bus Arbit rat ion ) . For t his delivery mode,
t he level flag must be set t o 0 and t rigger
mode flag t o 1. This I PI is sent t o all proces-
sors, regardless of t he value in t he dest ina-
t ion field or t he dest inat ion short hand field;
however, soft ware should specify t he all in-
cluding self short hand.
110 ( St ar t - Up)
Sends a special st art - up I PI ( called a SI PI )
t o t he t arget processor or processors. The
vect or t ypically point s t o a st art - up rout ine
t hat is part of t he BI OS boot - st rap code ( see
Sect ion 8. 4, Mult iple- Processor ( MP) I nit ial-
izat ion ) . I PI s sent wit h t his delivery mode
are not aut omat ically ret ried if t he source
API C is unable t o deliver it . I t is up t o t he
soft ware t o det ermine if t he SI PI was not
successfully delivered and t o reissue t he
SI PI if necessary.
Dest i nat i on Mode Select s eit her physical ( 0) or logical ( 1) dest inat ion mode ( see
Sect ion 10. 6. 2, Det ermining I PI Dest inat ion ) .
Del i ver y St at us ( Read Onl y)
I ndicat es t he I PI delivery st at us, as follows:
0 ( I dl e) I ndicat es t hat t his local API C has complet ed
sending any previous I PI s.
1 ( Send Pendi ng)
I ndicat es t hat t his local API C has not com-
plet ed sending t he last I PI .
Lev el For t he I NI T level de- assert delivery mode t his flag must be set
t o 0; for all ot her delivery modes it must be set t o 1. ( This flag
has no meaning in Pent ium 4 and I nt el Xeon processors, and will
always be issued as a 1. )
Tr i gger Mode Select s t he t rigger mode when using t he I NI T level de- assert
delivery mode: edge ( 0) or level ( 1) . I t is ignored for all ot her
Vol. 3 10-27
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
delivery modes. ( This flag has no meaning in Pent ium 4 and
I nt el Xeon processors, and will always be issued as a 0.)
Dest i nat i on Shor t hand
I ndicat es whet her a short hand not at ion is used t o specify t he
dest inat ion of t he int errupt and, if so, which short hand is used.
Dest inat ion short hands are used in place of t he 8- bit dest inat ion
field, and can be sent by soft ware using a single writ e t o t he low
doubleword of t he I CR. Short hands are defined for t he following
cases: soft ware self int errupt , I PI s t o all processors in t he
syst em including t he sender, I PI s t o all processors in t he syst em
excluding t he sender.
00: ( No Shor t hand)
The dest inat ion is specified in t he dest inat ion
field.
01: ( Sel f ) The issuing API C is t he one and only dest ina-
t ion of t he I PI . This dest inat ion short hand al-
lows soft ware t o int errupt t he processor on
which it is execut ing. An API C implement a-
t ion is free t o deliver t he self- int errupt mes-
sage int ernally or t o issue t he message t o
t he bus and snoop it as wit h any ot her I PI
message.
10: ( Al l I ncl udi ng Sel f )
The I PI is sent t o all processors in t he syst em
including t he processor sending t he I PI . The
API C will broadcast an I PI message wit h t he
dest inat ion field set t o FH for Pent ium and P6
family processors and t o FFH for Pent ium 4
and I nt el Xeon processors.
11: ( Al l Ex cl udi ng Sel f )
The I PI is sent t o all processors in a syst em
wit h t he except ion of t he processor sending
t he I PI . The API C broadcast s a message wit h
t he physical dest inat ion mode and dest ina-
t ion field set t o 0xFH for Pent ium and P6
family processors and t o 0xFFH for Pent ium
4 and I nt el Xeon processors. Support for t his
dest inat ion short hand in conj unct ion wit h
t he lowest - priorit y delivery mode is model
specific. For Pent ium 4 and I nt el Xeon pro-
cessors, when t his short hand is used t oget h-
er wit h lowest priorit y delivery mode, t he I PI
may be redirect ed back t o t he issuing pro-
cessor.
10-28 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Dest i nat i on Specifies t he t arget processor or processors. This field is only
used when t he dest inat ion short hand field is set t o 00B. I f t he
dest inat ion mode is set t o physical, t hen bit s 56 t hrough 59
cont ain t he API C I D of t he t arget processor for Pent ium and P6
family processors and bit s 56 t hrough 63 cont ain t he API C I D of
t he t arget processor t he for Pent ium 4 and I nt el Xeon proces-
sors. I f t he dest inat ion mode is set t o logical, t he int erpret at ion
of t he 8- bit dest inat ion field depends on t he set t ings of t he DFR
and LDR regist ers of t he local API Cs in all t he processors in t he
syst em ( see Sect ion 10. 6. 2, Det ermining I PI Dest inat ion ) .
Not all combinat ions of opt ions for t he I CR are valid. Table 10- 3 shows t he valid
combinat ions for t he fields in t he I CR for t he Pent ium 4 and I nt el Xeon processors;
Table 10- 4 shows t he valid combinat ions for t he fields in t he I CR for t he P6 family
processors. Also not e t hat t he lower half of t he I CR may not be preserved over t ran-
sit ions t o t he deepest C- St at es.
I CR operat ion in x2API C mode is discussed in Sect ion 10. 12. 9.
Table 10-3 Valid Combinations for the Pentium 4 and Intel Xeon Processors
Local xAPIC Interrupt Command Register
Destination
Shorthand
Valid/
Invalid
Trigger
Mode Delivery Mode
Destination
Mode
No Shorthand Valid Edge All Modes
1
Physical or Logical
No Shorthand Invalid
2
Level All Modes Physical or Logical
Self Valid Edge Fixed X
3
Self Invalid
2
Level Fixed X
Self Invalid X Lowest Priority, NMI, INIT, SMI, Start-
Up
X
All Including Self Valid Edge Fixed X
All Including Self Invalid
2
Level Fixed X
All Including Self Invalid X Lowest Priority, NMI, INIT, SMI, Start-
Up
X
All Excluding
Self
Valid Edge Fixed, Lowest Priority
1,4
, NMI, INIT,
SMI, Start-Up
X
All Excluding
Self
Invalid
2
Level FIxed, Lowest Priority
4
, NMI, INIT,
SMI, Start-Up
X
Vol. 3 10-29
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
NOTES:
1. The ability of a processor to send a lowest priority IPI is model specific.
2. For these interrupts, if the trigger mode bit is 1 (Level), the local xAPIC will override the bit set-
ting and issue the interrupt as an edge triggered interrupt.
3. X means the setting is ignored.
4. When using the lowest priority delivery mode and the all excluding self destination, the IPI
can be redirected back to the issuing APIC, which is essentially the same as the all including
self destination mode.
Table 10-4 Valid Combinations for the P6 Family Processors
Local APIC Interrupt Command Register
Destination
Shorthand
Valid/
Invalid
Trigger
Mode Delivery Mode Destination Mode
No Shorthand Valid Edge All Modes
1
Physical or Logical
No Shorthand Valid
2
Level Fixed, Lowest Priority
1
, NMI Physical or Logical
No Shorthand Valid
3
Level INIT Physical or Logical
Self Valid Edge Fixed X
4
Self 1 Level Fixed X
Self Invalid
5
X Lowest Priority, NMI, INIT,
SMI, Start-Up
X
All including Self Valid Edge Fixed X
All including Self Valid
2
Level Fixed X
All including Self Invalid
5
X Lowest Priority, NMI, INIT,
SMI, Start-Up
X
All excluding Self Valid Edge All Modes
1
X
All excluding Self Valid
2
Level Fixed, Lowest Priority
1
, NMI X
All excluding Self Invalid
5
Level SMI, Start-Up X
All excluding Self Valid
3
Level INIT X
X Invalid
5
Level SMI, Start-Up X
Table 10-3 Valid Combinations for the Pentium 4 and Intel Xeon Processors
Local xAPIC Interrupt Command Register (Contd.)
Destination
Shorthand
Valid/
Invalid
Trigger
Mode Delivery Mode
Destination
Mode
10-30 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.6.2 Determining IPI Destination
The dest inat ion of an I PI can be one, all, or a subset ( group) of t he processors on t he
syst em bus. The sender of t he I PI specifies t he dest inat ion of an I PI wit h t he
following API C regist ers and fields wit hin t he regist ers:
I CR Regi st er The following fields in t he I CR regist er are used t o specify t he
dest inat ion of an I PI :
Dest i nat i on Mode Select s one of t wo dest inat ion modes ( physical or
logical) .
Dest i nat i on Fi el d I n physical dest inat ion mode, used t o specify t he API C
I D of t he dest inat ion processor; in logical dest inat ion mode, used t o specify a
message dest inat ion address ( MDA) t hat can be used t o select specific
processors in clust ers.
Dest i nat i on Shor t hand A quick met hod of specifying all processors, all
excluding self, or self as t he dest inat ion.
Del i v er y mode, Low est Pr i or i t y Archit ect urally specifies t hat a lowest -
priorit y arbit rat ion mechanism be used t o select a dest inat ion processor from
a specified group of processors. The abilit y of a processor t o send a lowest
priorit y I PI is model specific and should be avoided by BI OS and operat ing
syst em soft ware.
Local dest i nat i on r egi st er ( LDR) Used in conj unct ion wit h t he logical
dest inat ion mode and MDAs t o select t he dest inat ion processors.
Dest i nat i on f or mat r egi st er ( DFR) Used in conj unct ion wit h t he logical
dest inat ion mode and MDAs t o select t he dest inat ion processors.
How t he I CR, LDR, and DFR are used t o select an I PI dest inat ion depends on t he
dest inat ion mode used: physical, logical, broadcast / self, or lowest - priorit y delivery
mode. These dest inat ion modes are described in t he following sect ions.
NOTES:
1. The ability of a processor to send a lowest priority IPI is model specific.
2. Treated as edge triggered if level bit is set to 1, otherwise ignored.
3. Treated as edge triggered when Level bit is set to 1; treated as INIT Level Deassert message
when level bit is set to 0 (deassert). Only INIT level deassert messages are allowed to have the
level bit set to 0. For all other messages the level bit must be set to 1.
4. X means the setting is ignored.
5. The behavior of the APIC is undefined.
Table 10-4 Valid Combinations for the P6 Family Processors
Local APIC Interrupt Command Register (Contd.)
Destination
Shorthand
Valid/
Invalid
Trigger
Mode Delivery Mode Destination Mode
Vol. 3 10-31
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Det erminat ion of I PI dest inat ions in x2API C mode is discussed in Sect ion 10. 12. 10.
10.6.2.1 Physical Destination Mode
I n physical dest inat ion mode, t he dest inat ion processor is specified by it s local API C
I D ( see Sect ion 10. 4. 6, Local API C I D ) . For Pent ium 4 and I nt el Xeon processors,
eit her a single dest inat ion ( local API C I Ds 00H t hrough FEH) or a broadcast t o all
API Cs ( t he API C I D is FFH) may be specified in physical dest inat ion mode.
A broadcast I PI ( bit s 28- 31 of t he MDA are 1' s) or I / O subsyst em init iat ed int errupt
wit h lowest priorit y delivery mode is not support ed in physical dest inat ion mode and
must not be configured by soft ware. Also, for any non- broadcast I PI or I / O
subsyst em init iat ed int errupt wit h lowest priorit y delivery mode, soft ware must
ensure t hat API Cs defined in t he int errupt address are present and enabled t o receive
int errupt s.
For t he P6 family and Pent ium processors, a single dest inat ion is specified in physical
dest inat ion mode wit h a local API C I D of 0H t hrough 0EH, allowing up t o 15 local
API Cs t o be addressed on t he API C bus. A broadcast t o all local API Cs is specified wit h
0FH.
NOTE
The number of local API Cs t hat can be addressed on t he syst em bus
may be rest rict ed by hardware.
10.6.2.2 Logical Destination Mode
I n logical dest inat ion mode, I PI dest inat ion is specified using an 8- bit message dest i-
nat ion address ( MDA) , which is ent ered in t he dest inat ion field of t he I CR. Upon
receiving an I PI message t hat was sent using logical dest inat ion mode, a local API C
compares t he MDA in t he message wit h t he values in it s LDR and DFR t o det ermine if
it should accept and handle t he I PI . For bot h configurat ions of logical dest inat ion
mode, when combined wit h lowest priorit y delivery mode, soft ware is responsible for
ensuring t hat all of t he local API Cs included in or addressed by t he I PI or I / O
subsyst em int errupt are present and enabled t o receive t he int errupt .
Figure 10- 13 shows t he layout of t he logical dest inat ion regist er ( LDR) . The 8- bit
logical API C I D field in t his regist er is used t o creat e an ident ifier t hat can be
compared wit h t he MDA.
NOTE
The logical API C I D should not be confused wit h t he local API C I D t hat
is cont ained in t he local API C I D regist er.
10-32 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Figure 10- 14 shows t he layout of t he dest inat ion format regist er ( DFR) . The 4- bit
model field in t his regist er select s one of t wo models ( flat or clust er) t hat can be used
t o int erpret t he MDA when using logical dest inat ion mode.
The int erpret at ion of MDA for t he t wo models is described in t he following para-
graphs.
1. Fl at Model This model is select ed by programming DFR bit s 28 t hrough 31 t o
1111. Here, a unique logical API C I D can be est ablished for up t o 8 local API Cs by
set t ing a different bit in t he logical API C I D field of t he LDR for each local API C. A
group of local API Cs can t hen be select ed by set t ing one or more bit s in t he MDA.
Each local API C performs a bit - wise AND of t he MDA and it s logical API C I D. I f a
t rue condit ion is det ect ed, t he local API C accept s t he I PI message. A broadcast t o
all API Cs is achieved by set t ing t he MDA t o 1s.
2. Cl ust er Model This model is select ed by programming DFR bit s 28 t hrough 31
t o 0000. This model support s t wo basic dest inat ion schemes: flat clust er and
hierarchical clust er.
The flat clust er dest inat ion model is only support ed for P6 f amily and Pent ium
pr ocessor s. Using t his model, all API Cs are assumed t o be connect ed t hrough t he
API C bus. Bit s 60 t hrough 63 of t he MDA cont ains t he encoded address of t he
dest inat ion clust er and bit s 56 t hrough 59 ident ify up t o four local API Cs wit hin
t he clust er ( each bit is assigned t o one local API C in t he clust er, as in t he flat
connect ion model) . To ident ify one or more local API Cs, bit s 60 t hrough 63 of t he
Figure 10-13. Logical Destination Register (LDR)
Figure 10-14. Destination Format Register (DFR)
31 0 23 24
Reserved Logical APIC ID
Address: 0FEE0 00D0H
Value after reset: 0000 0000H
31 0
Model
28
Reserved (All 1s)
Address: 0FEE0 00E0H
Value after reset: FFFF FFFFH
Flat model: 1111B
Cluster model: 0000B
Vol. 3 10-33
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
MDA are compared wit h bit s 28 t hrough 31 of t he LDR t o det ermine if a local API C
is part of t he clust er. Bit s 56 t hrough 59 of t he MDA are compared wit h Bit s 24
t hrough 27 of t he LDR t o ident ify a local API Cs wit hin t he clust er.
Set s of processors wit hin a clust er can be specified by writ ing t he t arget clust er
address in bit s 60 t hrough 63 of t he MDA and set t ing select ed bit s in bit s 56
t hrough 59 of t he MDA, corresponding t o t he chosen members of t he clust er. I n
t his mode, 15 clust ers ( wit h clust er addresses of 0 t hrough 14) each having 4
local API Cs can be specif ied in t he message. For t he P6 and Pent ium pr ocessor s
local API Cs, however, t he API C ar bit rat ion I D suppor t s only 15 API C agent s.
Ther ef or e, t he t ot al number of pr ocessor s and t heir local API Cs suppor t ed in
t his mode is limit ed t o 15. Br oadcast t o all local API Cs is achieved by set t ing all
dest inat ion bit s t o one. This guarant ees a mat ch on all clust ers and select s all
API Cs in each clust er. A broadcast I PI or I / O subsyst em broadcast int errupt wit h
lowest priorit y delivery mode is not support ed in clust er mode and must not be
configured by soft ware.
The hierarchical clust er dest inat ion model can be used wit h Pent ium 4, I nt el
Xeon, P6 family, or Pent ium processors. Wit h t his model, a hierarchical net work
can be creat ed by connect ing different flat clust ers via independent syst em or
API C buses. This scheme requires a clust er manager wit hin each clust er, which is
responsible for handling message passing bet ween syst em or API C buses. One
clust er cont ains up t o 4 agent s. Thus 15 clust er managers, each wit h 4 agent s,
can form a net work of up t o 60 API C agent s. Not e t hat hierarchical API C net works
requires a special clust er manager device, which is not part of t he local or t he I / O
API C unit s.
NOTES
All processors t hat have t heir API C soft ware enabled ( using t he
spurious vect or enable/ disable bit ) must have t heir DFRs ( Dest i-
nat ion Format Regist ers) programmed ident ically.
The default mode for DFR is flat mode. I f you are using clust er mode,
DFRs must be programmed before t he API C is soft ware enabled.
Since some chipset s do not accurat ely t rack a syst em view of t he
logical mode, program DFRs as soon as possible aft er st art ing t he
processor.
10.6.2.3 Broadcast/Self Delivery Mode
The dest inat ion short hand field of t he I CR allows t he delivery mode t o be by- passed
in favor of broadcast ing t he I PI t o all t he processors on t he syst em bus and/ or back
t o it self ( see Sect ion 10. 6. 1, I nt errupt Command Regist er ( I CR) ) . Three dest ina-
t ion short hands are support ed: self, all excluding self, and all including self. The
dest inat ion mode is ignored when a dest inat ion short hand is used.
10-34 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.6.2.4 Lowest Priority Delivery Mode
Wit h lowest priorit y delivery mode, t he I CR is programmed t o send an I PI t o several
processors on t he syst em bus, using t he logical or short hand dest inat ion mechanism
for select ing t he processor. The select ed processors t hen arbit rat e wit h one anot her
over t he syst em bus or t he API C bus, wit h t he lowest - priorit y processor accept ing t he
I PI .
For syst ems based on t he I nt el Xeon processor, t he chipset bus cont roller accept s
messages from t he I / O API C agent s in t he syst em and direct s int errupt s t o t he
processors on t he syst em bus. When using t he lowest priorit y delivery mode, t he
chipset chooses a t arget processor t o receive t he int errupt out of t he set of possible
t arget s. The Pent ium 4 processor provides a special bus cycle on t he syst em bus t hat
informs t he chipset of t he current t ask priorit y for each logical processor in t he
syst em. The chipset saves t his informat ion and uses it t o choose t he lowest priorit y
processor when an int errupt is received.
For syst ems based on P6 family processors, t he processor priorit y used in lowest -
priorit y arbit rat ion is cont ained in t he arbit rat ion priorit y regist er ( APR) in each local
API C. Figure 10- 15 shows t he layout of t he APR.
The APR value is comput ed as follows:
IF (TPR[7:4] IRRV[7:4]) AND (TPR[7:4] > ISRV[7:4])
THEN
APR[7:0] TPR[7:0]
ELSE
APR[7:4] max(TPR[7:4] AND ISRV[7:4], IRRV[7:4])
APR[3:0] 0.
Here, t he TPR value is t he t ask priorit y value in t he TPR ( see Figure 10- 18) , t he I RRV
value is t he vect or number for t he highest priorit y bit t hat is set in t he I RR ( see
Figure 10- 20) or 00H ( if no I RR bit is set ) , and t he I SRV value is t he vect or number
for t he highest priorit y bit t hat is set in t he I SR ( see Figure 10- 20) . Following arbit ra-
t ion among t he dest inat ion processors, t he processor wit h t he lowest value in it s APR
handles t he I PI and t he ot her processors ignore it .

Figure 10-15. Arbitration Priority Register (APR)
31 0 7 8
Reserved
Address: FEE0 0090H
Value after reset: 0H
Arbitration Priority Sub-Class
Arbitration Priority
4 3
Vol. 3 10-35
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
( P6 family and Pent ium processors. ) For t hese processors, if a f ocus pr ocessor
exist s, it may accept t he int errupt , regardless of it s priorit y. A processor is said t o be
t he focus of an int errupt if it is current ly servicing t hat int errupt or if it has a pending
request for t hat int errupt . For I nt el Xeon processors, t he concept of a focus processor
is not support ed.
I n operat ing syst ems t hat use t he lowest priorit y delivery mode but do not updat e
t he TPR, t he TPR informat ion saved in t he chipset will pot ent ially cause t he int errupt
t o be always delivered t o t he same processor from t he logical set . This behavior is
funct ionally backward compat ible wit h t he P6 family processor but may result in
unexpect ed performance implicat ions.
10.6.3 IPI Delivery and Acceptance
When t he low double- word of t he I CR is writ t en t o, t he local API C creat es an I PI
message from t he informat ion cont ained in t he I CR and sends t he message out on
t he syst em bus ( Pent ium 4 and I nt el Xeon processors) or t he API C bus ( P6 family and
Pent ium processors) . The manner in which t hese I PI s are handled aft er being issues
in described in Sect ion 10. 8, Handling I nt errupt s.
10.7 SYSTEM AND APIC BUS ARBITRATION
When several local API Cs and t he I / O API C are sending I PI and int errupt messages
on t he syst em bus ( or API C bus) , t he order in which t he messages are sent and
handled is det ermined t hrough bus arbit rat ion.
For t he Pent ium 4 and I nt el Xeon processors, t he local and I / O API Cs use t he arbit ra-
t ion mechanism defined for t he syst em bus t o det ermine t he order in which I PI s are
handled. This mechanism is non- archit ect ural and cannot be cont rolled by soft ware.
For t he P6 family and Pent ium processors, t he local and I / O API Cs use an API C- based
arbit rat ion mechanism t o det ermine t he order in which I PI s are handled. Here, each
local API C is given an arbit rat ion priorit y of from 0 t o 15, which t he I / O API C uses
during arbit rat ion t o det ermine which local API C should be given access t o t he API C
bus. The local API C wit h t he highest arbit rat ion priorit y always wins bus access. Upon
complet ion of an arbit rat ion round, t he winning local API C lowers it s arbit rat ion
priorit y t o 0 and t he losing local API Cs each raise t heirs by 1.
The current arbit rat ion priorit y for a local API C is st ored in a 4- bit , soft ware- t rans-
parent arbit rat ion I D ( Arb I D) regist er. During reset , t his regist er is init ialized t o t he
API C I D number ( st ored in t he local API C I D regist er) . The I NI T level- deassert I PI ,
which is issued wit h and I CR command, can be used t o resynchronize t he arbit rat ion
priorit ies of t he local API Cs by reset t ing Arb I D regist er of each agent t o it s current
API C I D value. ( The Pent ium 4 and I nt el Xeon processors do not implement t he Arb
I D regist er. )
Sect ion 10. 10, API C Bus Message Passing Mechanism and Prot ocol ( P6 Family,
Pent ium Processors) , describes t he API C bus arbit rat ion prot ocols and bus message
10-36 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
format s, while Sect ion 10. 6. 1, I nt errupt Command Regist er ( I CR) , describes t he
I NI T level de- assert I PI message.
Not e t hat except for t he SI PI I PI ( see Sect ion 10. 6. 1, I nt errupt Command Regist er
( I CR) ) , all bus messages t hat fail t o be delivered t o t heir specified dest inat ion or
dest inat ions are aut omat ically ret ried. Soft ware should avoid sit uat ions in which I PI s
are sent t o disabled or nonexist ent local API Cs, causing t he messages t o be resent
repeat edly.
10.8 HANDLING INTERRUPTS
When a local API C receives an int errupt from a local source, an int errupt message
from an I / O API C, or and I PI , t he manner in which it handles t he message depends
on processor implement at ion, as described in t he following sect ions.
10.8.1 Interrupt Handling with the Pentium 4 and Intel Xeon
Processors
Wit h t he Pent ium 4 and I nt el Xeon processors, t he local API C handles t he local int er-
rupt s, int errupt messages, and I PI s it receives as follows:
1. I t det ermines if it is t he specified dest inat ion or not ( see Figure 10- 16) . I f it is t he
specified dest inat ion, it accept s t he message; if it is not , it discards t he message.
2. I f t he local API C det ermines t hat it is t he designat ed dest inat ion for t he int errupt
and if t he int errupt request is an NMI , SMI , I NI T, Ext I NT, or SI PI , t he int errupt is
sent direct ly t o t he processor core for handling.
3. I f t he local API C det ermines t hat it is t he designat ed dest inat ion for t he int errupt
but t he int errupt request is not one of t he int errupt s given in st ep 2, t he local
API C set s t he appropriat e bit in t he I RR.
4. When int errupt s are pending in t he I RR and I SR regist er, t he local API C
dispat ches t hem t o t he processor one at a t ime, based on t heir priorit y and t he
Figure 10-16. Interrupt Acceptance Flow Chart for the Local APIC (Pentium 4 and
Intel Xeon Processors)
Wait to Receive
Bus Message
Belong to
Destination?
Discard
Message
No
Accept
Message
Yes
Vol. 3 10-37
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
current t ask and processor priorit ies in t he TPR and PPR ( see Sect ion 10. 8. 3. 1,
Task and Processor Priorit ies ) .
5. When a fixed int errupt has been dispat ched t o t he processor core for handling,
t he complet ion of t he handler rout ine is indicat ed wit h an inst ruct ion in t he
inst ruct ion handler code t hat writ es t o t he end- of- int errupt ( EOI ) regist er in t he
local API C ( see Sect ion 10. 8. 5, Signaling I nt errupt Servicing Complet ion ) . The
act of writ ing t o t he EOI regist er causes t he local API C t o delet e t he int errupt
from it s I SR queue and ( for level- t riggered int errupt s) send a message on t he
bus indicat ing t hat t he int errupt handling has been complet ed. ( A writ e t o t he EOI
regist er must not be included in t he handler rout ine for an NMI , SMI , I NI T,
Ext I NT, or SI PI . )
10.8.2 Interrupt Handling with the P6 Family and Pentium
Processors
Wit h t he P6 family and Pent ium processors, t he local API C handles t he local int er-
rupt s, int errupt messages, and I PI s it receives as follows ( see Figure 10- 17) .
10-38 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
1. ( I PI s only) I t examines t he I PI message t o det ermines if it is t he specified
dest inat ion for t he I PI as described in Sect ion 10.6. 2, Det ermining I PI Dest i-
nat ion. I f it is t he specified dest inat ion, it cont inues it s accept ance procedure; if
it is not t he dest inat ion, it discards t he I PI message. When t he message specifies
lowest - priorit y delivery mode, t he local API C will arbit rat e wit h t he ot her
processors t hat were designat ed on recipient s of t he I PI message ( see Sect ion
10. 6. 2. 4, Lowest Priorit y Delivery Mode ) .
2. I f t he local API C det ermines t hat it is t he designat ed dest inat ion for t he int errupt
and if t he int errupt request is an NMI , SMI , I NI T, Ext I NT, or I NI T- deassert
Figure 10-17. Interrupt Acceptance Flow Chart for the Local APIC (P6 Family and
Pentium Processors)
Wait to Receive
Bus Message
Belong
to
Destination?
Is it
NMI/SMI/INIT
/ExtINT?
Delivery
Am I
Focus?
Other
Focus?
Is Interrupt Slot
Available?
Is Status a
Retry?
Discard
Message
Accept
Message
Yes
Yes
Accept
Message
Is Interrupt
Slot Avail-
able?
Arbitrate
Yes
Am I Winner?
Accept
Message
Yes No
Set Status
to Retry
No
No
Yes
Set Status
to Retry
No
Discard
Message
No
Accept
Message
Yes
Lowes
Priority
Fixed
Yes No
No
Yes
No
P6 Family
Processor Specific

Vol. 3 10-39
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
int errupt , or one of t he MP prot ocol I PI messages ( BI PI , FI PI , and SI PI ) , t he
int errupt is sent direct ly t o t he processor core for handling.
3. I f t he local API C det ermines t hat it is t he designat ed dest inat ion for t he int errupt
but t he int errupt request is not one of t he int errupt s given in st ep 2, t he local
API C looks for an open slot in one of it s t wo pending int errupt queues cont ained
in t he I RR and I SR regist ers ( see Figure 10- 20) . I f a slot is available ( see Sect ion
10. 8. 4, I nt errupt Accept ance for Fixed I nt errupt s ) , places t he int errupt in t he
slot . I f a slot is not available, it rej ect s t he int errupt request and sends it back t o
t he sender wit h a ret ry message.
4. When int errupt s are pending in t he I RR and I SR regist er, t he local API C
dispat ches t hem t o t he processor one at a t ime, based on t heir priorit y and t he
current t ask and processor priorit ies in t he TPR and PPR ( see Sect ion 10. 8. 3. 1,
Task and Processor Priorit ies ) .
5. When a fixed int errupt has been dispat ched t o t he processor core for handling,
t he complet ion of t he handler rout ine is indicat ed wit h an inst ruct ion in t he
inst ruct ion handler code t hat writ es t o t he end- of- int errupt ( EOI ) regist er in t he
local API C ( see Sect ion 10. 8. 5, Signaling I nt errupt Servicing Complet ion ) . The
act of writ ing t o t he EOI regist er causes t he local API C t o delet e t he int errupt
from it s queue and ( for level- t riggered int errupt s) send a message on t he bus
indicat ing t hat t he int errupt handling has been complet ed. ( A writ e t o t he EOI
regist er must not be included in t he handler rout ine for an NMI , SMI , I NI T,
Ext I NT, or SI PI . )
The following sect ions describe t he accept ance of int errupt s and t heir handling by t he
local API C and processor in great er det ail.
10.8.3 Interrupt, Task, and Processor Priority
For int errupt s t hat are delivered t o t he processor t hrough t he local API C, each int er-
rupt has an implied priorit y based on it s vect or number. The local API C uses t his
priorit y t o det ermine when t o service t he int errupt relat ive t o t he ot her act ivit ies of
t he processor, including t he servicing of ot her int errupt s.
For int errupt vect ors in t he range of 16 t o 255, t he int errupt priorit y is det ermined
using t he following relat ionship:
priority = vector / 16
Here t he quot ient is rounded down t o t he nearest int eger value t o det ermine t he
priorit y, wit h 1 being t he lowest priorit y and 15 is t he highest . Because vect ors 0
t hrough 31 are reserved for dedicat ed uses by t he I nt el 64 and I A- 32 archit ect ures,
t he priorit ies of user defined int errupt s range from 2 t o 15.
Each int errupt priorit y level ( somet imes int erpret ed by soft ware as an int errupt
priorit y class) encompasses 16 vect ors. Priorit izing int errupt s wit hin a priorit y level is
det ermined by t he vect or number. The higher t he vect or number, t he higher t he
priorit y wit hin t hat priorit y level. I n det ermining t he priorit y of a vect or and ranking
10-40 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
of vect ors wit hin a priorit y group, t he vect or number is oft en divided int o t wo part s,
wit h t he high 4 bit s of t he vect or indicat ing it s priorit y and t he low 4 bit indicat ing it s
ranking wit hin t he priorit y group.
10.8.3.1 Task and Processor Priorities
The local API C also defines a t ask priorit y and a processor priorit y t hat it uses in
det ermining t he order in which int errupt s should be handled. The t ask priorit y is a
soft ware select ed value bet ween 0 and 15 ( see Figure 10- 18) t hat is writ t en int o t he
t ask priorit y regist er ( TPR) . The TPR is a read/ writ e regist er.
NOTE
I n t his discussion, t he t erm t ask refers t o a soft ware defined t ask,
process, t hread, program, or rout ine t hat is dispat ched t o run on t he
processor by t he operat ing syst em. I t does not refer t o an I A- 32
archit ect ure defined t ask as described in Chapt er 7, Task
Management .
The t ask priorit y allows soft ware t o set a pr i or i t y t hr eshol d for int errupt ing t he
processor. The processor will service only t hose int errupt s t hat have a priorit y higher
t han t hat specified in t he TPR. I f soft ware set s t he t ask priorit y in t he TPR t o 0, t he
processor will handle all int errupt s; it is it set t o 15, all int errupt s are inhibit ed from
being handled, except t hose delivered wit h t he NMI , SMI , I NI T, Ext I NT, I NI T- deas-
sert , and st art - up delivery mode. This mechanism enables t he operat ing syst em t o
t emporarily block specific int errupt s ( generally low priorit y int errupt s) from
dist urbing high- priorit y work t hat t he processor is doing.
Not e t hat t he t ask priorit y is also used t o det ermine t he arbit rat ion priorit y of t he
local processor ( see Sect ion 10. 6. 2. 4, Lowest Priorit y Delivery Mode ) .
The processor priorit y is set by t he processor, also t o value bet ween 0 and 15 ( see
Figure 10- 19) t hat is writ t en int o t he processor priorit y regist er ( PPR) . The PPR is a
read only regist er. The processor priorit y represent s t he current priorit y at which t he
processor is execut ing. I t is used t o det ermine whet her a pending int errupt can be
dispensed t o t he processor.

Figure 10-18. Task Priority Register (TPR)
31 0 7 8
Reserved
Address: FEE0 0080H
Value after reset: 0H
Task Priority Sub-Class
Task Priority
4 3
Vol. 3 10-41
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
I t s value in t he PPR is comput ed as follows:
IF TPR[7:4] ISRV[7:4]
THEN
PPR[7:0] TPR[7:0]
ELSE
PPR[7:4] ISRV[7:4]
PPR[3:0] 0
Here, t he I SRV value is t he vect or number of t he highest priorit y I SR bit t hat is set ,
or 00H if no I SR bit is set . Essent ially, t he processor priorit y is set t o eit her t o t he
highest priorit y pending int errupt in t he I SR or t o t he current t ask priorit y, whichever
is higher.
10.8.4 Interrupt Acceptance for Fixed Interrupts
The local API C queues t he fixed int errupt s t hat it accept s in one of t wo int errupt
pending regist ers: t he int errupt request regist er ( I RR) or in- service regist er ( I SR) .
These t wo 256- bit read- only regist ers are shown in Figure 10- 20. The 256 bit s in
t hese regist ers represent t he 256 possible vect ors; vect ors 0 t hrough 15 are
reserved by t he API C ( see also: Sect ion 10. 5. 2, Valid I nt errupt Vect ors ) .
NOTE
All int errupt s wit h an NMI , SMI , I NI T, Ext I NT, st art - up, or I NI T-
deassert delivery mode bypass t he I RR and I SR regist ers and are
sent direct ly t o t he processor core for servicing.
The I RR cont ains t he act ive int errupt request s t hat have been accept ed, but not yet
dispat ched t o t he processor for servicing. When t he local API C accept s an int errupt ,
it set s t he bit in t he I RR t hat corresponds t he vect or of t he accept ed int errupt . When
t he processor core is ready t o handle t he next int errupt , t he local API C clears t he
highest priorit y I RR bit t hat is set and set s t he corresponding I SR bit . The vect or for
t he highest priorit y bit set in t he I SR is t hen dispat ched t o t he processor core for
servicing.

Figure 10-19. Processor Priority Register (PPR)
31 0 7 8
Reserved
Address: FEE0 00A0H
Value after reset: 0H
Processor Priority Sub-Class
Processor Priority
4 3
10-42 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
While t he processor is servicing t he highest priorit y int errupt , t he local API C can send
addit ional fixed int errupt s by set t ing bit s in t he I RR. When t he int errupt service
rout ine issues a writ e t o t he EOI regist er ( see Sect ion 10. 8. 5, Signaling I nt errupt
Servicing Complet ion ) , t he local API C responds by clearing t he highest priorit y I SR
bit t hat is set . I t t hen repeat s t he process of clearing t he highest priorit y bit in t he I RR
and set t ing t he corresponding bit in t he I SR. The processor core t hen begins
execut ing t he service rout ing for t he highest priorit y bit set in t he I SR.
I f more t han one int errupt is generat ed wit h t he same vect or number, t he local API C
can set t he bit for t he vect or bot h in t he I RR and t he I SR. This means t hat for t he
Pent ium 4 and I nt el Xeon processors, t he I RR and I SR can queue t wo int errupt s for
each int errupt vect or: one in t he I RR and one in t he I SR. Any addit ional int errupt s
issued for t he same int errupt vect or are collapsed int o t he single bit in t he I RR.
For t he P6 family and Pent ium processors, t he I RR and I SR regist ers can queue no
more t han t wo int errupt s per priorit y level, and will rej ect ot her int errupt s t hat are
received wit hin t he same priorit y level.
I f t he local API C receives an int errupt wit h a priorit y higher t han t hat of t he int errupt
current ly in serviced, and int errupt s are enabled in t he processor core, t he local API C
dispat ches t he higher priorit y int errupt t o t he processor immediat ely ( wit hout
wait ing for a writ e t o t he EOI regist er) . The current ly execut ing int errupt handler is
t hen int errupt ed so t he higher- priorit y int errupt can be handled. When t he handling
of t he higher- priorit y int errupt has been complet ed, t he servicing of t he int errupt ed
int errupt is resumed.
The t rigger mode regist er ( TMR) indicat es t he t rigger mode of t he int errupt ( see
Figure 10- 20) . Upon accept ance of an int errupt int o t he I RR, t he corresponding TMR
bit is cleared for edge- t riggered int errupt s and set for level- t riggered int errupt s. I f a
TMR bit is set when an EOI cycle for it s corresponding int errupt vect or is generat ed,
an EOI message is sent t o all I / O API Cs.

Figure 10-20. IRR, ISR and TMR Registers
255 0
Reserved
Addresses: IRR FEE0 0200H - FEE0 0270H
Value after reset: 0H
16 15
IRR
Reserved ISR
Reserved TMR
ISR FEE0 0100H - FEE0 0170H
TMR FEE0 0180H - FEE0 01F0H
Vol. 3 10-43
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.8.5 Signaling Interrupt Servicing Completion
For all int errupt s except t hose delivered wit h t he NMI , SMI , I NI T, Ext I NT, t he st art -
up, or I NI T- Deassert delivery mode, t he int errupt handler must include a writ e t o t he
end- of- int errupt ( EOI ) regist er ( see Figure 10- 21) . This writ e must occur at t he end
of t he handler rout ine, somet ime before t he I RET inst ruct ion. This act ion indicat es
t hat t he servicing of t he current int errupt is complet e and t he local API C can issue t he
next int errupt from t he I SR.
Upon receiving and EOI , t he API C clears t he highest priorit y bit in t he I SR and
dispat ches t he next highest priorit y int errupt t o t he processor. I f t he t erminat ed
int errupt was a level- t riggered int errupt , t he local API C also sends an end- of- int er-
rupt message t o all I / O API Cs.
Syst em soft ware may prefer t o direct EOI s t o specific I / O API Cs rat her t han having
t he local API C send end- of- int errupt messages t o all I / O API Cs.
Soft ware can inhibit t he broadcast of EOI message by set t ing bit 12 of t he Spurious
I nt errupt Vect or Regist er ( see Sect ion 10. 9) . I f t his bit is set , a broadcast EOI is not
generat ed on an EOI cycle even if t he associat ed TMR bit indicat es t hat t he current
int errupt was level- t riggered. The default value for t he bit is 0, indicat ing t hat EOI
broadcast s are performed.
Bit 12 of t he Spurious I nt errupt Vect or Regist er is reserved t o 0 if t he processor does
not support suppression of EOI broadcast s. Support for EOI - broadcast suppression is
report ed in bit 24 in t he Local API C Version Regist er ( see Sect ion 10. 4. 8) ; t he feat ure
is support ed if t hat bit is set t o 1. When support ed, t he feat ure is available in bot h
xAPI C mode and x2API C mode.
Syst em soft ware desiring t o perform direct ed EOI s for level- t riggered int errupt s
should set bit 12 of t he Spurious I nt errupt Vect or Regist er and follow each t he EOI t o
t he local xAPI C for a level t riggered int errupt wit h a direct ed EOI t o t he I / O API C
generat ing t he int errupt ( t his is done by writ ing t o t he I / O API Cs EOI regist er) .
Syst em soft ware performing direct ed EOI s must ret ain a mapping associat ing level-
t riggered int errupt s wit h t he I / O API Cs in t he syst em.
Figure 10-21. EOI Register
31 0
Address: 0FEE0 00B0H
Value after reset: 0H
10-44 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.8.6 Task Priority in IA-32e Mode
I n I A- 32e mode, operat ing syst ems can manage t he 16 priorit y classes of ext ernal
int errupt s ( see Sect ion 10. 8. 3, I nt errupt , Task, and Processor Priorit y ) explicit ly
using t he t ask priorit y regist er ( TPR) . Operat ing syst ems can use t he TPR t o t empo-
rarily block specific ( low- priorit y) int errupt s from int errupt ing a high- priorit y t ask.
This is done by loading TPR wit h a value corresponding t o t he highest - priorit y int er-
rupt t hat is t o be blocked. For example:
Loading t he TPR wit h a value of 8 ( 01000B) blocks all int errupt s wit h a priorit y of
8 or less while allowing all int errupt s wit h a priorit y of nine or more t o be
recognized.
Loading t he TPR wit h zero enables all ext ernal int errupt s.
Loading t he TPR wit h 0F ( 01111B) disables all ext ernal int errupt s.
The TPR ( shown in Figure 10- 18) is cleared t o 0 on reset . I n 64- bit mode, soft ware
can read and writ e t he TPR using an alt ernat e int erface, MOV CR8 inst ruct ion. The
new priorit y level is est ablished when t he MOV CR8 inst ruct ion complet es execut ion.
Soft ware does not need t o force serializat ion aft er loading t he TPR using MOV CR8.
Use of t he MOV CRn inst ruct ion requires a privilege level of 0. Programs running at
privilege level great er t han 0 cannot read or writ e t he TPR. An at t empt t o do so
causes a general- prot ect ion except ion. The TPR is abst ract ed from t he int errupt
cont roller ( I C) , which priorit izes and manages ext ernal int errupt delivery t o t he
processor. The I C can be an ext ernal device, such as an API C or 8259. Typically, t he
I C provides a priorit y mechanism similar or ident ical t o t he TPR. The I C, however, is
considered implement at ion- dependent wit h t he under- lying priorit y mechanisms
subj ect t o change. CR8, by cont rast , is part of t he I nt el 64 archit ect ure. Soft ware can
depend on t his definit ion remaining unchanged.
Figure 10- 22 shows t he layout of CR8; only t he low four bit s are used. The remaining
60 bit s are reserved and must be writ t en wit h zeros. Failure t o do t his causes a
general- prot ect ion except ion.
10.8.6.1 Interaction of Task Priorities between CR8 and APIC
The first implement at ion of I nt el 64 archit ect ure includes a local advanced program-
mable int errupt cont roller ( API C) t hat is similar t o t he API C used wit h previous I A- 32
processors. Some aspect s of t he local API C affect t he operat ion of t he archit ect urally
defined t ask priorit y regist er and t he programming int erface using CR8.
Figure 10-22. CR8 Register
63 0
Value after reset: 0H
3 4
Reserved
Vol. 3 10-45
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Not able CR8 and API C int eract ions are:
The processor powers up wit h t he local API C enabled.
The API C must be enabled for CR8 t o funct ion as t he TPR. Writ es t o CR8 are
reflect ed int o t he API C Task Priorit y Regist er.
API C. TPR[ bit s 7: 4] = CR8[ bit s 3: 0] , API C. TPR[ bit s 3: 0] = 0. A read of CR8
ret urns a 64- bit value which is t he value of TPR[ bit s 7: 4] , zero ext ended t o 64
bit s.
There are no ordering mechanisms bet ween direct updat es of t he API C. TPR and CR8.
Operat ing soft ware should implement eit her direct API C TPR updat es or CR8 st yle
TPR updat es but not mix t hem. Soft ware can use a serializing inst ruct ion ( for
example, CPUI D) t o serialize updat es bet ween MOV CR8 and st ores t o t he API C.
10.9 SPURIOUS INTERRUPT
A special sit uat ion may occur when a processor raises it s t ask priorit y t o be great er
t han or equal t o t he level of t he int errupt for which t he processor I NTR signal is
current ly being assert ed. I f at t he t ime t he I NTA cycle is issued, t he int errupt t hat
was t o be dispensed has become masked ( programmed by soft ware) , t he local API C
will deliver a spurious- int errupt vect or. Dispensing t he spurious- int errupt vect or does
not affect t he I SR, so t he handler for t his vect or should ret urn wit hout an EOI .
The vect or number for t he spurious- int errupt vect or is specified in t he spurious- int er-
rupt vect or regist er ( see Figure 10- 23) . The funct ions of t he fields in t his regist er are
as follows:
Spur i ous Vect or Det ermines t he vect or number t o be delivered t o t he processor
when t he local API C generat es a spurious vect or.
( Pent ium 4 and I nt el Xeon processors. ) Bit s 0 t hrough 7 of t he
t his field are programmable by soft ware.
( P6 family and Pent ium processors) . Bit s 4 t hrough 7 of t he t his
field are programmable by soft ware, and bit s 0 t hrough 3 are
hardwired t o logical ones. Soft ware writ es t o bit s 0 t hrough 3
have no effect .
API C Sof t w ar e Enabl e/ Di sabl e
Allows soft ware t o t emporarily enable ( 1) or disable ( 0) t he local
API C ( see Sect ion 10. 4. 3, Enabling or Disabling t he Local
API C ) .
Focus Pr ocessor Check i ng
Det ermines if focus processor checking is enabled ( 0) or
disabled ( 1) when using t he lowest - priorit y delivery mode. I n
Pent ium 4 and I nt el Xeon processors, t his bit is reserved and
should be cleared t o 0.
Suppr ess EOI Br oadcast s
10-46 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Det ermines whet her an EOI for a level- t riggered int errupt
causes EOI messages t o be broadcast t o t he I / O API Cs ( 0) or not
( 1) . See Sect ion 10. 8. 5. The default value for t his bit is 0, indi-
cat ing t hat EOI broadcast s are performed. This bit is reserved t o
0 if t he processor does not support EOI - broadcast suppression.
NOTE
Do not program an LVT or I OAPI C RTE wit h a spurious vect or even if
you set t he mask bit . A spurious vect or I SR does not do an EOI . I f for
some reason an int errupt is generat ed by an LVT or RTE ent ry, t he bit
in t he in- service regist er will be left set for t he spurious vect or. This
will mask all int errupt s at t he same or lower priorit y
10.10 APIC BUS MESSAGE PASSING MECHANISM AND
PROTOCOL (P6 FAMILY, PENTIUM PROCESSORS)
The Pent ium 4 and I nt el Xeon processors pass messages among t he local and I / O
API Cs on t he syst em bus, using t he syst em bus message passing mechanism and
prot ocol.
Figure 10-23. Spurious-Interrupt Vector Register (SVR)
31 0
Reserved
7
Focus Processor Checking
2
APIC Software Enable/Disable
8 9 10
0: APIC Disabled
1: APIC Enabled
Spurious Vector
3
Address: FEE0 00F0H
Value after reset: 0000 00FFH
0: Enabled
1: Disabled
1. Not supported on all processors.
2. Not supported in Pentium 4 and Intel Xeon processors.
3. For the P6 family and Pentium processors, bits 0 through 3
are always 0.
11 12
EOI-Broadcast Suppression
1
0: Enabled
1: Disabled
Vol. 3 10-47
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The P6 family and Pent ium processors, pass messages among t he local and I / O
API Cs on t he serial API C bus, as follows. Because only one message can be sent at a
t ime on t he API C bus, t he I / O API C and local API Cs employ a rot at ing priorit y arbi-
t rat ion prot ocol t o gain permission t o send a message on t he API C bus. One or more
API Cs may st art sending t heir messages simult aneously. At t he beginning of every
message, each API C present s t he t ype of t he message it is sending and it s current
arbit rat ion priorit y on t he API C bus. This informat ion is used for arbit rat ion. Aft er
each arbit rat ion cycle ( wit hin an arbit rat ion round) , only t he pot ent ial winners keep
driving t he bus. By t he t ime all arbit rat ion cycles are complet ed, t here will be only
one API C left driving t he bus. Once a winner is select ed, it is grant ed exclusive use of
t he bus, and will cont inue driving t he bus t o send it s act ual message.
Aft er each successfully t ransmit t ed message, all API Cs increase t heir arbit rat ion
priorit y by 1. The previous winner ( t hat is, t he one t hat has j ust successfully t rans-
mit t ed it s message) assumes a priorit y of 0 ( lowest ) . An agent whose arbit rat ion
priorit y was 15 ( highest ) during arbit rat ion, but did not send a message, adopt s t he
previous winner s arbit rat ion priorit y, increment s by 1.
Not e t hat t he arbit rat ion prot ocol described above is slight ly different if one of t he
API Cs issues a special End- Of- I nt errupt ( EOI ) . This high- priorit y message is grant ed
t he bus regardless of it s sender s arbit rat ion priorit y, unless more t han one API C
issues an EOI message simult aneously. I n t he lat t er case, t he API Cs sending t he EOI
messages arbit rat e using t heir arbit rat ion priorit ies.
I f t he API Cs are set up t o use lowest priorit y arbit rat ion ( see Sect ion 10. 6. 2. 4,
Lowest Priorit y Delivery Mode ) and mult iple API Cs are current ly execut ing at t he
lowest priorit y ( t he value in t he APR regist er) , t he arbit rat ion priorit ies ( unique
values in t he Arb I D regist er) are used t o break t ies. All 8 bit s of t he APR are used for
t he lowest priorit y arbit rat ion.
10.10.1 Bus Message Formats
See Appendix F, API C Bus Message Format s, for a descript ion of bus message
format s used t o t ransmit messages on t he serial API C bus.
10.11 MESSAGE SIGNALLED INTERRUPTS
The PCI Local Bus Specificat ion, Rev 2. 2 ( www. pcisig. com) int roduces t he concept of
message signalled int errupt s. I nt el processors and chipset s wit h t his capabilit y
current ly include t he Pent ium 4 and I nt el Xeon processors. As t he specificat ion indi-
cat es:
Message signalled int errupt s ( MSI ) is an opt ional feat ure t hat
enables PCI devices t o request service by writ ing a syst em- specified
message t o a syst em- specified address ( PCI DWORD memory writ e
t ransact ion) . The t ransact ion address specifies t he message
dest inat ion while t he t ransact ion dat a specifies t he message. Syst em
10-48 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
soft ware is expect ed t o init ialize t he message dest inat ion and
message during device configurat ion, allocat ing one or more non-
shared messages t o each MSI capable funct ion.
The capabilit ies mechanism provided by t he PCI Local Bus Specificat ion is used t o
ident ify and configure MSI capable PCI devices. Among ot her fields, t his st ruct ure
cont ains a Message Dat a Regist er and a Message Address Regist er. To request
service, t he PCI device funct ion writ es t he cont ent s of t he Message Dat a Regist er t o
t he address cont ained in t he Message Address Regist er ( and t he Message Upper
Address regist er for 64- bit message addresses) .
Sect ion 10. 11.1 and Sect ion 10. 11. 2 provide layout det ails for t he Message Address
Regist er and t he Message Dat a Regist er. The operat ion issued by t he device is a PCI
writ e command t o t he Message Address Regist er wit h t he Message Dat a Regist er
cont ent s. The operat ion follows semant ic rules as defined for PCI writ e operat ions
and is a DWORD operat ion.
10.11.1 Message Address Register Format
The format of t he Message Address Regist er ( lower 32- bit s) is shown in
Figure 10- 24.
Fields in t he Message Address Regist er are as follows:
1. Bi t s 31- 20 These bit s cont ain a fixed value for int errupt messages ( 0FEEH) .
This value locat es int errupt s at t he 1- MByt e area wit h a base address of 4G
18M. All accesses t o t his region are direct ed as int errupt messages. Care must t o
be t aken t o ensure t hat no ot her device claims t he region as I / O space.
2. Dest i nat i on I D This field cont ains an 8- bit dest inat ion I D. I t ident ifies t he
messages t arget processor( s) . The dest inat ion I D corresponds t o bit s 63: 56 of
t he I / O API C Redirect ion Table Ent ry if t he I OAPI C is used t o dispat ch t he
int errupt t o t he processor( s) .
3. Redi r ect i on hi nt i ndi cat i on ( RH) This bit indicat es whet her t he message
should be direct ed t o t he processor wit h t he lowest int errupt priorit y among
processors t hat can receive t he int errupt .
Figure 10-24. Layout of the MSI Message Address Register
31 20 19 12 11 4 3 2 1 0
0FEEH Destination ID Reserved RH DM XX
Vol. 3 10-49
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
When RH is 0, t he int errupt is direct ed t o t he processor list ed in t he
Dest inat ion I D field.
When RH is 1 and t he physical dest inat ion mode is used, t he Dest inat ion
I D field must not be set t o 0xFF; it must point t o a processor t hat is
present and enabled t o receive t he int errupt .
When RH is 1 and t he logical dest inat ion mode is act ive in a syst em using
a flat addressing model, t he Dest inat ion I D field must be set so t hat bit s
set t o 1 ident ify processors t hat are present and enabled t o receive t he
int errupt .
I f RH is set t o 1 and t he logical dest inat ion mode is act ive in a syst em
using clust er addressing model, t hen Dest inat ion I D field must not be set
t o 0xFF; t he processors ident ified wit h t his field must be present and
enabled t o receive t he int errupt .
4. Dest i nat i on mode ( DM) This bit indicat es whet her t he Dest inat ion I D field
should be int erpret ed as logical or physical API C I D for delivery of t he lowest
priorit y int errupt . I f RH is 1 and DM is 0, t he Dest inat ion I D field is in physical
dest inat ion mode and only t he processor in t he syst em t hat has t he mat ching
API C I D is considered for delivery of t hat int errupt ( t his means no re- direct ion) .
I f RH is 1 and DM is 1, t he Dest inat ion I D Field is int erpret ed as in logical
dest inat ion mode and t he redirect ion is limit ed t o only t hose processors t hat are
part of t he logical group of processors based on t he processor s logical API C I D
and t he Dest inat ion I D field in t he message. The logical group of processors
consist s of t hose ident ified by mat ching t he 8- bit Dest inat ion I D wit h t he logical
dest inat ion ident ified by t he Dest inat ion Format Regist er and t he Logical
Dest inat ion Regist er in each local API C. The det ails are similar t o t hose described
in Sect ion 10. 6. 2, Det ermining I PI Dest inat ion. I f RH is 0, t hen t he DM bit is
ignored and t he message is sent ahead independent of whet her t he physical or
logical dest inat ion mode is used.
10.11.2 Message Data Register Format
The layout of t he Message Dat a Regist er is shown in Figure 10- 25.
Reserved fields are not assumed t o be any value. Soft ware must preserve t heir
cont ent s on writ es. Ot her fields in t he Message Dat a Regist er are described below.
1. Vect or This 8- bit field cont ains t he int errupt vect or associat ed wit h t he
message. Values range from 010H t o 0FEH. Soft ware must guarant ee t hat t he
field is not programmed wit h vect or 00H t o 0FH.
2. Del i v er y Mode This 3- bit field specifies how t he int errupt receipt is handled.
Delivery Modes operat e only in conj unct ion wit h specified Trigger Modes. Correct
Trigger Modes must be guarant eed by soft ware. Rest rict ions are indicat ed below:
a. 000B ( Fi x ed Mode) Deliver t he signal t o all t he agent s list ed in t he
dest inat ion. The Trigger Mode for fixed delivery mode can be edge or level.
10-50 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
b. 001B ( Low est Pr i or i t y) Deliver t he signal t o t he agent t hat is execut ing
at t he lowest priorit y of all agent s list ed in t he dest inat ion field. The t rigger
mode can be edge or level.
c. 010B ( Sy st em Management I nt er r upt or SMI ) The delivery mode is
edge only. For syst ems t hat rely on SMI semant ics, t he vect or field is ignored
but must be programmed t o all zeroes for fut ure compat ibilit y.
d. 100B ( NMI ) Deliver t he signal t o all t he agent s list ed in t he dest inat ion
field. The vect or informat ion is ignored. NMI is an edge t riggered int errupt
regardless of t he Trigger Mode Set t ing.
e. 101B ( I NI T) Deliver t his signal t o all t he agent s list ed in t he dest inat ion
field. The vect or informat ion is ignored. I NI T is an edge t riggered int errupt
regardless of t he Trigger Mode Set t ing.
f. 111B ( Ex t I NT) Deliver t he signal t o t he I NTR signal of all agent s in t he
dest inat ion field ( as an int errupt t hat originat ed from an 8259A compat ible
int errupt cont roller) . The vect or is supplied by t he I NTA cycle issued by t he
act ivat ion of t he Ext I NT. Ext I NT is an edge t riggered int errupt .
Figure 10-25. Layout of the MSI Message Data Register
Reserved
Reserved Reserved Vector
Delivery Mode
001 - Lowest Priority
010 - SMI
011 - Reserved
101 - INIT
110 - Reserved
111 - ExtINT
Trigger Mode
0 - Edge
1 - Level
Level for Trigger Mode = 0
X - Dont care
Level for Trigger Mode = 1
0 - Deassert
1 - Assert
000 - Fixed
100 - NMI
31 16 15 14 13 11 10 8 7 0
63 32
Vol. 3 10-51
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
3. Level Edge t riggered int errupt messages are always int erpret ed as assert
messages. For edge t riggered int errupt s t his field is not used. For level t riggered
int errupt s, t his bit reflect s t he st at e of t he int errupt input .
4. Tr i gger Mode This field indicat es t he signal t ype t hat will t rigger a message.
a. 0 I ndicat es edge sensit ive.
b. 1 I ndicat es level sensit ive.
10.12 EXTENDED XAPIC (X2APIC)
The x2API C archit ect ure ext ends t he xAPI C archit ect ure ( described in Sect ion 9. 4) in
a backward compat ible manner and provides forward ext endabilit y for fut ure I nt el
plat form innovat ions. Specifically, t he x2API C archit ect ure does t he following:
Ret ains all key element s of compat ibilit y t o t he xAPI C archit ect ure:
delivery modes,
int errupt and processor priorit ies,
int errupt sources,
int errupt dest inat ion t ypes;
Provides ext ensions t o scale processor addressabilit y for bot h t he logical and
physical dest inat ion modes;
Adds new feat ures t o enhance performance of int errupt delivery;
Reduces complexit y of logical dest inat ion mode int errupt delivery on link based
plat form archit ect ures.
Uses MSR programming int erface t o access API C regist ers in x2API C mode
inst ead of memory- mapped int erfaces. Memory- mapped int erface is support ed
when operat ing in xAPI C mode.
10.12.1 Detecting and Enabling x2APIC Mode
Processor support for x2API C mode can be det ect ed by execut ing CPUI D wit h EAX= 1
and t hen checking ECX, bit 21 ECX. I f CPUI D.( EAX= 1) : ECX. 21 is set , t he processor
support s t he x2API C capabilit y and can be placed int o t he x2API C mode.
Syst em soft ware can place t he local API C in t he x2API C mode by set t ing t he x2API C
mode enable bit ( bit 10) in t he I A32_API C_BASE MSR at MSR address 01BH. The
layout for t he I A32_API C_BASE MSR is shown in Figure 10- 26.
Table 10- 5, x2API C operat ing mode configurat ions describe t he possible combina-
t ions of t he enable bit ( EN - bit 11) and t he ext ended mode bit ( EXTD - bit 10) in t he
I A32_API C_BASE MSR.
10-52 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Once t he local API C has been swit ched t o x2API C mode ( EN = 1, EXTD = 1) ,
swit ching back t o xAPI C mode would require syst em soft ware t o disable t he local
API C unit . Specifically, at t empt ing t o writ e a value t o t he I A32_API C_BASE MSR t hat
has ( EN= 1, EXTD = 0) when t he local API C is enabled and in x2API C mode causes a
general- prot ect ion except ion. Once bit 10 in I A32_API C_BASE MSR is set , t he only
way t o leave x2API C mode using I A32_API C_BASE would require a WRMSR t o set
bot h bit 11 and bit 10 t o zero. Sect ion 10. 12. 5, x2API C St at e Transit ions provides a
det ailed st at e diagram for t he st at e t ransit ions allowed for t he local API C.
10.12.1.1 Instructions to Access APIC Registers
I n x2API C mode, syst em soft ware uses RDMSR and WRMSR t o access t he API C regis-
t ers. The MSR addresses for accessing t he x2API C regist ers are archit ect urally
defined and specified in Sect ion 10. 12.1.2, x2API C Regist er Address Space .
Execut ing t he RDMSR inst ruct ion wit h API C regist er address specified in ECX ret urns
t he cont ent of bit s 0 t hrough 31 of t he API C regist ers in EAX. Bit s 32 t hrough 63 are
ret urned in regist er EDX - t hese bit s are reserved if t he API C regist er being read is a
32- bit regist er. Similarly execut ing t he WRMSR inst ruct ion wit h t he API C regist er
address in ECX, writ es bit s 0 t o 31 of regist er EAX t o bit s 0 t o 31 of t he specified API C
regist er. I f t he regist er is a 64- bit regist er t hen bit s 0 t o 31 of regist er EDX are writ t en
t o bit s 32 t o 63 of t he API C regist er. The I nt errupt Command Regist er is t he only API C
Figure 10-26. IA32_APIC_BASE MSR Supporting x2APIC
Table 10-5. x2APIC Operating Mode Configurations
xAPIC global enable
(IA32_APIC_BASE[11])
x2APIC enable
(IA32_APIC_BASE[10]) Description
0 0 local APIC is disabled
0 1 Invalid
1 0 local APIC is enabled in xAPIC mode
1 1 local APIC is enabled in x2APIC mode
BSPProcessor is BSP
ENxAPIC global enable/disable
APIC BaseBase physical address
63 0 7 10 11 8 9 12
Reserved
36 35
APIC Base Reserved
EXTDEnable x2APIC mode
Vol. 3 10-53
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
regist er t hat is implement ed as a 64- bit MSR. The semant ics of handling reserved
bit s are defined in Sect ion 10. 12.1. 3, Reserved Bit Checking .
10.12.1.2 x2APIC Register Address Space
The MSR address range 800H t hrough BFFH is archit ect urally reserved and dedicat ed
for accessing API C regist ers in x2API C mode. Table 10- 6 list s t he API C regist ers t hat
are available in x2API C mode. When appropriat e, t he t able also gives t he offset at
which each regist er is available on t he page referenced by I A32_API C_BASE[ 35: 12]
in xAPI C mode.
There is a one- t o- one mapping bet ween t he x2API C MSRs and t he legacy xAPI C
regist er offset s wit h t he following except ions:
The Dest inat ion Format Regist er ( DFR) : The DFR, support ed at offset 0E0H in
x2API C mode, is not support ed in x2API C mode. There is no MSR wit h address
80EH.
The I nt errupt Command Regist er ( I CR) : The t wo 32- bit regist ers in xAPI C mode
( at offset s 300H and 310H) are merged int o a single 64- bit MSR in x2API C mode
( wit h MSR address 830H) . There is no MSR wit h address 831H.
The SELF I PI regist er. This regist er is available only in x2API C mode at address
83FH. I n xAPI C mode, t here is no regist er defined at offset 3F0H.
Addresses in t he range 800HBFFH t hat are not list ed in Table 10- 6 ( including 80EH
and 831H) are reserved. Execut ions of RDMSR and WRMSR t hat at t empt t o access
such addresses cause general- prot ect ion except ions.
The MSR address space is compressed t o allow for fut ure growt h. Every 32 bit
regist er on a 128- bit boundary in t he legacy MMI O space is mapped t o a single MSR
in t he local x2API C MSR address space. The upper 32- bit s of all x2API C MSRs ( except
for t he I CR) are reserved.
Table 10-6. Local APIC Register Address Map Supported by x2APIC
MSR Address
(x2APIC mode)
MMIO Offset
(xAPIC mode)
Register Name
MSR R/W
Semantics
Comments
802H 020H Local APIC ID register Read-only
1
See Section 10.12.5.1 for
initial values.
803H 030H Local APIC Version
register
Read-only Same version used in
xAPIC mode and x2APIC
mode.
808H 080H Task Priority Register
(TPR)
Read/write Bits 31:8 are reserved.
2
80AH 0A0H Processor Priority
Register (PPR)
Read-only
10-54 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
80BH 0B0H EOI register Write-
only
3
WRMSR of a non-zero
value causes #GP(0).
80DH 0D0H Logical Destination
Register (LDR)
Read-only Read/write in xAPIC
mode.
80FH 0F0H Spurious Interrupt
Vector Register (SVR)
Read/write See Section 10.9 for
reserved bits.
810H 100H In-Service Register
(ISR); bits 31:0
Read-only
811H 110H ISR bits 63:32 Read-only
812H 120H ISR bits 95:64 Read-only
813H 130H ISR bits 127:96 Read-only
814H 140H ISR bits 159:128 Read-only
815H 150H ISR bits 191:160 Read-only
816H 160H ISR bits 223:192 Read-only
817H 170H ISR bits 255:224 Read-only
818H 180H Trigger Mode Register
(TMR); bits 31:0
Read-only
819H 190H TMR bits 63:32 Read-only
81AH 1A0H TMR bits 95:64 Read-only
81BH 1B0H TMR bits 127:96 Read-only
81CH 1C0H TMR bits 159:128 Read-only
81DH 1D0H TMR bits 191:160 Read-only
81EH 1E0H TMR bits 223:192 Read-only
81FH 1F0H TMR bits 255:224 Read-only
820H 200H Interrupt Request
Register (IRR); bits
31:0
Read-only
821H 210H IRR bits 63:32 Read-only
822H 220H IRR bits 95:64 Read-only
823H 230H IRR bits 127:96 Read-only
824H 240H IRR bits 159:128 Read-only
825H 250H IRR bits 191:160 Read-only
Table 10-6. Local APIC Register Address Map Supported by x2APIC (Contd.)
MSR Address
(x2APIC mode)
MMIO Offset
(xAPIC mode)
Register Name
MSR R/W
Semantics
Comments
Vol. 3 10-55
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
826H 260H IRR bits 223:192 Read-only
827H 270H IRR bits 255:224 Read-only
828H 280H Error Status Register
(ESR)
Read/write WRMSR of a non-zero
value causes #GP(0). See
Section 10.5.3 and
Section 10.12.8.
82FH 2F0H LVT CMCI register Read/write See Figure 15-10 for
reserved bits.
830H
4
300H and
310H
Interrupt Command
Register (ICR)
Read/write See Figure 10-29 for
reserved bits
832H 320H LVT Timer register Read/write See Figure 10-8 for
reserved bits.
833H 330H LVT Thermal Sensor
register
Read/write See Figure 10-8 for
reserved bits.
834H 340H LVT Performance
Monitoring register
Read/write See Figure 10-8 for
reserved bits.
835H 350H LVT LINT0 register Read/write See Figure 10-8 for
reserved bits.
836H 360H LVT LINT1 register Read/write See Figure 10-8 for
reserved bits.
837H 370H LVT Error register Read/write See Figure 10-8 for
reserved bits.
838H 380H Initial Count register
(for Timer)
Read/write
839H 390H Current Count
register (for Timer)
Read-only
83EH 3E0H Divide Configuration
Register (DCR; for
Timer)
Read/write See Figure 10-10 for
reserved bits.
83FH Not available SELF IPI
5
Write-only Available only in x2APIC
mode.
NOTES:
1. WRMSR causes #GP(0) for read-only registers.
2. WRMSR causes #GP(0) for attempts to set a reserved bit to 1 in a read/write register (including
bits 63:32 of each register).
3. RDMSR causes #GP(0) for write-only registers.
Table 10-6. Local APIC Register Address Map Supported by x2APIC (Contd.)
MSR Address
(x2APIC mode)
MMIO Offset
(xAPIC mode)
Register Name
MSR R/W
Semantics
Comments
10-56 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.12.1.3 Reserved Bit Checking
Sect ion 10. 12. 1. 2 and Table 10- 6 specifies t he reserved bit definit ions for t he API C
regist ers in x2API C mode. Non- zero writ es ( by WRMSR inst ruct ion) t o reserved bit s
t o t hese regist ers will raise a general prot ect ion fault except ion while reads ret urn
zeros ( RsvdZ semant ics) .
I n x2API C mode, t he local API C I D regist er is increased t o 32 bit s wide. This enables
2
32
1 processors t o be addressable in physical dest inat ion mode. This 32- bit value is
referred t o as x2API C I D. A processor implement at ion may choose t o support less
t han 32 bit s in it s hardware. Syst em soft ware should be agnost ic t o t he act ual
number of bit s t hat are implement ed. All non- implement ed bit s will ret urn zeros on
reads by soft ware.
The API C I D value of FFFF_FFFFH and t he highest value corresponding t o t he imple-
ment ed bit - widt h of t he local API C I D regist er in t he syst em are reserved and cannot
be assigned t o any logical processor.
I n x2API C mode, t he local API C I D regist er is a read- only regist er t o syst em soft ware
and will be init ialized by hardware. I t is accessed via t he RDMSR inst ruct ion reading
t he MSR at address 0802H.
Each logical processor in t he syst em ( including clust ers wit h a communicat ion fabric)
must be configured wit h an unique x2API C I D t o avoid collisions of x2API C I Ds. On
DP and high- end MP processors t arget ed t o specific market segment s and depending
on t he syst em configurat ion, it is possible t hat logical processors in different and un-
connect ed clust ers power up init ialized wit h overlapping x2API C I Ds. I n t hese
configurat ions, a model- specific means may be provided in t hose product segment s
t o enable BI OS and/ or plat form firmware t o re- configure t he x2API C I Ds in some
clust ers t o provide for unique and non- overlapping syst em wide I Ds before config-
uring t he disconnect ed component s int o a single syst em.
10.12.2 x2APIC Register Availability
The local API C regist ers can be accessed via t he MSR int erface only when t he local
API C has been swit ched t o t he x2API C mode as described in Sect ion 10. 12. 1.
Accessing any API C regist er in t he MSR address range 0800H t hrough 0BFFH via
RDMSR or WRMSR when t he local API C is not in x2API C mode causes a general-
prot ect ion except ion. I n x2API C mode, t he memory mapped int erface is not available
and any access t o t he MMI O int erface will behave similar t o t hat of a legacy xAPI C in
globally disabled st at e. Table 10- 7 provides t he int eract ions bet ween t he legacy &
ext ended modes and t he legacy and regist er int erfaces.
4. MSR 831H is reserved; read/write operations cause general-protection exceptions. The contents
of the APIC register at MMIO offset 310H are accessible in x2APIC mode through the MSR at
address 830H.
5. SELF IPI register is supported only in x2APIC mode.
Vol. 3 10-57
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.12.3 MSR Access in x2APIC Mode
To allow for efficient access t o t he API C regist ers in x2API C mode, t he serializing
semant ics of WRMSR are relaxed when writ ing t o t he API C regist ers. Thus, syst em
soft ware should not use WRMSR t o API C regist ers in x2API C mode as a serializing
inst ruct ion. Read and writ e accesses t o t he API C regist ers will occur in program
order. A WRMSR t o an API C regist er may complet e before all preceding st ores are
globally visible; soft ware can prevent t his by insert ing a serializing inst ruct ion, an
SFENCE, or an MFENCE before t he WRMSR.
The RDMSR inst ruct ion is not serializing and t his behavior is unchanged when
reading API C regist ers in x2API C mode. Syst em soft ware accessing t he API C regis-
t ers using t he RDMSR inst ruct ion should not expect a serializing behavior. ( Not e: The
MMI O- based xAPI C int erface is mapped by syst em soft ware as an un- cached region.
Consequent ly, read/ writ es t o t he xAPI C- MMI O int erface have serializing semant ics in
t he xAPI C mode. )
10.12.4 VM-Exit Controls for MSRs and x2APIC Registers
The VMX archit ect ure allows a VMM t o specify list s of MSRs t o be loaded or st ored on
VMX t ransit ions using t he VMX- t ransit ion MSR areas ( see VM- exit MSR- st ore address
field, VM- exit MSR- load address filed, and VM- ent ry MSR- load address field in I nt el
64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3B) .
The X2API C MSRs cannot t o be loaded and st ored on VMX t ransit ions. A VMX t ransi-
t ion fails if t he VMM has specified t hat t he t ransit ion should access any MSRs in t he
address range from 0000_0800H t o 0000_08FFH ( t he range used for accessing t he
X2API C regist ers) . Specifically, processing of an 128- bit ent ry in any of t he VMX-
t ransit ion MSR areas fails if bit s 31: 0 of t hat ent ry ( represent ed as ENTRY_LOW_DW)
sat isfies t he expression: ENTRY_LOW_DW & FFFFF800H = 00000800H. Such a
failure causes an associat ed VM ent ry t o fail ( by reloading host st at e) and causes an
associat ed VM exit t o lead t o VMX abort .
Table 10-7. MSR/MMIO Interface of a Local x2APIC in Different Modes of Operation
MMIO Interface MSR Interface
xAPIC mode Available General-protection
exception
x2APIC mode Behavior identical to xAPIC in globally
disabled state
Available
10-58 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.12.5 x2APIC State Transitions
This sect ion provides a det ailed descript ion of t he x2API C st at es of a local x2API C
unit , t ransit ions bet ween t hese st at es as well as int eract ions of t hese st at es wit h I NI T
and RESET.
10.12.5.1 x2APIC States
The valid st at es for a local x2API C unit is list ed in Table 10- 5:
API C disabled: I A32_API C_BASE[ EN] = 0 and I A32_API C_BASE[ EXTD] = 0
xAPI C mode: I A32_API C_BASE[ EN] = 1 and I A32_API C_BASE[ EXTD] = 0
x2API C mode: I A32_API C_BASE[ EN] = 1 and I A32_API C_BASE[ EXTD] = 1
I nvalid: I A32_API C_BASE[ EN] = 0 and I A32_API C_BASE[ EXTD] = 1
The st at e corresponding t o EXTD= 1 and EN= 0 is not valid and it is not possible t o get
int o t his st at e. An execut ion of WRMSR t o t he I A32_API C_BASE_MSR t hat at t empt s
a t ransit ion from a valid st at e t o t his invalid st at e causes a general- prot ect ion excep-
t ion. Figure 10- 27 shows t he comprehensive st at e t ransit ion diagram for a local
x2API C unit .
On coming out of RESET, t he local API C unit is enabled and is in t he xAPI C mode:
I A32_API C_BASE[ EN] = 1 and I A32_API C_BASE[ EXTD] = 0. The API C regist ers are
init ialized as:
The local API C I D is init ialized by hardware wit h a 32 bit I D ( x2API C I D) . The
lowest 8 bit s of t he x2API C I D is t he legacy local xAPI C I D, and is st ored in t he
upper 8 bit s of t he API C regist er for access in xAPI C mode.
The following API C regist ers are reset t o all zeros for t hose fields t hat are defined
in t he xAPI C mode:
I RR, I SR, TMR, I CR, LDR, TPR, Divide Configurat ion Regist er ( See Chapt er 8
of I nt el 64 and I A- 32 Archit ect ures Soft ware Developer s Manual , Vol. 3B
for det ails of individual API C regist ers) ,
Timer init ial count and t imer current count regist ers,
The LVT regist ers are reset t o 0s except for t he mask bit s; t hese are set t o 1s.
The local API C version regist er is not affect ed.
The Spurious I nt errupt Vect or Regist er is init ialized t o 000000FFH.
The DFR ( available only in xAPI C mode) is reset t o all 1s.
SELF I PI regist er is reset t o zero.
Vol. 3 10-59
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
x2APIC After RESET
The valid t ransit ions from t he xAPI C mode st at e are:
t o t he x2API C mode by set t ing EXT t o 1 ( result ing EN= 1, EXTD= 1) . The physical
x2API C I D ( see Figure 10- 6) is preserved across t his t ransit ion and t he logical
x2API C I D ( see Figure 10- 30) is init ialized by hardware during t his t ransit ion as
document ed in Sect ion 10. 12.10. 2. The st at e of t he ext ended fields in ot her API C
regist ers, which was not init ialized at RESET, is not archit ect urally defined across
t his t ransit ion and syst em soft ware should explicit ly init ialize t hose program-
mable API C regist ers.
t o t he disabled st at e by set t ing EN t o 0 ( result ing EN= 0, EXTD= 0) .
The result of an I NI T in t he xAPI C st at e places t he API C in t he st at e wit h EN= 1,
EXTD= 0. The st at e of t he local API C I D regist er is preserved ( t he 8- bit xAPI C I D is in
t he upper 8 bit s of t he API C I D regist er) . All t he ot her API C regist ers are init ialized as
a result of I NI T.
A RESET in t his st at e places t he API C in t he st at e wit h EN= 1, EXTD= 0. The st at e of
t he local API C I D regist er is init ialized as described in Sect ion 10. 12. 5. 1. All t he ot her
API C regist ers are init ialized described in Sect ion 10. 12.5.1.
Figure 10-27. Local x2APIC State Transitions with IA32_APIC_BASE, INIT, and RESET
xAPIC Mode
EN =1
Illegal
Transition
Init
EN=1, Extd=1
Extended
Invalid
State
Mode
Reset
Extd = 1
Illegal
Transition
EN = 0
EN = 0
Illegal
Transition Extd = 0
Illegal
Transition
Extd = 0
EN=1, Extd=0
EN = 0
Extd = 1
Reset
Reset
Init
Init
Disabled
EN = 0
Extd = 0
Extd = 1
EN = 0
10-60 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
x2APIC Transitions From x2APIC Mode
From t he x2API C mode, t he only valid x2API C t ransit ion using I A32_API C_BASE is t o
t he st at e where t he x2API C is disabled by set t ing EN t o 0 and EXTD t o 0. The x2API C
I D ( 32 bit s) and t he legacy local xAPI C I D ( 8 bit s) are preserved across t his t ransi-
t ion. A t ransit ion from t he x2API C mode t o xAPI C mode is not valid, and t he corre-
sponding WRMSR t o t he I A32_API C_BASE MSR causes a general- prot ect ion
except ion.
A RESET in t his st at e places t he x2API C in xAPI C mode. All API C regist ers ( including
t he local API C I D regist er) are init ialized as described in Sect ion 10. 12. 5.1.
An I NI T in t his st at e keeps t he x2API C in t he x2API C mode. The st at e of t he local
API C I D regist er is preserved ( all 32 bit s) . However, all t he ot her API C regist ers are
init ialized as a result of t he I NI T t ransit ion.
x2APIC Transitions From Disabled Mode
From t he disabled st at e, t he only valid x2API C t ransit ion using I A32_API C_BASE is t o
t he xAPI C mode ( EN= 1, EXTD = 0) . Thus t he only means t o t ransit ion from x2API C
mode t o xAPI C mode is a t wo- st ep process:
first t ransit ion from x2API C mode t o local API C disabled mode ( EN= 0, EXTD =
0) ,
followed by anot her t ransit ion from disabled mode t o xAPI C mode ( EN= 1,
EXTD= 0) .
Consequent ly, all t he API C regist er st at es in t he x2API C, except for t he x2API C I D
( 32 bit s) , are not preserved across mode t ransit ions.
A RESET in t he disabled st at e places t he x2API C in t he xAPI C mode. All API C regist ers
( including t he local API C I D regist er) are init ialized as described in Sect ion 10. 12. 5. 1.
An I NI T in t he disabled st at e keeps t he x2API C in t he disabled st at e.
State Changes From xAPIC Mode to x2APIC Mode
Aft er API C regist er st at es have been init ialized by soft ware in xAPI C mode, a t ransi-
t ion from xAPI C mode t o x2API C mode does not affect most of t he API C regist er
st at es, except t he following:
The Logical Dest inat ion Regist er is not preserved.
Any API C I D value writ t en t o t he memory- mapped local API C I D regist er is not
preserved.
The high half of t he I nt errupt Command Regist er is not preserved.
10.12.6 System Software Transitions
This sect ion describes implicat ions for t he x2API C across syst em st at e t ransit ions -
specifically init ializat ion and boot ing.
Vol. 3 10-61
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Support for t he x2API C archit ect ure can be implement ed in t he local API C unit . All
exist ing PCI / MSI capable devices and I OxAPI C unit should work wit h t he x2API C
ext ensions defined in t his document . The x2API C archit ect ure also provides flexibilit y
t o cope wit h t he underlying fabrics t hat connect t he PCI devices, I OxAPI Cs and Local
API C unit s.
The ext ensions provided in t his specificat ion t ranslat e int o modificat ions t o:
t he local API C unit ,
t he underlying fabrics connect ing Message Signaled I nt errupt s ( MSI ) capable PCI
devices t o local xAPI Cs,
t he underlying fabrics connect ing t he I OxAPI Cs t o t he local API C unit s.
However no modificat ions are required t o PCI or PCI e devices t hat support direct
int errupt delivery t o t he processors via Message Signaled I nt errupt s. Similarly no
modificat ions are required t o t he I OxAPI C. The rout ing of int errupt s from t hese
devices in x2API C mode leverages t he int errupt remapping archit ect ure specified in
t he I nt el

Virt ualizat ion Technology for Direct ed I / O, Rev 1.2 specificat ion.
Modificat ions t o ACPI int erfaces t o support x2API C are described in Appendix A,
ACPI Ext ensions for x2API C Support , of t he I nt el

64 Archit ect ure x2API C Specifi-


cat ion.
The default will be for t he BI OS t o pass t he cont rol t o t he OS wit h t he local x2API Cs
in xAPI C mode if all x2API C I Ds report ed by CPUI D. 0BH: EDX are less t han 255, and
in x2API C mode if t here are any logical processor report ing it s x2API C I D at 255 or
great er.
10.12.7 CPUID Extensions And Topology Enumeration
For I nt el 64 and I A- 32 processors t hat support x2API C, a value of 1 report ed by
CPUI D. 01H: ECX[ 21] indicat es t hat t he processor support s x2API C and t he ext ended
t opology enumerat ion leaf ( CPUI D. 0BH) .
The ext ended t opology enumerat ion leaf can be accessed by execut ing CPUI D wit h
EAX = 0BH. Processors t hat do not support x2API C may support CPUI D leaf 0BH.
Soft ware can det ect t he availabilit y of t he ext ended t opology enumerat ion leaf ( 0BH)
by performing t wo st eps:
Check maximum input value for basic CPUI D informat ion by execut ing CPUI D
wit h EAX= 0. I f CPUI D.0H: EAX is great er t han or equal or 11 ( 0BH) , t hen proceed
t o next st ep
Check CPUI D. EAX= 0BH, ECX= 0H: EBX is non- zero.
I f bot h of t he above condit ions are t rue, ext ended t opology enumerat ion leaf is avail-
able. I f available, t he ext ended t opology enumerat ion leaf is t he preferred mecha-
nism for enumerat ing t opology. The presence of CPUI D leaf 0BH in a processor does
not guarant ee support for x2API C. I f CPUI D. EAX= 0BH, ECX= 0H: EBX ret urns zero
and maximum input value for basic CPUI D informat ion is great er t han 0BH, t hen
CPUI D. 0BH leaf is not support ed on t hat processor.
10-62 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The ext ended t opology enumerat ion leaf is int ended t o assist soft ware wit h enumer-
at ing processor t opology on syst ems t hat requires 32- bit x2API C I Ds t o address indi-
vidual logical processors. Det ails of CPUI D leaf 0BH can be found in t he reference
pages of CPUI D in Chapt er 3 of I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volume 2A.
Processor t opology enumerat ion algorit hm for processors support ing t he ext ended
t opology enumerat ion leaf of CPUI D and processors t hat do not support CPUI D leaf
0BH are t reat ed in Sect ion 8. 9. 4, Algorit hm for Three- Level Mappings of API C_I D .
10.12.7.1 Consistency of APIC IDs and CPUID
The consist ency of physical x2API C I D in MSR 802H in x2API C mode and t he 32- bit
value ret urned in CPUI D. 0BH: EDX is facilit at ed by processor hardware.
CPUI D.0BH: EDX will report t he full 32 bit I D, in xAPI C and x2API C mode. This allows
BI OS t o det ermine if a syst em has processors wit h I Ds exceeding t he 8- bit init ial
API C I D limit ( CPUI D. 01H: EBX[ 31: 24] ) . I nit ial API C I D ( CPUI D. 01H: EBX[ 31: 24] ) is
always equal t o CPUI D. 0BH: EDX[ 7: 0] .
I f t he values of CPUI D. 0BH: EDX report ed by all logical processors in a syst em are
less t han 255, BI OS can t ransfer cont rol t o OS in xAPI C mode.
I f t he values of CPUI D.0BH: EDX report ed by some logical processors in a syst em are
great er or equal t han 255, BI OS must support t wo opt ions t o hand off t o OS:
I f BI OS enables logical processors wit h x2API C I Ds great er t han 255, t hen it
should enable X2API C in Boot St rap Processor ( BSP) and all Applicat ion
Processors ( AP) before passing cont rol t o t he OS. Applicat ion requiring processor
t opology informat ion must use OS provided services based on x2API C I Ds or
CPUI D. 0BH leaf.
I f a BI OS t ransfers cont rol t o OS in xAPI C mode, t hen t he BI OS must ensure t hat
only logical processors wit h CPUI D.0BH. EDX value less t han 255 are enabled.
BI OS init ializat ion on all logical processors wit h CPUI D. 0B. EDX values great er
t han or equal t o 255 must ( a) disable API C and execut e CLI in each logical
processor, and ( b) leave t hese logical processor in t he lowest power st at e so t hat
t hese processors do not respond t o I NI T I PI during OS boot . The BSP and all t he
enabled logical processor operat e in xAPI C mode aft er BI OS passed cont rol t o
OS. Applicat ion requiring processor t opology informat ion can use OS provided
legacy services based on 8- bit init ial API C I Ds or legacy t opology informat ion
from CPUI D. 01H and CPUI D 04H leaves. Even if t he BI OS passes cont rol in xAPI C
mode, an OS can swit ch t he processors t o x2API C mode lat er. BI OS SMM handler
should always read t he API C_BASE_MSR, det ermine t he API C mode and use t he
corresponding access met hod.
10.12.8 Error Handling in x2APIC Mode
RDMSR and WRMSR operat ions t o reserved addresses in x2API C mode cause
general- prot ect ion except ions, as do reserved- bit violat ions ( see Sect ion 10. 12.1. 3) .
Vol. 3 10-63
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Beyond illegal regist er access and reserved bit violat ions, ot her API C errors are
logged in Error St at us Regist er. Writ es of a non- zero value t o t he Error St at us
Regist er in x2API C mode cause general- prot ect ion except ions. Figure 10- 28 illus-
t rat es t he Error St at us Regist er in x2API C mode.
Writ e t o t he I CR ( in xAPI C and x2API C modes) or t o SELF I PI regist er ( x2API C mode
only) wit h an illegal vect or ( vect or 0FH) will set t he Send I llegal Vect or bit .
On receiving an I PI wit h an illegal vect or ( vect or 0FH) , t he Receive I llegal Vect or
bit will be set . On receiving an int errupt wit h illegal vect or in t he range 0H 0FH, t he
int errupt will not be delivered t o t he processor nor will an I RR bit be set in t hat range.
Only t he ESR Receive I llegal Vect or bit will be set .
I f t he I CR is programmed wit h lowest priorit y delivery mode t hen t he Re- direct ible
I PI bit will be set in x2API C modes ( same as legacy xAPI C behavior) and t he int er-
rupt will not be processed.
Writ e t o t he I CR wit h bot h lowest priorit y delivery mode and illegal vect or, will set t he
re- direct ible I PI error bit . The int errupt will not be processed and hence t he Send
I llegal Vect or error bit will not be set .
10.12.9 ICR Operation in x2APIC Mode
I n x2API C mode, t he layout of t he I nt errupt Command Regist er is shown in Figure
10- 12. The lower 32 bit s of I CR in x2API C mode is ident ical t o t he lower half of t he
I CR in xAPI C mode, except t he Delivery St at us bit is removed since it is not needed
in x2API C mode. The dest inat ion I D field is expanded t o 32 bit s in x2API C mode.
To send an I PI using t he I CR, soft ware must set up t he I CR t o indicat e t he t ype of I PI
message t o be sent and t he dest inat ion processor or processors. Self I PI s can also
be sent using t he SELF I PI regist er ( see Sect ion 10. 12.11) .
A single MSR writ e t o t he I nt errupt Command Regist er is required for dispat ching an
int errupt in x2API C mode. Wit h t he removal of t he Delivery St at us bit , syst em soft -
ware no longer has a reason t o read t he I CR. I t remains readable only t o aid in
debugging; however, soft ware should not assume t he value ret urned by reading t he
I CR is t he last writ t en value.
Figure 10-28. Error Status Register (ESR) in x2APIC Mode
MSR Address: 828H
31 0
Reserved
7 8 1 2 3 4 5 6
Illegal Register Address
Received Illegal Vector
Send Illegal Vector
Redirectible IPI
Reserved
10-64 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
A dest inat ion I D value of FFFF_FFFFH is used for broadcast of int errupt s in bot h
logical dest inat ion and physical dest inat ion modes.
10.12.10 Determining IPI Destination in x2APIC Mode
10.12.10.1 Logical Destination Mode in x2APIC Mode
I n x2API C mode, t he Logical Dest inat ion Regist er ( LDR) is increased t o 32 bit s wide.
I t is a read- only regist er t o syst em soft ware. This 32- bit value is referred t o as
logical x2API C I D. Syst em soft ware accesses t his regist er via t he RDMSR inst ruc-
Figure 10-29. Interrupt Command Register (ICR) in x2APIC Mode
31 0
Reserved
7
Vector
Destination Shorthand
8 10
Delivery Mode
000: Fixed
001: Reserved
00: No Shorthand
01: Self
11 12 13 14 15 16 17 18 19
10: All Including Self
11: All Excluding Self
010: SMI
011: Reserved
100: NMI
101: INIT
110: Start Up
111: Reserved
Destination Mode
0: Physical
1: Logical
Level
0 = De-assert
1 = Assert
Trigger Mode
0: Edge
1: Level
63
32
Destination Field
Address: 830H (63 - 0)
Value after Reset: 0H
Reserved
20

Vol. 3 10-65
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
t ion reading t he MSR at address 80DH. Figure 10- 30 provides t he layout of t he
Logical Dest inat ion Regist er in x2API C mode.
I n t he xAPI C mode, t he Dest inat ion Format Regist er ( DFR) t hrough MMI O int erface
det ermines t he choice of a flat logical mode or a clust ered logical mode. Flat logical
mode is not support ed in t he x2API C mode. Hence t he Dest inat ion Format Regist er
( DFR) is eliminat ed in x2API C mode.
The 32- bit logical x2API C I D field of LDR is part it ioned int o t wo sub- fields:
Clust er I D ( LDR[ 31: 16] ) : is t he address of t he dest inat ion clust er
Logical I D ( LDR[ 15: 0] ) : defines a logical I D of t he individual local x2API C wit hin
t he clust er specified by LDR[ 31: 16] .
This layout enables 2^ 16- 1 clust ers each wit h up t o 16 unique logical I Ds - effec-
t ively providing an addressabilit y of ( ( 2^ 20) - 16) processors in logical dest inat ion
mode.
I t is likely t hat processor implement at ions may choose t o support less t han 16 bit s of
t he clust er I D or less t han 16- bit s of t he Logical I D in t he Logical Dest inat ion Regist er.
However syst em soft ware should be agnost ic t o t he number of bit s implement ed in
t he clust er I D and logical I D sub- fields. The x2API C hardware init ializat ion will ensure
t hat t he appropriat ely init ialized logical x2API C I Ds are available t o syst em soft ware
and reads of non- implement ed bit s ret urn zero. This is a read- only regist er t hat soft -
ware must read t o det ermine t he logical x2API C I D of t he processor. Specifically,
soft ware can apply a 16- bit mask t o t he lowest 16 bit s of t he logical x2API C I D t o
ident ify t he logical address of a processor wit hin a clust er wit hout needing t o know
t he number of implement ed bit s in clust er I D and Logical I D sub- fields. Similarly,
soft ware can creat e a message dest inat ion address for clust er model, by bit - Oring
t he Logical X2API C I D ( 31: 0) of processors t hat have mat ching Clust er I D( 31: 16) .
To enable clust er I D assignment in a fashion t hat mat ches t he syst em t opology char-
act erist ics and t o enable efficient rout ing of logical mode lowest priorit y device int er-
rupt s in link based plat form int erconnect s, t he LDR are init ialized by hardware based
on t he value of x2API C I D upon x2API C st at e t ransit ions. Det ails of t his init ializat ion
are provided in Sect ion 10. 12.10. 2.
Figure 10-30. Logical Destination Register in x2APIC Mode
MSR Address: 80DH
31 0
Logical x2APIC ID
10-66 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
10.12.10.2 Deriving Logical x2APIC ID from the Local x2APIC ID
I n x2API C mode, t he 32- bit logical x2API C I D, which can be read from LDR, is derived
from t he 32- bit local x2API C I D. Specifically, t he 16- bit logical I D sub- field is derived
by shift ing 1 by t he lowest 4 bit s of t he x2API C I D, i. e. Logical I D = 1
x2API C I D[ 3: 0] . The remaining bit s of t he x2API C I D t hen form t he clust er I D port ion
of t he logical x2API C I D:
Logical x2APIC ID = [(x2APIC ID[19:4] 16) | (1 x2APIC ID[3:0])]
The use of t he lowest 4 bit s in t he x2API C I D implies t hat at least 16 API C I Ds are
reserved for logical processors wit hin a socket in mult i- socket configurat ions. I f more
t han 16 API C I DS are reserved for logical processors in a socket / package t hen
mult iple clust er I Ds can exist wit hin t he package.
The LDR init ializat ion occurs whenever t he x2API C mode is enabled ( see Sect ion
10.12.5) .
10.12.11 SELF IPI Register
SELF I PI s are used ext ensively by some syst em soft ware. The x2API C archit ect ure
int roduces a new regist er int erface. This new regist er is dedicat ed t o t he purpose of
sending self- I PI s wit h t he int ent of enabling a highly opt imized pat h for sending self-
I PI s.
Figure 10- 31 provides t he layout of t he SELF I PI regist er. Syst em soft ware only spec-
ifies t he vect or associat ed wit h t he int errupt t o be sent . The semant ics of sending a
self- I PI via t he SELF I PI regist er are ident ical t o sending a self t arget ed edge t rig-
gered fixed int errupt wit h t he specified vect or. Specifically t he semant ics are ident ical
t o t he following set t ings for an int er- processor int errupt sent via t he I CR - Dest ina-
t ion Short hand ( I CR[ 19: 18] = 01 ( Self ) ) , Trigger Mode ( I CR[ 15] = 0 ( Edge) ) ,
Delivery Mode ( I CR[ 10: 8] = 000 ( Fixed) ) , Vect or ( I CR[ 7: 0] = Vect or) .
The SELF I PI regist er is a writ e- only regist er. A RDMSR inst ruct ion wit h address of t he
SELF I PI regist er causes a general- prot ect ion except ion.
Figure 10-31. SELF IPI register
MSR Address: 083FH
31 8 7 0
Reserved
Vector
Vol. 3 10-67
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The handling and priorit izat ion of a self- I PI sent via t he SELF I PI regist er is archit ec-
t urally ident ical t o t hat for an I PI sent via t he I CR from a legacy xAPI C unit . Specifi-
cally t he st at e of t he int errupt would be t racked via t he I nt errupt Request Regist er
( I RR) and I n Service Regist er ( I SR) and Trigger Mode Regist er ( TMR) as if it were
received from t he syst em bus. Also sending t he I PI via t he Self I nt errupt Regist er
ensures t hat int errupt is delivered t o t he processor core. Specifically complet ion of
t he WRMSR inst ruct ion t o t he SELF I PI regist er implies t hat t he int errupt has been
logged int o t he I RR. As expect ed for edge t riggered int errupt s, depending on t he
processor priorit y and readiness t o accept int errupt s, it is possible t hat int errupt s
sent via t he SELF I PI regist er or via t he I CR wit h ident ical vect ors can be combined.
10-68 Vol. 3
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
Vol. 3 11-1
CHAPTER 11
MEMORY CACHE CONTROL
This chapt er describes t he memory cache and cache cont rol mechanisms, t he TLBs,
and t he st ore buffer in I nt el 64 and I A- 32 processors. I t also describes t he memory
t ype range regist ers ( MTRRs) int roduced in t he P6 family processors and how t hey
are used t o cont rol caching of physical memory locat ions.
11.1 INTERNAL CACHES, TLBS, AND BUFFERS
The I nt el 64 and I A- 32 archit ect ures support cache, t ranslat ion look aside buffers
( TLBs) , and a st ore buffer for t emporary on- chip ( and ext ernal) st orage of inst ruc-
t ions and dat a. ( Figure 11- 1 shows t he arrangement of caches, TLBs, and t he st ore
buffer for t he Pent ium 4 and I nt el Xeon processors.) Table 11- 1 shows t he charact er-
ist ics of t hese caches and buffers for t he Pent ium 4, I nt el Xeon, P6 family, and
Pent ium processors. The si zes and char act er i st i cs of t hese uni t s ar e machi ne
speci f i c and may change i n f ut ur e ver si ons of t he pr ocessor . The CPUI D
inst ruct ion ret urns t he sizes and charact erist ics of t he caches and buffers for t he
processor on which t he inst ruct ion is execut ed. See CPUI DCPU I dent ificat ion in
Chapt er 3, I nst ruct ion Set Reference, A- M, of t he I nt el 64 and I A- 32 Archit ec-
t ures Soft ware Developers Manual, Volume 2A.
Figure 11-1. Cache Structure of the Pentium 4 and Intel Xeon Processors
Trace Cache Instruction Decoder
Bus Interface Unit
System Bus
Data Cache
Unit (L1)
(External)
Physical
Memory
Store Buffer
Data TLBs
L2 Cache
Instruction
TLBs
L3 Cache

Intel Xeon processors only


11-2 Vol. 3
MEMORY CACHE CONTROL
Figure 11- 2 shows t he cache arrangement of I nt el Core i7 processor.
Figure 11-2. Cache Structure of the Intel Core i7 Processors
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors
Cache or Buffer Characteristics
Trace Cache
1
Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): 12 Kops, 8-way set associative.
Intel Core i7, Intel Core 2 Duo, Intel

Atom, Intel Core Duo, Intel Core


Solo, Pentium M processor: not implemented.
P6 family and Pentium processors: not implemented.
L1 Instruction Cache Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): not implemented.
Intel Core i7 processor: 32-KByte, 4-way set associative.
Intel Core 2 Duo, Intel Atom, Intel Core Duo, Intel Core Solo, Pentium M
processor: 32-KByte, 8-way set associative.
P6 family and Pentium processors: 8- or 16-KByte, 4-way set associative,
32-byte cache line size; 2-way set associative for earlier Pentium
processors.
Instruction Decoder and front end
Out-of-Order Engine
Chipset
Data Cache
Unit (L1)
Instruction
Cache
STLB
Data TLB
L2 Cache
ITLB
L3 Cache
IMC
QPI
Vol. 3 11-3
MEMORY CACHE CONTROL
L1 Data Cache Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): 8-KByte, 4-way set associative, 64-byte cache line
size.
Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): 16-KByte, 8-way set associative, 64-byte cache line
size.
Intel Atom processors: 24-KByte, 6-way set associative, 64-byte cache
line size.
Intel Core i7, Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, Pentium M
and Intel Xeon processors: 32-KByte, 8-way set associative, 64-byte
cache line size.
P6 family processors: 16-KByte, 4-way set associative, 32-byte cache
line size; 8-KBytes, 2-way set associative for earlier P6 family
processors.
Pentium processors: 16-KByte, 4-way set associative, 32-byte cache line
size; 8-KByte, 2-way set associative for earlier Pentium processors.
L2 Unified Cache Intel Core 2 Duo and Intel Xeon processors: up to 4-MByte (or 4MBx2 in
quadcore processors), 16-way set associative, 64-byte cache line size.
Intel Core 2 Duo and Intel Xeon processors: up to 6-MByte (or 6MBx2 in
quadcore processors), 24-way set associative, 64-byte cache line size.
Intel Core i7, i5, i3 processors: 256KBbyte, 8-way set associative,
64-byte cache line size.
Intel Atom processors: 512-KByte, 8-way set associative, 64-byte cache
line size.
Intel Core Duo, Intel Core Solo processors: 2-MByte, 8-way set
associative, 64-byte cache line size
Pentium 4 and Intel Xeon processors: 256, 512, 1024, or 2048-KByte, 8-
way set associative, 64-byte cache line size, 128-byte sector size.
Pentium M processor: 1 or 2-MByte, 8-way set associative, 64-byte
cache line size.
P6 family processors: 128-KByte, 256-KByte, 512-KByte, 1-MByte, or 2-
MByte, 4-way set associative, 32-byte cache line size.
Pentium processor (external optional): System specific, typically 256- or
512-KByte, 4-way set associative, 32-byte cache line size.
L3 Unified Cache Intel Xeon processors: 512-KByte, 1-MByte, 2-MByte, or 4-MByte, 8-way
set associative, 64-byte cache line size, 128-byte sector size.
Intel Core i7 processor, Intel Xeon processor 5500: Up to 8MByte, 16-
way set associative, 64-byte cache line size.
Intel Xeon processor 5600: Up to 12MByte, 64-byte cache line size.
Intel Xeon processor 7500: Up to 24MByte, 64-byte cache line size.
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.)
Cache or Buffer Characteristics
11-4 Vol. 3
MEMORY CACHE CONTROL
Instruction TLB
(4-KByte Pages)
Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): 128 entries, 4-way set associative.
Intel Atom processors: 32-entries, fully associative.
Intel Core i7, i5, i3 processors: 64-entries per thread (128-entries per
core), 4-way set associative.
Intel Core 2 Duo, Intel Core Duo, Intel Core Solo processors, Pentium M
processor: 128 entries, 4-way set associative.
P6 family processors: 32 entries, 4-way set associative.
Pentium processor: 32 entries, 4-way set associative; fully set
associative for Pentium processors with MMX technology.
Data TLB (4-KByte
Pages)
Intel Core i7, i5, i3 processors, DTLB0: 64-entries, 4-way set associative.
Intel Core 2 Duo processors: DTLB0, 16 entries, DTLB1, 256 entries, 4
ways.
Intel Atom processors: 16-entry-per-thread micro-TLB, fully associative;
64-entry DTLB, 4-way set associative; 16-entry PDE cache, fully
associative.
Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
microarchitecture): 64 entry, fully set associative, shared with large page
DTLB.
Intel Core Duo, Intel Core Solo processors, Pentium M processor: 128
entries, 4-way set associative.
Pentium and P6 family processors: 64 entries, 4-way set associative;
fully set, associative for Pentium processors with MMX technology.
Instruction TLB
(Large Pages)
Intel Core i7, i5, i3 processors: 7-entries per thread, fully associative.
Intel Core 2 Duo processors: 4 entries, 4 ways.
Pentium 4 and Intel Xeon processors: large pages are fragmented.
Intel Core Duo, Intel Core Solo, Pentium M processor: 2 entries, fully
associative.
P6 family processors: 2 entries, fully associative.
Pentium processor: Uses same TLB as used for 4-KByte pages.
Data TLB (Large
Pages)
Intel Core i7, i5, i3 processors, DTLB0: 32-entries, 4-way set associative.
Intel Core 2 Duo processors: DTLB0, 16 entries, DTLB1, 32 entries, 4
ways.
Intel Atom processors: 8 entries, 4-way set associative.
Pentium 4 and Intel Xeon processors: 64 entries, fully set associative;
shared with small page data TLBs.
Intel Core Duo, Intel Core Solo, Pentium M processor: 8 entries, fully
associative.
P6 family processors: 8 entries, 4-way set associative.
Pentium processor: 8 entries, 4-way set associative; uses same TLB as
used for 4-KByte pages in Pentium processors with MMX technology.
Second-level Unified
TLB (4-KByte
Pages)
Intel Core i7, i5, i3 processor, STLB: 512-entries, 4-way set associative.
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.)
Cache or Buffer Characteristics
Vol. 3 11-5
MEMORY CACHE CONTROL
I nt el 64 and I A- 32 processors may implement four t ypes of caches: t he t race cache,
t he level 1 ( L1) cache, t he level 2 ( L2) cache, and t he level 3 ( L3) cache. See
Figure 11- 1. Cache availabilit y is described below:
I nt el Cor e i 7, i 5, i 3 pr ocessor Fami l y and I nt el Xeon pr ocessor Fami l y
based on I nt el

mi cr oar chi t ect ur e codename Nehal em and I nt el


mi cr oar chi t ect ur e codename West mer e The L1 cache is divided int o t wo
sect ions: one sect ion is dedicat ed t o caching inst ruct ions ( pre- decoded inst ruc-
t ions) and t he ot her caches dat a. The L2 cache is a unified dat a and inst ruct ion
cache. Each processor core has it s own L1 and L2. The L3 cache is an inclusive,
unified dat a and inst ruct ion cache, shared by all processor cores inside a physical
package. No t race cache is implement ed.
I nt el

Cor e 2 pr ocessor f ami l y and I nt el

Xeon

pr ocessor f ami l y
based on I nt el

Cor e mi cr oar chi t ect ur e The L1 cache is divided int o t wo


sect ions: one sect ion is dedicat ed t o caching inst ruct ions ( pre- decoded inst ruc-
t ions) and t he ot her caches dat a. The L2 cache is a unified dat a and inst ruct ion
cache locat ed on t he processor chip; it is shared bet ween t wo processor cores in
a dual- core processor implement at ion. Quad- core processors have t wo L2, each
shared by t wo processor cores. No t race cache is implement ed.
I nt el

At om pr ocessor The L1 cache is divided int o t wo sect ions: one


sect ion is dedicat ed t o caching inst ruct ions ( pre- decoded inst ruct ions) and t he
ot her caches dat a. The L2 cache is a unified dat a and inst ruct ion cache is locat ed
on t he processor chip. No t race cache is implement ed.
I nt el

Cor e Sol o and I nt el

Cor e Duo pr ocessor s The L1 cache is


divided int o t wo sect ions: one sect ion is dedicat ed t o caching inst ruct ions ( pre-
decoded inst ruct ions) and t he ot her caches dat a. The L2 cache is a unified dat a
and inst ruct ion cache locat ed on t he processor chip. I t is shared bet ween t wo
Store Buffer Intel Core i7, i5, i3 processors: 32entries.
Intel Core 2 Duo processors: 20 entries.
Intel Atom processors: 8 entries, used for both WC and store buffers.
Pentium 4 and Intel Xeon processors: 24 entries.
Pentium M processor: 16 entries.
P6 family processors: 12 entries.
Pentium processor: 2 buffers, 1 entry each (Pentium processors with
MMX technology have 4 buffers for 4 entries).
Write Combining
(WC) Buffer
Intel Core 2 Duo processors: 8 entries.
Intel Atom processors: 8 entries, used for both WC and store buffers.
Pentium 4 and Intel Xeon processors: 6 or 8 entries.
Intel Core Duo, Intel Core Solo, Pentium M processors: 6 entries.
P6 family processors: 4 entries.
NOTES:
1 Introduced to the IA-32 architecture in the Pentium 4 and Intel Xeon processors.
Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.)
Cache or Buffer Characteristics
11-6 Vol. 3
MEMORY CACHE CONTROL
processor cores in a dual- core processor implement at ion. No t race cache is
implement ed.
Pent i um

4 and I nt el

Xeon

pr ocessor s Based on I nt el Net Bur st


mi cr oar chi t ect ur e The t race cache caches decoded inst ruct ions ( ops) from
t he inst ruct ion decoder and t he L1 cache cont ains dat a. The L2 and L3 caches are
unified dat a and inst ruct ion caches locat ed on t he processor chip. Dualcore
processors have t wo L2, one in each processor core. Not e t hat t he L3 cache is
only implement ed on some I nt el Xeon processors.
P6 f ami l y pr ocessor s The L1 cache is divided int o t wo sect ions: one
dedicat ed t o caching inst ruct ions ( pre- decoded inst ruct ions) and t he ot her t o
caching dat a. The L2 cache is a unified dat a and inst ruct ion cache locat ed on t he
processor chip. P6 family processors do not implement a t race cache.
Pent i um

pr ocessor s The L1 cache has t he same st ruct ure as on P6 family


processors. There is no t race cache. The L2 cache is a unified dat a and inst ruct ion
cache ext ernal t o t he processor chip on earlier Pent ium processors and
implement ed on t he processor chip in lat er Pent ium processors. For Pent ium
processors where t he L2 cache is ext ernal t o t he processor, access t o t he cache is
t hrough t he syst em bus.
For I nt el Core i7 processors and processors based on I nt el Core, I nt el At om, and I nt el
Net Burst microarchit ect ures, I nt el Core Duo, I nt el Core Solo and Pent ium M proces-
sors, t he cache lines for t he L1 and L2 caches ( and L3 caches if support ed) are 64
byt es wide. The processor always reads a cache line from syst em memory beginning
on a 64- byt e boundary. ( A 64- byt e aligned cache line begins at an address wit h it s 6
least - significant bit s clear. ) A cache line can be filled from memory wit h a 8- t ransfer
burst t ransact ion. The caches do not support part ially- filled cache lines, so caching
even a single doubleword requires caching an ent ire line.
The L1 and L2 cache lines in t he P6 family and Pent ium processors are 32 byt es wide,
wit h cache line reads from syst em memory beginning on a 32- byt e boundary ( 5
least - significant bit s of a memory address clear. ) A cache line can be filled from
memory wit h a 4- t ransfer burst t ransact ion. Part ially- filled cache lines are not
support ed.
The t race cache in processors based on I nt el Net Burst microarchit ect ure is available
in all execut ion modes: prot ect ed mode, syst em management mode ( SMM) , and
real- address mode. The L1, L2, and L3 caches are also available in all execut ion
modes; however, use of t hem must be handled carefully in SMM ( see Sect ion 26. 4. 2,
SMRAM Caching ) .
The TLBs st ore t he most recent ly used page- direct ory and page- t able ent ries. They
speed up memory accesses when paging is enabled by reducing t he number of
memory accesses t hat are required t o read t he page t ables st ored in syst em
memory. The TLBs are divided int o four groups: inst ruct ion TLBs for 4- KByt e pages,
dat a TLBs for 4- KByt e pages; inst ruct ion TLBs for large pages ( 2- MByt e, 4- MByt e or
1- GByt e pages) , and dat a TLBs for large pages. The TLBs are normally act ive only in
prot ect ed mode wit h paging enabled. When paging is disabled or t he processor is in
Vol. 3 11-7
MEMORY CACHE CONTROL
real- address mode, t he TLBs maint ain t heir cont ent s unt il explicit ly or implicit ly
flushed ( see Sect ion 11. 9, I nvalidat ing t he Translat ion Lookaside Buffers ( TLBs) ) .
Processors based on I nt el Core microarchit ect ures implement one level of inst ruct ion
TLB and t wo levels of dat a TLB. I nt el Core i7 processor provides a second- level
unified TLB.
The st ore buffer is associat ed wit h t he processors inst ruct ion execut ion unit s. I t
allows writ es t o syst em memory and/ or t he int ernal caches t o be saved and in some
cases combined t o opt imize t he processor s bus accesses. The st ore buffer is always
enabled in all execut ion modes.
The processor s caches are for t he most part t ransparent t o soft ware. When enabled,
inst ruct ions and dat a flow t hrough t hese caches wit hout t he need for explicit soft -
ware cont rol. However, knowledge of t he behavior of t hese caches may be useful in
opt imizing soft ware performance. For example, knowledge of cache dimensions and
replacement algorit hms gives an indicat ion of how large of a dat a st ruct ure can be
operat ed on at once wit hout causing cache t hrashing.
I n mult iprocessor syst ems, maint enance of cache consist ency may, in rare circum-
st ances, require int ervent ion by syst em soft ware. For t hese rare cases, t he processor
provides privileged cache cont rol inst ruct ions for use in flushing caches and forcing
memory ordering.
The Pent ium III, Pent ium 4, and I nt el Xeon processors int roduced several inst ruct ions
t hat soft ware can use t o improve t he performance of t he L1, L2, and L3 caches,
including t he PREFETCHh and CLFLUSH inst ruct ions and t he non- t emporal move
inst ruct ions ( MOVNTI , MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) . The use of
t hese inst ruct ions are discussed in Sect ion 11. 5. 5, Cache Management I nst ruc-
t ions.
11.2 CACHING TERMINOLOGY
I A- 32 processors ( beginning wit h t he Pent ium processor) and I nt el 64 processors use
t he MESI ( modified, exclusive, shared, invalid) cache prot ocol t o maint ain consis-
t ency wit h int ernal caches and caches in ot her processors ( see Sect ion 11. 4, Cache
Cont rol Prot ocol ) .
When t he processor recognizes t hat an operand being read from memory is cache-
able, t he processor reads an ent ire cache line int o t he appropriat e cache ( L1, L2, L3,
or all) . This operat ion is called a cache l i ne f i l l . I f t he memory locat ion cont aining
t hat operand is st ill cached t he next t ime t he processor at t empt s t o access t he
operand, t he processor can read t he operand from t he cache inst ead of going back t o
memory. This operat ion is called a cache hi t .
When t he processor at t empt s t o writ e an operand t o a cacheable area of memory, it
first checks if a cache line for t hat memory locat ion exist s in t he cache. I f a valid
cache line does exist , t he processor ( depending on t he writ e policy current ly in force)
can writ e t he operand int o t he cache inst ead of writ ing it out t o syst em memory. This
operat ion is called a w r i t e hi t . I f a writ e misses t he cache ( t hat is, a valid cache line
11-8 Vol. 3
MEMORY CACHE CONTROL
is not present for area of memory being writ t en t o) , t he processor performs a cache
line fill, writ e allocat ion. Then it writ es t he operand int o t he cache line and
( depending on t he writ e policy current ly in force) can also writ e it out t o memory. I f
t he operand is t o be writ t en out t o memory, it is writ t en first int o t he st ore buffer, and
t hen writ t en from t he st ore buffer t o memory when t he syst em bus is available.
( Not e t hat for t he Pent ium processor, writ e misses do not result in a cache line fill;
t hey always result in a writ e t o memory. For t his processor, only read misses result in
cache line fills. )
When operat ing in an MP syst em, I A- 32 processors ( beginning wit h t he I nt el486
processor) and I nt el 64 processors have t he abilit y t o snoop ot her processor s
accesses t o syst em memory and t o t heir int ernal caches. They use t his snooping
abilit y t o keep t heir int ernal caches consist ent bot h wit h syst em memory and wit h
t he caches in ot her processors on t he bus. For example, in t he Pent ium and P6 family
processors, if t hrough snooping one processor det ect s t hat anot her processor
int ends t o writ e t o a memory locat ion t hat it current ly has cached in shar ed st at e,
t he snooping processor will invalidat e it s cache line forcing it t o perform a cache line
fill t he next t ime it accesses t he same memory locat ion.
Beginning wit h t he P6 family processors, if a processor det ect s ( t hrough snooping)
t hat anot her processor is t rying t o access a memory locat ion t hat it has modified in
it s cache, but has not yet writ t en back t o syst em memory, t he snooping processor
will signal t he ot her processor ( by means of t he HI TM# signal) t hat t he cache line is
held in modified st at e and will preform an implicit writ e- back of t he modified dat a.
The implicit writ e- back is t ransferred direct ly t o t he init ial request ing processor and
snooped by t he memory cont roller t o assure t hat syst em memory has been updat ed.
Here, t he processor wit h t he valid dat a may pass t he dat a t o t he ot her processors
wit hout act ually writ ing it t o syst em memory; however, it is t he responsibilit y of t he
memory cont roller t o snoop t his operat ion and updat e memory.
11.3 METHODS OF CACHING AVAILABLE
The processor allows any area of syst em memory t o be cached in t he L1, L2, and L3
caches. I n individual pages or regions of syst em memory, it allows t he t ype of
caching ( also called memor y t y pe) t o be specified ( see Sect ion 11. 5) . Memory t ypes
current ly defined for t he I nt el 64 and I A- 32 archit ect ures are ( see Table 11- 2) :
St r ong Uncacheabl e ( UC) Syst em memory locat ions are not cached. All
reads and writ es appear on t he syst em bus and are execut ed in program order
wit hout reordering. No speculat ive memory accesses, page- t able walks, or
prefet ches of speculat ed branch t arget s are made. This t ype of cache- cont rol is
useful for memory- mapped I / O devices. When used wit h normal RAM, it great ly
reduces processor performance.
NOTE
The behavior of FP and SSE/ SSE2 operat ions on operands in UC
memory is implement at ion dependent . I n some implement at ions,
Vol. 3 11-9
MEMORY CACHE CONTROL
accesses t o UC memory may occur more t han once. To ensure
predict able behavior, use loads and st ores of general purpose
regist ers t o access UC memory t hat may have read or writ e side
effect s.
Uncacheabl e ( UC- ) Has same charact erist ics as t he st rong uncacheable ( UC)
memory t ype, except t hat t his memory t ype can be overridden by programming
t he MTRRs for t he WC memory t ype. This memory t ype is available in processor
families st art ing from t he Pent ium III processors and can only be select ed t hrough
t he PAT.
Wr i t e Combi ni ng ( WC) Syst em memory locat ions are not cached ( as wit h
uncacheable memory) and coherency is not enforced by t he processor s bus
coherency prot ocol. Speculat ive reads are allowed. Writ es may be delayed and
combined in t he writ e combining buffer ( WC buffer) t o reduce memory accesses.
I f t he WC buffer is part ially filled, t he writ es may be delayed unt il t he next
occurrence of a serializing event ; such as, an SFENCE or MFENCE inst ruct ion,
CPUI D execut ion, a read or writ e t o uncached memory, an int errupt occurrence,
or a LOCK inst ruct ion execut ion. This t ype of cache- cont rol is appropriat e for
video frame buffers, where t he order of writ es is unimport ant as long as t he
writ es updat e memory so t hey can be seen on t he graphics display. See Sect ion
11. 3. 1, Buffering of Writ e Combining Memory Locat ions, for more informat ion
about caching t he WC memory t ype. This memory t ype is available in t he
Pent ium Pro and Pent ium I I processors by programming t he MTRRs; or in
processor families st art ing from t he Pent ium III processors by programming t he
MTRRs or by select ing it t hrough t he PAT.
Table 11-2. Memory Types and Their Properties
Memory Type and
Mnemonic
Cacheable Writeback
Cacheable
Allows
Speculative
Reads
Memory Ordering Model
Strong Uncacheable
(UC)
No No No Strong Ordering
Uncacheable (UC-) No No No Strong Ordering. Can only be
selected through the PAT. Can
be overridden by WC in MTRRs.
Write Combining (WC) No No Yes Weak Ordering. Available by
programming MTRRs or by
selecting it through the PAT.
Write Through (WT) Yes No Yes Speculative Processor Ordering.
Write Back (WB) Yes Yes Yes Speculative Processor Ordering.
Write Protected (WP) Yes for
reads; no for
writes
No Yes Speculative Processor Ordering.
Available by programming
MTRRs.
11-10 Vol. 3
MEMORY CACHE CONTROL
Wr i t e- t hr ough ( WT) Writ es and reads t o and from syst em memory are
cached. Reads come from cache lines on cache hit s; read misses cause cache
fills. Speculat ive reads are allowed. All writ es are writ t en t o a cache line ( when
possible) and t hrough t o syst em memory. When writ ing t hrough t o memory,
invalid cache lines are never filled, and valid cache lines are eit her filled or inval-
idat ed. Writ e combining is allowed. This t ype of cache- cont rol is appropriat e for
frame buffers or when t here are devices on t he syst em bus t hat access syst em
memory, but do not perform snooping of memory accesses. I t enforces
coherency bet ween caches in t he processors and syst em memory.
Wr i t e- back ( WB) Writ es and reads t o and from syst em memory are cached.
Reads come from cache lines on cache hit s; read misses cause cache fills.
Speculat ive reads are allowed. Writ e misses cause cache line fills ( in processor
families st art ing wit h t he P6 family processors) , and writ es are performed
ent irely in t he cache, when possible. Writ e combining is allowed. The writ e- back
memory t ype reduces bus t raffic by eliminat ing many unnecessary writ es t o
syst em memory. Writ es t o a cache line are not immediat ely forwarded t o syst em
memory; inst ead, t hey are accumulat ed in t he cache. The modified cache lines
are writ t en t o syst em memory lat er, when a writ e- back operat ion is performed.
Writ e- back operat ions are t riggered when cache lines need t o be deallocat ed,
such as when new cache lines are being allocat ed in a cache t hat is already full.
They also are t riggered by t he mechanisms used t o maint ain cache consist ency.
This t ype of cache- cont rol provides t he best performance, but it requires t hat all
devices t hat access syst em memory on t he syst em bus be able t o snoop memory
accesses t o insure syst em memory and cache coherency.
Wr i t e pr ot ect ed ( WP) Reads come from cache lines when possible, and read
misses cause cache fills. Writ es are propagat ed t o t he syst em bus and cause
corresponding cache lines on all processors on t he bus t o be invalidat ed.
Speculat ive reads are allowed. This memory t ype is available in processor
families st art ing from t he P6 family processors by programming t he MTRRs ( see
Table 11- 6) .
Table 11- 3 shows which of t hese caching met hods are available in t he Pent ium, P6
Family, Pent ium 4, and I nt el Xeon processors.
Table 11-3. Methods of Caching Available in Intel Core 2 Duo, Intel Atom, Intel Core
Duo, Pentium M, Pentium 4, Intel Xeon, P6 Family, and Pentium Processors
Memory Type Intel Core 2 Duo, Intel Atom, Intel
Core Duo, Pentium M, Pentium 4
and Intel Xeon Processors
P6 Family
Processors
Pentium
Processor
Strong Uncacheable (UC) Yes Yes Yes
Uncacheable (UC-) Yes Yes* No
Write Combining (WC) Yes Yes No
Write Through (WT) Yes Yes Yes
Write Back (WB) Yes Yes Yes
Vol. 3 11-11
MEMORY CACHE CONTROL
11.3.1 Buffering of Write Combining Memory Locations
Writ es t o t he WC memory t ype are not cached in t he t ypical sense of t he word
cached. They are ret ained in an int ernal writ e combining buffer ( WC buffer) t hat is
separat e from t he int ernal L1, L2, and L3 caches and t he st ore buffer. The WC buffer
is not snooped and t hus does not provide dat a coherency. Buffering of writ es t o WC
memory is done t o allow soft ware a small window of t ime t o supply more modified
dat a t o t he WC buffer while remaining as non- int rusive t o soft ware as possible. The
buffering of writ es t o WC memory also causes dat a t o be collapsed; t hat is, mult iple
writ es t o t he same memory locat ion will leave t he last dat a writ t en in t he locat ion and
t he ot her writ es will be lost .
The size and st ruct ure of t he WC buffer is not archit ect urally defined. For t he I nt el
Core 2 Duo, I nt el At om, I nt el Core Duo, Pent ium M, Pent ium 4 and I nt el Xeon proces-
sors; t he WC buffer is made up of several 64- byt e WC buffers. For t he P6 family
processors, t he WC buffer is made up of several 32- byt e WC buffers.
When soft ware begins writ ing t o WC memory, t he processor begins filling t he WC
buffers one at a t ime. When one or more WC buffers has been filled, t he processor
has t he opt ion of evict ing t he buffers t o syst em memory. The prot ocol for evict ing t he
WC buffers is implement at ion dependent and should not be relied on by soft ware for
syst em memory coherency. When using t he WC memory t ype, soft ware must be
sensit ive t o t he fact t hat t he writ ing of dat a t o syst em memory is being delayed and
must deliberat ely empt y t he WC buffers when syst em memory coherency is
required.
Once t he processor has st art ed t o evict dat a from t he WC buffer int o syst em
memory, it will make a bus- t ransact ion st yle decision based on how much of t he
buffer cont ains valid dat a. I f t he buffer is full ( for example, all byt es are valid) , t he
processor will execut e a burst - writ e t ransact ion on t he bus. This result s in all 32
byt es ( P6 family processors) or 64 byt es ( Pent ium 4 and more recent processor)
being t ransmit t ed on t he dat a bus in a single burst t ransact ion. I f one or more of t he
WC buffer s byt es are invalid ( for example, have not been writ t en by soft ware) , t he
processor will t ransmit t he dat a t o memory using part ial writ e t ransact ions ( one
chunk at a t ime, where a chunk is 8 byt es) .
Write Protected (WP) Yes Yes No
NOTE:
* Introduced in the Pentium III processor; not available in the Pentium Pro or Pentium II processors
Table 11-3. Methods of Caching Available in Intel Core 2 Duo, Intel Atom, Intel Core
Duo, Pentium M, Pentium 4, Intel Xeon, P6 Family, and Pentium Processors (Contd.)
Memory Type Intel Core 2 Duo, Intel Atom, Intel
Core Duo, Pentium M, Pentium 4
and Intel Xeon Processors
P6 Family
Processors
Pentium
Processor
11-12 Vol. 3
MEMORY CACHE CONTROL
This will result in a maximum of 4 part ial writ e t ransact ions ( for P6 family processors)
or 8 part ial writ e t ransact ions ( for t he Pent ium 4 and more recent processors) for one
WC buffer of dat a sent t o memory.
The WC memory t ype is weakly ordered by definit ion. Once t he evict ion of a WC
buffer has st art ed, t he dat a is subj ect t o t he weak ordering semant ics of it s defini-
t ion. Ordering is not maint ained bet ween t he successive allocat ion/ deallocat ion of
WC buffers ( for example, writ es t o WC buffer 1 followed by writ es t o WC buffer 2 may
appear as buffer 2 followed by buffer 1 on t he syst em bus) . When a WC buffer is
evict ed t o memory as part ial writ es t here is no guarant eed ordering bet ween succes-
sive part ial writ es ( for example, a part ial writ e for chunk 2 may appear on t he bus
before t he part ial writ e for chunk 1 or vice versa) .
The only element s of WC propagat ion t o t he syst em bus t hat are guarant eed are
t hose provided by t ransact ion at omicit y. For example, wit h a P6 family processor, a
complet ely full WC buffer will always be propagat ed as a single 32- bit burst t ransac-
t ion using any chunk order. I n a WC buffer evict ion where dat a will be evict ed as
part ials, all dat a cont ained in t he same chunk ( 0 mod 8 aligned) will be propagat ed
simult aneously. Likewise, for more recent processors st art ing wit h t hose based on
I nt el Net Burst microarchit ect ures, a full WC buffer will always be propagat ed as a
single burst t ransact ions, using any chunk order wit hin a t ransact ion. For part ial
buffer propagat ions, all dat a cont ained in t he same chunk will be propagat ed simul-
t aneously.
11.3.2 Choosing a Memory Type
The simplest syst em memory model does not use memory- mapped I / O wit h read or
writ e side effect s, does not include a frame buffer, and uses t he writ e- back memory
t ype for all memory. An I / O agent can perform direct memory access ( DMA) t o writ e-
back memory and t he cache prot ocol maint ains cache coherency.
A syst em can use st rong uncacheable memory for ot her memory- mapped I / O, and
should always use st rong uncacheable memory for memory- mapped I / O wit h read
side effect s.
Dual- port ed memory can be considered a writ e side effect , making relat ively prompt
writ es desirable, because t hose writ es cannot be observed at t he ot her port unt il t hey
reach t he memory agent . A syst em can use st rong uncacheable, uncacheable, writ e-
t hrough, or writ e- combining memory for frame buffers or dual- port ed memory t hat
cont ains pixel values displayed on a screen. Frame buffer memory is t ypically large ( a
few megabyt es) and is usually writ t en more t han it is read by t he processor. Using
st rong uncacheable memory for a frame buffer generat es very large amount s of bus
t raffic, because operat ions on t he ent ire buffer are implement ed using part ial writ es
rat her t han line writ es. Using writ e- t hrough memory for a frame buffer can displace
almost all ot her useful cached lines in t he processor' s L2 and L3 caches and L1 dat a
cache. Therefore, syst ems should use writ e- combining memory for frame buffers
whenever possible.
Vol. 3 11-13
MEMORY CACHE CONTROL
Soft ware can use page- level cache cont rol, t o assign appropriat e effect ive memory
t ypes when soft ware will not access dat a st ruct ures in ways t hat benefit from writ e-
back caching. For example, soft ware may read a large dat a st ruct ure once and not
access t he st ruct ure again unt il t he st ruct ure is rewrit t en by anot her agent . Such a
large dat a st ruct ure should be marked as uncacheable, or reading it will evict cached
lines t hat t he processor will be referencing again.
A similar example would be a writ e- only dat a st ruct ure t hat is writ t en t o ( t o export
t he dat a t o anot her agent ) , but never read by soft ware. Such a st ruct ure can be
marked as uncacheable, because soft ware never reads t he values t hat it writ es
( t hough as uncacheable memory, it will be writ t en using part ial writ es, while as
writ e- back memory, it will be writ t en using line writ es, which may not occur unt il t he
ot her agent reads t he st ruct ure and t riggers implicit writ e- backs) .
On t he Pent ium III, Pent ium 4, and more recent processors, new inst ruct ions are
provided t hat give soft ware great er cont rol over t he caching, prefet ching, and t he
writ e- back charact erist ics of dat a. These inst ruct ions allow soft ware t o use weakly
ordered or processor ordered memory t ypes t o improve processor performance, but
when necessary t o force st rong ordering on memory reads and/ or writ es. They also
allow soft ware great er cont rol over t he caching of dat a. For a descript ion of t hese
inst ruct ions and t here int ended use, see Sect ion 11. 5. 5, Cache Management
I nst ruct ions.
11.3.3 Code Fetches in Uncacheable Memory
Programs may execut e code from uncacheable ( UC) memory, but t he implicat ions
are different from accessing dat a in UC memory. When doing code fet ches, t he
processor never t ransit ions from cacheable code t o UC code speculat ively. I t also
never speculat ively fet ches branch t arget s t hat result in UC code.
The processor may fet ch t he same UC cache line mult iple t imes in order t o decode an
inst ruct ion once. I t may decode consecut ive UC inst ruct ions in a cacheline wit hout
fet ching bet ween each inst ruct ion. I t may also fet ch addit ional cachelines from t he
same or a consecut ive 4- KByt e page in order t o decode one non- speculat ive UC
inst ruct ion ( t his can be t rue even when t he inst ruct ion is cont ained fully in one line) .
Because of t he above and because cacheline sizes may change in fut ure processors,
soft ware should avoid placing memory- mapped I / O wit h read side effect s in t he
same page or in a subsequent page used t o execut e UC code.
11.4 CACHE CONTROL PROTOCOL
The following sect ion describes t he cache cont rol prot ocol current ly defined for t he
I nt el 64 and I A- 32 archit ect ures.
I n t he L1 dat a cache and in t he L2/ L3 unified caches, t he MESI ( modified, exclusive,
shared, invalid) cache prot ocol maint ains consist ency wit h caches of ot her proces-
sors. The L1 dat a cache and t he L2/ L3 unified caches have t wo MESI st at us flags per
11-14 Vol. 3
MEMORY CACHE CONTROL
cache line. Each line can be marked as being in one of t he st at es defined in Table
11- 4. I n general, t he operat ion of t he MESI prot ocol is t ransparent t o programs.
The L1 inst ruct ion cache in P6 family processors implement s only t he SI part of t he
MESI prot ocol, because t he inst ruct ion cache is not writ able. The inst ruct ion cache
monit ors changes in t he dat a cache t o maint ain consist ency bet ween t he caches
when inst ruct ions are modified. See Sect ion 11. 6, Self- Modifying Code, for more
informat ion on t he implicat ions of caching inst ruct ions.
11.5 CACHE CONTROL
The I nt el 64 and I A- 32 archit ect ures provide a variet y of mechanisms for cont rolling
t he caching of dat a and inst ruct ions and for cont rolling t he ordering of reads and
writ es bet ween t he processor, t he caches, and memory. These mechanisms can be
divided int o t wo groups:
Cache cont r ol r egi st er s and bi t s The I nt el 64 and I A- 32 archit ect ures
define several dedicat ed regist ers and various bit s wit hin cont rol regist ers and
page- and direct ory- t able ent ries t hat cont rol t he caching syst em memory
locat ions in t he L1, L2, and L3 caches. These mechanisms cont rol t he caching of
virt ual memory pages and of regions of physical memory.
Cache cont r ol and memor y or der i ng i nst r uct i ons The I nt el 64 and I A- 32
archit ect ures provide several inst ruct ions t hat cont rol t he caching of dat a, t he
ordering of memory reads and writ es, and t he prefet ching of dat a. These inst ruc-
t ions allow soft ware t o cont rol t he caching of specific dat a st ruct ures, t o cont rol
memory coherency for specific locat ions in memory, and t o force st rong memory
ordering at specific locat ions in a program.
The following sect ions describe t hese t wo groups of cache cont rol mechanisms.
Table 11-4. MESI Cache Line States
Cache Line State M (Modified) E (Exclusive) S (Shared) I (Invalid)
This cache line is valid? Yes Yes Yes No
The memory copy is Out of date Valid Valid
Copies exist in caches
of other processors?
No No Maybe Maybe
A write to this line Does not go to
the system bus.
Does not go to
the system bus.
Causes the
processor to gain
exclusive
ownership of the
line.
Goes directly to
the system bus.
Vol. 3 11-15
MEMORY CACHE CONTROL
11.5.1 Cache Control Registers and Bits
Figure 11- 3 depict s cache- cont rol mechanisms in I A- 32 processors. Ot her t han for
t he mat t er of memory address space, t hese work t he same in I nt el 64 processors.
The I nt el 64 and I A- 32 archit ect ures provide t he following cache- cont rol regist ers
and bit s for use in enabling or rest rict ing caching t o various pages or regions in
memory:
CD f l ag, bi t 30 of cont r ol r egi st er CR0 Cont rols caching of syst em memory
locat ions ( see Sect ion 2.5, Cont rol Regist ers ) . I f t he CD flag is clear, caching is
enabled for t he whole of syst em memory, but may be rest rict ed for individual
pages or regions of memory by ot her cache- cont rol mechanisms. When t he CD
flag is set , caching is rest rict ed in t he processor s caches ( cache hierarchy) for
t he P6 and more recent processor families and prevent ed for t he Pent ium
processor ( see not e below) . Wit h t he CD flag set , however, t he caches will st ill
respond t o snoop t raffic. Caches should be explicit ly flushed t o insure memory
coherency. For highest processor performance, bot h t he CD and t he NW flags in
cont rol regist er CR0 should be cleared. Table 11- 5 shows t he int eract ion of t he
CD and NW flags.
The ef f ect of set t ing t he CD flag is somewhat differ ent f or pr ocessor f amilies
st ar t ing wit h P6 f amily t han t he Pent ium pr ocessor ( see Table 11- 5) . To insur e
memor y coher ency aft er t he CD flag is set , t he caches should be explicit ly
f lushed ( see Sect ion 11. 5. 3, Pr event ing Caching ) . Set t ing t he CD f lag f or t he
P6 and more recent processor families modify cache line fill and updat e
behaviour. Also, set t ing t he CD flag on t hese processors do not force st rict
ordering of memory accesses unless t he MTRRs are disabled and/ or all memory
is referenced as uncached ( see Sect ion 8.2. 5, St rengt hening or Weakening t he
Memory- Ordering Model ) .
11-16 Vol. 3
MEMORY CACHE CONTROL
Figure 11-3. Cache-Control Registers and Bits Available in Intel 64 and IA-32
Processors
Page-Directory or
Page-Table Entry
TLBs
MTRRs
3
Physical Memory
0
FFFFFFFFH
2
control overall caching
of system memory
CD and NW Flags PCD and PWT flags
control page-level
caching
G flag controls page-
level flushing of TLBs
MTRRs control caching
of selected regions of
physical memory
P
C
D
CR3
Control caching of
page directory
P
W
T
C
D
CR0
N
W
Store Buffer
P
C
D
P
W
T
G
1
CR4
Enables global pages
P
G
E
designated with G flag
1. G flag only available in P6 and later processor families
3. MTRRs available only in P6 and later processor families;
similar control available in Pentium processor with the KEN#
and WB/WT# pins.
2. The maximum physical address size is reported by CPUID leaf
function 80000008H. The maximum physical address size of
PAT
4
PAT controls caching
of virtual memory
pages
4. PAT available only in Pentium III and later processor families.
P
4
A
T
FFFFFFFFFH applies only If 36-bit physical addressing is used.
5. L3 in processors based on Intel NetBurst microarchitecture can
be disabled using IA32_MISC_ENABLES MSR.
Vol. 3 11-17
MEMORY CACHE CONTROL
Table 11-5. Cache Operating Modes
CD NW Caching and Read/Write Policy L1 L2/L3
1
0 0 Normal Cache Mode. Highest performance cache operation.
Read hits access the cache; read misses may cause replacement.
Write hits update the cache.
Only writes to shared lines and write misses update system
memory.
Yes
Yes
Yes
Yes
Yes
Yes
Write misses cause cache line fills.
Write hits can change shared lines to modified under control of
the MTRRs and with associated read invalidation cycle.
(Pentium processor only.) Write misses do not cause cache line
fills.
Yes
Yes
Yes
Yes
(Pentium processor only.) Write hits can change shared lines to
exclusive under control of WB/WT#.
Invalidation is allowed.
External snoop traffic is supported.
Yes
Yes
Yes
Yes
Yes
0 1 Invalid setting.
Generates a general-protection exception (#GP) with an error code
of 0.
NA NA
1 0 No-fill Cache Mode. Memory coherency is maintained.
3
(Pentium 4 and later processor families.) State of processor after
a power up or reset.
Read hits access the cache; read misses do not cause
replacement (see Pentium 4 and Intel Xeon processors reference
below).
Write hits update the cache.
Only writes to shared lines and write misses update system
memory.
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Write misses access memory.
Write hits can change shared lines to exclusive under control of
the MTRRs and with associated read invalidation cycle.
(Pentium processor only.) Write hits can change shared lines to
exclusive under control of the WB/WT#.
Yes
Yes
Yes
Yes
Yes
1 0 (P6 and later processor families only.) Strict memory ordering is
not enforced unless the MTRRs are disabled and/or all memory is
referenced as uncached (see Section 7.2.4., Strengthening or
Weakening the Memory Ordering Model).
Invalidation is allowed.
External snoop traffic is supported.
Yes
Yes
Yes
Yes
Yes
Yes
11-18 Vol. 3
MEMORY CACHE CONTROL
NW f l ag, bi t 29 of cont r ol r egi st er CR0 Cont rols t he writ e policy for syst em
memory locat ions ( see Sect ion 2. 5, Cont rol Regist ers ) . I f t he NW and CD flags
are clear, writ e- back is enabled for t he whole of syst em memory, but may be
rest rict ed for individual pages or regions of memory by ot her cache- cont rol
mechanisms. Table 11- 5 shows how t he ot her combinat ions of CD and NW flags
affect s caching.
NOTES
For t he Pent ium 4 and I nt el Xeon processors, t he NW flag is a don t
care flag; t hat is, when t he CD flag is set , t he processor uses t he no-
fill cache mode, regardless of t he set t ing of t he NW flag.
For I nt el At om processors, t he NW flag is a dont care flag; t hat is,
when t he CD flag is set , t he processor disables caching, regardless of
t he set t ing of t he NW flag.
For t he Pent ium processor, when t he L1 cache is disabled ( t he CD and
NW flags in cont rol regist er CR0 are set ) , ext ernal snoops are
accept ed in DP ( dual- processor) syst ems and inhibit ed in unipro-
cessor syst ems.
When snoops are inhibit ed, address parit y is not checked and
APCHK# is not assert ed for a corrupt address; however, when snoops
are accept ed, address parit y is checked and APCHK# is assert ed for
1 1 Memory coherency is not maintained.
2, 3
(P6 family and Pentium processors.) State of the processor after
a power up or reset.
Read hits access the cache; read misses do not cause
replacement.
Write hits update the cache and change exclusive lines to
modified.
Yes
Yes
Yes
Yes
Yes
Yes
Shared lines remain shared after write hit.
Write misses access memory.
Invalidation is inhibited when snooping; but is allowed with INVD
and WBINVD instructions.
External snoop traffic is supported.
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
NOTES:
1. The L2/L3 column in this table is definitive for the Pentium 4, Intel Xeon, and P6 family proces-
sors. It is intended to represent what could be implemented in a system based on a Pentium pro-
cessor with an external, platform specific, write-back L2 cache.
2. The Pentium 4 and more recent processor families do not support this mode; setting the CD and
NW bits to 1 selects the no-fill cache mode.
3. Not supported In Intel Atom processors. If CD = 1 in an Intel Atom processor, caching is disabled.
Table 11-5. Cache Operating Modes
CD NW Caching and Read/Write Policy L1 L2/L3
1
Vol. 3 11-19
MEMORY CACHE CONTROL
corrupt addresses.
PCD f l ag i n t he page- di r ect or y and page- t abl e ent r i es Cont rols caching
for individual page t ables and pages, respect ively ( see Sect ion 4. 9, Paging and
Memory Typing ) . This flag only has effect when paging is enabled and t he CD
flag in cont rol regist er CR0 is clear. The PCD flag enables caching of t he page
t able or page when clear and prevent s caching when set .
PWT f l ag i n t he page- di r ect or y and page- t abl e ent r i es Cont rols t he writ e
policy for individual page t ables and pages, respect ively ( see Sect ion 4. 9, Paging
and Memory Typing ) . This flag only has effect when paging is enabled and t he
NW flag in cont rol regist er CR0 is clear. The PWT flag enables writ e- back caching
of t he page t able or page when clear and writ e- t hrough caching when set .
PCD and PWT f l ags i n cont r ol r egi st er CR3 Cont rol t he global caching and
writ e policy for t he page direct ory ( see Sect ion 2. 5, Cont rol Regist ers ) . The PCD
flag enables caching of t he page direct ory when clear and prevent s caching when
set . The PWT flag enables writ e- back caching of t he page direct ory when clear
and writ e- t hrough caching when set . These flags do not affect t he caching and
writ e policy for individual page t ables. These flags only have effect when paging
is enabled and t he CD flag in cont rol regist er CR0 is clear.
G ( gl obal ) f l ag i n t he page- di r ect or y and page- t abl e ent r i es ( i nt r oduced
t o t he I A- 32 ar chi t ect ur e i n t he P6 f ami l y pr ocessor s) Cont rols t he
flushing of TLB ent ries for individual pages. See Sect ion 4. 10, Caching
Translat ion I nformat ion, for more informat ion about t his flag.
PGE ( page gl obal enabl e) f l ag i n cont r ol r egi st er CR4 Enables t he est ab-
lishment of global pages wit h t he G flag. See Sect ion 4. 10, Caching Translat ion
I nformat ion, for more informat ion about t his flag.
Memor y t y pe r ange r egi st er s ( MTRRs) ( i nt r oduced i n P6 f ami l y
pr ocessor s) Cont rol t he t ype of caching used in specific regions of physical
memory. Any of t he caching t ypes described in Sect ion 11. 3, Met hods of Caching
Available, can be select ed. See Sect ion 11. 11, Memory Type Range Regist ers
( MTRRs) , for a det ailed descript ion of t he MTRRs.
Page At t r i but e Tabl e ( PAT) MSR ( i nt r oduced i n t he Pent i um III pr ocessor )
Ext ends t he memory t yping capabilit ies of t he processor t o permit memory
t ypes t o be assigned on a page- by- page basis ( see Sect ion 11. 12, Page At t ribut e
Table ( PAT) ) .
Thi r d- Level Cache Di sabl e f l ag, bi t 6 of t he I A32_MI SC_ENABLES MSR
( Avai l abl e onl y i n pr ocessor s based on I nt el Net Bur st mi cr oar chi t ect ur e)
Allows t he L3 cache t o be disabled and enabled, independent ly of t he L1 and
L2 caches.
KEN# and WB/ WT# pi ns ( Pent i um pr ocessor ) Allow ext ernal hardware t o
cont rol t he caching met hod used for specific areas of memory. They perform
similar ( but not ident ical) funct ions t o t he MTRRs in t he P6 family processors.
PCD and PWT pi ns ( Pent i um pr ocessor ) These pins ( which are associat ed
wit h t he PCD and PWT flags in cont rol regist er CR3 and in t he page- direct ory and
11-20 Vol. 3
MEMORY CACHE CONTROL
page- t able ent ries) permit caching in an ext ernal L2 cache t o be cont rolled on a
page- by- page basis, consist ent wit h t he cont rol exercised on t he L1 cache of
t hese processors. The P6 and more recent processor families do not provide
t hese pins because t he L2 cache in int ernal t o t he chip package.
11.5.2 Precedence of Cache Controls
The cache cont rol flags and MTRRs operat e hierarchically for rest rict ing caching. That
is, if t he CD flag is set , caching is prevent ed globally ( see Table 11- 5) . I f t he CD flag
is clear, t he page- level cache cont rol flags and/ or t he MTRRs can be used t o rest rict
caching. I f t here is an overlap of page- level and MTRR caching cont rols, t he mecha-
nism t hat prevent s caching has precedence. For example, if an MTRR makes a region
of syst em memory uncacheable, a page- level caching cont rol cannot be used t o
enable caching for a page in t hat region. The converse is also t rue; t hat is, if a page-
level caching cont rol designat es a page as uncacheable, an MTRR cannot be used t o
make t he page cacheable.
I n cases where t here is a overlap in t he assignment of t he writ e- back and writ e-
t hrough caching policies t o a page and a region of memory, t he writ e- t hrough policy
t akes precedence. The writ e- combining policy ( which can only be assigned t hrough
an MTRR or t he PAT) t akes precedence over eit her writ e- t hrough or writ e- back.
The select ion of memory t ypes at t he page level varies depending on whet her PAT is
being used t o select memory t ypes for pages, as described in t he following sect ions.
On processors based on I nt el Net Burst microarchit ect ure, t he t hird- level cache can
be disabled by bit 6 of t he I A32_MI SC_ENABLE MSR. Using I A32_MI SC_ENALBES[ bit
6] t akes precedence over t he CD flag, MTRRs, and PAT for t he L3 cache in t hose
processors. That is, when t he t hird- level cache disable flag is set ( cache disabled) ,
t he ot her cache cont rols have no affect on t he L3 cache; when t he flag is clear
( enabled) , t he cache cont rols have t he same affect on t he L3 cache as t hey have on
t he L1 and L2 caches.
I A32_MI SC_ENALBES[ bit 6] is not support ed in I nt el Core i7 processors, nor proces-
sors based on I nt el Core, and I nt el At om microarchit ect ures.
11.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II
Processors
The Pent ium Pro and Pent ium I I processors do not support t he PAT. Here, t he effec-
t ive memory t ype for a page is select ed wit h t he MTRRs and t he PCD and PWT bit s in
t he page- t able or page- direct ory ent ry for t he page. Table 11- 6 describes t he
mapping of MTRR memory t ypes and page- level caching at t ribut es t o effect ive
memory t ypes, when normal caching is in effect ( t he CD and NW flags in cont rol
regist er CR0 are clear) . Combinat ions t hat appear in gray are implement at ion-
defined for t he Pent ium Pro and Pent ium I I processors. Syst em designers are encour-
aged t o avoid t hese implement at ion- defined combinat ions.
Vol. 3 11-21
MEMORY CACHE CONTROL
When normal caching is in effect , t he effect ive memory t ype shown in Table 11- 6 is
det ermined using t he following rules:
1. I f t he PCD and PWT at t ribut es for t he page are bot h 0, t hen t he effect ive
memory t ype is ident ical t o t he MTRR- defined memory t ype.
2. I f t he PCD flag is set , t hen t he effect ive memory t ype is UC.
3. I f t he PCD flag is clear and t he PWT flag is set , t he effect ive memory t ype is WT
for t he WB memory t ype and t he MTRR- defined memory t ype for all ot her
memory t ypes.
4. Set t ing t he PCD and PWT flags t o opposit e values is considered model- specific for
t he WP and WC memory t ypes and archit ect urally- defined for t he WB, WT, and
UC memory t ypes.
Table 11-6. Effective Page-Level Memory Type for Pentium Pro and
Pentium II Processors
MTRR Memory Type
1
PCD Value PWT Value Effective Memory Type
UC X X UC
WC 0 0 WC
0 1 WC
1 0 WC
1 1 UC
WT 0 X WT
1 X UC
WP 0 0 WP
0 1 WP
1 0 WC
1 1 UC
WB 0 0 WB
0 1 WT
1 X UC
NOTE:
1. These effective memory types also apply to the Pentium 4, Intel Xeon, and Pentium III proces-
sors when the PAT bit is not used (set to 0) in page-table and page-directory entries.
11-22 Vol. 3
MEMORY CACHE CONTROL
11.5.2.2 Selecting Memory Types for Pentium III and More Recent
Processor Families
The I nt el Core 2 Duo, I nt el At om, I nt el Core Duo, I nt el Core Solo, Pent ium M,
Pent ium 4, I nt el Xeon, and Pent ium III processors use t he PAT t o select effect ive
page- level memory t ypes. Here, a memory t ype for a page is select ed by t he MTRRs
and t he value in a PAT ent ry t hat is select ed wit h t he PAT, PCD and PWT bit s in a
page- t able or page- direct ory ent ry ( see Sect ion 11. 12. 3, Select ing a Memory Type
from t he PAT ) . Table 11- 7 describes t he mapping of MTRR memory t ypes and PAT
ent ry t ypes t o effect ive memory t ypes, when normal caching is in effect ( t he CD and
NW flags in cont rol regist er CR0 are clear) .
Table 11-7. Effective Page-Level Memory Types for Pentium III and More Recent
Processor Families
MTRR Memory Type PAT Entry Value Effective Memory Type
UC UC UC
1
UC- UC
1
WC WC
WT UC
1
WB UC
1
WP UC
1
WC UC UC
2
UC- WC
WC WC
WT UC
2,3
WB WC
WP UC
2,3
WT UC UC
2
UC- UC
2
WC WC
WT WT
WB WT
WP WP
3
Vol. 3 11-23
MEMORY CACHE CONTROL
11.5.2.3 Writing Values Across Pages with Different Memory Types
I f t wo adj oining pages in memory have different memory t ypes, and a word or longer
operand is writ t en t o a memory locat ion t hat crosses t he page boundary bet ween
t hose t wo pages, t he operand might be writ t en t o memory t wice. This act ion does not
present a problem for writ es t o act ual memory; however, if a device is mapped t he
memory space assigned t o t he pages, t he device might malfunct ion.
WB UC UC
2
UC- UC
2
WC WC
WT WT
WB WB
WP WP
WP UC UC
2
UC- WC
3
WC WC
WT WT
3
WB WP
WP WP
NOTES:
1. The UC attribute comes from the MTRRs and the processors are not required to snoop their
caches since the data could never have been cached. This attribute is preferred for performance
reasons.
2. The UC attribute came from the page-table or page-directory entry and processors are required
to check their caches because the data may be cached due to page aliasing, which is not recom-
mended.
3. These combinations were specified as undefined in previous editions of the Intel 64 and IA-32
Architectures Software Developers Manual. However, all processors that support both the PAT
and the MTRRs determine the effective page-level memory types for these combinations as
given.
Table 11-7. Effective Page-Level Memory Types for Pentium III and More Recent
Processor Families (Contd.)
MTRR Memory Type PAT Entry Value Effective Memory Type
11-24 Vol. 3
MEMORY CACHE CONTROL
11.5.3 Preventing Caching
To disable t he L1, L2, and L3 caches aft er t hey have been enabled and have received
cache fills, perform t he following st eps:
1. Ent er t he no- fill cache mode. ( Set t he CD flag in cont rol regist er CR0 t o 1 and
t he NW flag t o 0.
2. Flush all caches using t he WBI NVD inst ruct ion.
3. Disable t he MTRRs and set t he default memory t ype t o uncached or set all MTRRs
for t he uncached memory t ype ( see t he discussion of t he discussion of t he TYPE
field and t he E flag in Sect ion 11. 11. 2. 1, I A32_MTRR_DEF_TYPE MSR ) .
The caches must be flushed ( st ep 2) aft er t he CD flag is set t o insure syst em memory
coherency. I f t he caches are not flushed, cache hit s on reads will st ill occur and dat a
will be read from valid cache lines.
The int ent of t he t hree separat e st eps list ed above address t hree dist inct require-
ment s: ( i) discont inue new dat a replacing exist ing dat a in t he cache ( ii) ensure dat a
already in t he cache are evict ed t o memory, ( iii) ensure subsequent memory refer-
ences observe UC memory t ype semant ics. Different processor implement at ion of
caching cont rol hardware may allow some variat ion of soft ware implement at ion of
t hese t hree requirement s. See not e below.
NOTES
Set t ing t he CD flag in cont rol regist er CR0 modifies t he processor s
caching behaviour as indicat ed in Table 11- 5, but set t ing t he CD flag
alone may not be sufficient across all processor families t o force t he
effect ive memory t ype for all physical memory t o be UC nor does it
force st rict memory ordering, due t o hardware implement at ion
variat ions across different processor families. To force t he UC
memory t ype and st rict memory ordering on all of physical memory,
it is sufficient t o eit her program t he MTRRs for all physical memory t o
be UC memory t ype or disable all MTRRs.
For t he Pent ium 4 and I nt el Xeon processors, aft er t he sequence of
st eps given above has been execut ed, t he cache lines cont aining t he
code bet ween t he end of t he WBI NVD inst ruct ion and before t he
MTRRS have act ually been disabled may be ret ained in t he cache
hierarchy. Here, t o remove code from t he cache complet ely, a second
WBI NVD inst ruct ion must be execut ed aft er t he MTRRs have been
disabled.
For I nt el At om processors, set t ing t he CD flag forces all physical
memory t o observe UC semant ics ( wit hout requiring memory t ype of
physical memory t o be set explicit ly) . Consequent ly, soft ware does
not need t o issue a second WBI NVD as some ot her processor
generat ions might require.
Vol. 3 11-25
MEMORY CACHE CONTROL
11.5.4 Disabling and Enabling the L3 Cache
On processors based on I nt el Net Burst microarchit ect ure, t he t hird- level cache can
be disabled by bit 6 of t he I A32_MI SC_ENABLE MSR. The t hird- level cache disable
flag ( bit 6 of t he I A32_MI SC_ENABLE MSR) allows t he L3 cache t o be disabled and
enabled, independent ly of t he L1 and L2 caches. Prior t o using t his cont rol t o disable
or enable t he L3 cache, soft ware should disable and flush all t he processor caches, as
described earlier in Sect ion 11. 5. 3, Prevent ing Caching, t o prevent of loss of infor-
mat ion st ored in t he L3 cache. Aft er t he L3 cache has been disabled or enabled,
caching for t he whole processor can be rest ored.
Newer I nt el 64 processor wit h L3 do not support I A32_MI SC_ENABLES[ bit 6] , t he
procedure described in Sect ion 11. 5. 3, Prevent ing Caching, apply t o t he ent ire
cache hierarchy.
11.5.5 Cache Management Instructions
The I nt el 64 and I A- 32 archit ect ures provide several inst ruct ions for managing t he
L1, L2, and L3 caches. The I NVD, WBI NVD, and WBI NVD inst ruct ions are syst em
inst ruct ions t hat operat e on t he L1, L2, and L3 caches as a whole. The PREFETCHh
and CLFLUSH inst ruct ions and t he non- t emporal move inst ruct ions ( MOVNTI ,
MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) , which were int roduced in
SSE/ SSE2 ext ensions, offer more granular cont rol over caching.
The I NVD and WBI NVD inst ruct ions are used t o invalidat e t he cont ent s of t he L1, L2,
and L3 caches. The I NVD inst ruct ion invalidat es all int ernal cache ent ries, t hen
generat es a special- funct ion bus cycle t hat indicat es t hat ext ernal caches also should
be invalidat ed. The I NVD inst ruct ion should be used wit h care. I t does not force a
writ e- back of modified cache lines; t herefore, dat a st ored in t he caches and not
writ t en back t o syst em memory will be lost . Unless t here is a specific requirement or
benefit t o invalidat ing t he caches wit hout writ ing back t he modified lines ( such as,
during t est ing or fault recovery where cache coherency wit h main memory is not a
concern) , soft ware should use t he WBI NVD inst ruct ion.
The WBI NVD inst ruct ion first writ es back any modified lines in all t he int ernal caches,
t hen invalidat es t he cont ent s of bot h t he L1, L2, and L3 caches. I t ensures t hat cache
coherency wit h main memory is maint ained regardless of t he writ e policy in effect
( t hat is, writ e- t hrough or writ e- back) . Following t his operat ion, t he WBI NVD inst ruc-
t ion generat es one ( P6 family processors) or t wo ( Pent ium and I nt el486 processors)
special- funct ion bus cycles t o indicat e t o ext ernal cache cont rollers t hat writ e- back of
modified dat a followed by invalidat ion of ext ernal caches should occur. The amount of
t ime or cycles for WBI NVD t o complet e will vary due t o t he size of different cache
hierarchies and ot her fact ors. As a consequence, t he use of t he WBI NVD inst ruct ion
can have an impact on int errupt / event response t ime.
The PREFETCHh inst ruct ions allow a program t o suggest t o t he processor t hat a
cache line from a specified locat ion in syst em memory be prefet ched int o t he cache
hierarchy ( see Sect ion 11. 8, Explicit Caching ) .
11-26 Vol. 3
MEMORY CACHE CONTROL
The CLFLUSH inst ruct ion allow select ed cache lines t o be flushed from memory. This
inst ruct ion give a program t he abilit y t o explicit ly free up cache space, when it is
known t hat cached sect ion of syst em memory will not be accessed in t he near fut ure.
The non- t emporal move inst ruct ions ( MOVNTI , MOVNTQ, MOVNTDQ, MOVNTPS, and
MOVNTPD) allow dat a t o be moved from t he processor s regist ers direct ly int o
syst em memory wit hout being also writ t en int o t he L1, L2, and/ or L3 caches. These
inst ruct ions can be used t o prevent cache pollut ion when operat ing on dat a t hat is
going t o be modified only once before being st ored back int o syst em memory. These
inst ruct ions operat e on dat a in t he general- purpose, MMX, and XMM regist ers.
11.5.6 L1 Data Cache Context Mode
L1 dat a cache cont ext mode is a feat ure of processors based on t he I nt el Net Burst
microarchit ect ure t hat support I nt el Hyper-Threading Technology. When
CPUI D.1: ECX[ bit 10] = 1, t he processor support s set t ing L1 dat a cache cont ext
mode using t he L1 dat a cache cont ext mode flag ( I A32_MI SC_ENABLE[ bit 24] ) .
Select able modes are adapt ive mode ( default ) and shared mode.
The BI OS is responsible for configuring t he L1 dat a cache cont ext mode.
11.5.6.1 Adaptive Mode
Adapt ive mode facilit at es L1 dat a cache sharing bet ween logical processors. When
running in adapt ive mode, t he L1 dat a cache is shared across logical processors in
t he same core if:
CR3 cont rol regist ers for logical processors sharing t he cache are ident ical.
The same paging mode is used by logical processors sharing t he cache.
I n t his sit uat ion, t he ent ire L1 dat a cache is available t o each logical processor
( inst ead of being compet it ively shared) .
I f CR3 values are different for t he logical processors sharing an L1 dat a cache or t he
logical processors use different paging modes, processors compet e for cache
resources. This reduces t he effect ive size of t he cache for each logical processor.
Aliasing of t he cache is not allowed ( which prevent s dat a t hrashing) .
11.5.6.2 Shared Mode
I n shared mode, t he L1 dat a cache is compet it ively shared bet ween logical proces-
sors. This is t rue even if t he logical processors use ident ical CR3 regist ers and paging
modes.
I n shared mode, linear addresses in t he L1 dat a cache can be aliased, meaning t hat
one linear address in t he cache can point t o different physical locat ions. The mecha-
nism for resolving aliasing can lead t o t hrashing. For t his reason,
I A32_MI SC_ENABLE[ bit 24] = 0 is t he preferred configurat ion for processors based
Vol. 3 11-27
MEMORY CACHE CONTROL
on t he I nt el Net Burst microarchit ect ure t hat support I nt el Hyper-Threading Tech-
nology.
11.6 SELF-MODIFYING CODE
A writ e t o a memory locat ion in a code segment t hat is current ly cached in t he
processor causes t he associat ed cache line ( or lines) t o be invalidat ed. This check is
based on t he physical address of t he inst ruct ion. I n addit ion, t he P6 family and
Pent ium processors check whet her a writ e t o a code segment may modify an inst ruc-
t ion t hat has been prefet ched for execut ion. I f t he writ e affect s a prefet ched inst ruc-
t ion, t he prefet ch queue is invalidat ed. This lat t er check is based on t he linear
address of t he inst ruct ion. For t he Pent ium 4 and I nt el Xeon processors, a writ e or a
snoop of an inst ruct ion in a code segment , where t he t arget inst ruct ion is already
decoded and resident in t he t race cache, invalidat es t he ent ire t race cache. The lat t er
behavior means t hat programs t hat self- modify code can cause severe degradat ion
of performance when run on t he Pent ium 4 and I nt el Xeon processors.
I n pract ice, t he check on linear addresses should not creat e compat ibilit y problems
among I A- 32 processors. Applicat ions t hat include self- modifying code use t he same
linear address for modifying and fet ching t he inst ruct ion. Syst ems soft ware, such as
a debugger, t hat might possibly modify an inst ruct ion using a different linear address
t han t hat used t o fet ch t he inst ruct ion, will execut e a serializing operat ion, such as a
CPUI D inst ruct ion, before t he modified inst ruct ion is execut ed, which will aut omat i-
cally resynchronize t he inst ruct ion cache and prefet ch queue. ( See Sect ion 8. 1. 3,
Handling Self- and Cross- Modifying Code, for more informat ion about t he use of
self- modifying code. )
For I nt el486 processors, a writ e t o an inst ruct ion in t he cache will modify it in bot h
t he cache and memory, but if t he inst ruct ion was prefet ched before t he writ e, t he old
version of t he inst ruct ion could be t he one execut ed. To prevent t he old inst ruct ion
from being execut ed, flush t he inst ruct ion prefet ch unit by coding a j ump inst ruct ion
immediat ely aft er any writ e t hat modifies an inst ruct ion.
11.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS)
I mplicit caching occurs when a memory element is made pot ent ially cacheable,
alt hough t he element may never have been accessed in t he normal von Neumann
sequence. I mplicit caching occurs on t he P6 and more recent processor families due
t o aggressive prefet ching, branch predict ion, and TLB miss handling. I mplicit caching
is an ext ension of t he behavior of exist ing I nt el386, I nt el486, and Pent ium processor
syst ems, since soft ware running on t hese processor families also has not been able
t o det erminist ically predict t he behavior of inst ruct ion prefet ch.
11-28 Vol. 3
MEMORY CACHE CONTROL
To avoid problems relat ed t o implicit caching, t he operat ing syst em must explicit ly
invalidat e t he cache when changes are made t o cacheable dat a t hat t he cache coher-
ency mechanism does not aut omat ically handle. This includes writ es t o dual- port ed
or physically aliased memory boards t hat are not det ect ed by t he snooping mecha-
nisms of t he processor, and changes t o page- t able ent ries in memory.
The code in Example 11- 1 shows t he effect of implicit caching on page- t able ent ries.
The linear address F000H point s t o physical locat ion B000H ( t he page- t able ent ry for
F000H cont ains t he value B000H) , and t he page- t able ent ry for linear address F000
is PTE_F000.
Example 11-1. Effect of Implicit Caching on Page-Table Entries
mov EAX, CR3; Invalidate the TLB
mov CR3, EAX; by copying CR3 to itself
mov PTE_F000, A000H; Change F000H to point to A000H
mov EBX, [F000H];
Because of speculat ive execut ion in t he P6 and more recent processor families, t he
last MOV inst ruct ion performed would place t he value at physical locat ion B000H int o
EBX, rat her t han t he value at t he new physical address A000H. This sit uat ion is
remedied by placing a TLB invalidat ion bet ween t he load and t he st ore.
11.8 EXPLICIT CACHING
The Pent ium III processor int roduced four new inst ruct ions, t he PREFETCHh inst ruc-
t ions, t hat provide soft ware wit h explicit cont rol over t he caching of dat a. These
inst ruct ions provide hint s t o t he processor t hat t he dat a request ed by a PREFETCHh
inst ruct ion should be read int o cache hierarchy now or as soon as possible, in ant ici-
pat ion of it s use. The inst ruct ions provide different variat ions of t he hint t hat allow
select ion of t he cache level int o which dat a will be read.
The PREFETCHh inst ruct ions can help reduce t he long lat ency t ypically associat ed
wit h reading dat a from memory and t hus help prevent processor st alls. However,
t hese inst ruct ions should be used j udiciously. Overuse can lead t o resource conflict s
and hence reduce t he performance of an applicat ion. Also, t hese inst ruct ions should
only be used t o prefet ch dat a from memory; t hey should not be used t o prefet ch
inst ruct ions. For more det ailed informat ion on t he proper use of t he prefet ch inst ruc-
t ion, refer t o Chapt er 7, Opt imizing Cache Usage, in t he I nt el 64 and I A- 32 Archi-
t ect ures Opt imizat ion Reference Manual.
Vol. 3 11-29
MEMORY CACHE CONTROL
11.9 INVALIDATING THE TRANSLATION LOOKASIDE
BUFFERS (TLBS)
The processor updat es it s address t ranslat ion caches ( TLBs) t ransparent ly t o soft -
ware. Several mechanisms are available, however, t hat allow soft ware and hardware
t o invalidat e t he TLBs eit her explicit ly or as a side effect of anot her operat ion. Most
det ails are given in Sect ion 4.10. 4, I nvalidat ion of TLBs and Paging- St ruct ure
Caches. I n addit ion, t he following operat ions invalidat e all TLB ent ries, irrespect ive
of t he set t ing of t he G flag:
Assert ing or de- assert ing t he FLUSH# pin.
( Pent ium 4, I nt el Xeon, and lat er processors only. ) Writ ing t o an MTRR ( wit h a
WRMSR inst ruct ion) .
Writ ing t o cont rol regist er CR0 t o modify t he PG or PE flag.
( Pent ium 4, I nt el Xeon, and lat er processors only. ) Writ ing t o cont rol regist er CR4
t o modify t he PSE, PGE, or PAE flag.
Writ ing t o cont rol regist er CR4 t o change t he PCI DE flag from 1 t o 0.
See Sect ion 4. 10, Caching Translat ion I nformat ion, for addit ional informat ion about
t he TLBs.
11.10 STORE BUFFER
I nt el 64 and I A- 32 processors t emporarily st ore each writ e ( st ore) t o memory in a
st ore buffer. The st ore buffer improves processor performance by allowing t he
processor t o cont inue execut ing inst ruct ions wit hout having t o wait unt il a writ e t o
memory and/ or t o a cache is complet e. I t also allows writ es t o be delayed for more
efficient use of memory- access bus cycles.
I n general, t he exist ence of t he st ore buffer is t ransparent t o soft ware, even in
syst ems t hat use mult iple processors. The processor ensures t hat writ e operat ions
are always carried out in program order. I t also insures t hat t he cont ent s of t he st ore
buffer are always drained t o memory in t he following sit uat ions:
When an except ion or int errupt is generat ed.
( P6 and more recent processor families only) When a serializing inst ruct ion is
execut ed.
When an I / O inst ruct ion is execut ed.
When a LOCK operat ion is performed.
( P6 and more recent processor families only) When a BI NI T operat ion is
performed.
( Pent ium III, and more recent processor families only) When using an SFENCE
inst ruct ion t o order st ores.
11-30 Vol. 3
MEMORY CACHE CONTROL
( Pent ium 4 and more recent processor families only) When using an MFENCE
inst ruct ion t o order st ores.
The discussion of writ e ordering in Sect ion 8. 2, Memory Ordering, gives a det ailed
descript ion of t he operat ion of t he st ore buffer.
11.11 MEMORY TYPE RANGE REGISTERS (MTRRS)
The following sect ion pert ains only t o t he P6 and more recent processor families.
The memory t ype range regist ers ( MTRRs) provide a mechanism for associat ing t he
memory t ypes ( see Sect ion 11. 3, Met hods of Caching Available ) wit h physical-
address ranges in syst em memory. They allow t he processor t o opt imize operat ions
for different t ypes of memory such as RAM, ROM, frame- buffer memory, and
memory- mapped I / O devices. They also simplify syst em hardware design by elimi-
nat ing t he memory cont rol pins used for t his funct ion on earlier I A- 32 processors and
t he ext ernal logic needed t o drive t hem.
The MTRR mechanism allows up t o 96 memory ranges t o be defined in physical
memory, and it defines a set of model- specific regist ers ( MSRs) for specifying t he
t ype of memory t hat is cont ained in each range. Table 11- 8 shows t he memory t ypes
t hat can be specified and t heir propert ies; Figure 11- 4 shows t he mapping of physical
memory wit h MTRRs. See Sect ion 11. 3, Met hods of Caching Available, for a more
det ailed descript ion of each memory t ype.
Following a hardware reset , t he P6 and more recent processor families disable all t he
fixed and variable MTRRs, which in effect makes all of physical memory uncacheable.
I nit ializat ion soft ware should t hen set t he MTRRs t o a specific, syst em- defined
memory map. Typically, t he BI OS ( basic input / out put syst em) soft ware configures
t he MTRRs. The operat ing syst em or execut ive is t hen free t o modify t he memory
map using t he normal page- level cacheabilit y at t ribut es.
I n a mult iprocessor syst em using a processor in t he P6 family or a more recent
family, each processor MUST use t he ident ical MTRR memory map so t hat soft ware
will have a consist ent view of memory.
NOTE
I n mult iple processor syst ems, t he operat ing syst em must maint ain
MTRR consist ency bet ween all t he processors in t he syst em ( t hat is,
all processors must use t he same MTRR values) . The P6 and more
recent processor families provide no hardware support for
maint aining t his consist ency.
Table 11-8. Memory Types That Can Be Encoded in MTRRs
Memory Type and Mnemonic Encoding in MTRR
Uncacheable (UC) 00H
Vol. 3 11-31
MEMORY CACHE CONTROL
Write Combining (WC) 01H
Reserved* 02H
Reserved* 03H
Write-through (WT) 04H
Write-protected (WP) 05H
Writeback (WB) 06H
Reserved* 7H through FFH
NOTE:
* Use of these encodings results in a general-protection exception (#GP).
Figure 11-4. Mapping Physical Memory With MTRRs
Table 11-8. Memory Types That Can Be Encoded in MTRRs (Contd.)
0
FFFFFFFFH
80000H
BFFFFH
C0000H
FFFFFH
100000H
7FFFFH
512 KBytes
256 KBytes
256 KBytes
8 fixed ranges
16 fixed ranges
64 fixed ranges
Variable ranges
(64-KBytes each)
(16 KBytes each)
(4 KBytes each)
(from 4 KBytes to
maximum size of
Address ranges not
Physical Memory
mapped by an MTRR
are set to a default type
physical memory)
11-32 Vol. 3
MEMORY CACHE CONTROL
11.11.1 MTRR Feature Identification
The availabilit y of t he MTRR feat ure is model- specific. Soft ware can det ermine if
MTRRs are support ed on a processor by execut ing t he CPUI D inst ruct ion and reading
t he st at e of t he MTRR flag ( bit 12) in t he feat ure informat ion regist er ( EDX) .
I f t he MTRR flag is set ( indicat ing t hat t he processor implement s MTRRs) , addit ional
informat ion about MTRRs can be obt ained from t he 64- bit I A32_MTRRCAP MSR
( named MTRRcap MSR for t he P6 family processors) . The I A32_MTRRCAP MSR is a
read- only MSR t hat can be read wit h t he RDMSR inst ruct ion. Figure 11- 5 shows t he
cont ent s of t he I A32_MTRRCAP MSR. The funct ions of t he flags and field in t his
regist er are as follows:
VCNT ( var i abl e r ange r egi st er s count ) f i el d, bi t s 0 t hr ough 7 I ndicat es
t he number of variable ranges implement ed on t he processor.
FI X ( f i x ed r ange r egi st er s suppor t ed) f l ag, bi t 8 Fixed range MTRRs
( I A32_MTRR_FI X64K_00000 t hrough I A32_MTRR_FI X4K_0F8000) are
support ed when set ; no fixed range regist ers are support ed when clear.
WC ( w r i t e combi ni ng) f l ag, bi t 10 The writ e- combining ( WC) memory t ype
is support ed when set ; t he WC t ype is not support ed when clear.
SMRR ( Syst em- Management Range Regi st er ) f l ag, bi t 11 The syst em-
management range regist er ( SMRR) int erface is support ed when bit 11 is set ; t he
SMRR int erface is not support ed when clear.
Bit 9 and bit s 11 t hrough 63 in t he I A32_MTRRCAP MSR are reserved. I f soft ware
at t empt s t o writ e t o t he I A32_MTRRCAP MSR, a general- prot ect ion except ion ( # GP)
is generat ed.
Soft ware must read I A32_MTRRCAP VCNT field t o det ermine t he number of variable
MTRRs and query ot her feat ure bit s in I A32_MTRRCAP t o det ermine addit ional capa-
bilit ies t hat are support ed in a processor. For example, some processors may report
a value of 8 in t he VCNT field, ot her processors may report VCNT wit h different
values.
Figure 11-5. IA32_MTRRCAP Register
VCNT Number of variable range registers
FIX Fixed range registers supported
WC Write-combining memory type supported
63 0
Reserved
W
C
7 10 11
VCNT
F
I
X
8 9
Reserved
SMRR SMRR interface supported
Vol. 3 11-33
MEMORY CACHE CONTROL
11.11.2 Setting Memory Ranges with MTRRs
The memory ranges and t he t ypes of memory specified in each range are set by t hree
groups of regist ers: t he I A32_MTRR_DEF_TYPE MSR, t he fixed- range MTRRs, and
t he variable range MTRRs. These regist ers can be read and writ t en t o using t he
RDMSR and WRMSR inst ruct ions, respect ively. The I A32_MTRRCAP MSR indicat es
t he availabilit y of t hese regist ers on t he processor ( see Sect ion 11. 11.1, MTRR
Feat ure I dent ificat ion ) .
11.11.2.1 IA32_MTRR_DEF_TYPE MSR
The I A32_MTRR_DEF_TYPE MSR ( named MTRRdefType MSR for t he P6 family
processors) set s t he default propert ies of t he regions of physical memory t hat are not
encompassed by MTRRs. The funct ions of t he flags and field in t his regist er are as
follows:
Ty pe f i el d, bi t s 0 t hr ough 7 I ndicat es t he default memory t ype used for
t hose physical memory address ranges t hat do not have a memory t ype specified
for t hem by an MTRR ( see Table 11- 8 for t he encoding of t his field) . The legal
values for t his field are 0, 1, 4, 5, and 6. All ot her values result in a general-
prot ect ion except ion ( # GP) being generat ed.
I nt el recommends t he use of t he UC ( uncached) memory t ype for all physical
memory addresses where memory does not exist . To assign t he UC t ype t o
nonexist ent memory locat ions, it can eit her be specified as t he default t ype in t he
Type field or be explicit ly assigned wit h t he fixed and variable MTRRs.
FE ( f i x ed MTRRs enabl ed) f l ag, bi t 10 Fixed- range MTRRs are enabled
when set ; fixed- range MTRRs are disabled when clear. When t he fixed- range
MTRRs are enabled, t hey t ake priorit y over t he variable- range MTRRs when
overlaps in ranges occur. I f t he fixed- range MTRRs are disabled, t he variable-
range MTRRs can st ill be used and can map t he range ordinarily covered by t he
fixed- range MTRRs.
E ( MTRRs enabl ed) f l ag, bi t 11 MTRRs are enabled when set ; all MTRRs are
disabled when clear, and t he UC memory t ype is applied t o all of physical
Figure 11-6. IA32_MTRR_DEF_TYPE MSR
Type Default memory type
FE Fixed-range MTRRs enable/disable
E MTRR enable/disable
63 0
Reserved
F
E
7 10 11
Type
8 9 12
E
Reserved
11-34 Vol. 3
MEMORY CACHE CONTROL
memory. When t his flag is set , t he FE flag can disable t he fixed- range MTRRs;
when t he flag is clear, t he FE flag has no affect . When t he E flag is set , t he t ype
specified in t he default memory t ype field is used for areas of memory not
already mapped by eit her a fixed or variable MTRR.
Bit s 8 and 9, and bit s 12 t hrough 63, in t he I A32_MTRR_DEF_TYPE MSR are
reserved; t he processor generat es a general- prot ect ion except ion ( # GP) if soft ware
at t empt s t o writ e nonzero values t o t hem.
11.11.2.2 Fixed Range MTRRs
The fixed memory ranges are mapped wit h 11 fixed- range regist ers of 64 bit s each.
Each of t hese regist ers is divided int o 8- bit fields t hat are used t o specify t he memory
t ype for each of t he sub- ranges t he regist er cont rols:
Regi st er I A32_MTRR_FI X64K_00000 Maps t he 512- KByt e address range
from 0H t o 7FFFFH. This range is divided int o eight 64- KByt e sub- ranges.
Regi st er s I A32_MTRR_FI X16K_80000 and I A32_MTRR_FI X16K_A0000
Maps t he t wo 128- KByt e address ranges from 80000H t o BFFFFH. This range
is divided int o sixt een 16- KByt e sub- ranges, 8 ranges per regist er.
Regi st er s I A32_MTRR_FI X4K_C0000 t hr ough
I A32_MTRR_FI X4K_F8000 Maps eight 32- KByt e address ranges from
C0000H t o FFFFFH. This range is divided int o sixt y- four 4- KByt e sub- ranges, 8
ranges per regist er.
Table 11- 9 shows t he relat ionship bet ween t he fixed physical- address ranges and t he
corresponding fields of t he fixed- range MTRRs; Table 11- 8 shows memory t ype
encoding for MTRRs.
For t he P6 family processors, t he prefix for t he fixed range MTRRs is MTRRfix.
11.11.2.3 Variable Range MTRRs
The Pent ium 4, I nt el Xeon, and P6 family processors permit soft ware t o specify t he
memory t ype for m variable- size address ranges, using a pair of MTRRs for each
range. The number m of ranges support ed is given in bit s 7: 0 of t he I A32_MTRRCAP
MSR ( see Figure 11- 5 in Sect ion 11. 11. 1) .
The first ent ry in each pair ( I A32_MTRR_PHYSBASEn) defines t he base address and
memory t ype for t he range; t he second ent ry ( I A32_MTRR_PHYSMASKn) cont ains a
mask used t o det ermine t he address range. The n suffix is in t he range 0 t hrough
m1 and ident ifies a specific regist er pair.
For P6 family processors, t he prefixes for t hese variable range MTRRs are MTRRphys-
Base and MTRRphysMask.
Vol. 3 11-35
MEMORY CACHE CONTROL
Figure 11- 7 shows flags and fields in t hese regist ers. The funct ions of t hese flags and
fields are:
Ty pe f i el d, bi t s 0 t hr ough 7 Specifies t he memory t ype for t he range ( see
Table 11- 8 for t he encoding of t his field) .
PhysBase f i el d, bi t s 12 t hr ough ( MAXPHYADDR- 1) Specifies t he base
address of t he address range. This 24- bit value, in t he case where MAXPHYADDR
is 36 bit s, is ext ended by 12 bit s at t he low end t o form t he base address ( t his
aut omat ically aligns t he address on a 4- KByt e boundary) .
Phy sMask f i el d, bi t s 12 t hr ough ( MAXPHYADDR- 1) Specifies a mask ( 24
bit s if t he maximum physical address size is 36 bit s, 28 bit s if t he maximum
physical address size is 40 bit s) . The mask det ermines t he range of t he region
being mapped, according t o t he following relat ionships:
Address_Wit hin_Range AND PhysMask = PhysBase AND PhysMask
This value is ext ended by 12 bit s at t he low end t o form t he mask value. For
more informat ion: see Sect ion 11. 11. 3, Example Base and Mask Calcula-
t ions.
Table 11-9. Address Mapping for Fixed-Range MTRRs
Address Range (hexadecimal) MTRR
63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
70000-
7FFFF
60000-
6FFFF
50000-
5FFFF
40000-
4FFFF
30000-
3FFFF
20000-
2FFFF
10000-
1FFFF
00000-
0FFFF
IA32_MTRR_
FIX64K_00000
9C000
9FFFF
98000-
98FFF
94000-
97FFF
90000-
93FFF
8C000-
8FFFF
88000-
8BFFF
84000-
87FFF
80000-
83FFF
IA32_MTRR_
FIX16K_80000
BC000
BFFFF
B8000-
BBFFF
B4000-
B7FFF
B0000-
B3FFF
AC000-
AFFFF
A8000-
ABFFF
A4000-
A7FFF
A0000-
A3FFF
IA32_MTRR_
FIX16K_A0000
C7000
C7FFF
C6000-
C6FFF
C5000-
C5FFF
C4000-
C4FFF
C3000-
C3FFF
C2000-
C2FFF
C1000-
C1FFF
C0000-
C0FFF
IA32_MTRR_
FIX4K_C0000
CF000
CFFFF
CE000-
CEFFF
CD000-
CDFFF
CC000-
CCFFF
CB000-
CBFFF
CA000-
CAFFF
C9000-
C9FFF
C8000-
C8FFF
IA32_MTRR_
FIX4K_C8000
D7000
D7FFF
D6000-
D6FFF
D5000-
D5FFF
D4000-
D4FFF
D3000-
D3FFF
D2000-
D2FFF
D1000-
D1FFF
D0000-
D0FFF
IA32_MTRR_
FIX4K_D0000
DF000
DFFFF
DE000-
DEFFF
DD000-
DDFFF
DC000-
DCFFF
DB000-
DBFFF
DA000-
DAFFF
D9000-
D9FFF
D8000-
D8FFF
IA32_MTRR_
FIX4K_D8000
E7000
E7FFF
E6000-
E6FFF
E5000-
E5FFF
E4000-
E4FFF
E3000-
E3FFF
E2000-
E2FFF
E1000-
E1FFF
E0000-
E0FFF
IA32_MTRR_
FIX4K_E0000
EF000
EFFFF
EE000-
EEFFF
ED000-
EDFFF
EC000-
ECFFF
EB000-
EBFFF
EA000-
EAFFF
E9000-
E9FFF
E8000-
E8FFF
IA32_MTRR_
FIX4K_E8000
F7000
F7FFF
F6000-
F6FFF
F5000-
F5FFF
F4000-
F4FFF
F3000-
F3FFF
F2000-
F2FFF
F1000-
F1FFF
F0000-
F0FFF
IA32_MTRR_
FIX4K_F0000
FF000
FFFFF
FE000-
FEFFF
FD000-
FDFFF
FC000-
FCFFF
FB000-
FBFFF
FA000-
FAFFF
F9000-
F9FFF
F8000-
F8FFF
IA32_MTRR_
FIX4K_F8000
11-36 Vol. 3
MEMORY CACHE CONTROL
The widt h of t he PhysMask field depends on t he maximum physical address
size support ed by t he processor.
CPUI D.80000008H report s t he maximum physical address size support ed by
t he processor. I f CPUI D.80000008H is not available, soft ware may assume
t hat t he processor support s a 36- bit physical address size ( t hen PhysMask is
24 bit s wide and t he upper 28 bit s of I A32_MTRR_PHYSMASKn are reserved) .
See t he Not e below.
V ( val i d) f l ag, bi t 11 Enables t he regist er pair when set ; disables regist er
pair when clear.
All ot her bit s in t he I A32_MTRR_PHYSBASEn and I A32_MTRR_PHYSMASKn regist ers
are reserved; t he processor generat es a general- prot ect ion except ion ( # GP) if soft -
ware at t empt s t o writ e t o t hem.
Some mask values can result in ranges t hat are not cont inuous. I n such ranges, t he
area not mapped by t he mask value is set t o t he default memory t ype. I nt el does not
encourage t he use of discont inuous ranges because t hey could require physical
memory t o be present t hroughout t he ent ire 4- GByt e physical memory map. I f
memory is not provided, t he behaviour is undefined.
Figure 11-7. IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn Variable-Range
Register Pair
V Valid
PhysMask Sets range mask
IA32_MTRR_PHYSMASKn Register
63 0
Reserved
10 11 12
V
Reserved
MAXPHYADDR
PhysMask
Type Memory type for range
PhysBase Base address of range
IA32_MTRR_PHYSBASEn Register
63 0
Reserved
11 12
Type
MAXPHYADDR
PhysBase
7 8
Reserved
MAXPHYADDR: The bit position indicated by MAXPHYADDR depends on the maximum
physical address range supported by the processor. It is reported by CPUID leaf
function 80000008H. If CPUID does not support leaf 80000008H, the processor
supports 36-bit physical address size, then bit PhysMask consists of bits 35:12, and
bits 63:36 are reserved.
Vol. 3 11-37
MEMORY CACHE CONTROL
NOTE
I t is possible for soft ware t o parse t he memory descript ions t hat
BI OS provides by using t he ACPI / I NT15 e820 int erface mechanism.
This informat ion t hen can be used t o det ermine how MTRRs are
init ialized ( for example: allowing t he BI OS t o define valid memory
ranges and t he maximum memory range support ed by t he plat form,
including t he processor) .
See Sect ion 11. 11. 4. 1, MTRR Precedences, for informat ion on overlapping variable
MTRR ranges.
11.11.2.4 System-Management Range Register Interface
I f I A32_MTRRCAP[ bit 11] is set , t he processor support s t he SMRR int erface t o
rest rict access t o a specified memory address range used by syst em- management
mode ( SMM) soft ware ( see Sect ion 26. 4. 2.1) . I f t he SMRR int erface is support ed,
SMM soft ware is st rongly encouraged t o use it t o prot ect t he SMI code and dat a
st ored by SMI handler in t he SMRAM region.
The syst em- management range regist ers consist of a pair of MSRs ( see Figure 11- 8) .
The I A32_SMRR_PHYSBASE MSR defines t he base address for t he SMRAM memory
range and t he memory t ype used t o access it in SMM. The I A32_SMRR_PHYSMASK
MSR cont ains a valid bit and a mask t hat det ermines t he SMRAM address range
prot ect ed by t he SMRR int erface. These MSRs may be writ t en only in SMM; an
at t empt t o writ e t hem out side of SMM causes a general- prot ect ion except ion.
1
Figure 11- 8 shows flags and fields in t hese regist ers. The funct ions of t hese flags and
fields are t he following:
Ty pe f i el d, bi t s 0 t hr ough 7 Specifies t he memory t ype for t he range ( see
Table 11- 8 for t he encoding of t his field) .
PhysBase f i el d, bi t s 12 t hr ough 31 Specifies t he base address of t he
address range. The address must be less t han 4 GByt es and is aut omat ically
aligned on a 4- KByt e boundary.
PhysMask f i el d, bi t s 12 t hr ough 31 Specifies a mask t hat det ermines t he
range of t he region being mapped, according t o t he following relat ionships:
Address_Wit hin_Range AND PhysMask = PhysBase AND PhysMask
This value is ext ended by 12 bit s at t he low end t o form t he mask value. For
more informat ion: see Sect ion 11. 11. 3, Example Base and Mask Calcula-
t ions.
V ( val i d) f l ag, bi t 11 Enables t he regist er pair when set ; disables regist er
pair when clear.
1. For some processor models, these MSRs can be accessed by RDMSR and WRMSR only if the
SMRR interface has been enabled in the IA32_FEATURE_CONTROL MSR. See Appendix B.
11-38 Vol. 3
MEMORY CACHE CONTROL
Before at t empt ing t o access t hese SMRR regist ers, soft ware must t est bit 11 in t he
I A32_MTRRCAP regist er. I f SMRR is not support ed, reads from or writ es t o regist ers
cause general- prot ect ion except ions.
When t he valid flag in t he I A32_SMRR_PHYSMASK MSR is 1, accesses t o t he specified
address range are t reat ed as follows:
I f t he logical processor is in SMM, accesses uses t he memory t ype in t he
I A32_SMRR_PHYSBASE MSR.
I f t he logical processor is not in SMM, writ e accesses are ignored and read
accesses ret urn a fixed value for each byt e. The uncacheable memory t ype ( UC)
is used in t his case.
The above it ems apply even if t he address range specified overlaps wit h a range
specified by t he MTRRs.
11.11.3 Example Base and Mask Calculations
The examples in t his sect ion apply t o processors t hat support a maximum physical
address size of 36 bit s. The base and mask values ent ered in variable- range MTRR
pairs are 24- bit values t hat t he processor ext ends t o 36- bit s.
For example, t o ent er a base address of 2 MByt es ( 200000H) in t he
I A32_MTRR_PHYSBASE3 regist er, t he 12 least - significant bit s are t runcat ed and t he
value 000200H is ent ered in t he PhysBase field. The same operat ion must be
performed on mask values. For example, t o map t he address range from 200000H t o
Figure 11-8. IA32_SMRR_PHYSBASE and IA32_SMRR_PHYSMASK SMRR Pair
V Valid
PhysMask Sets range mask
IA32_SMRR_PHYSMASK Register
63 0
Reserved
10 11 12
V
Reserved
31
PhysMask
Type Memory type for range
PhysBase Base address of range
IA32_SMRR_PHYSBASE Register
63 0
Reserved
11 12
Type
31
PhysBase
7 8
Reserved
Vol. 3 11-39
MEMORY CACHE CONTROL
3FFFFFH ( 2 MByt es t o 4 MByt es) , a mask value of FFFE00000H is required. Again, t he
12 least - significant bit s of t his mask value are t runcat ed, so t hat t he value ent ered in
t he PhysMask field of I A32_MTRR_PHYSMASK3 is FFFE00H. This mask is chosen so
t hat when any address in t he 200000H t o 3FFFFFH range is ANDd wit h t he mask
value, it will ret urn t he same value as when t he base address is ANDd wit h t he mask
value ( which is 200000H) .
To map t he address range from 400000H t o 7FFFFFH ( 4 MByt es t o 8 MByt es) , a base
value of 000400H is ent ered in t he PhysBase field and a mask value of FFFC00H is
ent ered in t he PhysMask field.
Example 11-2. Setting-Up Memory for a System
Here is an example of set t ing up t he MTRRs for an syst em. Assume t hat t he syst em
has t he following charact erist ics:
96 MByt es of syst em memory is mapped as writ e- back memory ( WB) for highest
syst em performance.
A cust om 4- MByt e I / O card is mapped t o uncached memory ( UC) at a base
address of 64 MByt es. This rest rict ion forces t he 96 MByt es of syst em memory t o
be addressed from 0 t o 64 MByt es and from 68 MByt es t o 100 MByt es, leaving a
4- MByt e hole for t he I / O card.
An 8- MByt e graphics card is mapped t o writ e- combining memory ( WC) beginning
at address A0000000H.
The BI OS area from 15 MByt es t o 16 MByt es is mapped t o UC memory.
The following set t ings for t he MTRRs will yield t he proper mapping of t he physical
address space for t his syst em configurat ion.
IA32_MTRR_PHYSBASE0 = 0000 0000 0000 0006H
IA32_MTRR_PHYSMASK0 = 0000 000F FC00 0800H
Caches 0-64 MByte as WB cache type.
IA32_MTRR_PHYSBASE1 = 0000 0000 0400 0006H
IA32_MTRR_PHYSMASK1 = 0000 000F FE00 0800H
Caches 64-96 MByte as WB cache type.
IA32_MTRR_PHYSBASE2 = 0000 0000 0600 0006H
IA32_MTRR_PHYSMASK2 = 0000 000F FFC0 0800H
Caches 96-100 MByte as WB cache type.
IA32_MTRR_PHYSBASE3 = 0000 0000 0400 0000H
IA32_MTRR_PHYSMASK3 = 0000 000F FFC0 0800H
Caches 64-68 MByte as UC cache type.
IA32_MTRR_PHYSBASE4 = 0000 0000 00F0 0000H
IA32_MTRR_PHYSMASK4 = 0000 000F FFF0 0800H
Caches 15-16 MByte as UC cache type.
11-40 Vol. 3
MEMORY CACHE CONTROL
IA32_MTRR_PHYSBASE5 = 0000 0000 A000 0001H
IA32_MTRR_PHYSMASK5 = 0000 000F FF80 0800H
Caches A0000000-A0800000 as WC type.
This MTRR set up uses t he abilit y t o overlap any t wo memory ranges ( as long as t he
ranges are mapped t o WB and UC memory t ypes) t o minimize t he number of MTRR
regist ers t hat are required t o configure t he memory environment . This set up also
fulfills t he requirement t hat t wo regist er pairs are left for operat ing syst em usage.
11.11.3.1 Base and Mask Calculations for Greater-Than 36-bit Physical
Address Support
For I nt el 64 and I A- 32 processors t hat support great er t han 36 bit s of physical
address size, soft ware should query CPUI D.80000008H t o det ermine t he maximum
physical address. See t he example.
Example 11-3. Setting-Up Memory for a System with a 40-Bit Address Size
I f a processor support s 40- bit s of physical address size, t hen t he PhysMask field ( in
I A32_MTRR_PHYSMASKn regist ers) is 28 bit s inst ead of 24 bit s. For t his sit uat ion,
Example 11- 2 should be modified as follows:
IA32_MTRR_PHYSBASE0 = 0000 0000 0000 0006H
IA32_MTRR_PHYSMASK0 = 0000 00FF FC00 0800H
Caches 0-64 MByte as WB cache type.
IA32_MTRR_PHYSBASE1 = 0000 0000 0400 0006H
IA32_MTRR_PHYSMASK1 = 0000 00FF FE00 0800H
Caches 64-96 MByte as WB cache type.
IA32_MTRR_PHYSBASE2 = 0000 0000 0600 0006H
IA32_MTRR_PHYSMASK2 = 0000 00FF FFC0 0800H
Caches 96-100 MByte as WB cache type.
IA32_MTRR_PHYSBASE3 = 0000 0000 0400 0000H
IA32_MTRR_PHYSMASK3 = 0000 00FF FFC0 0800H
Caches 64-68 MByte as UC cache type.
IA32_MTRR_PHYSBASE4 = 0000 0000 00F0 0000H
IA32_MTRR_PHYSMASK4 = 0000 00FF FFF0 0800H
Caches 15-16 MByte as UC cache type.
IA32_MTRR_PHYSBASE5 = 0000 0000 A000 0001H
IA32_MTRR_PHYSMASK5 = 0000 00FF FF80 0800H
Caches A0000000-A0800000 as WC type.
Vol. 3 11-41
MEMORY CACHE CONTROL
11.11.4 Range Size and Alignment Requirement
A range t hat is t o be mapped t o a variable- range MTRR must meet t he following
power of 2 size and alignment rules:
1. The minimum range size is 4 KByt es and t he base address of t he range must be
on at least a 4- KByt e boundary.
2. For ranges great er t han 4 KByt es, each range must be of lengt h 2
n
and it s base
address must be aligned on a 2
n
boundary, where n is a value equal t o or great er
t han 12. The base- address alignment value cannot be less t han it s lengt h. For
example, an 8- KByt e range cannot be aligned on a 4- KByt e boundary. I t must be
aligned on at least an 8- KByt e boundary.
11.11.4.1 MTRR Precedences
I f t he MTRRs are not enabled ( by set t ing t he E flag in t he I A32_MTRR_DEF_TYPE
MSR) , t hen all memory accesses are of t he UC memory t ype. I f t he MTRRs are
enabled, t hen t he memory t ype used for a memory access is det ermined as follows:
1. I f t he physical address falls wit hin t he first 1 MByt e of physical memory and
fixed MTRRs are enabled, t he processor uses t he memory t ype st ored for t he
appropriat e fixed- range MTRR.
2. Ot herwise, t he processor at t empt s t o mat ch t he physical address wit h a memory
t ype set by t he variable- range MTRRs:
I f one variable memory range mat ches, t he processor uses t he memory t ype
st ored in t he I A32_MTRR_PHYSBASEn regist er for t hat range.
I f t wo or more variable memory ranges mat ch and t he memory t ypes are
ident ical, t hen t hat memory t ype is used.
I f t wo or more variable memory ranges mat ch and one of t he memory t ypes
is UC, t he UC memory t ype used.
I f t wo or more variable memory ranges mat ch and t he memory t ypes are WT
and WB, t he WT memory t ype is used.
For overlaps not defined by t he above rules, processor behavior is undefined.
3. I f no fixed or variable memory range mat ches, t he processor uses t he default
memory t ype.
11.11.5 MTRR Initialization
On a hardware reset , t he P6 and more recent processors clear t he valid flags in vari-
able- range MTRRs and clear t he E flag in t he I A32_MTRR_DEF_TYPE MSR t o disable
all MTRRs. All ot her bit s in t he MTRRs are undefined.
Prior t o init ializing t he MTRRs, soft ware ( normally t he syst em BI OS) must init ialize all
fixed- range and variable- range MTRR regist er fields t o 0. Soft ware can t hen init ialize
11-42 Vol. 3
MEMORY CACHE CONTROL
t he MTRRs according t o known t ypes of memory, including memory on devices t hat it
aut o- configures. I nit ializat ion is expect ed t o occur prior t o boot ing t he operat ing
syst em.
See Sect ion 11. 11. 8, MTRR Considerat ions in MP Syst ems, for informat ion on
init ializing MTRRs in MP ( mult iple- processor) syst ems.
11.11.6 Remapping Memory Types
A syst em designer may re- map memory t ypes t o t une performance or because a
fut ure processor may not implement all memory t ypes support ed by t he Pent ium 4,
I nt el Xeon, and P6 family processors. The following rules support coherent memory-
t ype re- mappings:
1. A memory t ype should not be mapped int o anot her memory t ype t hat has a
weaker memory ordering model. For example, t he uncacheable t ype cannot be
mapped int o any ot her t ype, and t he writ e- back, writ e- t hrough, and writ e-
prot ect ed t ypes cannot be mapped int o t he weakly ordered writ e- combining
t ype.
2. A memory t ype t hat does not delay writ es should not be mapped int o a memory
t ype t hat does delay writ es, because applicat ions of such a memory t ype may
rely on it s writ e- t hrough behavior. Accordingly, t he writ e- back t ype cannot be
mapped int o t he writ e- t hrough t ype.
3. A memory t ype t hat views writ e dat a as not necessarily st ored and read back by
a subsequent read, such as t he writ e- prot ect ed t ype, can only be mapped t o
anot her t ype wit h t he same behaviour ( and t here are no ot hers for t he
Pent ium 4, I nt el Xeon, and P6 family processors) or t o t he uncacheable t ype.
I n many specific cases, a syst em designer can have addit ional informat ion about how
a memory t ype is used, allowing addit ional mappings. For example, writ e- t hrough
memory wit h no associat ed writ e side effect s can be mapped int o writ e- back
memory.
11.11.7 MTRR Maintenance Programming Interface
The operat ing syst em maint ains t he MTRRs aft er boot ing and set s up or changes t he
memory t ypes for memory- mapped devices. The operat ing syst em should provide a
driver and applicat ion programming int erface ( API ) t o access and set t he MTRRs. The
funct ion calls MemTypeGet ( ) and MemTypeSet ( ) define t his int erface.
11.11.7.1 MemTypeGet() Function
The MemTypeGet ( ) funct ion ret urns t he memory t ype of t he physical memory range
specified by t he paramet ers base and size. The base address is t he st art ing physical
address and t he size is t he number of byt es for t he memory range. The funct ion
Vol. 3 11-43
MEMORY CACHE CONTROL
aut omat ically aligns t he base address and size t o 4- KByt e boundaries. Pseudocode
for t he MemTypeGet ( ) funct ion is given in Example 11- 4.
Example 11-4. MemTypeGet() Pseudocode
#define MIXED_TYPES -1 /* 0 < MIXED_TYPES || MIXED_TYPES > 256 */
IF CPU_FEATURES.MTRR /* processor supports MTRRs */
THEN
Align BASE and SIZE to 4-KByte boundary;
IF (BASE + SIZE) wrap 4-GByte address space
THEN return INVALID;
FI;
IF MTRRdefType.E = 0
THEN return UC;
FI;
FirstType Get4KMemType (BASE);
/* Obtains memory type for first 4-KByte range. */
/* See Get4KMemType (4KByteRange) in Example 11-5. */
FOR each additional 4-KByte range specified in SIZE
NextType Get4KMemType (4KByteRange);
IF NextType FirstType
THEN return MixedTypes;
FI;
ROF;
return FirstType;
ELSE return UNSUPPORTED;
FI;
I f t he processor does not support MTRRs, t he funct ion ret urns UNSUPPORTED. I f t he
MTRRs are not enabled, t hen t he UC memory t ype is ret urned. I f more t han one
memory t ype corresponds t o t he specified range, a st at us of MI XED_TYPES is
ret urned. Ot herwise, t he memory t ype defined for t he range ( UC, WC, WT, WB, or
WP) is ret urned.
The pseudocode for t he Get 4KMemType( ) funct ion in Example 11- 5 obt ains t he
memory t ype for a single 4- KByt e range at a given physical address. The sample
code det ermines whet her an PHY_ADDRESS falls wit hin a fixed range by comparing
t he address wit h t he known fixed ranges: 0 t o 7FFFFH ( 64- KByt e regions) , 80000H t o
BFFFFH ( 16- KByt e regions) , and C0000H t o FFFFFH ( 4- KByt e regions) . I f an address
falls wit hin one of t hese ranges, t he appropriat e bit s wit hin one of it s MTRRs det er-
mine t he memory t ype.
11-44 Vol. 3
MEMORY CACHE CONTROL
Example 11-5. Get4KMemType() Pseudocode
IF IA32_MTRRCAP.FIX AND MTRRdefType.FE /* fixed registers enabled */
THEN IF PHY_ADDRESS is within a fixed range
return IA32_MTRR_FIX.Type;
FI;
FOR each variable-range MTRR in IA32_MTRRCAP.VCNT
IF IA32_MTRR_PHYSMASK.V = 0
THEN continue;
FI;
IF (PHY_ADDRESS AND IA32_MTRR_PHYSMASK.Mask) =
(IA32_MTRR_PHYSBASE.Base
AND IA32_MTRR_PHYSMASK.Mask)
THEN
return IA32_MTRR_PHYSBASE.Type;
FI;
ROF;
return MTRRdefType.Type;
11.11.7.2 MemTypeSet() Function
The MemTypeSet ( ) funct ion in Example 11- 6 set s a MTRR for t he physical memory
range specified by t he paramet ers base and size t o t he t ype specified by t ype. The
base address and size are mult iples of 4 KByt es and t he size is not 0.
Example 11-6. MemTypeSet Pseudocode
IF CPU_FEATURES.MTRR (* processor supports MTRRs *)
THEN
IF BASE and SIZE are not 4-KByte aligned or size is 0
THEN return INVALID;
FI;
IF (BASE + SIZE) wrap 4-GByte address space
THEN return INVALID;
FI;
IF TYPE is invalid for Pentium 4, Intel Xeon, and P6 family
processors
THEN return UNSUPPORTED;
FI;
IF TYPE is WC and not supported
THEN return UNSUPPORTED;
FI;
IF IA32_MTRRCAP.FIX is set AND range can be mapped using a
fixed-range MTRR
Vol. 3 11-45
MEMORY CACHE CONTROL
THEN
pre_mtrr_change();
update affected MTRR;
post_mtrr_change();
FI;
ELSE (* try to map using a variable MTRR pair *)
IF IA32_MTRRCAP.VCNT = 0
THEN return UNSUPPORTED;
FI;
IF conflicts with current variable ranges
THEN return RANGE_OVERLAP;
FI;
IF no MTRRs available
THEN return VAR_NOT_AVAILABLE;
FI;
IF BASE and SIZE do not meet the power of 2 requirements for
variable MTRRs
THEN return INVALID_VAR_REQUEST;
FI;
pre_mtrr_change();
Update affected MTRRs;
post_mtrr_change();
FI;
pre_mtrr_change()
BEGIN
disable interrupts;
Save current value of CR4;
disable and flush caches;
flush TLBs;
disable MTRRs;
IF multiprocessing
THEN maintain consistency through IPIs;
FI;
END
post_mtrr_change()
BEGIN
flush caches and TLBs;
enable MTRRs;
enable caches;
restore value of CR4;
enable interrupts;
11-46 Vol. 3
MEMORY CACHE CONTROL
END
The physical address t o variable range mapping algorit hm in t he MemTypeSet func-
t ion det ect s conflict s wit h current variable range regist ers by cycling t hrough t hem
and det ermining whet her t he physical address in quest ion mat ches any of t he current
ranges. During t his scan, t he algorit hm can det ect whet her any current variable
ranges overlap and can be concat enat ed int o a single range.
The pre_mt rr_change( ) funct ion disables int errupt s prior t o changing t he MTRRs, t o
avoid execut ing code wit h a part ially valid MTRR set up. The algorit hm disables
caching by set t ing t he CD flag and clearing t he NW flag in cont rol regist er CR0. The
caches are invalidat ed using t he WBI NVD inst ruct ion. The algorit hm flushes all TLB
ent ries eit her by clearing t he page- global enable ( PGE) flag in cont rol regist er CR4 ( if
PGE was already set ) or by updat ing cont rol regist er CR3 ( if PGE was already clear) .
Finally, it disables MTRRs by clearing t he E flag in t he I A32_MTRR_DEF_TYPE MSR.
Aft er t he memory t ype is updat ed, t he post _mt rr_change( ) funct ion re- enables t he
MTRRs and again invalidat es t he caches and TLBs. This second invalidat ion is
required because of t he processor' s aggressive prefet ch of bot h inst ruct ions and
dat a. The algorit hm rest ores int errupt s and re- enables caching by set t ing t he CD
flag.
An operat ing syst em can bat ch mult iple MTRR updat es so t hat only a single pair of
cache invalidat ions occur.
11.11.8 MTRR Considerations in MP Systems
I n MP ( mult iple- processor) syst ems, t he operat ing syst ems must maint ain MTRR
consist ency bet ween all t he processors in t he syst em. The Pent ium 4, I nt el Xeon, and
P6 family processors provide no hardware support t o maint ain t his consist ency. I n
general, all processors must have t he same MTRR values.
This requirement implies t hat when t he operat ing syst em init ializes an MP syst em, it
must load t he MTRRs of t he boot processor while t he E flag in regist er MTRRdefType
is 0. The operat ing syst em t hen direct s ot her processors t o load t heir MTRRs wit h t he
same memory map. Aft er all t he processors have loaded t heir MTRRs, t he operat ing
syst em signals t hem t o enable t heir MTRRs. Barrier synchronizat ion is used t o
prevent furt her memory accesses unt il all processors indicat e t hat t he MTRRs are
enabled. This synchronizat ion is likely t o be a shoot - down st yle algorit hm, wit h
shared variables and int erprocessor int errupt s.
Any change t o t he value of t he MTRRs in an MP syst em requires t he operat ing syst em
t o repeat t he loading and enabling process t o maint ain consist ency, using t he
following procedure:
1. Broadcast t o all processors t o execut e t he following code sequence.
2. Disable int errupt s.
3. Wait for all processors t o reach t his point .
Vol. 3 11-47
MEMORY CACHE CONTROL
4. Ent er t he no- fill cache mode. ( Set t he CD flag in cont rol regist er CR0 t o 1 and t he
NW flag t o 0. )
5. Flush all caches using t he WBI NVD inst ruct ions. Not e on a processor t hat
support s self- snooping, CPUI D feat ure flag bit 27, t his st ep is unnecessary.
6. I f t he PGE flag is set in cont rol regist er CR4, flush all TLBs by clearing t hat flag.
7. I f t he PGE flag is clear in cont rol regist er CR4, flush all TLBs by execut ing a MOV
from cont rol regist er CR3 t o anot her regist er and t hen a MOV from t hat regist er
back t o CR3.
8. Disable all range regist ers ( by clearing t he E flag in regist er MTRRdefType) . I f
only variable ranges are being modified, soft ware may clear t he valid bit s for t he
affect ed regist er pairs inst ead.
9. Updat e t he MTRRs.
10. Enable all range regist ers ( by set t ing t he E flag in regist er MTRRdefType) . I f only
variable- range regist ers were modified and t heir individual valid bit s were
cleared, t hen set t he valid bit s for t he affect ed ranges inst ead.
11. Flush all caches and all TLBs a second t ime. ( The TLB flush is required for
Pent ium 4, I nt el Xeon, and P6 family processors. Execut ing t he WBI NVD
inst ruct ion is not needed when using Pent ium 4, I nt el Xeon, and P6 family
processors, but it may be needed in fut ure syst ems. )
12. Ent er t he normal cache mode t o re- enable caching. ( Set t he CD and NW flags in
cont rol regist er CR0 t o 0. )
13. Set PGE flag in cont rol regist er CR4, if cleared in St ep 6 ( above) .
14. Wait for all processors t o reach t his point .
15. Enable int errupt s.
11.11.9 Large Page Size Considerations
The MTRRs provide memory t yping for a limit ed number of regions t hat have a
4 KByt e granularit y ( t he same granularit y as 4- KByt e pages) . The memory t ype for a
given page is cached in t he processor s TLBs. When using large pages ( 2 MByt es,
4 MByt es, or 1 GByt es) , a single page- t able ent ry covers mult iple 4- KByt e granules,
each wit h a single memory t ype. Because t he memory t ype for a large page is cached
in t he TLB, t he processor can behave in an undefined manner if a large page is
mapped t o a region of memory t hat MTRRs have mapped wit h mult iple memory
t ypes.
Undefined behavior can be avoided by insuring t hat all MTRR memory- t ype ranges
wit hin a large page are of t he same t ype. I f a large page maps t o a region of memory
cont aining different MTRR- defined memory t ypes, t he PCD and PWT flags in t he
page- t able ent ry should be set for t he most conservat ive memory t ype for t hat
range. For example, a large page used for memory mapped I / O and regular memory
11-48 Vol. 3
MEMORY CACHE CONTROL
is mapped as UC memory. Alt ernat ively, t he operat ing syst em can map t he region
using mult iple 4- KByt e pages each wit h it s own memory t ype.
The requirement t hat all 4- KByt e ranges in a large page are of t he same memory
t ype implies t hat large pages wit h different memory t ypes may suffer a performance
penalt y, since t hey must be marked wit h t he lowest common denominat or memory
t ype. The same considerat ion apply t o 1 GByt e pages, each of which may consist of
mult iple 2- Mbyt e ranges.
The Pent ium 4, I nt el Xeon, and P6 family processors provide special support for t he
physical memory range from 0 t o 4 MByt es, which is pot ent ially mapped by bot h t he
fixed and variable MTRRs. This support is invoked when a Pent ium 4, I nt el Xeon, or
P6 family processor det ect s a large page overlapping t he first 1 MByt e of t his
memory range wit h a memory t ype t hat conflict s wit h t he fixed MTRRs. Here, t he
processor maps t he memory range as mult iple 4- KByt e pages wit hin t he TLB. This
operat ion insures correct behavior at t he cost of performance. To avoid t his perfor-
mance penalt y, operat ing- syst em soft ware should reserve t he large page opt ion for
regions of memory at addresses great er t han or equal t o 4 MByt es.
11.12 PAGE ATTRIBUTE TABLE (PAT)
The Page At t ribut e Table ( PAT) ext ends t he I A- 32 archit ect ures page- t able format t o
allow memory t ypes t o be assigned t o regions of physical memory based on linear
address mappings. The PAT is a companion feat ure t o t he MTRRs; t hat is, t he MTRRs
allow mapping of memory t ypes t o regions of t he physical address space, where t he
PAT allows mapping of memory t ypes t o pages wit hin t he linear address space. The
MTRRs are useful for st at ically describing memory t ypes for physical ranges, and are
t ypically set up by t he syst em BI OS. The PAT ext ends t he funct ions of t he PCD and
PWT bit s in page t ables t o allow all five of t he memory t ypes t hat can be assigned
wit h t he MTRRs ( plus one addit ional memory t ype) t o also be assigned dynamically
t o pages of t he linear address space.
The PAT was int roduced t o I A- 32 archit ect ure on t he Pent ium III processor. I t is also
available in t he Pent ium 4 and I nt el Xeon processors.
11.12.1 Detecting Support for the PAT Feature
An operat ing syst em or execut ive can det ect t he availabilit y of t he PAT by execut ing
t he CPUI D inst ruct ion wit h a value of 1 in t he EAX regist er. Support for t he PAT is indi-
cat ed by t he PAT flag ( bit 16 of t he values ret urned t o EDX regist er) . I f t he PAT is
support ed, t he operat ing syst em or execut ive can use t he I A32_PAT MSR t o program
t he PAT. When memory t ypes have been assigned t o ent ries in t he PAT, soft ware can
t hen use of t he PAT- index bit ( PAT) in t he page- t able and page- direct ory ent ries
along wit h t he PCD and PWT bit s t o assign memory t ypes from t he PAT t o individual
pages.
Vol. 3 11-49
MEMORY CACHE CONTROL
Not e t hat t here is no separat e flag or cont rol bit in any of t he cont rol regist ers t hat
enables t he PAT. The PAT is always enabled on all processors t hat support it , and t he
t able lookup always occurs whenever paging is enabled, in all paging modes.
11.12.2 IA32_PAT MSR
The I A32_PAT MSR is locat ed at MSR address 277H ( see t o Appendix B, Model-
Specific Regist ers ( MSRs) , and t his address will remain at t he same address on
fut ure I A- 32 processors t hat support t he PAT feat ure. Figure 11- 9. shows t he format
of t he 64- bit I A32_PAT MSR.
The I A32_PAT MSR cont ains eight page at t ribut e fields: PA0 t hrough PA7. The t hree
low- order bit s of each field are used t o specify a memory t ype. The five high- order
bit s of each field are reserved, and must be set t o all 0s. Each of t he eight page
at t ribut e fields can cont ain any of t he memory t ype encodings specified in Table
11- 10.
Not e t hat for t he P6 family processors, t he I A32_PAT MSR is named t he PAT MSR.
31 27 26 24 23 19 18 16 15 11 10 8 7 3 2 0
Reserved PA3 Reserved PA2 Reserved PA1 Reserved PA0
63 59 58 56 55 51 50 48 47 43 42 40 39 35 34 32
Reserved PA7 Reserved PA6 Reserved PA5 Reserved PA4
Figure 11-9. IA32_PAT MSR
Table 11-10. Memory Types That Can Be Encoded With PAT
Encoding Mnemonic
00H Uncacheable (UC)
01H Write Combining (WC)
02H Reserved*
03H Reserved*
04H Write Through (WT)
05H Write Protected (WP)
06H Write Back (WB)
07H Uncached (UC-)
08H - FFH Reserved*
11-50 Vol. 3
MEMORY CACHE CONTROL
11.12.3 Selecting a Memory Type from the PAT
To select a memory t ype for a page from t he PAT, a 3- bit index made up of t he PAT,
PCD, and PWT bit s must be encoded in t he page- t able or page- direct ory ent ry for t he
page. Table 11- 11 shows t he possible encodings of t he PAT, PCD, and PWT bit s and
t he PAT ent ry select ed wit h each encoding. The PAT bit is bit 7 in page- t able ent ries
t hat point t o 4- KByt e pages and bit 12 in paging- st ruct ure ent ries t hat point t o larger
pages. The PCD and PWT bit s are bit s 4 and 3, respect ively, in paging- st ruct ure
ent ries t hat point t o pages of any size.
The PAT ent ry select ed for a page is used in conj unct ion wit h t he MTRR set t ing for t he
region of physical memory in which t he page is mapped t o det ermine t he effect ive
memory t ype for t he page, as shown in Table 11- 7.
11.12.4 Programming the PAT
Table 11- 12 shows t he default set t ing for each PAT ent ry following a power up or
reset of t he processor. The set t ing remain unchanged following a soft reset ( I NI T
reset ) .
NOTE:
* Using these encodings will result in a general-protection exception (#GP).
Table 11-11. Selection of PAT Entries with PAT, PCD, and PWT Flags
PAT PCD PWT PAT Entry
0 0 0 PAT0
0 0 1 PAT1
0 1 0 PAT2
0 1 1 PAT3
1 0 0 PAT4
1 0 1 PAT5
1 1 0 PAT6
1 1 1 PAT7
Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset
PAT Entry Memory Type Following Power-up or Reset
PAT0 WB
PAT1 WT
PAT2 UC-
PAT3 UC
Table 11-10. Memory Types That Can Be Encoded With PAT
Vol. 3 11-51
MEMORY CACHE CONTROL
The values in all t he ent ries of t he PAT can be changed by writ ing t o t he I A32_PAT
MSR using t he WRMSR inst ruct ion. The I A32_PAT MSR is read and writ e accessible
( use of t he RDMSR and WRMSR inst ruct ions, respect ively) t o soft ware operat ing at a
CPL of 0. Table 11- 10 shows t he allowable encoding of t he ent ries in t he PAT.
At t empt ing t o writ e an undefined memory t ype encoding int o t he PAT causes a
general- prot ect ion ( # GP) except ion t o be generat ed.
The operat ing syst em is responsible for insuring t hat changes t o a PAT ent ry occur in
a manner t hat maint ains t he consist ency of t he processor caches and t ranslat ion
lookaside buffers ( TLB) . This is accomplished by following t he procedure as specified
in Sect ion 11. 11. 8, MTRR Considerat ions in MP Syst ems, for changing t he value of
an MTRR in a mult iple processor syst em. I t requires a specific sequence of operat ions
t hat includes flushing t he processors caches and TLBs.
The PAT allows any memory t ype t o be specified in t he page t ables, and t herefore it
is possible t o have a single physical page mapped t o t wo or more different linear
addresses, each wit h different memory t ypes. I nt el does not support t his pract ice
because it may lead t o undefined operat ions t hat can result in a syst em failure. I n
part icular, a WC page must never be aliased t o a cacheable page because WC writ es
may not check t he processor caches.
When remapping a page t hat was previously mapped as a cacheable memory t ype t o
a WC page, an operat ing syst em can avoid t his t ype of aliasing by doing t he
following:
1. Remove t he previous mapping t o a cacheable memory t ype in t he page t ables;
t hat is, make t hem not present .
2. Flush t he TLBs of processors t hat may have used t he mapping, even specula-
t ively.
3. Creat e a new mapping t o t he same physical address wit h a new memory t ype, for
inst ance, WC.
4. Flush t he caches on all processors t hat may have used t he mapping previously.
Not e on processors t hat support self- snooping, CPUI D feat ure flag bit 27, t his
st ep is unnecessary.
Operat ing syst ems t hat use a page direct ory as a page t able ( t o map large pages)
and enable page size ext ensions must carefully scrut inize t he use of t he PAT index bit
for t he 4- KByt e page- t able ent ries. The PAT index bit for a page- t able ent ry ( bit 7)
corresponds t o t he page size bit in a page- direct ory ent ry. Therefore, t he operat ing
syst em can only use PAT ent ries PA0 t hrough PA3 when set t ing t he caching t ype for
PAT4 WB
PAT5 WT
PAT6 UC-
PAT7 UC
Table 11-12. Memory Type Setting of PAT Entries Following a Power-up or Reset
11-52 Vol. 3
MEMORY CACHE CONTROL
a page t able t hat is also used as a page direct ory. I f t he operat ing syst em at t empt s
t o use PAT ent ries PA4 t hrough PA7 when using t his memory as a page t able, it effec-
t ively set s t he PS bit for t he access t o t his memory as a page direct ory.
For compat ibilit y wit h earlier I A- 32 processors t hat do not support t he PAT, care
should be t aken in select ing t he encodings for ent ries in t he PAT ( see Sect ion
11. 12. 5, PAT Compat ibilit y wit h Earlier I A- 32 Processors ) .
11.12.5 PAT Compatibility with Earlier IA-32 Processors
For I A- 32 processors t hat support t he PAT, t he I A32_PAT MSR is always act ive. That
is, t he PCD and PWT bit s in page- t able ent ries and in page- direct ory ent ries ( t hat
point t o pages) are always select a memory t ype for a page indirect ly by select ing an
ent ry in t he PAT. They never select t he memory t ype for a page direct ly as t hey do in
earlier I A- 32 processors t hat do not implement t he PAT ( see Table 11- 6) .
To allow compat ibilit y for code writ t en t o run on earlier I A- 32 processor t hat do not
support t he PAT, t he PAT mechanism has been designed t o allow backward compat i-
bilit y t o earlier processors. This compat ibilit y is provided t hrough t he ordering of t he
PAT, PCD, and PWT bit s in t he 3- bit PAT ent ry index. For processors t hat do not imple-
ment t he PAT, t he PAT index bit ( bit 7 in t he page- t able ent ries and bit 12 in t he page-
direct ory ent ries) is reserved and set t o 0. Wit h t he PAT bit reserved, only t he first
four ent ries of t he PAT can be select ed wit h t he PCD and PWT bit s. At power- up or
reset ( see Table 11- 12) , t hese first four ent ries are encoded t o select t he same
memory t ypes as t he PCD and PWT bit s would normally select direct ly in an I A- 32
processor t hat does not implement t he PAT. So, if encodings of t he first four ent ries
in t he PAT are left unchanged following a power- up or reset , code writ t en t o run on
earlier I A- 32 processors t hat do not implement t he PAT will run correct ly on I A- 32
processors t hat do implement t he PAT.
Vol. 3 12-1
CHAPTER 12
INTEL

MMX

TECHNOLOGY SYSTEM
PROGRAMMING
This chapt er describes t hose feat ures of t he I nt el

MMX t echnology t hat must be


considered when designing or enhancing an operat ing syst em t o support MMX t ech-
nology. I t covers MMX inst ruct ion set emulat ion, t he MMX st at e, aliasing of MMX
regist ers, saving MMX st at e, t ask and cont ext swit ching considerat ions, except ion
handling, and debugging.
12.1 EMULATION OF THE MMX INSTRUCTION SET
The I A- 32 or I nt el 64 archit ect ure does not support emulat ion of t he MMX inst ruc-
t ions, as it does for x87 FPU inst ruct ions. The EM flag in cont rol regist er CR0
( provided t o invoke emulat ion of x87 FPU inst ruct ions) cannot be used for MMX
inst ruct ion emulat ion. I f an MMX inst ruct ion is execut ed when t he EM flag is set , an
invalid opcode except ion ( UD# ) is generat ed. Table 12- 1 shows t he int eract ion of t he
EM, MP, and TS flags in cont rol regist er CR0 when execut ing MMX inst ruct ions.
12.2 THE MMX STATE AND MMX REGISTER ALIASING
The MMX st at e consist s of eight 64- bit regist ers ( MM0 t hrough MM7) . These regist ers
are aliased t o t he low 64- bit s ( bit s 0 t hrough 63) of float ing- point regist ers R0
t hrough R7 ( see Figure 12- 1) . Not e t hat t he MMX regist ers are mapped t o t he phys-
ical locat ions of t he float ing- point regist ers ( R0 t hrough R7) , not t o t he relat ive loca-
t ions of t he regist ers in t he float ing- point regist er st ack ( ST0 t hrough ST7) . As a
Table 12-1. Action Taken By MMX Instructions
for Different Combinations of EM, MP and TS
CR0 Flags
EM MP* TS Action
0 1 0 Execute.
0 1 1 #NM exception.
1 1 0 #UD exception.
1 1 1 #UD exception.
NOTE:
* For processors that support the MMX instructions, the MP flag should be set.
12-2 Vol. 3
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


result , t he MMX regist er mapping is fixed and is not affect ed by value in t he Top Of
St ack ( TOS) field in t he float ing- point st at us word ( bit s 11 t hrough 13) .
When a value is writ t en int o an MMX regist er using an MMX inst ruct ion, t he value also
appears in t he corresponding float ing- point regist er in bit s 0 t hrough 63. Likewise,
when a float ing- point value writ t en int o a float ing- point regist er by a x87 FPU, t he
low 64 bit s of t hat value also appears in a t he corresponding MMX regist er.
The execut ion of MMX inst ruct ions have several side effect s on t he x87 FPU st at e
cont ained in t he float ing- point regist ers, t he x87 FPU t ag word, and t he x87 FPU
st at us word. These side effect s are as follows:
When an MMX inst ruct ion writ es a value int o an MMX regist er, at t he same t ime,
bit s 64 t hrough 79 of t he corresponding float ing- point regist er are set t o all 1s.
When an MMX inst ruct ion ( ot her t han t he EMMS inst ruct ion) is execut ed, each of
t he t ag fields in t he x87 FPU t ag word is set t o 00B ( valid) . ( See also Sect ion
12. 2. 1, Effect of MMX, x87 FPU, FXSAVE, and FXRSTOR I nst ruct ions on t he x87
FPU Tag Word. )
Figure 12-1. Mapping of MMX Registers to Floating-Point Registers
0 79
R7
R6
R5
R4
R3
R2
R1
R0
Floating-Point Registers
64 63
x87 FPU Status Register
11 13
x87 FPU Tag
MMX Registers
TOS
Register
0
MM7
MM6
MM5
MM4
MM3
MM2
MM1
MM0
63
TOS = 0
00
00
00
00
00
00
00
00
000
Vol. 3 12-3
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


When t he EMMS inst ruct ion is execut ed, each t ag field in t he x87 FPU t ag word is
set t o 11B ( empt y) .
Each t ime an MMX inst ruct ion is execut ed, t he TOS value is set t o 000B.
Execut ion of MMX inst ruct ions does not affect t he ot her bit s in t he x87 FPU st at us
word ( bit s 0 t hrough 10 and bit s 14 and 15) or t he cont ent s of t he ot her x87 FPU
regist ers t hat comprise t he x87 FPU st at e ( t he x87 FPU cont rol word, inst ruct ion
point er, dat a point er, or opcode regist ers) .
Table 12- 2 summarizes t he effect s of t he MMX inst ruct ions on t he x87 FPU st at e.
12.2.1 Effect of MMX, x87 FPU, FXSAVE, and FXRSTOR
Instructions on the x87 FPU Tag Word
Table 12- 3 summarizes t he effect of MMX and x87 FPU inst ruct ions and t he FXSAVE
and FXRSTOR inst ruct ions on t he t ags in t he x87 FPU t ag word and t he corresponding
t ags in an image of t he t ag word st ored in memory.
The values in t he fields of t he x87 FPU t ag word do not affect t he cont ent s of t he MMX
regist ers or t he execut ion of MMX inst ruct ions. However, t he MMX inst ruct ions do
modify t he cont ent s of t he x87 FPU t ag word, as is described in Sect ion 12. 2, The
MMX St at e and MMX Regist er Aliasing. These modificat ions may affect t he operat ion
of t he x87 FPU when execut ing x87 FPU inst ruct ions, if t he x87 FPU st at e is not
init ialized or rest ored prior t o beginning x87 FPU inst ruct ion execut ion.
Not e t hat t he FSAVE, FXSAVE, and FSTENV inst ruct ions ( which save x87 FPU st at e
informat ion) read t he x87 FPU t ag regist er and cont ent s of each of t he float ing- point
regist ers, det ermine t he act ual t ag values for each regist er ( empt y, nonzero, zero, or
special) , and st ore t he updat ed t ag word in memory. Aft er execut ing t hese inst ruc-
t ions, all t he t ags in t he x87 FPU t ag word are set t o empt y ( 11B) . Likewise, t he
EMMS inst ruct ion clears MMX st at e from t he MMX/ float ing- point regist ers by set t ing
all t he t ags in t he x87 FPU t ag word t o 11B.
Table 12-2. Effects of MMX Instructions on x87 FPU State
MMX
Instruction
Type
x87 FPU Tag
Word
TOS Field of
x87 FPU
Status
Word
Other x87
FPU Registers
Bits 64
Through 79 of
x87 FPU Data
Registers
Bits 0
Through 63 of
x87 FPU Data
Registers
Read from
MMX register
All tags set
to 00B (Valid)
000B Unchanged Unchanged Unchanged
Write to MMX
register
All tags set
to 00B (Valid)
000B Unchanged Set to all 1s Overwritten
with MMX data
EMMS All fields set
to 11B
(Empty)
000B Unchanged Unchanged Unchanged
12-4 Vol. 3
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


12.3 SAVING AND RESTORING THE MMX STATE AND
REGISTERS
Because t he MMX regist ers are aliased t o t he x87 FPU dat a regist ers, t he MMX st at e
can be saved t o memory and rest ored from memory as follows:
Execut e an FSAVE, FNSAVE, or FXSAVE inst ruct ion t o save t he MMX st at e t o
memory. ( The FXSAVE inst ruct ion also saves t he st at e of t he XMM and MXCSR
regist ers. )
Execut e an FRSTOR or FXRSTOR inst ruct ion t o rest ore t he MMX st at e from
memory. ( The FXRSTOR inst ruct ion also rest ores t he st at e of t he XMM and
MXCSR regist ers. )
The save and rest ore met hods described above are required for operat ing syst ems
( see Sect ion 12. 4, Saving MMX St at e on Task or Cont ext Swit ches ) . Applicat ions
can in some cases save and rest ore only t he MMX regist ers in t he following way:
Table 12-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the
x87 FPU Tag Word
Instruction
Type
Instruction x87 FPU Tag Word Image of x87 FPU Tag Word
Stored in Memory
MMX All (except EMMS) All tags are set to 00B (valid). Not affected.
MMX EMMS All tags are set to 11B
(empty).
Not affected.
x87 FPU All (except FSAVE,
FSTENV, FRSTOR,
FLDENV)
Tag for modified floating-
point register is set to 00B or
11B.
Not affected.
x87 FPU and
FXSAVE
FSAVE, FSTENV,
FXSAVE
Tags and register values are
read and interpreted; then all
tags are set to 11B.
Tags are set according to the
actual values in the floating-
point registers; that is, empty
registers are marked 11B and
valid registers are marked
00B (nonzero), 01B (zero), or
10B (special).
x87 FPU and
FXRSTOR
FRSTOR, FLDENV,
FXRSTOR
All tags marked 11B in
memory are set to 11B; all
other tags are set according
to the value in the
corresponding floating-point
register: 00B (nonzero), 01B
(zero), or 10B (special).
Tags are read and
interpreted, but not modified.
Vol. 3 12-5
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


Execut e eight MOVQ inst ruct ions t o save t he cont ent s of t he MMX0 t hrough
MMX7 regist ers t o memory. An EMMS inst ruct ion may t hen ( opt ionally) be
execut ed t o clear t he MMX st at e in t he x87 FPU.
Execut e eight MOVQ inst ruct ions t o read t he saved cont ent s of MMX regist ers
from memory int o t he MMX0 t hrough MMX7 regist ers.
NOTE
The I A- 32 archit ect ure does not support scanning t he x87 FPU t ag
word and t hen only saving valid ent ries.
12.4 SAVING MMX STATE ON TASK OR CONTEXT
SWITCHES
When swit ching from one t ask or cont ext t o anot her, it is oft en necessary t o save t he
MMX st at e. As a general rule, if t he exist ing t ask swit ching code for an operat ing
syst em includes facilit ies for saving t he st at e of t he x87 FPU, t hese facilit ies can also
be relied upon t o save t he MMX st at e, wit hout rewrit ing t he t ask swit ch code. This
reliance is possible because t he MMX st at e is aliased t o t he x87 FPU st at e ( see
Sect ion 12. 2, The MMX St at e and MMX Regist er Aliasing ) .
Wit h t he int roduct ion of t he FXSAVE and FXRSTOR inst ruct ions and of
SSE/ SSE2/ SSE3/ SSSE3 ext ensions, it is possible ( and more efficient ) t o creat e st at e
saving facilit ies in t he operat ing syst em or execut ive t hat save t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3 st at e in one operat ion. Sect ion 13. 5, Designing
OS Facilit ies for AUTOMATI CALLY Saving x87 FPU, MMX, and
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e on Task or Cont ext Swit ches, describes how t o
design such facilit ies. The t echniques describes in t his sect ion can be adapt ed t o
saving only t he MMX and x87 FPU st at e if needed.
12.5 EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING
MMX INSTRUCTIONS
MMX inst ruct ions do not generat e x87 FPU float ing- point except ions, nor do t hey
affect t he processor s st at us flags in t he EFLAGS regist er or t he x87 FPU st at us word.
The following except ions can be generat ed during t he execut ion of an MMX inst ruc-
t ion:
Except ions during memory accesses:
St ack- segment fault ( # SS) .
General prot ect ion ( # GP) .
Page fault ( # PF) .
Alignment check ( # AC) , if alignment checking is enabled.
12-6 Vol. 3
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


Syst em except ions:
I nvalid Opcode ( # UD) , if t he EM flag in cont rol regist er CR0 is set when an
MMX inst ruct ion is execut ed ( see Sect ion 12. 1, Emulat ion of t he MMX
I nst ruct ion Set ) .
Device not available ( # NM) , if an MMX inst ruct ion is execut ed when t he TS
flag in cont rol regist er CR0 is set . ( See Sect ion 13. 5. 1, Using t he TS Flag t o
Cont rol t he Saving of t he x87 FPU, MMX, SSE, SSE2, SSE3 SSSE3 and SSE4
St at e. )
Float ing- point error ( # MF) . ( See Sect ion 12. 5. 1, Effect of MMX I nst ruct ions on
Pending x87 Float ing- Point Except ions. )
Ot her except ions can occur indirect ly due t o t he fault y execut ion of t he except ion
handlers for t he above except ions.
12.5.1 Effect of MMX Instructions on Pending x87 Floating-Point
Exceptions
I f an x87 FPU float ing- point except ion is pending and t he processor encount ers an
MMX inst ruct ion, t he processor generat es a x87 FPU float ing- point error ( # MF) prior
t o execut ing t he MMX inst ruct ion, t o allow t he pending except ion t o be handled by
t he x87 FPU float ing- point error except ion handler. While t his except ion handler is
execut ing, t he x87 FPU st at e is maint ained and is visible t o t he handler. Upon
ret urning from t he except ion handler, t he MMX inst ruct ion is execut ed, which will
alt er t he x87 FPU st at e, as described in Sect ion 12. 2, The MMX St at e and MMX
Regist er Aliasing.
12.6 DEBUGGING MMX CODE
The debug facilit ies operat e in t he same manner when execut ing MMX inst ruct ions as
when execut ing ot her I A- 32 or I nt el 64 archit ect ure inst ruct ions.
To correct ly int erpret t he cont ent s of t he MMX or x87 FPU regist ers from t he
FSAVE/ FNSAVE or FXSAVE image in memory, a debugger needs t o t ake account of
t he relat ionship bet ween t he x87 FPU regist er s logical locat ions relat ive t o TOS and
t he MMX regist er s physical locat ions.
I n t he x87 FPU cont ext , STn refers t o an x87 FPU regist er at locat ion n relat ive t o t he
TOS. However, t he t ags in t he x87 FPU t ag word are associat ed wit h t he physical
locat ions of t he x87 FPU regist ers ( R0 t hrough R7) . The MMX regist ers always refer
t o t he physical locat ions of t he regist ers ( wit h MM0 t hrough MM7 being mapped t o R0
t hrough R7) . Figure 12- 2 shows t his relat ionship. Here, t he inner circle refers t o t he
physical locat ion of t he x87 FPU and MMX regist ers. The out er circle refers t o t he x87
FPU regist erss relat ive locat ion t o t he current TOS.
When t he TOS equals 0 ( case A in Figure 12- 2) , ST0 point s t o t he physical locat ion
R0 on t he float ing- point st ack. MM0 maps t o ST0, MM1 maps t o ST1, and so on.
Vol. 3 12-7
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


When t he TOS equals 2 ( case B in Figure 12- 2) , ST0 point s t o t he physical locat ion
R2. MM0 maps t o ST6, MM1 maps t o ST7, MM2 maps t o ST0, and so on.
Figure 12-2. Mapping of MMX Registers to x87 FPU Data Register Stack
MM0
MM1
MM2
MM3
MM4
MM5
MM6
MM7
ST1
ST2
ST7
ST0 ST6
ST7
ST1
TOS
TOS
x87 FPU push
x87 FPU pop
x87 FPU push
x87 FPU pop
Case A: TOS=0
Case B: TOS=2
MM0
MM1
MM2
MM3
MM4
MM5
MM6
MM7
ST0
Outer circle = x87 FPU data registers logical location relative to TOS
Inner circle = x87 FPU tags = MMX registers location = FP registerss physical location
(R0)
(R2)
(R2)
(R0)
12-8 Vol. 3
INTEL

MMX

TECHNOLOGY SYSTEM PROGRAMMING


Vol. 3 13-1
CHAPTER 13
SYSTEM PROGRAMMING FOR INSTRUCTION SET
EXTENSIONS AND PROCESSOR EXTENDED STATES
This chapt er describes syst em programming feat ures for inst ruct ion set ext ensions
operat ing on t he processor st at e ext ension known as t he SSE st at e ( XMM regist ers,
MXCSR) and for processor ext ended st at es. I nst ruct ion set ext ensions operat ing on
t he SSE st at e include t he st reaming SI MD ext ensions ( SSE) , st reaming SI MD ext en-
sions 2 ( SSE2) , st reaming SI MD ext ensions 3 ( SSE3) , Supplement al SSE3 ( SSSE3) ,
and SSE4.
Sect ions 13. 1 t hrough 13. 5 cover syst em programming requirement s t o enable
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, providing operat ing syst em or execut ive
support for t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, SI MD float ing- point
except ions, except ion handling, and t ask ( cont ext ) swit ching.
Operat ing syst em support for SSE st at e, once implement ed using FXSAVE/ FXRSTOR,
provides a limit ed degree of forward support for subsequent inst ruct ion set ext en-
sions operat ing on t he same known set of processor st at e. Processor ext ended st at es
refer t o an ext ension in I nt el 64 archit ect ure t hat will allow syst em execut ives t o
implement support for mult iple processor st at e ext ensions t hat may be int roduced
over t ime wit hout requiring t he syst em execut ive t o be modified each t ime a new
processor st at e ext ension is int roduced.
Managing processor ext ended st at es requires t he following aspect s:
using inst ruct ions like XSAVE, XRSTOR, t o save/ rest ore st at e informat ion t o a
memory region consist ent wit h t he processor st at e ext ensions support ed in
hardware,
using CPUI D enumerat ion feat ures t o query t he set of ext ended processor st at es
support ed by t he processor,
using XSETBV inst ruct ion t o enable individual processor st at e ext ensions,
maint aining various syst em programming resources.
Syst em programming for managing processor ext ended st at es is described in t he
sect ions st art ing 13. 6.
13.1 PROVIDING OPERATING SYSTEM SUPPORT FOR
SSE/SSE2/SSE3/ SSSE3/SSE4 EXTENSIONS
To use SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, t he operat ing syst em or execut ive
must provide support for init ializing t he processor t o use t hese ext ensions, for
handling t he FXSAVE and FXRSTOR st at e saving inst ruct ions, and for handling SI MD
float ing- point except ions. The following sect ions provide syst em programming
13-2 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
guidelines for t his support . Because SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions share
t he same st at e, experience t he same set s of non- numerical and numerical except ion
behavior, t hese guidelines t hat apply t o SSE also apply t o ot her set s of SI MD ext en-
sions t hat operat e on t he same processor st at e and subj ect t o t he same set s of of
non- numerical and numerical except ion behavior.
Chapt er 11, Programming wit h St reaming SI MD Ext ensions 2 ( SSE2) , and Chapt er
12, Programming wit h SSE3, SSSE3 and SSE4, in t he I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 1, discuss support for
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 from an applicat ions point of view program.
13.1.1 Adding Support to an Operating System for
SSE/SSE2/SSE3/SSSE3/SSE4 Extensions
The following guidelines describe funct ions t hat an operat ing syst em or execut ive
must perform t o support SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions:
1. Check t hat t he processor support s t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions.
2. Check t hat t he processor support s t he FXSAVE and FXRSTOR inst ruct ions.
3. Provide an init ializat ion for t he SSE, SSE2 SSE3, SSSE3 and SSE4 st at es.
4. Provide support for t he FXSAVE and FXRSTOR inst ruct ions.
5. Provide support ( if necessary) in non- numeric except ion handlers for except ions
generat ed by t he SSE, SSE2, SSE3 and SSE4 inst ruct ions.
6. Provide an except ion handler for t he SI MD float ing- point except ion ( # XM) .
The following sect ions describe how t o implement each of t hese guidelines.
13.1.2 Checking for SSE/SSE2/SSE3/SSSE3/SSE4 Extension
Support
I f t he processor at t empt s t o execut e an unsupport ed SSE/ SSE2/ SSE3/ SSSE3/ SSE4
inst ruct ion, t he processor generat es an invalid- opcode except ion ( # UD) .
Before an operat ing syst em or execut ive at t empt s t o use
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions, it should check t hat support is present .
Make sure:
CPUI D. 1: EDX. SSE[ bit 25] = 1
CPUI D. 1: EDX. SSE2[ bit 26] = 1
CPUI D. 1: ECX.SSE3[ bit 0] = 1
CPUI D. 1: ECX. SSSE3[ bit 9] = 1
CPUI D. 1: ECX. SSE4_1[ bit 19] = 1
CPUI D. 1: ECX. SSE4_2[ bit 20] = 1
Vol. 3 13-3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
To use POPCNT inst ruct ion, soft ware must check CPUI D.1: ECX. POPCNT[ bit 23] = 1
13.1.3 Checking for Support for the FXSAVE and FXRSTOR
Instructions
A separat e check must be made t o insure t hat t he processor support s FXSAVE and
FXRSTOR. Make sure:
CPUI D.1: EDX. FXSR[ bit 24] = 1
13.1.4 Initialization of the SSE/SSE2/SSE3/SSSE3/SSE4 Extensions
The operat ing syst em or execut ive should carry out t he following st eps t o set up
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions for use by applicat ion programs:
1. Set CR4. OSFXSR[ bit 9] = 1. Set t ing t his flag assumes t hat t he operat ing syst em
provides facilit ies for saving and rest oring SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at es
using FXSAVE and FXRSTOR inst ruct ions. These inst r uct ions are commonly used
t o save t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e during t ask swit ches and when
invoking t he SI MD float ing- point except ion ( # XM) handler ( see Sect ion 13. 4,
Saving t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 St at e on Task or Cont ext Swit ches,
and Sect ion 13. 1. 6, Providing an Handler for t he SI MD Float ing- Point Except ion
( # XM) , respect ively) .
I f t he processor does not support t he FXSAVE and FXRSTOR inst ruct ions,
at t empt ing t o set t he OSFXSR flag will cause an except ion ( # GP) t o be
generat ed.
2. Set CR4. OSXMMEXCPT[ bit 10] = 1. Set t ing t his flag assumes t hat t he operat ing
syst em provides an SI MD float ing- point except ion ( # XM) handler ( see Sect ion
13. 1. 6, Providing an Handler for t he SI MD Float ing- Point Except ion ( # XM) ) .
NOTE
The OSFXSR and OSXMMEXCPT bit s in cont rol regist er CR4 must be
set by t he operat ing syst em. The processor has no ot her way of
det ect ing operat ing- syst em support for t he FXSAVE and FXRSTOR
inst ruct ions or for handling SI MD float ing- point except ions.
3. Clear CR0.EM[ bit 2] = 0. This act ion disables emulat ion of t he x87 FPU, which is
required when execut ing SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions ( see Sect ion
2. 5, Cont rol Regist ers ) .
4. Set CR0. MP[ bit 1] = 1. This set t ing is t he required set t ing for I nt el 64 and I A- 32
processors t hat support t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions ( see
Sect ion 9. 2. 1, Configuring t he x87 FPU Environment ) .
Table 13- 1 and Table 13- 2 show t he act ions of t he processor when an
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion is execut ed, depending on t he:
13-4 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
OSFXSR and OSXMMEXCPT flags in cont rol regist er CR4
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 feat ure flags ret urned by CPUI D
EM, MP, and TS flags in cont rol regist er CR0
Table 13-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2,
SSE3, EM, MP, and TS
1
CR4 CPUID CR0 Flags
OSFXSR OSXMMEXCPT SSE,
SSE2,
SSE3
2
SSE4_1
3
EM MP
4
TS Action
0 X
5
X X 1 X #UD exception.
1 X 0 X 1 X #UD exception.
1 X 1 1 1 X #UD exception.
1 0 1 0 1 0 Execute instruction; #UD exception
if unmasked SIMD floating-point
exception is detected.
1 1 1 0 1 0 Execute instruction; #XM exception
if unmasked SIMD floating-point
exception is detected.
1 X 1 0 1 1 #NM exception.
NOTES:
1. For execution of any SSE/SSE2/SSE3 instruction except the PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, MOVNTI, and CLFLUSH instructions.
2. Exception conditions due to CR4.OSFXSR or CR4.OSXMMEXCPT do not apply to FISTTP.
3. Only applies to DPPS, DPPD, ROUNDPS, ROUNDPD, ROUNDSS, ROUNDSD.
4. For processors that support the MMX instructions, the MP flag should be set.
5. X Dont care.
Vol. 3 13-5
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
The SI MD float ing- point except ion mask bit s ( bit s 7 t hrough 12) , t he flush- t o- zero
flag ( bit 15) , t he denormals- are- zero flag ( bit 6) , and t he rounding cont rol field ( bit s
13 and 14) in t he MXCSR regist er should be left in t heir default values of 0. This
permit s t he applicat ion t o det ermine how t hese feat ures are t o be used.
13.1.5 Providing Non-Numeric Exception Handlers for Exceptions
Generated by the SSE/SSE2/SSE3/SSSE3/SSE4 Instructions
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions can generat e t he same t ype of memory
access except ions ( such as, page fault , segment not present , and limit violat ions)
and ot her non- numeric except ions as ot her I nt el 64 and I A- 32 archit ect ure inst ruc-
t ions generat e.
Ordinarily, exist ing except ion handlers can handle t hese and ot her non- numeric
except ions wit hout code modificat ion. However, depending on t he mechanisms used
in exist ing except ion handlers, some modificat ions might need t o be made.
The SSE/ SSE2/ SSE3/ SSSE3/ SSE4 ext ensions can generat e t he non- numeric excep-
t ions list ed below:
Memory Access Except ions:
I nvalid opcode ( # UD) .
St ack- segment fault ( # SS) .
General prot ect ion ( # GP) . Execut ing most SSE/ SSE2/ SSE3 inst ruct ions wit h
an unaligned 128- bit memory reference generat es a general- prot ect ion
except ion. ( The MOVUPS and MOVUPD inst ruct ions allow unaligned a loads or
st ores of 128- bit memory locat ions, wit hout generat ing a general- prot ect ion
except ion. ) A 128- bit reference wit hin t he st ack segment t hat is not aligned
Table 13-2. Action Taken for Combinations of OSFXSR, SSSE3, SSE4, EM, and TS
CR4 CPUID CR0 Flags
OSFXSR SSSE3
SSE4_1*
SSE4_2**
EM TS Action
0 X*** X X #UD exception.
1 0 X X #UD exception.
1 1 1 X #UD exception.
1 1 0 1 #NM exception.
NOTES:
* Applies to SSE4_1 instructions except DPPS, DPPD, ROUNDPS, ROUNDPD, ROUNDSS, ROUNDSD.
** Applies to SSE4_2 instructions except CRC32 and POPCNT.
***X Dont care.
13-6 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
t o a 16- byt e boundary will also generat e a general- prot ect ion except ion,
inst ead a st ack- segment fault except ion ( # SS) .
Page fault ( # PF) .
Alignment check ( # AC) . When enabled, t his t ype of alignment check
operat es on operands t hat are less t han 128- bit s in size: 16- bit , 32- bit , and
64- bit . To enable t he generat ion of alignment check except ions, do t he
following:
Set t he AM flag ( bit 18 of cont rol regist er CR0)
Set t he AC flag ( bit 18 of t he EFLAGS regist er)
CPL must be 3
I f alignment check except ions are enabled, 16- bit , 32- bit , and 64- bit
misalignment will be det ect ed for t he MOVUPD and MOVUPS inst ruct ions;
det ect ion of 128- bit misalignment is not guarant eed and may vary wit h
implement at ion.
Syst em Except ions:
I nvalid- opcode except ion ( # UD) . This except ion is generat ed when execut ing
SSE/ SSE2/ SSE3/ SSSE3 inst ruct ions under t he following condit ions:
SSE/ SSE2/ SSE3/ SSSE3/ SSE4_1/ SSE4_2 feat ure flags ret urned by
CPUI D are set t o 0. This condit ion does not affect t he CLFLUSH
inst ruct ion, nor POPCNT.
The CLFSH feat ure flag ret urned by t he CPUI D inst ruct ion is set t o 0. This
except ion condit ion only pert ains t o t he execut ion of t he CLFLUSH
inst ruct ion.
The POPCNT feat ure flag ret urned by t he CPUI D inst ruct ion is set t o 0.
This except ion condit ion only pert ains t o t he execut ion of t he POPCNT
inst ruct ion.
The EM flag ( bit 2) in cont rol regist er CR0 is set t o 1, regardless of t he
value of TS flag ( bit 3) of CR0. This condit ion does not affect t he PAUSE,
PREFETCHh, MOVNTI , SFENCE, LFENCE, MFENSE, CLFLUSH, CRC32 and
POPCNT inst ruct ions.
The OSFXSR flag ( bit 9) in cont rol regist er CR4 is set t o 0. This condit ion
does not affect t he PAVGB, PAVGW, PEXTRW, PI NSRW, PMAXSW, PMAXUB,
PMI NSW, PMI NUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW,
MASKMOVQ, MOVNTQ, MOVNTI , PAUSE, PREFETCHh, SFENCE, LFENCE,
MFENCE, CLFLUSH, CRC32 and POPCNT inst ruct ions.
Execut ing a inst ruct ion t hat causes a SI MD float ing- point except ion when
t he OSXMMEXCPT flag ( bit 10) in cont rol regist er CR4 is set t o 0. See
Sect ion 13. 5. 1, Using t he TS Flag t o Cont rol t he Saving of t he x87 FPU,
MMX, SSE, SSE2, SSE3 SSSE3 and SSE4 St at e.
Vol. 3 13-7
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
Device not available ( # NM) . This except ion is generat ed by execut ing a
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion when t he TS flag ( bit 3) of CR0 is
set t o 1.
Ot her except ions can occur indirect ly due t o fault y execut ion of t he above
except ions.
13.1.6 Providing an Handler for the SIMD Floating-Point Exception
(#XM)
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions do not generat e numeric except ions on
packed int eger operat ions. They can generat e t he following numeric ( SI MD float ing-
point ) except ions on packed and scalar single- precision and double- precision
float ing- point operat ions.
I nvalid operat ion ( # I )
Divide- by- zero ( # Z)
Denormal operand ( # D)
Numeric overflow ( # O)
Numeric underflow ( # U)
I nexact result ( Precision) ( # P)
These SI MD float ing- point except ions ( wit h t he except ion of t he denormal operand
except ion) are defined in t he I EEE St andard 754 for Binary Float ing- Point Arit hmet ic
and represent t he same condit ions t hat cause x87 FPU float ing- point error excep-
t ions ( # MF) t o be generat ed for x87 FPU inst ruct ions.
Each of t hese except ions can be masked, in which case t he processor ret urns a
reasonable result t o t he dest inat ion operand wit hout invoking an except ion handler.
However, if any of t hese except ions are left unmasked, det ect ion of t he except ion
condit ion result s in a SI MD float ing- point except ion ( # XM) being generat ed. See
Chapt er 6, I nt errupt 19SI MD Float ing- Point Except ion ( # XM) .
To handle unmasked SI MD float ing- point except ions, t he operat ing syst em or execu-
t ive must provide an except ion handler. The sect ion t it led SSE and SSE2 SI MD
Float ing- Point Except ions in Chapt er 11, Programming wit h St reaming SI MD
Ext ensions 2 ( SSE2) , of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 1, describes t he SI MD float ing- point except ion classes and gives
suggest ions for writ ing an except ion handler t o handle t hem.
To indicat e t hat t he operat ing syst em provides a handler for SI MD float ing- point
except ions ( # XM) , t he OSXMMEXCPT flag ( bit 10) must be set in cont rol regist er
CR0.
13-8 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
13.1.6.1 Numeric Error flag and IGNNE#
SSE/ SSE2/ SSE3/ SSE4 ext ensions ignore t he NE flag in cont rol regist er CR0 ( t hat is,
t reat s it as if it were always set ) and t he I GNNE# pin. When an unmasked SI MD
float ing- point except ion is det ect ed, it is always report ed by generat ing a SI MD
float ing- point except ion ( # XM) .
13.2 EMULATION OF SSE/SSE2/SSE3/SSSE3/SSE4
EXTENSIONS
The I nt el 64 and I A- 32 archit ect ure does not support emulat ion of t he
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ions, as t hey do for x87 FPU inst ruct ions.
The EM flag in cont rol regist er CR0 ( provided t o invoke emulat ion of x87 FPU inst ruc-
t ions) cannot be used t o invoke emulat ion of SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruc-
t ions. I f an SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion is execut ed when CR0. EM = 1,
an invalid opcode except ion ( # UD) is generat ed. See Table 13- 1.
13.3 SAVING AND RESTORING THE
SSE/SSE2/SSE3/SSSE3/SSE4 STATE
The SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e consist s of t he st at e of t he XMM and MXCSR
regist ers. The recommended met hod for saving and rest oring t his st at e follows:
Execut e an FXSAVE inst ruct ion t o save t he st at e of t he XMM and MXCSR regist ers
t o memory.
Execut e an FXRSTOR inst ruct ion t o rest ore t he st at e of t he XMM and MXCSR
regist ers from t he image saved in memory by t he FXSAVE inst ruct ion.
This save and rest ore met hod is required for all operat ing syst ems. See Sect ion 13. 5,
Designing OS Facilit ies for AUTOMATI CALLY Saving x87 FPU, MMX, and
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e on Task or Cont ext Swit ches.
I n some cases, applicat ions can only save t he XMM and MXCSR regist ers in t he
following way:
Execut e MOVDQ inst ruct ions t o save t he cont ent s of each XMM regist ers t o
memory.
Execut e a STMXCSR inst ruct ion t o save t he st at e of t he MXCSR regist er t o
memory.
I n some cases, applicat ions can only rest ore t he XMM and MXCSR regist ers in t he
following way:
Execut e MOVDQ inst ruct ions t o read t he saved cont ent s of each XMM regist ers
from memory t o XMM regist ers.
Vol. 3 13-9
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
Execut e a LDMXCSR inst ruct ion t o rest ore t he st at e of t he MXCSR regist er from
memory.
13.4 SAVING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON
TASK OR CONTEXT SWITCHES
When swit ching from one t ask or cont ext t o anot her, it is oft en necessary t o save t he
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e. FXSAVE and FXRSTOR inst ruct ions provide a
simple met hod for saving and rest oring t his st at e. See Sect ion 13. 3, Saving and
Rest oring t he SSE/ SSE2/ SSE3/ SSSE3/ SSE4 St at e. These inst ruct ions offer t he
added benefit of saving x87 FPU and MMX st at e as well.
Guidelines for writ ing such procedures are in Sect ion 13. 5, Designing OS Facilit ies
for AUTOMATI CALLY Saving x87 FPU, MMX, and SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e
on Task or Cont ext Swit ches.
13.5 DESIGNING OS FACILITIES FOR AUTOMATICALLY
SAVING X87 FPU, MMX, AND
SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON TASK OR
CONTEXT SWITCHES
The x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e consist of t he st at e of t he x87
FPU, MMX, XMM, and MXCSR regist ers. The FXSAVE and FXRSTOR inst ruct ions
provide a fast met hod for saving ad rest oring t his st at e. I f t ask or cont ext swit ching
facilit ies are already implement ed in an operat ing syst em or execut ive and t hey use
FSAVE/ FNSAVE and FRSTOR t o save t he x87 FPU and MMX st at e, t hese facilit ies can
be ext ended t o save and rest ore SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e by subst it ut ing
FXSAVE/ FXRSTOR for FSAVE/ FNSAVE and FRSTOR.
Where t ask or cont ent swit ching facilit ies must be writ t en from scrat ch, several
approaches can be t aken for using t he FXSAVE and FXRSTOR inst ruct ions t o save and
rest ore x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e:
The operat ing syst em can require applicat ions t hat are int ended be run as t asks
t ake responsibilit y for saving t he st at e of t he x87 FPU, MMX, XMM, and MXCSR
regist ers prior t o a t ask suspension during a t ask swit ch and for rest oring t he
regist ers when t he t ask is resumed. This approach is appropriat e for cooperat ive
mult it asking operat ing syst ems, where t he applicat ion has cont rol over ( or is able
t o det ermine) when a t ask swit ch is about t o occur and can save st at e prior t o t he
t ask swit ch.
The operat ing syst em can t ake t he responsibilit y for aut omat ically saving t he x87
FPU, MMX, XMM, and MXCSR regist ers as part of t he t ask swit ch process ( using
an FXSAVE inst ruct ion) and aut omat ically rest oring t he st at e of t he regist ers
13-10 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
when a suspended t ask is resumed ( using an FXRSTOR inst ruct ion) . Here, t he
x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSE4 st at e must be saved as part of t he t ask
st at e. This approach is appropriat e for preempt ive mult it asking operat ing
syst ems, where t he applicat ion cannot know when it is going t o be preempt ed
and cannot prepare in advance for t ask swit ching. Here, t he operat ing syst em is
responsible for saving and rest oring t he t ask and t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3 st at e when necessary.
The operat ing syst em can t ake t he responsibilit y for saving t he x87 FPU, MMX,
XMM, and MXCSR regist ers as part of t he t ask swit ch process, but delay t he
saving of t he MMX and x87 FPU st at e unt il an x87 FPU, MMX, or
SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion is act ually execut ed by t he new t ask.
Using t his approach, t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e is
saved only if an x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion needs
t o be execut ed in t he new t ask. ( See Sect ion 13. 5. 1, Using t he TS Flag t o
Cont rol t he Saving of t he x87 FPU, MMX, SSE, SSE2, SSE3 SSSE3 and SSE4
St at e, for more informat ion. )
13.5.1 Using the TS Flag to Control the Saving of the
x87 FPU, MMX, SSE, SSE2, SSE3 SSSE3 and SSE4 State
Saving t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e using FXSAVE requires
processor overhead. I f t he new t ask does not access x87 FPU, MMX, XMM, and
MXCSR regist ers, avoid overhead by not aut omat ically saving t he st at e on a t ask
swit ch.
The TS flag in cont rol regist er CR0 is provided t o allow t he operat ing syst em t o delay
saving t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e unt il an inst ruct ion
t hat act ually accesses t his st at e is encount ered in a new t ask. When t he TS flag is
set , t he processor monit ors t he inst ruct ion st ream for an x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion. When t he processor det ect s
one of t hese inst ruct ions, it raises a device- not - available except ion ( # NM) prior t o
execut ing t he inst ruct ion. The device- not - available except ion handler can t hen be
used t o save t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e for t he previous
t ask ( using an FXSAVE inst ruct ion) and load t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e for t he current t ask ( using an
FXRSTOR inst ruct ion) . I f t he t ask never encount ers an x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 inst ruct ion, t he device- not - available excep-
t ion will not be raised and a t ask st at e will not be saved unnecessarily.
NOTE
The CRC32 and POPCNT inst ruct ions do not operat e on t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e. They operat e on t he
general- purpose regist ers and are not involved in t he OSs lazy
FXSAVE/ FXRSTOR t echnique.
Vol. 3 13-11
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
The TS flag can be set eit her explicit ly ( by execut ing a MOV inst ruct ion t o cont rol
regist er CR0) or implicit ly ( using t he I A- 32 archit ect ures nat ive t ask swit ching mech-
anism) . When t he nat ive t ask swit ching mechanism is used, t he processor aut omat i-
cally set s t he TS flag on a t ask swit ch. Aft er t he device- not - available handler has
saved t he x87 FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e, it should execut e t he
CLTS inst ruct ion t o clear t he TS flag.
Figure 13- 1 gives an example of an operat ing syst em t hat implement s x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e saving using t he TS flag. I n t his
example, t ask A is t he current ly running t ask and t ask B is t he new t ask. The oper-
at ing syst em maint ains a save area for t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e for each t ask and defines a variable
( x87_MMX_SSE_SSE2_SSE3_St at eOwner) t hat indicat es t he t ask t hat owns t he
st at e. I n t his example, t ask A is t he current owner.
On a t ask swit ch, t he operat ing syst em t ask swit ching code must execut e t he
following pseudo- code t o set t he TS flag according t o t he current owner of t he x87
FPU/ MMX/ SSE/ SSE2/ SSE3/ SSSE3/ SSE4 st at e. I f t he new t ask ( t ask B in t his
example) is not t he current owner of t his st at e, t he TS flag is set t o 1; ot herwise, it is
set t o 0.
IF Task_Being_Switched_To x87FPU_MMX_XMM_MXCSR_StateOwner
THEN
CR0.TS 1;
ELSE
CR0.TS 0;
FI;
Figure 13-1. Example of Saving the x87 FPU, MMX, SSE, SSE2, SSE3, and SSSE3
State During an Operating-System Controlled Task Switch
Task A
Task B
Application
Operating System
Task A
Operating System
Task Switching Code
Device-Not-Available
Exception Handler
Owner of x87 FPU,
CR0.TS=1 and x87 FPU
MMX, SSEx
Instruction is encountered
MMX, XMM,
x87 FPU/MMX/
State Save Area
XMM/MXCSR
Task B
x87 FPU/MMX/
State Save Area
XMM/MXCSR
Saves Task A
x87 FPU/MMX/
XMM/MXCSR State
Loads Task B
x87 FPU/MMX/
XMM/MXCSR State
MXCSR State
13-12 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
I f a new t ask at t empt s t o access an x87 FPU, MMX, XMM, or MXCSR regist er while t he
TS flag is set t o 1, a device- not - available except ion ( # NM) is generat ed. The device-
not - available except ion handler execut es t he following pseudo- code.
FXSAVE To x87FPU/MMX/XMM/MXCSR State Save Area for Current
x87FPU_MMX_XMM_MXCSR_StateOwner;
FXRSTOR x87FPU/MMX/XMM/MXCSR State From Current Tasks
x87FPU/MMX/XMM/MXCSR State Save Area;
x87FPU_MMX_XMM_MXCSR_StateOwner Current_Task;
CR0.TS 0;
This except ion handler code performs t he following t asks:
Saves t he x87 FPU, MMX, XMM, or MXCSR regist ers in t he st at e save area for t he
current owner of t he x87 FPU/ MMX/ XMM/ MXCSR st at e.
Rest ores t he x87 FPU, MMX, XMM, or MXCSR regist ers from t he new t asks save
area for t he x87 FPU/ MMX/ XMM/ MXCSR st at e.
Updat es t he current x87 FPU/ MMX/ XMM/ MXCSR st at e owner t o be t he current
t ask.
Clears t he TS flag.
13.6 XSAVE/XRSTOR AND PROCESSOR EXTENDED STATE
MANAGEMENT
The feat ures associat ed wit h managing processor ext ended st at es include
An ext ensible dat a layout for exist ing and fut ure processor st at e ext ensions. The
layout of t he XSAVE/ XRSTOR area ext ends from t he 512- byt e FXSAVE/ FXRSTOR
layout t o provide compat ibilit y and migrat ion pat h from managing t he legacy
FXSAVE/ FXRSTOR area. Specifically, t he XSAVE/ XRSTOR area layout consist s of:
The FXSAVE/ FXRSTOR area ( 512 byt es, t he layout is ident ical t o t he
FXSAVE/ FXRSTOR area) ,
The XSAVE header area ( 64 byt es) ,
A finit e set of save areas, each corresponding t o a processor ext ended st at e
( see I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2B, XSAVE inst ruct ion) . The number of save areas, t he offset and t he
size of each save area is enumerat ed by CPUI D leaf funct ion 0DH.
CPUI D Enhancement : CPUI D inst ruct ion provides informat ion on
CPUI D. 01H. ECX. XSAVE[ bit 26] . A feat ure flag indicat ing t he processor s
support of XSAVE/ XRSTOR archit ect ure ext ensions
CPUI D. 01H. ECX. OSXSAVE[ bit 27] . A feat ure flag indicat ing whet her OS has
enabled ext ensible st at e management and communicat ing t hat t he OS
support s processor ext ended st at e management .
Vol. 3 13-13
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
CPUI D leaf funct ion 0DH enumerat es t he list of processor st at es ( including
legacy x87 FPU, SSE st at es and processor ext ended st at es) , t he offset and
size of individual save area for each processor ext ended st at e.
Cont rol regist er enhancement and dedicat ed regist er for enabling each processor
ext ended st at e: CR4. OSXSAVE[ bit 18] and t he XFEATURE_ENABLED_MASK
regist er ( XCR0) are described in Chapt er 2, Syst em Archit ect ure Overview .
XCR0 can be read at all privilege levels but writ t en only at ring 0.
I nst ruct ions t o manage t he XFEATURE_ENABLED_MASK regist er ( XCR0) and t he
XSAVE/ XRSTOR area ( see I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2B) :
XGETBV: reads XCR0.
XSETBV: writ es t o XCR0, ring 0 only.
XRSTOR: rest ores from memory t he processor st at es specified by a bit vect or
mask specified in EDX: EAX.
XSAVE: saves t he current processor st at es t o memory according t o a bit
vect or mask in EDX: EAX.
13.6.1 XSAVE Header
The header sect ion includes a XSTATE_BV bit vect or field. I f t he value of a bit in
HEADER.XSTATE_BV is 1, it indicat es t hat t he corresponding processor ext ended
st at e was writ t en t o t he respect ive save area in memory by t he XSAVE inst ruct ion.
I f soft ware modifies t he save area image of a part icular processor st at e component
direct ly, it is responsible t o updat e t he corresponding bit in HEADER. XSTATE_BV t o 1.
Ot herwise, direct ly modified st at e informat ion in a save area image may be ignored
by XRSTOR.
The order of bit vect ors in XSTATE_BV mat ches t hose of t he
XFEATURE_ENABLED_MASK regist er ( XCR0) . Alt hough XCR0 has only t wo bit s
init ially defined for st at e management , t he general relat ionship bet ween t he value of
XSTATE_BV and t he corresponding processor st at e in t he XSAVE/ XRSTOR layout is
depict ed in Figure 13- 2.
13-14 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
The XSAVE header is 64 byt es in lengt h and must be aligned on 64 byt e boundary.
Therefore, t he XSAVE/ XRSTOR region must be aligned on 64- byt e boundary. The
format of t he header is as follows ( see Table 13- 3) :
The value of each bit in HEADER. XSTATE_BV may affect t he act ion performed by
XRSTOR, depending on t he logical value of t he respect ive bit s in t he
XFEATURE_ENABLED_MASK regist er ( XCR0) , t he rest ore bit mask ( EDX: EAX input t o
XRSTOR) , and HEADER.XSTATE_BV. When an XRSTOR inst ruct ion is execut ed wit h a
rest ore bit mask select ing t he it h bit vect or ( and t he corresponding XCR0 bit is
Figure 13-2. Future Layout of XSAVE/XRSTOR Area and XSTATE_BV with Five Sets
of Processor State Extensions
Table 13-3. XSAVE Header Format
15:8 7:0 Byte Offset
Reserved (Must be zero) XSTATE_BV 0
Reserved Reserved (Must be zero) 16
Reserved Reserved 32
Reserved Reserved 48
..................................
XState_BV
E
x
t
e
n
s
i
o
n
s

2
X87 FPU State
Save Area
0 1 2 4 3
FXSAVE
63
SSE State
FXRSTOR
XState_BV, .. Header
Ext_SaveArea2
.........................
E
x
t
e
n
s
i
o
n
s

4
Ext_SaveArea3
1 1 1 1 0
Bit Position
E
x
t
e
n
s
i
o
n
s

3
Updated
Not updated
Updated Ext_SaveArea4
Vol. 3 13-15
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
enabled) , a value of "1" in t he corresponding bit of HEADER. XSTATE_BV causes t he
processor st at e t o be updat ed wit h cont ent s of t he save area read from t he memory
image. A value of "0" in HEADER. XSTATE_BV causes t he processor st at e t o be init ial-
ized by hardware supplied values inst ead of from memory ( See t he operat ion det ail
of XRSTOR in I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2B) .
The save area image corresponding t o a bit wit h "0" value in HEADER. XSTATE_BV
may or may not cont ain t he correct st at e informat ion. XRSTOR will ensure t he
regist er st at e for a component is properly init ialized regardless of t he value of t he
save area when t he component header bit is zero.
13.7 INTEROPERABILITY OF XSAVE/XRSTOR AND
FXSAVE/FXRSTOR
FXSAVE inst ruct ion writ es x87 FPU and SSE st at e informat ion t o a 512- byt e FXSAVE,
FXRSTOR save area. FXRSTOR rest ores t he processor s x87 FPU and SSE st at es from
FXSAVE/ FXRSTOR save area image. XSAVE/ XRSTOR inst ruct ions support x87 FPU
and SSE st at es using t he same layout as t he FXSAVE/ FXRSTOR area t o provide
int eroperabilit y of FXSAVE versus XSAVE, and FXRSTOR versus XRSTOR.
XSAVE/ XRSTOR provides t he addit ional flexibilit y for syst em soft ware t o manage SSE
st at e independent of x87 FPU st at es. Thus syst em soft ware t hat had been using
FXSAVE/ FXRSTOR t o manage x87 FPU and SSE st at es can t ransit ion t o
XSAVE/ XRSTOR t o manage x87 FPU, SSE and ot her processor ext ended st at es in a
syst emat ic and forward- looking manner.
I t is also possible for syst em soft ware t o adopt an alt ernat e approach of using
FXSAVE/ FXRSTOR for x87 and SSE st at e management , and implement ing forward
processor ext ended st at e management using XSAVE/ XRSTOR. I n t his case, syst em
soft ware must specify t he bit vect or mask in EDX: EAX appropriat ely when execut ing
XSAVE/ XRSTOR inst ruct ions.
For inst ance, when using t he XSAVE inst ruct ion, t he OS can supply a bit vect or in
EDX: EAX wit h t he t wo least significant bit s corresponding t o x87 FPU and SSE st at e
equal t o 0. Then, t he XSAVE inst ruct ion will not writ e t he processor s x87 FPU and
SSE st at e int o memory. Similarly for t he XRSTOR inst ruct ion a bit vect or mask in
EDX: EAX wit h t he least t wo significant bit equal t o 0 will cause t he XRSTOR inst ruc-
t ion t o not rest ore nor init ialize t he processor s x87 FPU and SSE st at e.
The processor s act ion as a result of execut ing XRSTOR, on t he x87 FPU st at e,
MXCSR, and XMM regist ers, are list ed in Table 13- 4 ( Bot h bit 1 and bit 0 of t he
XFEATURE_ENABLED_MASK regist er are presumed t o be 1) . The x87 FPU or XMM
regist ers may be init ialized by t he processor ( See XRSTOR operat ion in I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2B) . When t he MXCSR
regist er is updat ed from memory, reserved bit checking is enforced. The
saving/ rest oring of MXCSR is bound t o t he SSE st at e, independent of t he x87 FPU
st at e. The act ion of XSAVE is list ed in Table 13- 5.
13-16 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
XSAVE, XRSTOR inst ruct ions operat ing on FP or SSE st at e will cause a # NM Device
Not Available) except ion, if CR0.TS is set . Using t his feat ure, syst em soft ware can
implement t he lazy rest ore t echnique of managing x87 FPU/ SSE st at e using eit her
FXSAVE/ FXRSTOR or XSAVE/ XRSTOR. I t can be accomplished even wit h t he int er-
mixing of FXSAVE and XSAVE inst ruct ions.
Table 13-4. XRSTOR Action on MXCSR, x87 FPU, XMM Register
EDX:EAX XSTATE_BV MXCSR XMM Registers x87 FPU State
Bit 1 Bit 0 Bit 1 Bit 0
0 0 X X None None None
0 1 X 0 None None Init by processor
0 1 X 1 None None Load
1 0 0 X Load/Check Init by processor None
1 0 1 X Load/Check Load None
1 1 0 0 Load/Check Init by processor Init by processor
1 1 0 1 Load/Check Init by processor Load
1 1 1 0 Load/Check Load Init by processor
1 1 1 1 Load/Check Load Load
Table 13-5. XSAVE Action on MXCSR, x87 FPU, XMM Register
EDX:EAX XCR0
1
NOTES:
1. XCR0 is the XFEATURE_ENABLED_MASK register. Note that attempts to set XCR0[0] to 0 cause
#GP.
MXCSR XMM Registers x87 FPU State
Bit 1 Bit 0 Bit 1 Bit 0
0 0 X 1 None None None
0 1 X 1 None None Store
1 0 0 1 None None None
1 0 1 1 Store Store None
1 1 0 1 None None Store
1 1 1 1 Store Store Store
Vol. 3 13-17
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
13.8 DETECTION, ENUMERATION, ENABLING PROCESSOR
EXTENDED STATE SUPPORT
An OS can det ermine if t he XSAVE/ XRSTOR/ XGETBV/ XSETBV inst ruct ions and t he
XFEATURE_ENABLED_MASK regist er ( XCR0) are available in t he processor by
checking t he value of CPUI D.1.ECX. XSAVE t o be 1. The OS must set CR4. OSXSAVE t o
1 t o enable t he new inst ruct ions. The OS uses XSETBV t o enable t he processor st at e
component ( set t ing t he corresponding bit in XCR0 t o 1) t hat it will manage using
XSAVE/ XRSTOR. Bit 0 of XCR0 must be set t o 1. The value of CR4.OSXSAVE is
reflect ed in CPUI D. 01H: ECX. OSXSAVE ( bit 27) t o communicat e t he set t ing t o non-
privileged soft ware.
The bit s t hat must be enabled in t he XFEATURE_ENABLED_MASK regist er ( XCR0)
and t he size of t he memory region needed t o save processor ext ended st at e informa-
t ion must be enumerat ed by CPUI D leaf 0DH wit h ECX = 0 as input . However, t he
recommended usage by syst em soft ware t o use XSAVE/ XRSTOR is t o:
Allocat e a memory buffer according t o t he size report ed by CPUI D.( EAX= 0DH,
ECX= 0H) : ECX. The value report ed by CPUI D. ( EAX= 0DH, ECX= 0H) : ECX always
includes t he size of t he header. Clear t he ent ire buffer prior t o being used by
XSAVE.
Provide EDX: EAX wit h all bit s set t o 1 for XSAVE and XRSTOR inst ruct ions.
An alt ernat ive approach is t o read t he mast er bit vect or mask EDX: EAX report ed by
CPUI D. ( EAX= 0D, ECX= 0H) . This mask may be used as input t o t he XSAVE/ XRSTOR
Figure 13-3. OS Enabling of Processor Extended State Support
Check
HW support XSAVE, XRSTOR, XSETBV, XFEM
CPUID.1H:ECX.XSAVE?
Enumerate
Extended state features
Buffer size requirement
Set valid bits in
XCR0 via XSETBV
Set CR4.OSXSAVE
to 1
Clear buffer to 0
XSETBV enabled
13-18 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
inst ruct ions, and provides a more const rained list of feat ures t han using all 1' s in t he
save mask.
The advant age of using a mask value of all- bit s- set - t o- 1 for XSAVE/ XRSTOR is t hat it
can simplify syst em soft wares support for processor ext ended st at e management ,
when mult iple generat ions of hardware may support different number of processor
ext ended st at es as report ed by CPUI D. However, t here may be addit ional implemen-
t at ion requirement of soft ware modificat ion t hat may arise due t o a part icular syst em
soft ware or specific det ails int roduced by a new processor ext ended st at e.
13.8.1 Application Programming Model and Processor Extended
States
New inst ruct ion set ext ensions may be int roduced over t ime and operat ing on a
processor ext ended st at e t hat must be enabled in t he XFEATURE_ENABLED_MASK
regist er ( XCR0) . The general applicat ion programming model for using such inst ruc-
t ion set ext ensions are:
Check if OS has enabled processor ext ended st at e management . I f
CPUI D. 01H: ECX.OSXSAVE is 1, t he OS has enabled t he
XSAVE/ XRSTOR/ XSETBV/ XGETBV inst ruct ions and t he
XFEATURE_ENABLED_MASK regist er, and it has indicat ed support for t he
processor ext ended st at e management .
Applicat ions do not need t o check t he value of CPUI D.01H: ECX. XSAVE because
CPUI D. 01H: ECX. OSXSAVE = 1 implies OS has successfully verified
CPUI D. 01H: ECX. XSAVE = 1. CPUI D. 01H: ECX.OSXSAVE reflect s t he value of
CR4.OSXSAVE, and t his bit cannot be set t o 1 unless CPUI D.01H: ECX.XSAVE = 1.
Check whet her t he processor ext ended st at e component associat ed wit h a given
inst ruct ion set ext ension is enabled by t he OS. The bit s of EDX: EAX ret urned by
XGETBV as 1 indicat e which processor ext ended st at e component s have been
enabled by OS. Not e, t he CR4. OSFXSR is not used by OS t o enable inst ruct ion
ext ensions requiring processor ext ended st at e support .
Check t he t arget inst ruct ion set ext ension is support ed in t he processor. Each
new inst ruct ion set ext ension is expect ed t o provide a feat ure flag in CPUI D when
it is int roduced.
Vol. 3 13-19
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND
I f all t hree requirement s are met , applicat ions can use t he t arget new inst ruct ion set
ext ensions. I f any of t he above requirement s are not met , an at t empt t o execut e an
inst ruct ion operat ing on a processor ext ended st at e corresponding t o bit offset
higher t han 1 in t he XFEATURE_ENABLED_MASK regist er ( XCR0) will cause a # UD
except ion.
Newer inst ruct ion ext ensions operat ing on SSE st at e, but not on any processor
ext ended st at es corresponding bit s in XCR0 wit h an offset higher t han 1, follow t he
programming model described by Sect ion 13. 1 t hrough Sect ion 13. 5. XCR0 is not
required t o enable OS support for SSE st at e management , but CR4.OSFXSR is
required.
Figure 13-4. Application Detection of New Instruction Extensions and Processor
Extended State
Implied HW support for
Check enabled state in
XCR0 via XGETBV
Check feature flag
for Instruction set
Check feature flag
CPUID.1H:ECX.OXSAVE = 1?
OS provides processor
extended state management
State ok to use
XSAVE, XRSTOR, XGETBV, XCR0
enabled Instructions
Yes
13-20 Vol. 3
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR
Vol. 3 14-1
CHAPTER 14
POWER AND THERMAL MANAGEMENT
This chapt er describes facilit ies of I nt el 64 and I A- 32 archit ect ure used for power
management and t hermal monit oring.
14.1 ENHANCED INTEL SPEEDSTEP

TECHNOLOGY
Enhanced I nt el SpeedSt ep

Technology was int roduced in t he Pent ium M processor;


it is available in Pent ium 4, I nt el Xeon, I nt el

Core Solo, I nt el

Core Duo, I nt el


At om and I nt el

Core2 Duo processors. The t echnology manages processor


power consumpt ion using performance st at e t ransit ions. These st at es are defined as
discret e operat ing point s associat ed wit h different frequencies.
Enhanced I nt el SpeedSt ep Technology differs from previous generat ions of I nt el
SpeedSt ep Technology in t wo ways:
Cent ralizat ion of t he cont rol mechanism and soft ware int erface in t he processor
by using model- specific regist ers.
Reduced hardware overhead; t his permit s more frequent performance st at e
t ransit ions.
Previous generat ions of t he I nt el SpeedSt ep Technology require processors t o be a
deep sleep st at e, holding off bus mast er t ransfers for t he durat ion of a performance
st at e t ransit ion. Performance st at e t ransit ions under t he Enhanced I nt el SpeedSt ep
Technology are discret e t ransit ions t o a new t arget frequency.
Support is indicat ed by CPUI D, using ECX feat ure bit 07. Enhanced I nt el SpeedSt ep
Technology is enabled by set t ing I A32_MI SC_ENABLE MSR, bit 16. On reset , bit 16 of
I A32_MI SC_ENABLE MSR is cleared.
14.1.1 Software Interface For Initiating Performance State
Transitions
St at e t ransit ions are init iat ed by writ ing a 16- bit value t o t he I A32_PERF_CTL
regist er, see Figure 14- 2. I f a t ransit ion is already in progress, t ransit ion t o a new
value will subsequent ly t ake effect .
Reads of I A32_PERF_CTL det ermine t he last t arget ed operat ing point . The current
operat ing point can be read from I A32_PERF_STATUS. I A32_PERF_STATUS is
updat ed dynamically.
The 16- bit encoding t hat defines valid operat ing point s is model- specific. Applicat ions
and performance t ools are not expect ed t o use eit her I A32_PERF_CTL or
I A32_PERF_STATUS and should t reat bot h as reserved. Performance monit oring
14-2 Vol. 3
POWER AND THERMAL MANAGEMENT
t ools can access model- specific event s and report t he occurrences of st at e
t ransit ions.
14.2 P-STATE HARDWARE COORDINATION
The Advanced Configurat ion and Power I nt erface ( ACPI ) defines performance st at es
( P- st at e) t hat are used facilit at e syst em soft wares abilit y t o manage processor
power consumpt ion. Different P- st at e correspond t o different performance levels
t hat are applied while t he processor is act ively execut ing inst ruct ions. Enhanced I nt el
SpeedSt ep Technology support s P- st at e by providing soft ware int erfaces t hat cont rol
t he operat ing frequency and volt age of a processor.
Wit h mult iple processor cores residing in t he same physical package, hardware
dependencies may exist for a subset of logical processors on a plat form. These
dependencies may impose requirement s t hat impact coordinat ion of P- st at e t ransi-
t ions. As a result , mult i- core processors may require an OS t o provide addit ional soft -
ware support for coordinat ing P- st at e t ransit ions for t hose subset s of logical
processors.
A BI OS ( following ACPI 3.0 specificat ion) can choose t o expose P- st at e as dependent
and hardware- coordinat ed t o OS power management ( OSPM) policy. To support
OSPMs, mult i- core processors must have addit ional built - in support for P- st at e hard-
ware coordinat ion and feedback.
I nt el 64 and I A- 32 processors wit h dependent P- st at e amongst a subset of logical
processors permit hardware coordinat ion of P- st at e and provide a hardware- coordi-
nat ion feedback mechanism using I A32_MPERF MSR and I A32_APERF MSR. See
Figure 14- 1 for an overview of t he t wo 64- bit MSRs and t he bullet s below for a
det ailed descript ion:
Use CPUI D t o check t he P- St at e hardware coordinat ion feedback capabilit y bit .
CPUI D. 06H. ECX[ Bit 0] = 1 indicat es I A32_MPERF MSR and I A32_APERF MSR are
present .
I A32_MPERF MSR ( 0xE7) increment s in proport ion t o a fixed frequency, which is
configured when t he processor is boot ed.
Figure 14-1. IA32_MPERF MSR and IA32_APERF MSR for P-state Coordination
63 0
IA32_MPERF (Addr: E7H)
63 0
IA32_APERF (Addr: E8H)
Vol. 3 14-3
POWER AND THERMAL MANAGEMENT
I A32_APERF MSR ( 0xE8) increment s in proport ion t o act ual performance, while
account ing for hardware coordinat ion of P- st at e and TM1/ TM2; or soft ware
init iat ed t hrot t ling.
The MSRs are per logical processor; t hey measure performance only when t he
t arget ed processor is in t he C0 st at e.
Only t he I A32_APERF/ I A32_MPERF rat io is archit ect urally defined; soft ware
should not at t ach meaning t o t he cont ent of t he individual of I A32_APERF or
I A32_MPERF MSRs.
When eit her MSR overflows, bot h MSRs are reset t o zero and cont inue t o
increment .
Bot h MSRs are full 64- bit s count ers. Each MSR can be writ t en t o independent ly.
However, soft ware should follow t he guidelines illust rat ed in Example 14- 1.
I f P- st at es are exposed by t he BI OS as hardware coordinat ed, soft ware is expect ed
t o confirm processor support for P- st at e hardware coordinat ion feedback and use t he
feedback mechanism t o make P- st at e decisions. The OSPM is expect ed t o eit her save
away t he current MSR values ( for det erminat ion of t he delt a of t he count er rat io at a
lat er t ime) or r eset bot h MSRs ( execut e WRMSR wit h 0 t o t hese MSRs individually) at
t he st art of t he t ime window used for making t he P- st at e decision. When not reset -
t ing t he values, overflow of t he MSRs can be det ect ed by checking whet her t he new
values read are less t han t he previously saved values.
Example 14- 1 demonst rat es st eps for using t he hardware feedback mechanism
provided by I A32_APERF MSR and I A32_MPERF MSR t o det ermine a t arget P- st at e.
Example 14-1. Determine Target P-state From Hardware Coordinated Feedback
DWORD PercentBusy; // Percentage of processor time not idle.
// Measure PercentBusy during previous sampling window.
// Typically, PercentBusy is measure over a time scale suitable for
// power management decisions
//
// RDMSR of MCNT and ACNT should be performed without delay.
// Software needs to exercise care to avoid delays between
// the two RDMSRs (for example, interrupts).
MCNT = RDMSR(IA32_MPERF);
ACNT = RDMSR(IA32_APERF);
// PercentPerformance indicates the percentage of the processor
// that is in use. The calculation is based on the PercentBusy,
// that is the percentage of processor time not idle and the P-state
// hardware coordinated feedback using the ACNT/MCNT ratio.
// Note that both values need to be calculated over the same
// time window.
PercentPerformance = PercentBusy * (ACNT/MCNT);
14-4 Vol. 3
POWER AND THERMAL MANAGEMENT
// This example does not cover the additional logic or algorithms
// necessary to coordinate multiple logical processors to a target P-state.
TargetPstate = FindPstate(PercentPerformance);
if (TargetPstate != currentPstate) {
SetPState(TargetPstate);
}
// WRMSR of MCNT and ACNT should be performed without delay.
// Software needs to exercise care to avoid delays between
// the two WRMSRs (for example, interrupts).
WRMSR(IA32_MPERF, 0);
WRMSR(IA32_APERF, 0);
14.3 SYSTEM SOFTWARE CONSIDERATIONS AND
OPPORTUNISTIC PROCESSOR PERFORMANCE
OPERATION
An I nt el 64 processor may support a form of processor operat ion t hat t akes advan-
t age of design headroom t o opport unist ically increase performance. I n I nt el Core i7
processors, I nt el Turbo Boost Technology can convert t hermal headroom int o higher
performance across mult i- t hreaded and single- t hreaded workloads. I n I nt el Core 2
processors, I nt el Dynamic Accelerat ion can convert t hermal headroom int o higher
performance if only one t hread is act ive.
14.3.1 Intel Dynamic Acceleration
I nt el Core 2 Duo processor T 7700 int roduces I nt el Dynamic Accelerat ion ( I DA) . I DA
t akes advant age of t hermal design headroom and opport unist ically allows a single
core t o operat e at a higher performance level when t he operat ing syst em request s
increased performance.
14.3.2 System Software Interfaces for Opportunistic Processor
Performance Operation
Opport unist ic processor operat ion, applicable t o I nt el Dynamic Accelerat ion and I nt el
Turbo Boost Technology, has t he following charact erist ics:
A t ransit ion from a normal st at e of operat ion ( e. g. I DA/ Turbo mode disengaged)
t o a t arget st at e is not guarant eed, but may occur opport unist ically aft er t he
Vol. 3 14-5
POWER AND THERMAL MANAGEMENT
corresponding enable mechanism is act ivat ed, t he headroom is available and
cert ain crit eria are met .
The opport unist ic processor performance operat ion is generally t ransparent t o
most applicat ion soft ware.
Syst em soft ware ( BI OS and Operat ing syst em) must be aware of hardware
support for opport unist ic processor performance operat ion and may need t o
t emporarily disengage opport unist ic processor performance operat ion when it
requires more predict able processor operat ion.
When opport unist ic processor performance operat ion is engaged, t he OS should
use hardware coordinat ion feedback mechanisms t o prevent un- int ended policy
effect s if it is act ivat ed during inappropriat e sit uat ions.
14.3.2.1 Discover Hardware Support and Enabling of Opportunistic
Processor Operation
I f an I nt el 64 processor has hardware support for opport unist ic processor perfor-
mance operat ion, t he power- on default st at e of I A32_MI SC_ENABLES[ 38] indicat es
t he presence of such hardware support . For I nt el 64 processors t hat support oppor-
t unist ic processor performance operat ion, t he default value is 1, indicat ing it s pres-
ence. For processors t hat do not support opport unist ic processor performance
operat ion, t he default value is 0. The power- on default value of
I A32_MI SC_ENABLES[ 38] allows BI OS t o det ect t he presence of hardware support of
opport unist ic processor performance operat ion.
I A32_MI SC_ENABLES[ 38] is shared across all logical processors in a physical
package. I t is writ t en by BI OS during plat form init iat ion t o enable/ disable opport u-
nist ic processor operat ion in conj unct ion of OS power management capabilit ies, see
Sect ion 14. 3. 2. 2. BI OS can set I A32_MI SC_ENABLES[ 38] wit h 1 t o disable opport u-
nist ic processor performance operat ion; it must clear t he default value of
I A32_MI SC_ENABLES[ 38] t o 0 t o enable opport unist ic processor performance oper-
at ion. OS and applicat ions must use CPUI D leaf 06H if it needs t o det ect processors
t hat has opport unist ic processor operat ion enabled.
When CPUI D is execut ed wit h EAX = 06H on input , Bit 1 of EAX in Leaf 06H ( i. e.
CPUI D. 06H: EAX[ 1] ) indicat es opport unist ic processor performance operat ion, such
as I DA, has been enabled by BI OS.
Opport unist ic processor performance operat ion can be disabled by set t ing bit 38 of
I A32_MI SC_ENABLES. This mechanism is int ended for BI OS only. I f
I A32_MI SC_ENABLES[ 38] is set , CPUI D.06H: EAX[ 1] will ret urn 0.
14.3.2.2 OS Control of Opportunistic Processor Performance Operation
There may be phases of soft ware execut ion in which syst em soft ware cannot t olerat e
t he non- det erminist ic aspect s of opport unist ic processor performance operat ion. For
example, when calibrat ing a real- t ime workload t o make a CPU reservat ion request
14-6 Vol. 3
POWER AND THERMAL MANAGEMENT
t o t he OS, it may be undesirable t o allow t he possibilit y of t he processor delivering
increased performance t hat cannot be sust ained aft er t he calibrat ion phase.
Syst em soft ware can t emporarily disengage opport unist ic processor performance
operat ion by set t ing bit 32 of t he I A32_PERF_CTL MSR ( 0199H) , using a read-
modify- writ e sequence on t he MSR. The opport unist ic processor performance opera-
t ion can be re- engaged by clearing bit 32 in I A32_PERF_CTL MSR, using a read-
modify- writ e sequence. The DI SENAGE bit in I A32_PERF_CTL is not reflect ed in bit
32 of t he I A32_PERF_STATUS MSR ( 0198H) , and it is not shared bet ween logical
processors in a physical package. I n order for OS t o engage I DA/ Turbo mode, t he
BI OS must
enable opport unist ic processor performance operat ion, as described in Sect ion
14. 3. 2. 1,
expose t he operat ing point s associat ed wit h I DA/ Turbo mode t o t he OS.
14.3.2.3 Required Changes to OS Power Management P-state Policy
I nt el Dynamic Accelerat ion ( I DA) and I nt el Turbo Boost Technology can provide
opport unist ic performance great er t han t he performance level corresponding t o t he
maximum qualified frequency of t he processor ( see CPUI Ds brand st ring informa-
t ion) . Syst em soft ware can use a pair of MSRs t o observe performance feedback.
Soft ware must query for t he presence of I A32_APERF and I A32_MPERF ( see Sect ion
14. 2) . The rat io bet ween I A32_APERF and I A32_MPERF is archit ect urally defined and
a value great er t han unit y indicat es performance increase occurred during t he obser-
vat ion period due t o I DA. Wit hout incorporat ing such performance feedback, t he
t arget P- st at e evaluat ion algorit hm can result in a non- opt imal P- st at e t arget .
There are ot her scenarios under which OS power management may want t o disable
I DA, some of t hese are list ed below:
When engaging ACPI defined passive t hermal management , it may be more
effect ive t o disable I DA for t he durat ion of passive t hermal management .
When t he user has indicat ed a policy preference of power savings over perfor-
mance, OS power management may want t o disable I DA while t hat policy is in
effect .
Figure 14-2. IA32_PERF_CTL Register
63 0
Reserved
16
EIST Transition Target
15 32 33 31
IDA/Turbo DISENGAGE
Vol. 3 14-7
POWER AND THERMAL MANAGEMENT
14.3.2.4 Application Awareness of Opportunistic Processor Operation
(Optional)
There may be sit uat ions t hat an end user or applicat ion soft ware wishes t o be aware
of t urbo mode act ivit y. I t is possible for an applicat ion- level ut ilit y t o periodically
check t he occurrences of opport unist ic processor operat ion. The basic element s of an
algorit hm is described below, using t he charact erist ics of I nt el Turbo Boost Tech-
nology as example.
Using an OS- provided t imer service, applicat ion soft ware can periodically calculat e
t he rat io bet ween unhalt ed- core- clockt icks ( UCC) relat ive t o t he unhalt ed- reference-
clockt icks ( URC) on each logical processor t o det ermine if t hat logical processor had
been request ed by OS t o run at some frequency higher t han t he invariant TSC
frequency, or t he OS has det ermined syst em- level demand has reduced sufficient ly
t o put t hat logical processor int o a lower- performance p- st at e or even lower- act ivit y
st at e.
I f an applicat ion soft ware have access t o informat ion of t he base operat ing rat io
bet ween t he invariant TSC frequency and t he base clock ( 133. 33 MHz) , it can convert
t he sampled rat io int o a dynamic frequency est imat e for each prior sampling period.
The base operat ing rat io can be read from MSR_PLATFORM_I NFO[ 15: 8] .
The periodic sampling t echnique is depict ed in Figure 14- 3 and described below:
The sampling period chosen by t he applicat ion ( t o program an OS t imer service)
should be sufficient ly large t o avoid excessive polling overhead t o ot her applica-
t ions or t asks managed by t he OS.
Figure 14-3. Periodic Query of Activity Ratio of Opportunistic Processor Operation
LP 2
LP 1
n-1 n+3
Sample period
LP 0
n+2 n n+1
UCC
n, 0
URC
n, 0
FixedCtr1
FixedCtr2
LP 2
LP 1
LP 0
LP 2
LP 1
LP 0
LP 2
LP 1
LP 0
UCC
n+1, 0
URC
n+1, 0
UCC
n+2, 0
URC
n+2, 0
UCC
n+3, 0
URC
n+3, 0
Logical Processor i Turbo Activity Ratio = (UCC
n+1, i
- UCC
n, i
) / (URC
n+1, i
- URC
n, i
)
Unhalted core clockticks
Unhalted reference
clockticks
.....
.....
.....
.....
14-8 Vol. 3
POWER AND THERMAL MANAGEMENT
When t he OS t imer service t ransfers cont rol, t he applicat ion can use RDPMC
( wit h ECX = 4000_0001H) t o read I A32_PERF_FI XED_CTR1 ( MSR address 30AH)
t o record t he unhalt ed core clockt ick ( UCC) value; followed by RDPMC
( ECX= 4000_0002H) t o read I A32_PERF_FI XED_CTR2 ( MSR address 30BH) t o
record t he unhalt ed reference clockt ick ( URC) value. This pair of values is needed
for each logical processor for each sampling period.
The applicat ion can calculat e t he Turbo act ivit y rat io based on t he difference of
UCC bet ween each sample period, over t he difference of URC difference. The
effect ive frequency of each sample period of t he logical processor, i, can be
est imat ed by:
( UCC
n+ 1, i
- UCC
n, i
) / ( URC
n+ 1, i
- URC
n, i
) * Base_operat ing_rat io* 133. 33MHz
I t is possible t hat t he OS had request ed a lower- performance P- st at e during a
sampling period. Thus t he rat io ( UCC
n+ 1, i
- UCC
n, i
) / ( URC
n+ 1, i
- URC
n, i
) can reflect
t he average of Turbo act ivit y ( driving t he rat io above unit y) and some lower P- st at e
t ransit ions ( causing t he rat io t o be < 1) .
I t is also possible t hat t he OS might request ed C- st at e t ransit ions when t he demand
is low. The above rat io generally does not account for cycles any logical processor
was idle. On I nt el Core i7 processors, an applicat ion can make use of t he t ime st amp
count er ( I A- 32_TSC) running at a const ant frequency ( i. e. Base_operat ing_rat io*
133. 33MHz) during C- st at es. Thus soft ware can calculat e rat ios t hat can indicat e
fract ions of sample period spent in t he C0 st at e, using t he unhalt ed reference clock-
t icks and t he invariant TSC. Not e t he est imat e of fract ion spent in C0 may be affect ed
by SMM handler if t he syst em soft ware makes use of t he FREEZE_WHI LE_SMM_EN
capabilit y t o freeze performance count er values while t he SMM handler is servicing
an SMI ( see Chapt er 20, I nt roduct ion t o Virt ual- Machine Ext ensions ) .
14.3.3 Intel Turbo Boost Technology
I nt el Turbo Boost Technology is support ed in I nt el Core i7 processors and I nt el Xeon
processors based on I nt el

microarchit ect ure codename Nehalem. I t uses t he same


principle of leveraging t hermal headroom t o dynamically increase processor perfor-
mance for single- t hreaded and mult i- t hreaded/ mult i- t asking environment . The
programming int erface described in Sect ion 14. 3. 2 also applies t o I nt el Turbo Boost
Technology.
14.3.4 Performance and Energy Bias Hint support
I nt el 64 processors may support addit ional soft ware hint t o guide t he hardware
heurist ic of power management feat ures t o favor increasing dynamic performance or
conserve energy consumpt ion.
Soft ware can det ect processor' s capabilit y t o support performance- energy bias pref-
erence hint by examining bit 3 of ECX in CPUI D leaf 6. The processor support s t his
Vol. 3 14-9
POWER AND THERMAL MANAGEMENT
capabilit y if CPUI D.06H: ECX. SETBH[ bit 3] is set and it also implies t he presence of a
new archit ect ural MSR called I A32_ENERGY_PERF_BI AS ( 1B0H) .
Soft ware can program t he lowest four bit s of I A32_ENERGY_PERF_BI AS MSR wit h a
value from 0 - 15. The values represent a sliding scale, where a value of 0 ( t he
default reset value) corresponds t o a hint preference for highest performance and a
value of 15 corresponds t o t he maximum energy savings. A value of 7 roughly t rans-
lat es int o a hint t o balance performance wit h energy consumpt ion
The layout of I A32_ENERGY_PERF_BI AS is shown in Figure 14- 4. The scope of
I A32_ENERGY_PERF_BI AS is per logical processor, which means t hat each of t he
logical processors in t he package can be programmed wit h a different value. This
may be especially import ant in virt ualizat ion scenarios, where t he performance /
energy requirement s of one logical processor may differ from t he ot her. Conflict ing
"hint s" from various logical processors at higher hierarchy level will be resolved in
favor of performance over energy savings.
Soft ware can use what ever crit eria it sees fit t o program t he MSR wit h t he appro-
priat e value. However, t he value only serves as a hint t o t he hardware and t he act ual
impact on performance and energy savings is model specific.
14.4 MWAIT EXTENSIONS FOR ADVANCED POWER
MANAGEMENT
I A- 32 processors may support a number of C- st at es
1
t hat reduce power consumpt ion
for inact ive st at es. I nt el Core Solo and I nt el Core Duo processors support bot h
deeper C- st at e and MWAI T ext ensions t hat can be used by OS t o implement power
management policy.
Figure 14-4. IA32_ENERGY_PERF_BIAS Register
1. The processor-specific C-states defined in MWAIT extensions can map to ACPI defined C-state
types (C0, C1, C2, C3). The mapping relationship depends on the definition of a C-state by proces-
sor implementation and is exposed to OSPM by the BIOS using the ACPI defined _CST table.
63 0
Reserved
Energy Policy Preference Hint
4 3
14-10 Vol. 3
POWER AND THERMAL MANAGEMENT
Soft ware should use CPUI D t o discover if a t arget processor support s t he enumera-
t ion of MWAI T ext ensions. I f CPUI D. 05H. ECX[ Bit 0] = 1, t he t arget processor
support s MWAI T ext ensions and t heir enumerat ion ( see Chapt er 3, I nst ruct ion Set
Reference, A- M, of I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 2A) .
I f CPUI D. 05H. ECX[ Bit 1] = 1, t he t arget processor support s using int errupt s as
break- event s for MWAI T, even when int errupt s are disabled. Use t his feat ure t o
measure C- st at e residency as follows:
Soft ware can writ e t o bit 0 in t he MWAI T Ext ensions regist er ( ECX) when issuing
an MWAI T t o ent er int o a processor- specific C- st at e or sub C- st at e.
When a processor comes out of an inact ive C- st at e or sub C- st at e, soft ware can
read a t imest amp before an int errupt service rout ine ( I SR) is pot ent ially
execut ed.
CPUI D.05H. EDX allows soft ware t o enumerat e processor- specific C- st at es and sub
C- st at es available for use wit h MWAI T ext ensions. I A- 32 processors may support
more t han one C- st at e of a given C- st at e t ype. These are called sub C- st at es. Numer-
ically higher C- st at e have higher power savings and lat ency ( upon ent ering and
exit ing) t han lower- numbered C- st at e.
At CPL = 0, syst em soft ware can specify desired C- st at e and sub C- st at e by using t he
MWAI T hint s regist er ( EAX) . Processors will not go t o C- st at e and sub C- st at e deeper
t han what is specified by t he hint regist er. I f CPL > 0 and if MONI TOR/ MWAI T is
support ed at CPL > 0, t he processor will only ent er C1- st at e ( regardless of t he
C- st at e request in t he hint s regist er) .
Execut ing MWAI T generat es an except ion on processors operat ing at a privilege level
where MONI TOR/ MWAI T are not support ed.
NOTE
I f MWAI T is used t o ent er a C- st at e ( including sub C- st at e) t hat is
numerically higher t han C1, a st ore t o t he address range armed by
MONI TOR inst ruct ion will cause t he processor t o exit MWAI T if t he
st ore was originat ed by ot her processor agent s. A st ore from non-
processor agent may not cause t he processor t o exit MWAI T.
14.5 THERMAL MONITORING AND PROTECTION
The I A- 32 archit ect ure provides t he following mechanisms for monit oring t empera-
t ure and cont rolling t hermal power:
1. The cat ast r ophi c shut dow n det ect or forces processor execut ion t o st op if t he
processor s core t emperat ure rises above a preset limit .
2. Aut omat i c and adapt i ve t her mal moni t or i ng mechani sms force t he
processor t o reduce it s power consumpt ion in order t o operat e wit hin predet er-
mined t emperat ure limit s.
Vol. 3 14-11
POWER AND THERMAL MANAGEMENT
3. The sof t w ar e cont r ol l ed cl ock modul at i on mechani sm permit s operat ing
syst ems t o implement power management policies t hat reduce power
consumpt ion; t his is in addit ion t o t he reduct ion offered by aut omat ic t hermal
monit oring mechanisms.
4. On- di e di gi t al t her mal sensor and i nt er r upt mechani sms permit t he OS t o
manage t hermal condit ions nat ively wit hout relying on BI OS or ot her syst em
board component s.
The first mechanism is not visible t o soft ware. The ot her t hree mechanisms are
visible t o soft ware using processor feat ure informat ion ret urned by execut ing CPUI D
wit h EAX = 1.
The second mechanism includes:
Aut omat i c t her mal moni t or i ng provides t wo modes of operat ion. One mode
modulat es t he clock dut y cycle; t he second mode changes t he processor s
frequency. Bot h modes are used t o cont rol t he core t emperat ure of t he processor.
Adapt i v e t her mal moni t or i ng can provide flexible t hermal management on
processors made of mult iple cores.
The t hird mechanism modulat es t he clock dut y cycle of t he processor. As shown in
Figure 14- 5, t he phrase dut y cycle does not refer t o t he act ual dut y cycle of t he
clock signal. I nst ead it refers t o t he t ime period during which t he clock signal is
allowed t o drive t he processor chip. By using t he st op clock mechanism t o cont rol
how oft en t he processor is clocked, processor power consumpt ion can be modulat ed.
For previous aut omat ic t hermal monit oring mechanisms, soft ware cont rolled mecha-
nisms t hat changed processor operat ing paramet ers t o impact changes in t hermal
condit ions. Soft ware did not have nat ive access t o t he nat ive t hermal condit ion of t he
processor; nor could soft ware alt er t he t rigger condit ion t hat init iat ed soft ware
program cont rol.
The fourt h mechanism ( list ed above) provides access t o an on- die digit al t hermal
sensor using a model- specific regist er and uses an int errupt mechanism t o alert soft -
ware t o init iat e digit al t hermal monit oring.
Figure 14-5. Processor Modulation Through Stop-Clock Mechanism
Clock Applied to Processor
Stop-Clock Duty Cycle
25% Duty Cycle (example only)
14-12 Vol. 3
POWER AND THERMAL MANAGEMENT
14.5.1 Catastrophic Shutdown Detector
P6 family processors int roduced a t hermal sensor t hat act s as a cat ast rophic shut -
down det ect or. This cat ast rophic shut down det ect or was also implement ed in
Pent ium 4, I nt el Xeon and Pent ium M processors. I t is always enabled. When
processor core t emperat ure reaches a fact ory preset level, t he sensor t rips and
processor execut ion is halt ed unt il aft er t he next reset cycle.
14.5.2 Thermal Monitor
Pent ium 4, I nt el Xeon and Pent ium M processors int roduced a second t emperat ure
sensor t hat is fact ory- calibrat ed t o t rip when t he processor s core t emperat ure
crosses a level corresponding t o t he recommended t hermal design envelop. The t rip-
t emperat ure of t he second sensor is calibrat ed below t he t emperat ure assigned t o
t he cat ast rophic shut down det ect or.
14.5.2.1 Thermal Monitor 1
The Pent ium 4 processor uses t he second t emperat ure sensor in conj unct ion wit h a
mechanism called Thermal Monit or 1 ( TM1) t o cont rol t he core t emperat ure of t he
processor. TM1 cont rols t he processor s t emperat ure by modulat ing t he dut y cycle of
t he processor clock. Modulat ion of dut y cycles is processor model specific. Not e t hat
t he processors STPCLK# pin is not used here; t he st op- clock circuit ry is cont rolled
int ernally.
Support for TM1 is indicat ed by CPUI D.1: EDX. TM[ bit 29] = 1.
TM1 is enabled by set t ing t he t hermal- monit or enable flag ( bit 3) in
I A32_MI SC_ENABLE [ see Appendix B, Model- Specific Regist ers ( MSRs) ] . Following
a power- up or reset , t he flag is cleared, disabling TM1. BI OS is required t o enable
only one aut omat ic t hermal monit oring modes. Operat ing syst ems and applicat ions
must not disable t he operat ion of t hese mechanisms.
14.5.2.2 Thermal Monitor 2
An addit ional aut omat ic t hermal prot ect ion mechanism, called Thermal Monit or 2
( TM2) , was int roduced in t he I nt el Pent ium M processor and also incorporat ed in
newer models of t he Pent ium 4 processor family. I nt el Core Duo and Solo processors,
and I nt el Core 2 Duo processor family all support TM1 and TM2. TM2 cont rols t he
core t emperat ure of t he processor by reducing t he operat ing frequency and volt age
of t he processor and offers a higher performance level for a given level of power
reduct ion t han TM1.
TM2 is t riggered by t he same t emperat ure sensor as TM1. The mechanism t o enable
TM2 may be implement ed different ly across various I A- 32 processor families wit h
different CPUI D signat ures in t he family encoding value, but will be uniform wit hin an
I A- 32 processor family.
Vol. 3 14-13
POWER AND THERMAL MANAGEMENT
Support for TM2 is indicat ed by CPUI D. 1: ECX. TM2[ bit 8] = 1.
14.5.2.3 Two Methods for Enabling TM2
On processors wit h CPUI D family/ model/ st epping signat ure encoded as 0x69n or
0x6Dn ( early Pent ium M processors) , TM2 is enabled if t he TM_SELECT flag ( bit 16)
of t he MSR_THERM2_CTL regist er is set t o 1 ( Figure 14- 6) and bit 3 of t he
I A32_MI SC_ENABLE regist er is set t o 1.
Following a power- up or reset , t he TM_SELECT flag may be cleared. BI OS is required
t o enable eit her TM1 or TM2. Operat ing syst ems and applicat ions must not disable
mechanisms t hat enable TM1 or TM2. I f bit 3 of t he I A32_MI SC_ENABLE regist er is
set and TM_SELECT flag of t he MSR_THERM2_CTL regist er is cleared, TM1 is
enabled.
On processors int roduced aft er t he Pent ium 4 processor ( t his includes most Pent ium
M processors) , t he met hod used t o enable TM2 is different . TM2 is enable by set t ing
bit 13 of I A32_MI SC_ENABLE regist er t o 1. This applies t o I nt el Core Duo, Core Solo,
and I nt el Core 2 processor family.
The t arget operat ing frequency and volt age for t he TM2 t ransit ion aft er TM2 is t rig-
gered is specified by t he value writ t en t o MSR_THERM2_CTL, bit s 15: 0 ( Figure 14- 7) .
Following a power- up or reset , BI OS is required t o enable at least one of t hese t wo
t hermal monit oring mechanisms. I f bot h TM1 and TM2 are support ed, BI OS may
choose t o enable TM2 inst ead of TM1. Operat ing syst ems and applicat ions must not
disable t he mechanisms t hat enable TM1or TM2; and t hey must not alt er t he value in
bit s 15: 0 of t he MSR_THERM2_CTL regist er.
Figure 14-6. MSR_THERM2_CTL Register On Processors with CPUID
Family/Model/Stepping Signature Encoded as 0x69n or 0x6Dn
TM_SELECT
Reserved
31 0
Reserved
16
14-14 Vol. 3
POWER AND THERMAL MANAGEMENT
14.5.2.4 Performance State Transitions and Thermal Monitoring
I f t he t hermal cont rol circuit ry ( TCC) for t hermal monit or ( TM1/ TM2) is act ive, writ es
t o t he I A32_PERF_CTL will effect a new t arget operat ing point as follows:
I f TM1 is enabled and t he TCC is engaged, t he performance st at e t ransit ion can
commence before t he TCC is disengaged.
I f TM2 is enabled and t he TCC is engaged, t he performance st at e t ransit ion
specified by a writ e t o t he I A32_PERF_CTL will commence aft er t he TCC has
disengaged.
14.5.2.5 Thermal Status Information
The st at us of t he t emperat ure sensor t hat t riggers t he t hermal monit or ( TM1/ TM2) is
indicat ed t hrough t he t hermal st at us flag and t hermal st at us log flag in t he
I A32_THERM_STATUS MSR ( see Figure 14- 8) .
The funct ions of t hese flags are:
Ther mal St at us f l ag, bi t 0 When set , indicat es t hat t he processor core
t emperat ure is current ly at t he t rip t emperat ure of t he t hermal monit or and t hat
t he processor power consumpt ion is being reduced via eit her TM1 or TM2,
depending on which is enabled. When clear, t he flag indicat es t hat t he core
t emperat ure is below t he t hermal monit or t rip t emperat ure. This flag is read only.
Ther mal St at us Log f l ag, bi t 1 When set , indicat es t hat t he t hermal sensor
has t ripped since t he last power- up or reset or since t he last t ime t hat soft ware
cleared t his flag. This flag is a st icky bit ; once set it remains set unt il cleared by
soft ware or unt il a power- up or reset of t he processor. The default st at e is clear.
Figure 14-7. MSR_THERM2_CTL Register for Supporting TM2
63 0
Reserved
15
TM2 Transition Target
Vol. 3 14-15
POWER AND THERMAL MANAGEMENT
Aft er t he second t emperat ure sensor has been t ripped, t he t hermal monit or
( TM1/ TM2) will remain engaged for a minimum t ime period ( on t he order of 1 ms) .
The t hermal monit or will remain engaged unt il t he processor core t emperat ure drops
below t he preset t rip t emperat ure of t he t emperat ure sensor, t aking hyst eresis int o
account .
While t he processor is in a st op- clock st at e, int errupt s will be blocked from int er-
rupt ing t he processor. This holding off of int errupt s increases t he int errupt lat ency,
but does not cause int errupt s t o be lost . Out st anding int errupt s remain pending unt il
clock modulat ion is complet e.
The t hermal monit or can be programmed t o generat e an int errupt t o t he processor
when t he t hermal sensor is t ripped. The delivery mode, mask and vect or for t his
int errupt can be programmed t hrough t he t hermal ent ry in t he local API Cs LVT ( see
Sect ion 10. 5. 1, Local Vect or Table ) . The low- t emperat ure int errupt enable and
high- t emperat ure int errupt enable flags in t he I A32_THERM_I NTERRUPT MSR ( see
Figure 14- 9) cont rol when t he int errupt is generat ed; t hat is, on a t ransit ion from a
t emperat ure below t he t rip point t o above and/ or vice- versa.
Hi gh- Temper at ur e I nt er r upt Enabl e f l ag, bi t 0 Enables an int errupt t o be
generat ed on t he t ransit ion from a low- t emperat ure t o a high- t emperat ure when
set ; disables t he int errupt when clear.( R/ W) .
Low - Temper at ur e I nt er r upt Enabl e f l ag, bi t 1 Enables an int errupt t o be
generat ed on t he t ransit ion from a high- t emperat ure t o a low- t emperat ure when
set ; disables t he int errupt when clear.
The t hermal monit or int errupt can be masked by t he t hermal LVT ent ry. Aft er a
power- up or reset , t he low- t emperat ure int errupt enable and high- t emperat ure
Figure 14-8. IA32_THERM_STATUS MSR
Figure 14-9. IA32_THERM_INTERRUPT MSR
63 0
Reserved
1 2
Thermal Status
Thermal Status Log
63 0
Reserved
1 2
High-Temperature Interrupt Enable
Low-Temperature Interrupt Enable
14-16 Vol. 3
POWER AND THERMAL MANAGEMENT
int errupt enable flags in t he I A32_THERM_I NTERRUPT MSR are cleared ( int errupt s
are disabled) and t he t hermal LVT ent ry is set t o mask int errupt s. This int errupt
should be handled eit her by t he operat ing syst em or syst em management mode
( SMM) code.
Not e t hat t he operat ion of t he t hermal monit oring mechanism has no effect upon t he
clock rat e of t he processor' s int ernal high- resolut ion t imer ( t ime st amp count er) .
14.5.2.6 Adaptive Thermal Monitor
The I nt el Core 2 Duo processor family support s enhanced t hermal management
mechanism, referred t o as Adapt ive Thermal Monit or ( Adapt ive TM) .
Unlike TM2, Adapt ive TM is not limit ed t o one TM2 t ransit ion t arget . During a t hermal
t rip event , Adapt ive TM ( if enabled) select s an opt imal t arget operat ing point based
on whet her or not t he current operat ing point has effect ively cooled t he processor.
Similar t o TM2, Adapt ive TM is enable by BI OS. The BI OS is required t o t est t he TM1
and TM2 feat ure flags and enable all available t hermal cont rol mechanisms ( including
Adapt ive TM) at plat form init iat ion.
Adapt ive TM is available only t o a subset of processors t hat support TM2.
I n each chip- mult iprocessing ( CMP) silicon die, each core has a unique t hermal
sensor t hat t riggers independent ly. These t hermal sensor can t rigger TM1 or TM2
t ransit ions in t he same manner as described in Sect ion 14. 5. 2. 1 and Sect ion
14. 5. 2. 2. The t rip point of t he t hermal sensor is not programmable by soft ware since
it is set during t he fabricat ion of t he processor.
Each t hermal sensor in a processor core may be t riggered independent ly t o engage
t hermal management feat ures. I n Adapt ive TM, bot h cores will t ransit ion t o a lower
frequency and/ or lower volt age level if one sensor is t riggered.
Triggering of t his sensor is visible t o soft ware via t he t hermal int errupt LVT ent ry in
t he local API C of a given core.
14.5.3 Software Controlled Clock Modulation
Pent ium 4, I nt el Xeon and Pent ium M processors also support soft ware- cont rolled
clock modulat ion. This provides a means for operat ing syst ems t o implement a power
management policy t o reduce t he power consumpt ion of t he processor. Here, t he
st op- clock dut y cycle is cont rolled by soft ware t hrough t he
I A32_CLOCK_MODULATI ON MSR ( see Figure 14- 10) .
Vol. 3 14-17
POWER AND THERMAL MANAGEMENT
The I A32_CLOCK_MODULATI ON MSR cont ains t he following flag and field used t o
enable soft ware- cont rolled clock modulat ion and t o select t he clock modulat ion dut y
cycle:
On- Demand Cl ock Modul at i on Enabl e, bi t 4 Enables on- demand soft ware
cont rolled clock modulat ion when set ; disables soft ware- cont rolled clock
modulat ion when clear.
On- Demand Cl ock Modul at i on Dut y Cycl e, bi t s 1 t hr ough 3 Select s t he
on- demand clock modulat ion dut y cycle ( see Table 14- 1) . This field is only act ive
when t he on- demand clock modulat ion enable flag is set .
Not e t hat t he on- demand clock modulat ion mechanism ( like t he t hermal monit or)
cont rols t he processor s st op- clock circuit ry int ernally t o modulat e t he clock signal.
The STPCLK# pin is not used in t his mechanism.
The on- demand clock modulat ion mechanism can be used t o cont rol processor power
consumpt ion. Power management soft ware can writ e t o t he
I A32_CLOCK_MODULATI ON MSR t o enable clock modulat ion and t o select a modula-
t ion dut y cycle. I f on- demand clock modulat ion and TM1 are bot h enabled and t he
t hermal st at us of t he processor is hot ( bit 0 of t he I A32_THERM_STATUS MSR is set ) ,
Figure 14-10. IA32_CLOCK_MODULATION MSR
Table 14-1. On-Demand Clock Modulation Duty Cycle Field Encoding
Duty Cycle Field Encoding Duty Cycle
000B Reserved
001B 12.5% (Default)
010B 25.0%
011B 37.5%
100B 50.0%
101B 63.5%
110B 75%
111B 87.5%
63 0
Reserved
1 3
On-Demand Clock Modulation Duty Cycle
On-Demand Clock Modulation Enable
4 5
Reserved
14-18 Vol. 3
POWER AND THERMAL MANAGEMENT
clock modulat ion at t he dut y cycle specified by TM1 t akes precedence, regardless of
t he set t ing of t he on- demand clock modulat ion dut y cycle.
For Hyper-Threading Technology enabled processors, t he
I A32_CLOCK_MODULATI ON regist er is duplicat ed for each logical processor. I n order
for t he On- demand clock modulat ion feat ure t o work properly, t he feat ure must be
enabled on all t he logical processors wit hin a physical processor. I f t he programmed
dut y cycle is not ident ical for all t he logical processors, t he processor clock will modu-
lat e t o t he highest dut y cycle programmed.
For t he P6 family processors, on- demand clock modulat ion was implement ed
t hrough t he chipset , which cont rolled clock modulat ion t hrough t he processor s
STPCLK# pin.
14.5.4 Detection of Thermal Monitor and Software Controlled
Clock Modulation Facilities
The ACPI flag ( bit 22) of t he CPUI D feat ure flags indicat es t he presence of t he
I A32_THERM_STATUS, I A32_THERM_I NTERRUPT, I A32_CLOCK_MODULATI ON
MSRs, and t he xAPI C t hermal LVT ent ry.
The TM1 flag ( bit 29) of t he CPUI D feat ure flags indicat es t he presence of t he aut o-
mat ic t hermal monit oring facilit ies t hat modulat e clock dut y cycles.
14.5.5 On Die Digital Thermal Sensors
On die digit al t hermal sensor can be read using an MSR ( no I / O int erface) . I n I nt el
Core Duo processors, each core has a unique digit al sensor whose t emperat ure is
accessible using an MSR. The digit al t hermal sensor is t he preferred met hod for
reading t he die t emperat ure because ( a) it is locat ed closer t o t he hot t est port ions of
t he die, ( b) it enables soft ware t o accurat ely t rack t he die t emperat ure and t he
pot ent ial act ivat ion of t hermal t hrot t ling.
14.5.5.1 Digital Thermal Sensor Enumeration
The processor support s a digit al t hermal sensor if CPUI D. 06H. EAX[ 0] = 1. I f t he
processor support s digit al t hermal sensor, EBX[ bit s 3: 0] det ermine t he number of
t hermal t hresholds t hat are available for use.
Soft ware set s t hermal t hresholds by using t he I A32_THERM_I NTERRUPT MSR. Soft -
ware reads out put of t he digit al t hermal sensor using t he I A32_THERM_STATUS
MSR.
Vol. 3 14-19
POWER AND THERMAL MANAGEMENT
14.5.5.2 Reading the Digital Sensor
Unlike t radit ional analog t hermal devices, t he out put of t he digit al t hermal sensor is
a t emperat ure relat ive t o t he maximum support ed operat ing t emperat ure of t he
processor.
Temperat ure measurement s ret urned by digit al t hermal sensors are always at or
below TCC act ivat ion t emperat ure. Crit ical t emperat ure condit ions are det ect ed
using t he Crit ical Temperat ure St at us bit . When t his bit is set , t he processor is
operat ing at a crit ical t emperat ure and immediat e shut down of t he syst em should
occur. Once t he Crit ical Temperat ure St at us bit is set , reliable operat ion is not guar-
ant eed.
See Figure 14- 11 for t he layout of I A32_THERM_STATUS MSR. Bit fields include:
Ther mal St at us ( bi t 0, RO) This bit indicat es whet her t he digit al t hermal
sensor high- t emperat ure out put signal ( PROCHOT# ) is current ly act ive. Bit 0 = 1
indicat es t he feat ure is act ive. This bit may not be writ t en by soft ware; it reflect s
t he st at e of t he digit al t hermal sensor.
Ther mal St at us Log ( bi t 1, R/ WC0) This is a st icky bit t hat indicat es t he
hist ory of t he t hermal sensor high t emperat ure out put signal ( PROCHOT# ) .
Bit 1 = 1 if PROCHOT# has been assert ed since a previous RESET or t he last t ime
soft ware cleared t he bit . Soft ware may clear t his bit by writ ing a zero.
PROCHOT# or FORCEPR# Event ( bi t 2, RO) I ndicat es whet her PROCHOT#
or FORCEPR# is being assert ed by anot her agent on t he plat form.
Figure 14-11. IA32_THERM_STATUS Register
63 0
Reserved
15
Reading Valid
1 2 3 4 5 8 10 16 22 23 27
Resolution in Deg. Celsius
Digital Readout
Thermal Threshold #2 Log
Thermal Threshold #2 Status
Thermal Threshold #1 Log
Thermal Threshold #1 Status
Critical Temperature Log
6 7 9 31 32
Critical Temperature Status
PROCHOT# or FORCEPR# Log
PROCHOT# or FORCEPR# Event
Thermal Status Log
Thermal Status
14-20 Vol. 3
POWER AND THERMAL MANAGEMENT
PROCHOT# or FORCEPR# Log ( bi t 3, R/ WC0) St icky bit t hat indicat es
whet her PROCHOT# or FORCEPR# has been assert ed by anot her agent on t he
plat form since t he last clearing of t his bit or a reset . I f bit 3 = 1, PROCHOT# or
FORCEPR# has been ext ernally assert ed. Soft ware may clear t his bit by writ ing a
zero. Ext ernal PROCHOT# assert ions are only acknowledged if t he Bidirect ional
Prochot feat ure is enabled.
Cr i t i cal Temper at ur e St at us ( bi t 4, RO) I ndicat es whet her t he crit ical
t emperat ure det ect or out put signal is current ly act ive. I f bit 4 = 1, t he crit ical
t emperat ure det ect or out put signal is current ly act ive.
Cr i t i cal Temper at ur e Log ( bi t 5, R/ WC0) St icky bit t hat indicat es whet her
t he crit ical t emperat ure det ect or out put signal has been assert ed since t he last
clearing of t his bit or reset . I f bit 5 = 1, t he out put signal has been assert ed.
Soft ware may clear t his bit by writ ing a zero.
Ther mal Thr eshol d # 1 St at us ( bi t 6, RO) I ndicat es whet her t he act ual
t emperat ure is current ly higher t han or equal t o t he value set in Thermal
Threshold # 1. I f bit 6 = 0, t he act ual t emperat ure is lower. I f bit 6 = 1, t he
act ual t emperat ure is great er t han or equal t o TT# 1. Quant it at ive informat ion of
act ual t emperat ure can be inferred from Digit al Readout , bit s 22: 16.
Ther mal Thr eshol d # 1 Log ( bi t 7, R/ WC0) St icky bit t hat indicat es
whet her t he Thermal Threshold # 1 has been reached since t he last clearing of
t his bit or a reset . I f bit 7 = 1, t he Threshold # 1 has been reached. Soft ware may
clear t his bit by writ ing a zero.
Ther mal Thr eshol d # 2 St at us ( bi t 8, RO) I ndicat es whet her act ual
t emperat ure is current ly higher t han or equal t o t he value set in Thermal
Threshold # 2. I f bit 8 = 0, t he act ual t emperat ure is lower. I f bit 8 = 1, t he
act ual t emperat ure is great er t han or equal t o TT# 2. Quant it at ive informat ion of
act ual t emperat ure can be inferred from Digit al Readout , bit s 22: 16.
Ther mal Thr eshol d # 2 Log ( bi t 9, R/ WC0) St icky bit t hat indicat es
whet her t he Thermal Threshold # 2 has been reached since t he last clearing of
t his bit or a reset . I f bit 9 = 1, t he Thermal Threshold # 2 has been reached.
Soft ware may clear t his bit by writ ing a zero.
Di gi t al Readout ( bi t s 22: 16, RO) Digit al t emperat ure reading in 1 degree
Celsius relat ive t o t he TCC act ivat ion t emperat ure.
0: TCC Act ivat ion t emperat ure,
1: ( TCC Act ivat ion - 1) , et c. See t he processor s dat a sheet for det ails regarding
TCC act ivat ion.
A lower reading in t he Digit al Readout field ( bit s 22: 16) indicat es a higher act ual
t emperat ure.
Resol ut i on i n Degr ees Cel si us ( bi t s 30: 27, RO) Specifies t he resolut ion
( or t olerance) of t he digit al t hermal sensor. The value is in degrees Celsius. I t is
recommended t hat new t hreshold values be offset from t he current t emperat ure
by at least t he resolut ion + 1 in order t o avoid hyst eresis of int errupt generat ion.
Vol. 3 14-21
POWER AND THERMAL MANAGEMENT
Readi ng Val i d ( bi t 31, RO) I ndicat es if t he digit al readout in bit s 22: 16 is
valid. The readout is valid if bit 31 = 1.
Changes t o t emperat ure can be det ect ed using t wo t hresholds ( see Figure 14- 12) ;
one is set above and t he ot her below t he current t emperat ure. These t hresholds have
t he capabilit y of generat ing int errupt s using t he core' s local API C which soft ware
must t hen service. Not e t hat t he local API C ent ries used by t hese t hresholds are also
used by t he I nt el

Thermal Monit or; it is up t o soft ware t o det ermine t he source of a


specific int errupt .
See Figure 14- 12 for t he layout of I A32_THERM_I NTERRUPT MSR. Bit fields include:
Hi gh- Temper at ur e I nt er r upt Enabl e ( bi t 0, R/ W) This bit allows t he BI OS
t o enable t he generat ion of an int errupt on t he t ransit ion from low- t emperat ure
t o a high- t emperat ure t hreshold. Bit 0 = 0 ( default ) disables int errupt s;
bit 0 = 1 enables int errupt s.
Low - Temper at ur e I nt er r upt Enabl e ( bi t 1, R/ W) This bit allows t he BI OS
t o enable t he generat ion of an int errupt on t he t ransit ion from high- t emperat ure
t o a low- t emperat ure ( TCC de- act ivat ion) . Bit 1 = 0 ( default ) disables int errupt s;
bit 1 = 1 enables int errupt s.
PROCHOT# I nt er r upt Enabl e ( bi t 2, R/ W) This bit allows t he BI OS or OS
t o enable t he generat ion of an int errupt when PROCHOT# has been assert ed by
anot her agent on t he plat form and t he Bidirect ional Prochot feat ure is enabled.
Bit 2 = 0 disables t he int errupt ; bit 2 = 1 enables t he int errupt .
FORCEPR# I nt er r upt Enabl e ( bi t 3, R/ W) This bit allows t he BI OS or OS t o
enable t he generat ion of an int errupt when FORCEPR# has been assert ed by
anot her agent on t he plat form. Bit 3 = 0 disables t he int errupt ; bit 3 = 1 enables
t he int errupt .
Figure 14-12. IA32_THERM_INTERRUPT Register
63 0
Reserved
15
Threshold #2 Interrupt Enable
1 2 3 4 5 8 14 16 22 23 24
Threshold #2 Value
Threshold #1 Interrupt Enable
Threshold #1 Value
Overheat Interrupt Enable
FORCPR# Interrupt Enable
PROCHOT# Interrupt Enable
Low Temp. Interrupt Enable
High Temp. Interrupt Enable
14-22 Vol. 3
POWER AND THERMAL MANAGEMENT
Cr i t i cal Temper at ur e I nt er r upt Enabl e ( bi t 4, R/ W) Enables t he
generat ion of an int errupt when t he Crit ical Temperat ure Det ect or has det ect ed a
crit ical t hermal condit ion. The recommended response t o t his condit ion is a
syst em shut down. Bit 4 = 0 disables t he int errupt ; bit 4 = 1 enables t he
int errupt .
Thr eshol d # 1 Val ue ( bi t s 14: 8, R/ W) A t emperat ure t hreshold, encoded
relat ive t o t he TCC Act ivat ion t emperat ure ( using t he same format as t he Digit al
Readout ) . This t hreshold is compared against t he Digit al Readout and is used t o
generat e t he Thermal Threshold # 1 St at us and Log bit s as well as t he Threshold
# 1 t hermal int errupt delivery.
Thr eshol d # 1 I nt er r upt Enabl e ( bi t 15, R/ W) Enables t he generat ion of
an int errupt when t he act ual t emperat ure crosses t he Threshold # 1 set t ing in any
direct ion. Bit 15 = 0 enables t he int errupt ; bit 15 = 1 disables t he int errupt .
Thr eshol d # 2 Val ue ( bi t s 22: 16, R/ W) A t emperat ure t hreshold, encoded
relat ive t o t he TCC Act ivat ion t emperat ure ( using t he same format as t he Digit al
Readout ) . This t hreshold is compared against t he Digit al Readout and is used t o
generat e t he Thermal Threshold # 2 St at us and Log bit s as well as t he Threshold
# 2 t hermal int errupt delivery.
Thr eshol d # 2 I nt er r upt Enabl e ( bi t 23, R/ W) Enables t he generat ion of
an int errupt when t he act ual t emperat ure crosses t he Threshold # 2 set t ing in any
direct ion. Bit 23 = 0 enables t he int errupt ; bit 23 = 1 disables t he int errupt .
Vol. 3 15-1
CHAPTER 15
MACHINE-CHECK ARCHITECTURE
This chapt er describes t he machine- check ar chit ect ure and machine- check
except ion mechanism found in t he Pent ium 4, I nt el Xeon, and P6 family
processors. See Chapt er 6, I nt errupt 18Machine- Check Except ion
( # MC) , for more informat ion on machine- check except ions. A brief descrip-
t ion of t he Pent ium processor s machine check capabilit y is also given.
Addit ionally, a signaling mechanism for soft ware t o respond t o hardware
correct ed machine check error is covered.
15.1 MACHINE-CHECK ARCHITECTURE
The Pent ium 4, I nt el Xeon, and P6 family processors implement a machine-
check archit ect ure t hat provides a mechanism for det ect ing and report ing
hardware ( machine) err ors, such as: syst em bus errors, ECC errors, parit y
error s, cache err ors, and TLB errors. I t consist s of a set of model- specific
regist ers ( MSRs) t hat are used t o set up machine checking and addit ional
banks of MSRs used for recording errors t hat are det ect ed.
The processor signals t he det ect ion of an uncorrect ed machine- check error
by generat ing a machine- check except ion ( # MC) , which is an abort class
except ion. The implement at ion of t he machine- check archit ect ure does not
ordinarily permit t he processor t o be rest art ed reliably aft er generat ing a
machine- check except ion. However, t he machine- check- except ion handler
can collect informat ion about t he machine- check error from t he machine-
check MSRs.
St art ing wit h 45nm I nt el 64 processor wit h CPUI D signat ure
DisplayFamily_DisplayModel encoding of 06H_1AH ( see CPUI D inst ruct ion in
Chapt er 3, I nst ruct ion Set Reference, A- M in t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 2A) , t he processor can
report infor mat ion on correct ed machine- check errors and deliver a
programmable int errupt for soft ware t o respond t o MC errors, referred t o as
correct ed machine- check error int errupt ( CMCI ) . See Sect ion 15. 5 for det ail.
I nt el 64 processors support ing machine- check ar chit ect ure and CMCI may
also support an addit ional enhancement , namely, support for soft ware
recovery fr om cert ain uncorrect ed recoverable machine check errors. See
Sect ion 15. 6 for det ail.
15-2 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.2 COMPATIBILITY WITH PENTIUM

PROCESSOR
The Pent ium 4, I nt el Xeon, and P6 family processors support and ext end t he
machine- check except ion mechanism int roduced in t he Pent ium processor.
The Pent ium processor report s t he following machine- check errors:
dat a parit y errors during read cycles
unsuccessful complet ion of a bus cycle
The above errors are repor t ed using t he P5_MC_TYPE and P5_MC_ADDR
MSRs ( implement at ion specific for t he Pent ium processor) . Use t he RDMSR
inst ruct ion t o read t hese MSRs. See Appendix B, Model- Specific Regist ers
( MSRs) , for t he addresses.
The machine- check error repor t ing mechanism t hat Pent ium processors use
is similar t o t hat used in Pent ium 4, I nt el Xeon, and P6 family processors.
When an error is det ect ed, it is recorded in P5_MC_TYPE and P5_MC_ADDR;
t he processor t hen generat es a machine- check except ion ( # MC) .
See Sect ion 15. 3. 3, Mapping of t he Pent ium

Processor Machine- Check
Errors t o t he Machine- Check Archit ect ure, and Sect ion 15. 10. 2, Pent ium

Processor Machine- Check Except ion Handling, for informat ion on compat i-
bilit y bet ween machine- check code writ t en t o r un on t he Pent ium processors
and code writ t en t o run on P6 family processors.
15.3 MACHINE-CHECK MSRS
Machine check MSRs in t he Pent ium 4, I nt el Xeon, and P6 family processors
consist of a set of global cont rol and st at us regist ers and several error-
report ing regist er banks. See Figure 15- 1.
Vol. 3 15-3
MACHINE-CHECK ARCHITECTURE
Each error- report ing bank is associat ed wit h a specific hardware unit ( or
group of hardware unit s) in t he processor. Use RDMSR and WRMSR t o read
and t o writ e t hese regist ers.
15.3.1 Machine-Check Global Control MSRs
The machine- check global cont rol MSRs include t he I A32_MCG_CAP,
I A32_MCG_STATUS, and I A32_MCG_CTL. See Appendix B, Model- Specific
Regist ers ( MSRs) , for t he addresses of t hese regist ers.
15.3.1.1 IA32_MCG_CAP MSR
The I A32_MCG_CAP MSR is a read- only regist er t hat pr ovides informat ion
about t he machine- check archit ect ure of t he pr ocessor. Figure 15- 2 shows
t he st ruct ure of t he regist er in Pent ium 4, I nt el Xeon, and P6 family proces-
sors.
Figure 15-1. Machine-Check MSRs
0
63 0
63
IA32_MCG_CAP MSR
IA32_MCG_STATUS MSR
Error-Reporting Bank Registers
0
63 0
63
IA32_MCi_CTL MSR
IA32_MCi_STATUS MSR
0
63 0
63
IA32_MCi_ADDR MSR
IA32_MCi_MISC MSR
Global Control MSRs
(One Set for Each Hardware Unit)
0 63
IA32_MCG_CTL MSR
0 63
IA32_MCi_CTL2 MSR
15-4 Vol. 3
MACHINE-CHECK ARCHITECTURE
Where:
Count f i el d, bi t s 7: 0 I ndicat es t he number of hardware unit error- report ing
banks available in a part icular processor implement at ion.
MCG_CTL_P ( cont r ol MSR pr esent ) f l ag, bi t 8 I ndicat es t hat t he processor
implement s t he I A32_MCG_CTL MSR when set ; t his regist er is absent when clear.
MCG_EXT_P ( ex t ended MSRs pr esent ) f l ag, bi t 9 I ndicat es t hat t he
processor implement s t he ext ended machine- check st at e regist ers found st art ing
at MSR address 180H; t hese regist ers are absent when clear.
MCG_CMCI _P ( Cor r ect ed MC er r or count i ng/ si gnal i ng ex t ensi on
pr esent ) f l ag, bi t 10 I ndicat es ( when set ) t hat ext ended st at e and
associat ed MSRs necessary t o support t he report ing of an int errupt on a
correct ed MC error event and/ or count t hreshold of correct ed MC errors, is
present . When t his bit is set , it does not imply t his feat ure is support ed across all
banks. Soft ware should check t he availabilit y of t he necessary logic on a bank by
bank basis when using t his signaling capabilit y ( i. e. bit 30 set t able in individual
I A32_MCi_CTL2 regist er) .
MCG_TES_P ( t hr eshol d- based er r or st at us pr esent ) f l ag, bi t 11
I ndicat es ( when set ) t hat bit s 56: 53 of t he I A32_MCi_STATUS MSR are part of
t he archit ect ural space. Bit s 56: 55 are reserved, and bit s 54: 53 are used t o
report t hreshold- based error st at us. Not e t hat when MCG_TES_P is not set , bit s
56: 53 of t he I A32_MCi_STATUS MSR are model- specific.
MCG_EXT_CNT, bi t s 23: 16 I ndicat es t he number of ext ended machine-
check st at e regist ers present . This field is meaningful only when t he MCG_EXT_P
flag is set .
MCG_SER_P ( sof t w ar e er r or r ecov er y suppor t pr esent ) f l ag, bi t 24
I ndicat es ( when set ) t hat t he processor support s soft ware error recovery ( see
Figure 15-2. IA32_MCG_CAP Register
MCG_TES_P[11]
MCG_EXT_CNT[23:16]
63 9
Reserved
10 11 12
MCG_CMCI_P[10]
0 8 7
Count
MCG_EXT_P[9]
15 16 23 24
MCG_CTL_P[8]
MCG_SER_P[24]
25
Vol. 3 15-5
MACHINE-CHECK ARCHITECTURE
Sect ion 15. 6) , and I A32_MCi_STATUS MSR bit s 56: 55 are used t o report t he
signaling of uncorrect ed recoverable errors and whet her soft ware must t ake
recovery act ions for uncorrect ed errors. Not e t hat when MCG_TES_P is not set ,
bit s 56: 53 of t he I A32_MCi_STATUS MSR are model- specific. I f MCG_TES_P is set
but MCG_SER_P is not set , bit s 56: 55 are reserved.
The effect of writ ing t o t he I A32_MCG_CAP MSR is undefined.
15.3.1.2 IA32_MCG_STATUS MSR
The I A32_MCG_STATUS MSR describes t he current st at e of t he processor
aft er a machine- check except ion has occurr ed ( see Figure 15- 3) .
Where:
RI PV ( r est ar t I P val i d) f l ag, bi t 0 I ndicat es ( when set ) t hat program
execut ion can be rest art ed reliably at t he inst ruct ion point ed t o by t he inst ruct ion
point er pushed on t he st ack when t he machine- check except ion is generat ed.
When clear, t he program cannot be reliably rest art ed at t he pushed inst ruct ion
point er.
EI PV ( er r or I P v al i d) f l ag, bi t 1 I ndicat es ( when set ) t hat t he inst ruct ion
point ed t o by t he inst ruct ion point er pushed ont o t he st ack when t he machine-
check except ion is generat ed is direct ly associat ed wit h t he error. When t his flag
is cleared, t he inst ruct ion point ed t o may not be associat ed wit h t he error.
MCI P ( machi ne check i n pr ogr ess) f l ag, bi t 2 I ndicat es ( when set ) t hat a
machine- check except ion was generat ed. Soft ware can set or clear t his flag. The
occurrence of a second Machine- Check Event while MCI P is set will cause t he
processor t o ent er a shut down st at e. For informat ion on processor behavior in
t he shut down st at e, please refer t o t he descript ion in Chapt er 6, I nt errupt and
Except ion Handling : I nt errupt 8Double Fault Except ion ( # DF) .
Bit s 63: 03 in I A32_MCG_STATUS are reserved.
Figure 15-3. IA32_MCG_STATUS Register
EIPVError IP valid flag
MCIPMachine check in progress flag
63 0
Reserved
1 2 3
E
I
P
V
M
C
I
P
R
I
P
V
RIPVRestart IP valid flag
15-6 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.3.1.3 IA32_MCG_CTL MSR
The I A32_MCG_CTL MSR is present if t he capabilit y flag MCG_CTL_P is set in
t he I A32_MCG_CAP MSR.
I A32_MCG_CTL cont rols t he report ing of machine- check except ions. I f
present , writ ing 1s t o t his regist er enables machine- check feat ures and
writ ing all 0s disables machine- check feat ures. All ot her values are unde-
fined and/ or implement at ion specific.
15.3.2 Error-Reporting Register Banks
Each error- report ing regist er bank can cont ain t he I A32_MCi_CTL,
I A32_MCi_STATUS, I A32_MCi_ADDR, and I A32_MCi_MI SC MSRs. The
number of report ing banks is indicat ed by bit s [ 7: 0] of I A32_MCG_CAP MSR
( address 0179H) . The first error- report ing regist er ( I A32_MC0_CTL) always
st art s at addr ess 400H.
See Appendix B, Model- Specific Regist ers ( MSRs) , for addresses of t he
error- repor t ing regist ers in t he Pent ium 4 and I nt el Xeon processors; and for
addresses of t he error- report ing regist ers P6 family processors.
15.3.2.1 IA32_MCi_CTL MSRs
The I A32_MCi_CTL MSR cont rols error report ing for errors produced by a
part icular hardware unit ( or group of hardware unit s) . Each of t he 64 flags
( EEj ) repr esent s a pot ent ial error. Set t ing an EEj flag enables report ing of
t he associat ed error and clearing it disables report ing of t he error. The
processor does not writ e changes t o bit s t hat are not implement ed.
Figure 15- 4 shows t he bit fields of I A32_MCi_CTL.
NOTE
For P6 family processors, processors based on I nt el Core microarchi-
t ect ure ( excluding processors wit h DisplayFamily_DisplayModel
Figure 15-4. IA32_MCi_CTL Register
EEjError reporting enable flag
63 0 1 2 3
E
E
0
1
E
E
0
2
E
E
0
0
E
E
6
1
E
E
6
2
E
E
6
3
62 61
. . . . .
(where j is 00 through 63)
Vol. 3 15-7
MACHINE-CHECK ARCHITECTURE
encoding of 06H_1AH and onward) : t he operat ing syst em or
execut ive soft ware must not modify t he cont ent s of t he
I A32_MC0_CTL MSR. This MSR is int ernally aliased t o t he
EBL_CR_POWERON MSR and cont rols plat form- specific error
handling feat ures. Syst em specific firmware ( t he BI OS) is responsible
for t he appropriat e init ializat ion of t he I A32_MC0_CTL MSR. P6 family
processors only allow t he writ ing of all 1s or all 0s t o t he
I A32_MCi_CTL MSR.
15.3.2.2 IA32_MCi_STATUS MSRS
Each I A32_MCi_STATUS MSR cont ains informat ion relat ed t o a machine-
check error if it s VAL ( valid) flag is set ( see Figure 15- 5) . Soft ware is respon-
sible for clearing I A32_MCi_STATUS MSRs by explicit ly writ ing 0s t o t hem;
writ ing 1s t o t hem causes a general- prot ect ion except ion.
NOTE
Figure 15- 5 depict s t he I A32_MCi_STATUS MSR when
I A32_MCG_CAP[ 24] = 1, I A32_MCG_CAP[ 11] = 1 and
I A32_MCG_CAP[ 10] = 1. When I A32_MCG_CAP[ 24] = 0 and
I A32_MCG_CAP[ 11] = 1, bit s 56: 55 is reserved and bit s 54: 53 for
t hreshold- based error report ing. When I A32_MCG_CAP[ 11] = 0, bit s
56: 53 are part of t he Ot her I nformat ion field. The use of bit s 54: 53
for t hreshold- based error report ing began wit h I nt el Core Duo
processors, and is current ly used for cache memory. See Sect ion
15. 4, Enhanced Cache Error report ing, for more informat ion. When
I A32_MCG_CAP[ 10] = 0, bit s 52: 38 are part of t he Ot her I nfor-
mat ion field. The use of bit s 52: 38 for correct ed MC error count is
15-8 Vol. 3
MACHINE-CHECK ARCHITECTURE
int roduced wit h I nt el 64 processor having CPUI D
DisplayFamily_DisplayModel encoding of 06H_1AH.
Where:
MCA ( machi ne- check ar chi t ect ur e) er r or code f i el d, bi t s 15: 0 Specifies
t he machine- check archit ect ure- defined error code for t he machine- check error
condit ion det ect ed. The machine- check archit ect ure- defined error codes are
guarant eed t o be t he same for all I A- 32 processors t hat implement t he machine-
check archit ect ure. See Sect ion 15. 9, I nt erpret ing t he MCA Error Codes, and
Appendix E, I nt erpret ing Machine- Check Error Codes , for informat ion on
machine- check error codes.
Model - speci f i c er r or code f i el d, bi t s 31: 16 Specifies t he model- specific
error code t hat uniquely ident ifies t he machine- check error condit ion det ect ed.
The model- specific error codes may differ among I A- 32 processors for t he same
machine- check error condit ion. See Appendix E, I nt erpret ing Machine- Check
Error Codes for informat ion on model- specific error codes.
Reser ved, Er r or St at us, and Ot her I nf or mat i on f i el ds, bi t s 56: 32
Bit s 37: 32 always cont ain Ot her I nformat ion t hat is implement at ion-
specific and is not part of t he machine- check archit ect ure. Soft ware t hat
is int ended t o be port able among I A- 32 processors should not rely on
t hese values.
Figure 15-5. IA32_MCi_STATUS Register
63
Threshold-based error status (54:53)*
AR Recovery action required for UCR error (55)**
S Signaling an uncorrected recoverable (UCR) error (56)**
PCC Processor context corrupted (57)
37 32 31 16 0
P
C
A
E
ADDRV MCi_ADDR register valid (58)
MISCV MCi_MISC register valid (59)
EN Error reporting enabled (60)
UC Uncorrected error (61)
OVER Error overflow (62)
VAL MCi_STATUS register valid (63)
C
MCA Error Code
U S
R
Other
MSCOD Model
5453 38 626160 5958 575655 52 15
V
A
L
O
V
E
R
C N Specific Error Code
Info
Corrected Error
Count
* When IA32_MCG_CAP[11] (MCG_TES_P) is not set, these bits are model-specific
(part of Other Information).
** When IA32_MCG_CAP[11] or IA32_MCG_CAP[24] are not set, these bits are reserved, or
model-specific (part of Other Information).
Vol. 3 15-9
MACHINE-CHECK ARCHITECTURE
I f I A32_MCG_CAP[ 10] is 0, bit s 52: 38 also cont ain Ot her I nformat ion
( in t he same sense as bit s 37: 32) .
I f I A32_MCG_CAP[ 10] is 1, bit s 52: 38 are archit ect ural ( not model-
specific) . I n t his case, bit s 52: 38 report s t he value of a 15 bit count er t hat
increment s each t ime a correct ed error is observed by t he MCA recording
bank. This count value will cont inue t o increment unt il cleared by
soft ware. The most significant bit , 52, is a st icky count overflow bit .
I f I A32_MCG_CAP[ 11] is 0, bit s 56: 53 also cont ain Ot her I nformat ion
( in t he same sense) .
I f I A32_MCG_CAP[ 11] is 1, bit s 56: 53 are archit ect ural ( not model-
specific) . I n t his case, bit s 56: 53 have t he following funct ionalit y:
I f I A32_MCG_CAP[ 24] is 0, bit s 56: 55 are reserved.
I f I A32_MCG_CAP[ 24] is 1, bit s 56: 55 are defined as follows:
S ( Signaling) flag, bit 56 - Signals t he report ing of UCR errors in t his
MC bank. See Sect ion 15. 6. 2 for addit ional det ail.
AR ( Act ion Required) flag, bit 55 - I ndicat es ( when set ) t hat MCA
error code specific recovery act ion must be performed by syst em
soft ware at t he t ime t his error was signaled. See Sect ion 15. 6. 2 for
addit ional det ail.
I f t he UC bit ( Figure 15- 5) is 1, bit s 54: 53 are undefined.
I f t he UC bit ( Figure 15- 5) is 0, bit s 54: 53 indicat e t he st at us of t he
hardware st ruct ure t hat report ed t he t hreshold- based error. See
Table 15- 1.
PCC ( pr ocessor cont ex t cor r upt ) f l ag, bi t 57 I ndicat es ( when set ) t hat t he
st at e of t he processor might have been corrupt ed by t he error condit ion det ect ed
and t hat reliable rest art ing of t he processor may not be possible. When clear, t his
Table 15-1. Bits 54:53 in IA32_MCi_STATUS MSRs
when IA32_MCG_CAP[11] = 1 and UC = 0
Bits 54:53 Meaning
00 No tracking - No hardware status tracking is provided for the structure reporting this
event.
01 Green - Status tracking is provided for the structure posting the event; the current
status is green (below threshold). For more information, see Section 15.4, Enhanced
Cache Error reporting.
10 Yellow - Status tracking is provided for the structure posting the event; the current
status is yellow (above threshold). For more information, see Section 15.4, Enhanced
Cache Error reporting.
11 Reserved
15-10 Vol. 3
MACHINE-CHECK ARCHITECTURE
flag indicat es t hat t he error did not affect t he processor s st at e. Soft ware
rest art ing might be possible.
ADDRV ( I A32_MCi_ADDR r egi st er val i d) f l ag, bi t 58 I ndicat es ( when set )
t hat t he I A32_MCi_ADDR regist er cont ains t he address where t he error occurred
( see Sect ion 15. 3. 2. 3, I A32_MCi_ADDR MSRs ) . When clear, t his flag indicat es
t hat t he I A32_MCi_ADDR regist er is eit her not implement ed or does not cont ain
t he address where t he error occurred. Do not read t hese regist ers if t hey are not
implement ed in t he processor.
MI SCV ( I A32_MCi_MI SC r egi st er val i d) f l ag, bi t 59 I ndicat es ( when set )
t hat t he I A32_MCi_MI SC regist er cont ains addit ional informat ion regarding t he
error. When clear, t his flag indicat es t hat t he I A32_MCi_MI SC regist er is eit her
not implement ed or does not cont ain addit ional informat ion regarding t he error.
Do not read t hese regist ers if t hey are not implement ed in t he processor.
EN ( er r or enabl ed) f l ag, bi t 60 I ndicat es ( when set ) t hat t he error was
enabled by t he associat ed EEj bit of t he I A32_MCi_CTL regist er.
UC ( er r or uncor r ect ed) f l ag, bi t 61 I ndicat es ( when set ) t hat t he processor
did not or was not able t o correct t he error condit ion. When clear, t his flag
indicat es t hat t he processor was able t o correct t he error condit ion.
OVER ( machi ne check over f l ow ) f l ag, bi t 62 I ndicat es ( when set ) t hat a
machine- check error occurred while t he result s of a previous error were st ill in
t he error- report ing regist er bank ( t hat is, t he VAL bit was already set in t he
I A32_MCi_STATUS regist er) . The processor set s t he OVER flag and soft ware is
responsible for clearing it . I n general, enabled errors are writ t en over disabled
errors, and uncorrect ed errors are writ t en over correct ed errors. Uncorrect ed
errors are not writ t en over previous valid uncorrect ed errors. For more infor-
mat ion, see Sect ion 15. 3. 2. 2. 1, Overwrit e Rules for Machine Check Overflow .
VAL ( I A32_MCi_STATUS r egi st er val i d) f l ag, bi t 63 I ndicat es ( when set )
t hat t he informat ion wit hin t he I A32_MCi_STATUS regist er is valid. When t his flag
is set , t he processor follows t he rules given for t he OVER flag in t he
I A32_MCi_STATUS regist er when overwrit ing previously valid ent ries. The
processor set s t he VAL flag and soft ware is responsible for clearing it .
15.3.2.2.1 Overwrite Rules for Machine Check Overflow
Table 15- 2 shows t he overwrit e rules for how t o t reat a second event if t he
cache has already post ed an event t o t he MC bank t hat is, what t o do if t he
valid bit for an MC bank already is set t o 1. When more t han one st ruct ure
post s event s in a given bank, t hese rules specify whet her a new event will
overwrit e a previous post ing or not . These rules define a pr iorit y for uncor-
rect ed ( highest priorit y) , yellow, and green/ unmonit ored ( lowest priorit y)
st at us.
Vol. 3 15-11
MACHINE-CHECK ARCHITECTURE
I n Table 15- 2, t he values in t he t wo left - most columns are
I A32_MCi_STATUS[ 54: 53] .
I f a second event overwrit es a previously post ed event , t he informat ion ( as
guarded by individual valid bit s) in t he MCi bank is ent irely from t he second
event . Similarly, if a first event is ret ained, all of t he informat ion previously
post ed for t hat event is ret ained. I n eit her case, t he OVER bit
( MCi_St at us[ 62] ) will be set t o indicat e an overflow.
Aft er soft ware polls a post ing and clears t he regist er, t he valid bit is no
longer set and t herefore t he meaning of t he rest of t he bit s, including t he
yellow/ green/ 00 st at us field in bit s 54: 53, is undefined. The yellow/ green
indicat ion will only be post ed for event s associat ed wit h monit ored st ruc-
t ures ot herwise t he unmonit ored ( 00) code will be post ed in
MCi_St at us[ 54: 53] .
15.3.2.3 IA32_MCi_ADDR MSRs
The I A32_MCi_ADDR MSR cont ains t he address of t he code or dat a memory
locat ion t hat pr oduced t he machine- check error if t he ADDRV flag in t he
I A32_MCi_STATUS regist er is set ( see Sect ion 15- 6, I A32_MCi_ADDR
MSR ) . The I A32_MCi_ADDR regist er is eit her not implement ed or cont ains
no address if t he ADDRV flag in t he I A32_MCi_STATUS regist er is clear.
When not implement ed in t he processor, all reads and writ es t o t his MSR will
cause a general prot ect ion except ion.
The address ret urned is an offset int o a segment , linear address, or physical
addr ess. This depends on t he error encount ered. When t hese regist ers ar e
implement ed, t hese regist ers can be cleared by explicit ly writ ing 0s t o t hese
regist ers. Writ ing 1s t o t hese regist ers will cause a general- prot ect ion
except ion. See Figure 15- 6.
Table 15-2. Overwrite Rules for Enabled Errors
First Event Second Event UC bit Color MCA Info
00/green 00/green 0 00/green second
00/green yellow 0 yellow second error
yellow 00/green 0 yellow first error
yellow yellow 0 yellow either
00/green/yellow UC 1 undefined second
UC 00/green/yellow 1 undefined first
15-12 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.3.2.4 IA32_MCi_MISC MSRs
The I A32_MCi_MI SC MSR cont ains addit ional informat ion describing t he
machine- check error if t he MI SCV flag in t he I A32_MCi_STATUS regist er is
set . The I A32_MCi_MI SC_MSR is eit her not implement ed or does not cont ain
addit ional informat ion if t he MI SCV flag in t he I A32_MCi_STATUS regist er is
clear.
When not implement ed in t he processor, all reads and writ es t o t his MSR will
cause a general prot ect ion except ion. When implement ed in a processor,
t hese regist ers can be cleared by explicit ly writ ing all 0s t o t hem; writ ing 1s
t o t hem causes a general- prot ect ion except ion t o be generat ed. This regist er
is not implement ed in any of t he err or- report ing regist er banks for t he P6
family processors.
I f bot h MI SCV and I A32_MCG_CAP[ 24] are set , t he I A32_MCi_MI SC_MSR is
defined according t o Figure 15- 7 t o support soft ware recovery of uncor-
rect ed err ors ( see Sect ion 15. 6) :
Figure 15-6. IA32_MCi_ADDR MSR
Address
63 0
Reserved
35 36
Address
*
63 0
Processor Without Support For Intel 64 Architecture
Processor With Support for Intel 64 Architecture
* Useful bits in this field depend on the address methodology in use when the
the register state is saved.
Vol. 3 15-13
MACHINE-CHECK ARCHITECTURE
Recoverable Address LSB ( bit s 5: 0) : The lowest valid recoverable address bit .
I ndicat es t he posit ion of t he least significant bit ( LSB) of t he recoverable error
address. For example, if t he processor logs bit s [ 43: 9] of t he address, t he LSB
sub- field in I A32_MCi_MI SC is 01001b ( 9 decimal) . For t his example, bit s [ 8: 0]
of t he recoverable error address in I A32_MCi_ADDR should be ignored.
Address Mode ( bit s 8: 6) : Address mode for t he address logged in
I A32_MCi_ADDR. The support ed address modes are given in Table 15- 3.
Model Specific I nformat ion ( bit s 63: 9) : Not archit ect urally defined.
15.3.2.5 IA32_MCi_CTL2 MSRs
The I A32_MCi_CTL2 MSR provides t he programming int erface t o use
correct ed MC error signaling capabilit y t hat is indicat ed by
I A32_MCG_CAP[ 10] = 1. Soft ware must check for t he presence of
I A32_MCi_CTL2 on a per- bank basis.
Figure 15-7. UCR Support in IA32_MCi_MISC Register
Table 15-3. Address Mode in IA32_MCi_MISC[8:6]
IA32_MCi_MISC[8:6] Encoding Definition
000 Segment Offset
001 Linear Address
010 Physical Address
011 Memory Address
100 to 110 Reserved
111 Generic
Address Mode
63
0
Model Specific Information
6 5
Recoverable Address LSB
8 9
15-14 Vol. 3
MACHINE-CHECK ARCHITECTURE
When I A32_MCG_CAP[ 10] = 1, t he I A32_MCi_CTL2 MSR for each bank
exist s, i. e. r eads and writ es t o t hese MSR are support ed. However, signaling
int erface for correct ed MC errors may not be support ed in all banks.
The layout of I A32_MCi_CTL2 is shown in Figure 15- 8:
Cor r ect ed er r or count t hr eshol d, bi t s 14: 0 Soft ware must init ialize t his
field. The value is compared wit h t he correct ed error count field in
I A32_MCi_STATUS, bit s 38 t hrough 52. An overflow event is signaled t o t he CMCI
LVT ent ry ( see Table 10- 1) in t he API C when t he count value equals t he t hreshold
value. The new LVT ent ry in t he API C is at 02F0H offset from t he API C_BASE. I f
CMCI int erface is not support ed for a part icular bank ( but I A32_MCG_CAP[ 10]
= 1) , t his field will always read 0.
CMCI _EN- Cor r ect ed er r or i nt er r upt enabl e/ di sabl e/ i ndi cat or , bi t s 30
Soft ware set s t his bit t o enable t he generat ion of correct ed machine- check error
int errupt ( CMCI ) . I f CMCI int erface is not support ed for a part icular bank ( but
I A32_MCG_CAP[ 10] = 1) , t his bit is writ eable but will always ret urn 0 for t hat
bank. This bit also indicat es CMCI is support ed or not support ed in t he corre-
sponding bank. See Sect ion 15. 5 for det ails of soft ware det ect ion of CMCI facilit y.
Some microarchit ect ural sub- syst ems t hat are t he source of correct ed MC
errors may be shared by more t han one logical processors. Consequent ly,
t he facilit ies for report ing MC errors and cont rolling mechanisms may be
shared by more t han one logical processors. For example, t he
I A32_MCi_CTL2 MSR is shared bet ween logical pr ocessors sharing a
processor core. Soft ware is responsible t o program I A32_MCi_CTL2 MSR in
a consist ent manner wit h CMCI delivery and usage.
Aft er processor reset , I A32_MCi_CTL2 MSRs are zeroed.
Figure 15-8. IA32_MCi_CTL2 Register
CMCI_ENEnable/disable CMCI
63 15
Reserved
29
Corrected error count threshold
0 14 31 30
Reserved
Vol. 3 15-15
MACHINE-CHECK ARCHITECTURE
15.3.2.6 IA32_MCG Extended Machine Check State MSRs
The Pent ium 4 and I nt el Xeon processors implement a variable number of
ext ended machine- check st at e MSRs. The MCG_EXT_P flag in t he
I A32_MCG_CAP MSR indicat es t he presence of t hese ext ended regist ers,
and t he MCG_EXT_CNT field indicat es t he number of t hese regist ers act ually
implement ed. See Sect ion 15. 3. 1. 1, I A32_MCG_CAP MSR. Also see Table
15- 4.
I n processors wit h support for I nt el 64 archit ect ure, 64- bit machine check
st at e MSRs are aliased t o t he legacy MSRs. I n addit ion, t here may be regis-
t ers beyond I A32_MCG_MI SC. These may include up t o five reserved MSRs
( I A32_MCG_RESERVED[ 1: 5] ) and save- st at e MSRs for regist ers int roduced
in 64- bit mode. See Table 15- 5.
Table 15-4. Extended Machine Check State MSRs
in Processors Without Support for Intel 64 Architecture
MSR Address Description
IA32_MCG_EAX 180H Contains state of the EAX register at the time of the machine-
check error.
IA32_MCG_EBX 181H Contains state of the EBX register at the time of the machine-
check error.
IA32_MCG_ECX 182H Contains state of the ECX register at the time of the machine-
check error.
IA32_MCG_EDX 183H Contains state of the EDX register at the time of the machine-
check error.
IA32_MCG_ESI 184H Contains state of the ESI register at the time of the machine-
check error.
IA32_MCG_EDI 185H Contains state of the EDI register at the time of the machine-
check error.
IA32_MCG_EBP 186H Contains state of the EBP register at the time of the machine-
check error.
IA32_MCG_ESP 187H Contains state of the ESP register at the time of the machine-
check error.
IA32_MCG_EFLAGS 188H Contains state of the EFLAGS register at the time of the
machine-check error.
IA32_MCG_EIP 189H Contains state of the EIP register at the time of the machine-
check error.
IA32_MCG_MISC 18AH When set, indicates that a page assist or page fault occurred
during DS normal operation.
15-16 Vol. 3
MACHINE-CHECK ARCHITECTURE
Table 15-5. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture
MSR Address Description
IA32_MCG_RAX 180H Contains state of the RAX register at the time of the machine-
check error.
IA32_MCG_RBX 181H Contains state of the RBX register at the time of the machine-
check error.
IA32_MCG_RCX 182H Contains state of the RCX register at the time of the machine-
check error.
IA32_MCG_RDX 183H Contains state of the RDX register at the time of the machine-
check error.
IA32_MCG_RSI 184H Contains state of the RSI register at the time of the machine-
check error.
IA32_MCG_RDI 185H Contains state of the RDI register at the time of the machine-
check error.
IA32_MCG_RBP 186H Contains state of the RBP register at the time of the machine-
check error.
IA32_MCG_RSP 187H Contains state of the RSP register at the time of the machine-
check error.
IA32_MCG_RFLAGS 188H Contains state of the RFLAGS register at the time of the
machine-check error.
IA32_MCG_RIP 189H Contains state of the RIP register at the time of the machine-
check error.
IA32_MCG_MISC 18AH When set, indicates that a page assist or page fault occurred
during DS normal operation.
IA32_MCG_
RSERVED[1:5]
18BH-
18FH
These registers, if present, are reserved.
IA32_MCG_R8 190H Contains state of the R8 register at the time of the machine-
check error.
IA32_MCG_R9 191H Contains state of the R9 register at the time of the machine-
check error.
IA32_MCG_R10 192H Contains state of the R10 register at the time of the machine-
check error.
IA32_MCG_R11 193H Contains state of the R11 register at the time of the machine-
check error.
IA32_MCG_R12 194H Contains state of the R12 register at the time of the machine-
check error.
IA32_MCG_R13 195H Contains state of the R13 register at the time of the machine-
check error.
Vol. 3 15-17
MACHINE-CHECK ARCHITECTURE
When a machine- check error is det ect ed on a Pent ium 4 or I nt el Xeon
processor, t he pr ocessor saves t he st at e of t he general- pur pose regist ers,
t he R/ EFLAGS regist er, and t he R/ EI P in t hese ext ended machine- check
st at e MSRs. This informat ion can be used by a debugger t o analyze t he error.
These regist ers are read/ writ e t o zer o regist ers. This means soft ware can
read t hem; but if soft ware writ es t o t hem, only all zeros is allowed. I f soft -
ware at t empt s t o writ e a non- zero value int o one of t hese regist ers, a
general- prot ect ion ( # GP) except ion is generat ed. These regist ers are
cleared on a hardware reset ( power- up or RESET) , but maint ain t heir
cont ent s following a soft reset ( I NI T reset ) .
15.3.3 Mapping of the Pentium

Processor Machine-Check Errors
to the Machine-Check Architecture
The Pent ium pr ocessor report s machine- check er rors using t wo regist ers:
P5_MC_TYPE and P5_MC_ADDR. The Pent ium 4, I nt el Xeon, and P6 family
processors map t hese regist ers t o t he I A32_MCi_STATUS and
I A32_MCi_ADDR in t he error- report ing regist er bank. This bank report s on
t he same t ype of ext ernal bus errors report ed in P5_MC_TYPE and
P5_MC_ADDR.
The informat ion in t hese regist ers can t hen be accessed in t wo ways:
By reading t he I A32_MCi_STATUS and I A32_MCi_ADDR regist ers as part of a
general machine- check except ion handler writ t en for Pent ium 4 and P6 family
processors.
By reading t he P5_MC_TYPE and P5_MC_ADDR regist ers using t he RDMSR
inst ruct ion.
The second capabilit y permit s a machine- check except ion handler writ t en t o
run on a Pent ium processor t o be run on a Pent ium 4, I nt el Xeon, or P6
family processor. There is a limit at ion in t hat informat ion ret urned by t he
Pent ium 4, I nt el Xeon, and P6 family processors is encoded different ly t han
informat ion ret urned by t he Pent ium processor. To run a Pent ium processor
machine- check except ion handler on a Pent ium 4, I nt el Xeon, or P6 family
IA32_MCG_R14 196H Contains state of the R14 register at the time of the machine-
check error.
IA32_MCG_R15 197H Contains state of the R15 register at the time of the machine-
check error.
Table 15-5. Extended Machine Check State MSRs
In Processors With Support For Intel 64 Architecture (Contd.)
MSR Address Description
15-18 Vol. 3
MACHINE-CHECK ARCHITECTURE
processor; t he handler must be writ t en t o int erpret P5_MC_TYPE encodings
correct ly.
15.4 ENHANCED CACHE ERROR REPORTING
St art ing wit h I nt el Core Duo processors, cache error report ing was
enhanced. I n earlier I nt el pr ocessors, cache st at us was based on t he
number of corr ect ion event s t hat occurred in a cache. I n t he new paradigm,
called t hreshold- based error st at us, cache st at us is based on t he number
of lines ( ECC blocks) in a cache t hat incur repeat ed correct ions. The
t hreshold is chosen by I nt el, based on various fact ors. I f a processor
support s t hreshold- based error st at us, it set s I A32_MCG_CAP[ 11]
( MCG_TES_P) t o 1; if not , t o 0.
A processor t hat support s enhanced cache er ror report ing cont ains hard-
ware t hat t racks t he operat ing st at us of cert ain caches and pr ovides an indi-
cat or of t heir healt h. The hardware report s a green st at us when t he
number of lines t hat incur repeat ed correct ions is at or below a pre- defined
t hreshold, and a yellow st at us when t he number of affect ed lines exceeds
t he t hreshold. Yellow st at us means t hat t he cache report ing t he event is
operat ing correct ly, but you should schedule t he syst em for servicing wit hin
a few weeks.
I nt el recommends t hat you rely on t his mechanism for st ruct ur es support ed
by t hreshold- base error report ing.
The CPU/ syst em/ plat form response t o a yellow event should be less severe
t han it s response t o an uncorrect ed error. An uncorrect ed error means t hat
a serious er ror has act ually occurred, whereas t he yellow condit ion is a
warning t hat t he number of affect ed lines has exceeded t he t hreshold but is
not , in it self, a serious event : t he error was correct ed and syst em st at e was
not compromised.
The green/ yellow st at us indicat or is not a foolpr oof early warning for an
uncorrect ed error result ing from t he failure of t wo bit s in t he same ECC
block. Such a failur e can occur and cause an uncorr ect ed err or befor e t he
yellow t hreshold is r eached. However, t he chance of an uncorrect ed error
increases as t he number of affect ed lines increases.
15.5 CORRECTED MACHINE CHECK ERROR INTERRUPT
Correct ed machine- check error int errupt ( CMCI ) is an archit ect ural
enhancement t o t he machine- check archit ect ure. I t provides capabilit ies
Vol. 3 15-19
MACHINE-CHECK ARCHITECTURE
beyond t hose of t hreshold- based error report ing ( Sect ion 15. 4) . Wit h
t hreshold- based error report ing, soft ware is limit ed t o use periodic polling t o
query t he st at us of hardwar e correct ed MC err ors. CMCI provides a signaling
mechanism t o deliver a local int errupt based on t hreshold values t hat soft -
ware can program using t he I A32_MCi_CTL2 MSRs.
CMCI is disabled by default . Syst em soft ware is required t o enable CMCI for
each I A32_MCi bank t hat support t he report ing of hardware correct ed err ors
if I A32_MCG_CAP[ 10] = 1.
Syst em soft ware use I A32_MCi_CTL2 MSR t o enable/ disable t he CMCI capa-
bilit y for each bank and program t hreshold values int o I A32_MCi_CTL2 MSR.
CMCI is not affect ed by t he CR4. MCE bit , and it is not affect ed by t he
I A32_MCi_CTL MSRs.
To det ect t he exist ence of t hresholding for a given bank, soft ware writ es only
bit s 14: 0 wit h t he t hreshold value. I f t he bit s persist , t hen t hresholding is
available ( and CMCI is available) . I f t he bit s are all 0' s, t hen no t hresholding
exist s. To det ect t hat CMCI signaling exist s, soft war e writ es a 1 t o bit 30 of
t he MCi_CTL2 regist er. Upon subsequent read, I f Bit 30 = 0, no CMCI is
available for t his bank. I f Bit 30 = 1, t hen CMCI is available and enabled.
15.5.1 CMCI Local APIC Interface
The int eract ion of CMCI is depict ed in Figure 15- 9.
Figure 15-9. CMCI Behavior
Error threshold
63 0
MCi_CTL2
30 31
Error count
53 0
Software write 1 to enable
Count overflow threshold -> CMCI LVT in local APIC
29 14
37
MCi_STATUS
38 52
?=
APIC_BASE + 2F0H
15-20 Vol. 3
MACHINE-CHECK ARCHITECTURE
CMCI int errupt delivery is configured by writ ing t o t he LVT CMCI regist er
ent ry in t he local API C regist er space at default address of API C_BASE +
2F0H. A CMCI int errupt can be delivered t o more t han one logical pr ocessors
if mult iple logical processors are affect ed by t he associat ed MC errors. For
example, if a correct ed bit error in a cache shared by t wo logical processors
caused a CMCI , t he int errupt will be delivered t o bot h logical processors
sharing t hat microarchit ect ural sub- syst em. Similarly, package level errors
may cause CMCI t o be delivered t o all logical processors wit hin t he package.
However, syst em level errors will not be handled by CMCI .
The for mat of t he LVT CMCI regist er is shown in Figure 15- 10. The LVT ent r y
allows t he 4 delivery modes, an 8 bit int errupt vect or, and masking.
Vect or , bi t s 7: 0 The int errupt vect or number. The local API C allows values
16- 255 in t his regist er. Values of 0 t hrough 15 will result in an illegal vect or t o be
logged in t he API C error st at us regist er.
Del i ver y mode, bi t s 10: 8 The following delivery modes are support ed:
000B: Fixed delivery. Delivers an int errupt t o t he vect or specified in bit s 7: 0.
010B: SMI Delivers an SMI int errupt t o t he processor core t hrough t he
processor' s local SMI signal pat h. When using t his delivery mode, t he vect or
field should be set t o 00H for fut ure compat ibilit y.
100B: NMI Delivers an NMI int errupt t o t he logical processor affect ed by t he
error. The vect or informat ion is ignored.
Ent ry 101B ( I NI T) and ent ry 111B ( Ext I NT) are not support ed by CMCI LVT.
Ot her bit pat t erns are reserved.
Figure 15-10. Local APIC CMCI LVT Register
MASK
31 11
Reserved
15
Delivery Status
0 10 17 16
Reserved
7 8 12 13
Delivery Mode
Vector Number
APIC_BASE + 2F0H
Vol. 3 15-21
MACHINE-CHECK ARCHITECTURE
Del i ver y st at us, bi t s 12 I t is a read- only bit t hat , when set , indicat es t hat an
int errupt from t his source has been delivered t o t he processor core, but has not
yet been accept ed.
Mask , bi t s 16 When set , inhibit s recept ion of t he int errupt . ( Unlike t he
PerfMon LVT ent ry, t his bit is not set when an int errupt is received. When clear,
CMCI is not masked. The mask bit is set by default .
Bit s 31: 17, 15: 13 and 11 are reserved.
15.5.2 System Software Recommendation for Managing CMCI and
Machine Check Resources
Syst em soft ware must enable and manage CMCI , set up int errupt handlers
t o service CMCI int errupt s delivered t o affect ed logical processors, program
CMCI LVT ent ry, and quer y machine check banks t hat ar e shared by more
t han one logical processors.
This sect ion describes t echniques syst em soft ware can implement t o
manage CMCI init ializat ion, service CMCI int errupt s in a efficient manner t o
minimize cont ent ions t o access shared MSR resources.
15.5.2.1 CMCI Initialization
Alt hough a CMCI int errupt may be delivered t o more t han one logical proces-
sors depending on t he nat ure of t he correct ed MC error, only one inst ance of
t he int errupt service rout ine needs t o perfor m t he necessary service and
make queries t o t he machine- check banks. The following st eps describes a
t echnique t hat limit s t he amount of work t he syst em has t o do in response t o
a CMCI .
To provide maximum flexibilit y, syst em soft ware should define per- t hread dat a
st ruct ure for each logical processor t o allow equal- opport unit y and efficient
response t o int errupt delivery. Specifically, t he per- t hread dat a st ruct ure should
include a set of per- bank fields t o t rack which machine check bank it needs t o
access in response t o a delivered CMCI int errupt . The number of banks t hat
needs t o be t racked is det ermined by I A32_MCG_CAP[ 7: 0] .
I nit ializat ion of per- t hread dat a st ruct ure. The init ializat ion of per- t hread dat a
st ruct ure must be done serially on each logical processor in t he syst em. The
sequencing order t o st art t he per- t hread init ializat ion bet ween different logical
processor is arbit rary. But it must observe t he following specific det ail t o sat isfy
t he shared nat ure of specific MSR resources:
a. Each t hread init ializes it s dat a st ruct ure t o indicat e t hat it does not own any
MC bank regist ers.
15-22 Vol. 3
MACHINE-CHECK ARCHITECTURE
b. Each t hread examines I A32_MCi_CTL2[ 30] indicat or for each bank t o
det ermine if anot her t hread has already claimed ownership of t hat bank.
I f I A32_MCi_CTL2[ 30] had been set by anot her t hread. This t hread can
not own bank i and should proceed t o st ep b. and examine t he next
machine check bank unt il all of t he machine check banks are exhaust ed.
I f I A32_MCi_CTL2[ 30] = 0, proceed t o st ep c.
c. Check whet her writ ing a 1 int o I A32_MCi_CTL2[ 30] can ret urn wit h 1 on a
subsequent read t o det ermine t his bank can support CMCI .
I f I A32_MCi_CTL2[ 30] = 0, t his bank does not support CMCI . This t hread
can not own bank i and should proceed t o st ep b. and examine t he next
machine check bank unt il all of t he machine check banks are exhaust ed.
I f I A32_MCi_CTL2[ 30] = 1, modify t he per- t hread dat a st ruct ure t o
indicat e t his t hread claims ownership t o t he MC bank; proceed t o init ialize
t he error t hreshold count ( bit s 15: 0) of t hat bank as described in Chapt er
15, CMCI Threshold Management . Then proceed t o st ep b. and examine
t he next machine check bank unt il all of t he machine check banks are
exhaust ed.
Aft er t he t hread has examined all of t he machine check banks, it sees if it owns
any MC banks t o service CMCI . I f any bank has been claimed by t his t hread:
Ensure t hat t he CMCI int errupt handler has been set up as described in
Chapt er 15, CMCI I nt errupt Handler .
I nit ialize t he CMCI LVT ent ry, as described in Chapt er 15, CMCI Local API C
I nt erface .
Log and clear all of I A32_MCi_St at us regist ers for t he banks t hat t his t hread
owns. This will allow new errors t o be logged.
15.5.2.2 CMCI Threshold Management
The Correct ed MC error t hreshold field, I A32_MCi_CTL2[ 15: 0] , is archit ec-
t urally defined. Specifically, all t hese bit s are writ able by soft ware, but
different pr ocessor implement at ions may choose t o implement less t han 15
bit s as t hreshold for t he overflow comparison wit h
I A32_MCi_STATUS[ 52: 38] . The following describes t echniques t hat soft -
ware can manage CMCI t hreshold t o be compat ible wit h changes in imple-
ment at ion charact erist ics:
Soft ware can set t he init ial t hreshold value t o 1 by writ ing 1 t o
I A32_MCi_CTL2[ 15: 0] . This will cause overflow condit ion on every correct ed MC
error and generat es a CMCI int errupt .
To increase t he t hreshold and reduce t he frequency of CMCI servicing:
a. Find t he maximum t hreshold value a given processor implement at ion
support s. The st eps are:
Vol. 3 15-23
MACHINE-CHECK ARCHITECTURE
Writ e 7FFFH t o I A32_MCi_CTL2[ 15: 0] ,
Read back I A32_MCi_CTL2[ 15: 0] , t he lower 15 bit s ( 14: 0) is t he
maximum t hreshold support ed by t he processor.
b. I ncrease t he t hreshold t o a value below t he maximum value discovered using
st ep a.
15.5.2.3 CMCI Interrupt Handler
The following describes t echniques syst em soft ware may consider t o imple-
ment a CMCI service rout ine:
The service rout ine examines it s privat e per- t hread dat a st ruct ure t o check which
set of MC banks it has ownership. I f t he t hread does not have ownership of a
given MC bank, proceed t o t he next MC bank. Ownership is det ermined at init ial-
izat ion t ime which is described in Sect ion [ Cross Reference t o 14. 5. 2. 1] .
I f t he t hread had claimed ownership t o an MC bank,
Check for valid MC errors by t est ing I A32_MCi_STATUS.VALI D[ 63] ,
Log MC errors,
Clear t he MSRs of t his MC bank.
I f no valid error, proceed t o next MC bank.
When all MC banks have been processed, exit service rout ine and ret urn t o
original program execut ion.
This t echnique will allow each logical processors t o handle correct ed MC
errors independent ly and requir es no synchronizat ion t o access shared MSR
resources.
15.6 RECOVERY OF UNCORRECTED RECOVERABLE (UCR)
ERRORS
Recovery of uncorrect ed recoverable machine check er rors is an enhance-
ment in machine- check archit ect ure. The first processor t hat support s t his
feat ure is 45nm I nt el 64 processor wit h CPUI D signat ure
DisplayFamily_DisplayModel encoding of 06H_2EH. This allow syst em soft -
war e t o perform recovery act ion on cert ain class of uncorrect ed errors and
cont inue execut ion.
15-24 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.6.1 Detection of Software Error Recovery Support
Soft ware must use bit 24 of I A32_MCG_CAP ( MCG_SER_P) t o det ect t he
presence of soft ware error recovery support ( see Figure 15- 2) . When
I A32_MCG_CAP[ 24] is set , t his indicat es t hat t he processor support s soft -
ware error recovery. When t his bit is clear, t his indicat es t hat t here is no
support for err or recover y from t he processor and t he primary responsibilit y
of t he machine check handler is logging t he machine check error informat ion
and shut t ing down t he syst em.
The new class of archit ect ural MCA errors from which syst em soft ware can
at t empt recovery is called Uncorrect ed Recoverable ( UCR) Errors. UCR
errors ar e uncorrect ed errors t hat have been det ect ed and signaled but have
not corrupt ed t he pr ocessor cont ext . For cert ain UCR errors, t his means t hat
once syst em soft ware has performed a cert ain recovery act ion, it is possible
t o cont inue execut ion on t his processor. UCR error repor t ing provides an
error cont ainment mechanism for dat a poisoning. The machine check
handler will use t he error log informat ion from t he error r eport ing regist ers
t o analyze and implement specific error recovery act ions for UCR errors.
15.6.2 UCR Error Reporting and Logging
I A32_MCi_STATUS MSR is used for report ing UCR errors and exist ing
correct ed or uncorrect ed errors. The definit ions of I A32_MCi_STATUS,
including bit fields t o ident ify UCR errors, is shown in Figure 15- 5. UCR
errors can be signaled t hrough eit her t he correct ed machine check int errupt
( CMCI ) or machine check except ion ( MCE) pat h depending on t he t ype of t he
UCR error.
When I A32_MCG_CAP[ 24] is set , a UCR error is indicat ed by t he following
bit set t ings in t he I A32_MCi_STATUS regist er:
Valid ( bit 63) = 1
UC ( bit 61) = 1
PCC ( bit 57) = 0
Addit ional informat ion from t he I A32_MCi_MI SC and t he I A32_MCi_ADDR
regist ers for t he UCR er ror are available when t he ADDRV and t he MI SCV
flags in t he I A32_MCi_STATUS regist er are set ( see Sect ion 15. 3. 2. 4) . The
MCA error code field of t he I A32_MCi_STATUS regist er indicat es t he t ype of
UCR error. Syst em soft ware can int er pr et t he MCA er ror code field t o analyze
and ident ify t he necessary recovery act ion for t he given UCR error.
I n addit ion, t he I A32_MCi_STATUS regist er bit fields, bit s 56: 55, are defined
( see Figure 15- 5) t o provide addit ional informat ion t o help syst em soft ware
t o properly ident ify t he necessary recovery act ion for t he UCR error:
Vol. 3 15-25
MACHINE-CHECK ARCHITECTURE
S ( Signaling) flag, bit 56 - I ndicat es ( when set ) t hat a machine check except ion
was generat ed for t he UCR error report ed in t his MC bank and syst em soft ware
needs t o check t he AR flag and t he MCA error code fields in t he
I A32_MCi_STATUS regist er t o ident ify t he necessary recovery act ion for t his
error. When t he S flag in t he I A32_MCi_STATUS regist er is clear, t his UCR error
was not signaled via a machine check except ion and inst ead was report ed as a
correct ed machine check ( CMC) . Syst em soft ware is not required t o t ake any
recovery act ion when t he S flag in t he I A32_MCi_STATUS regist er is clear.
AR ( Act ion Required) flag, bit 55 - I ndicat es ( when set ) t hat MCA error code
specific recovery act ion must be performed by syst em soft ware at t he t ime t his
error was signaled. This recovery act ion must be complet ed successfully before
any addit ional work is scheduled for t his processor When t he RI PV flag in t he
I A32_MCG_STATUS is clear, an alt ernat ive execut ion st ream needs t o be
provided; when t he MCA error code specific recovery specific recovery act ion
cannot be successfully complet ed, syst em soft ware must shut down t he syst em.
When t he AR flag in t he I A32_MCi_STATUS regist er is clear, syst em soft ware may
st ill t ake MCA error code specific recovery act ion but t his is opt ional; syst em
soft ware can safely resume program execut ion at t he inst ruct ion point er saved
on t he st ack from t he machine check except ion when t he RI PV flag in t he
I A32_MCG_STATUS regist er is set .
Bot h t he S and t he AR flags in t he I A32_MCi_STATUS regist er are defined t o
be st icky bit s, which mean t hat once set , t he pr ocessor does not clear t hem.
Only soft ware and good power- on reset can clear t he S and t he AR- flags.
Bot h t he S and t he AR flags are only set when t he processor report s t he UCR
errors ( MCG_CAP[ 24] is set ) .
15.6.3 UCR Error Classification
Wit h t he S and AR flag encoding in t he I A32_MCi_STATUS regist er, UCR
errors can be classified as:
Uncorrect ed no act ion required ( UCNA) - is a UCR error t hat is not signaled via a
machine check except ion and, inst ead, is report ed t o syst em soft ware as a
correct ed machine check error. UCNA errors indicat e t hat some dat a in t he
syst em is corrupt ed, but t he dat a has not been consumed and t he processor
st at e is valid and you may cont inue execut ion on t his processor. UCNA errors
require no act ion from syst em soft ware t o cont inue execut ion. A UNCA error is
indicat ed wit h UC= 1, PCC= 0, S= 0 and AR= 0 in t he I A32_MCi_STATUS regist er.
soft ware recoverable act ion opt ional ( SRAO) - a UCR error is signaled via a
machine check except ion and a syst em soft ware recovery act ion is opt ional and
not required t o cont inue execut ion from t his machine check except ion. SRAO
errors indicat e t hat some dat a in t he syst em is corrupt , but t he dat a has not been
consumed and t he processor st at e is valid. SRAO errors provide t he addit ional
error informat ion for syst em soft ware t o perform a recovery act ion. An SRAO
error is indicat ed wit h UC= 1, PCC= 0, S= 1, EN= 1 and AR= 0 in t he
15-26 Vol. 3
MACHINE-CHECK ARCHITECTURE
I A32_MCi_STATUS regist er. Recovery act ions for SRAO errors are MCA error code
specific. The MI SCV and t he ADDRV flags in t he I A32_MCi_STATUS regist er are
set when t he addit ional error informat ion is available from t he I A32_MCi_MI SC
and t he I A32_MCi_ADDR regist ers. Syst em soft ware needs t o inspect t he MCA
error code fields in t he I A32_MCi_STATUS regist er t o ident ify t he specific
recovery act ion for a given SRAO error. I f MI SCV and ADDRV are not set , it is
recommended t hat no syst em soft ware error recovery be performed however,
you can resume execut ion.
soft ware recoverable act ion required ( SRAR) - a UCR error t hat requires syst em
soft ware t o t ake a recovery act ion on t his processor before scheduling anot her
st ream of execut ion on t his processor. SRAR errors indicat e t hat t he error was
det ect ed and raised at t he point of t he consumpt ion in t he execut ion flow. An
SRAR error is indicat ed wit h UC= 1, PCC= 0, S= 1, EN= 1 and AR= 1 in t he
I A32_MCi_STATUS regist er. Recovery act ions are MCA error code specific. The
MI SCV and t he ADDRV flags in t he I A32_MCi_STATUS regist er are set when t he
addit ional error informat ion is available from t he I A32_MCi_MI SC and t he
I A32_MCi_ADDR regist ers. Syst em soft ware needs t o inspect t he MCA error code
fields in t he I A32_MCi_STATUS regist er t o ident ify t he specific recovery act ion for
a given SRAR error. I f MI SCV and ADDRV are not set , it is recommended t hat
syst em soft ware shut down t he syst em.
Table 15- 6 summarizes UCR, correct ed, and uncorrect ed errors.
Table 15-6. MC Error Classifications
Type of Error
1
NOTES:
1. VAL=1, EN=1 for UC=1 errors; OVER=0 for UC=1 and PCC=0 errors SRAR, SRAO and UCNA errors
are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
UC PCC S AR Signaling Software Action Example
Uncorrected Error
(UC)
1 1 x x MCE Reset the system
SRAR 1 0 1 1 MCE For known MCACOD, take
specific recovery action;
For unknown MCACOD,
must bugcheck
Cache to
processor load
error
SRAO 1 0 1 0 MCE For known MCACOD, take
specific recovery action;
For unknown MCACOD,
OK to keep the system
running
Patrol scrub and
explicit writeback
poison errors
UCNA 1 0 0 0 CMC Log the error and Ok to
keep the system running
Poison detection
error
Corrected Error (CE) 0 0 x x CMC Log the error and no
corrective action
required
ECC in caches and
memory
Vol. 3 15-27
MACHINE-CHECK ARCHITECTURE
15.6.4 UCR Error Overwrite Rules
I n general, t he overwrit e rules are as follows:
UCR errors will overwrit e correct ed errors.
Uncorrect ed ( PCC= 1) errors overwrit e UCR ( PCC= 0) errors.
UCR errors are not writ t en over previous UCR errors.
Correct ed errors do not writ e over previous UCR errors.
Regardless of whet her t he 1st err or is ret ained or t he 2nd error is over-
writ t en over t he 1st er ror, t he OVER flag in t he I A32_MCi_STATUS regist er
will be set t o indicat e an overflow condit ion. As t he S flag and AR flag in t he
I A32_MCi_STATUS regist er are defined t o be st icky flags, a second event
cannot clear t hese 2 flags once set , however t he MC bank informat ion may
be filled in for t he 2nd error. The t able below shows t he overwrit e rules and
how t o t reat a second error if t he first event is already logged in a MC bank
along wit h t he result ing bit set t ing of t he UC, PCC, and AR flags in t he
I A32_MCi_STATUS regist er. As UCNA and SRA0 errors do not require
recovery act ion from syst em soft ware t o cont inue program execut ion, a
syst em reset by syst em soft ware is not required unless t he AR flag or PCC
flag is set for t he UCR overflow case ( OVER= 1, VAL= 1, UC= 1, PCC= 0) .
Table 15- 7 list s overwrit e rules for uncor rect ed errors, cor rect ed errors, and
uncorrect ed r ecoverable errors.
Table 15-7. Overwrite Rules for UC, CE, and UCR Errors
First Event Second Event UC PCC S AR MCA Bank Reset System
CE UCR 1 0 0 if UCNA,
else 1
1 if SRAR,
else 0
second yes, if AR=1
UCR CE 1 0 0 if UCNA,
else 1
1 if SRAR,
else 0
first yes, if AR=1
UCNA UCNA 1 0 0 0 first no
UCNA SRAO 1 0 1 0 first no
UCNA SRAR 1 0 1 1 first yes
SRAO UCNA 1 0 1 0 first no
SRAO SRAO 1 0 1 0 first no
SRAO SRAR 1 0 1 1 first yes
SRAR UCNA 1 0 1 1 first yes
SRAR SRAO 1 0 1 1 first yes
SRAR SRAR 1 0 1 1 first yes
UCR UC 1 1 undefined undefined second yes
UC UCR 1 1 undefined undefined first yes
15-28 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.7 MACHINE-CHECK AVAILABILITY
The machine- check archit ect ure and machine- check except ion ( # MC) are
model- specific feat ures. Soft ware can execut e t he CPUI D inst ruct ion t o
det ermine whet her a processor implement s t hese feat ures. Following t he
execut ion of t he CPUI D inst ruct ion, t he set t ings of t he MCA flag ( bit 14) and
MCE flag ( bit 7) in EDX indicat e whet her t he processor implement s t he
machine- check archit ect ur e and machine- check except ion.
15.8 MACHINE-CHECK INITIALIZATION
To use t he processors machine- check archit ect ure, soft ware must init ialize
t he processor t o act ivat e t he machine- check except ion and t he er ror-
report ing mechanism.
Example 15- 1 gives pseudocode for performing t his init ializat ion. This
pseudocode checks for t he exist ence of t he machine- check archit ect ure and
except ion; it t hen enables machine- check except ion and t he error- report ing
regist er banks. The pseudocode shown is compat ible wit h t he Pent ium 4,
I nt el Xeon, P6 family, and Pent ium processors.
Following power up or power cycling, I A32_MCi_STATUS regist ers are not
guarant eed t o have valid dat a unt il aft er t hey are init ially cleared t o zero by
soft ware ( as shown in t he init ializat ion pseudocode in Example 15- 1) . I n
addit ion, when using P6 family pr ocessors, soft ware must set MCi_STATUS
regist ers t o zero when doing a soft - reset .
Example 15-1. Machine-Check Initialization Pseudocode
Check CPUID Feature Flags for MCE and MCA support
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
IF (IA32_MCG_CAP.MCG_CTL_P = 1)
(* IA32_MCG_CTL register is present *)
THEN
IA32_MCG_CTL FFFFFFFFFFFFFFFFH;
(* enables all MCA features *)
FI
(* Determine number of error-reporting banks supported *)
COUNT IA32_MCG_CAP.Count;
MAX_BANK_NUMBER COUNT - 1;
IF (Processor Family is 6H and Processor EXTMODEL:MODEL is less than 1AH)
THEN
Vol. 3 15-29
MACHINE-CHECK ARCHITECTURE
(* Enable logging of all errors except for MC0_CTL register *)
FOR error-reporting banks (1 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL 0FFFFFFFFFFFFFFFFH;
OD
ELSE
(* Enable logging of all errors including MC0_CTL register *)
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL 0FFFFFFFFFFFFFFFFH;
OD
FI
(* BIOS clears all errors only on power-on reset *)
IF (BIOS detects Power-on reset)
THEN
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_STATUS 0;
OD
ELSE
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
(Optional for BIOS and OS) Log valid errors
(OS only) IA32_MCi_STATUS 0;
OD
FI
FI
Setup the Machine Check Exception (#MC) handler for vector 18 in IDT
Set the MCE bit (bit 6) in CR4 register to enable Machine-Check Exceptions
FI
15.9 INTERPRETING THE MCA ERROR CODES
When t he processor det ect s a machine- check error condit ion, it writ es a 16-
bit error code t o t he MCA error code field of one of t he I A32_MCi_STATUS
regist ers and set s t he VAL ( valid) flag in t hat regist er. The pr ocessor may
also writ e a 16- bit model- specific error code in t he I A32_MCi_STATUS
regist er depending on t he implement at ion of t he machine- check archit ec-
t ure of t he pr ocessor.
The MCA error codes are archit ect urally defined for I nt el 64 and I A- 32
processors. To det ermine t he cause of a machine- check except ion, t he
machine- check except ion handler must read t he VAL flag for each
I A32_MCi_STATUS regist er. I f t he flag is set , t he machine check- except ion
15-30 Vol. 3
MACHINE-CHECK ARCHITECTURE
handler must t hen read t he MCA error code field of t he regist er. I t is t he
encoding of t he MCA error code field [ 15: 0] t hat det ermines t he t ype of error
being report ed and not t he regist er bank report ing it .
There are t wo t ypes of MCA error codes: simple error codes and compound
error codes.
15.9.1 Simple Error Codes
Table 15- 8 shows t he simple error codes. These unique codes indicat e global
error informat ion.
15.9.2 Compound Error Codes
Compound error codes describe errors relat ed t o t he TLBs, memory, caches,
bus and int erconnect logic, and int ernal t imer. A set of sub- fields is common
t o all of compound errors. These sub- fields describe t he t ype of access, level
in t he cache hierarchy, and t ype of request . Table 15- 9 shows t he general
form of t he compound err or codes.
Table 15-8. IA32_MCi_Status [15:0] Simple Error Code Encoding
Error Code Binary Encoding Meaning
No Error 0000 0000 0000 0000 No error has been reported to this bank of
error-reporting registers.
Unclassified 0000 0000 0000 0001 This error has not been classified into the
MCA error classes.
Microcode ROM Parity
Error
0000 0000 0000 0010 Parity error in internal microcode ROM
External Error 0000 0000 0000 0011 The BINIT# from another processor caused
this processor to enter machine check.
1
FRC Error 0000 0000 0000 0100 FRC (functional redundancy check)
master/slave error
Internal Parity Error 0000 0000 0000 0101 Internal parity error.
Internal Timer Error 0000 0100 0000 0000 Internal timer error.
Internal Unclassified 0000 01xx xxxx xxxx Internal unclassified errors.
2
NOTES:
1. BINIT# assertion will cause a machine check exception if the processor (or any processor on the
same external bus) has BINIT# observation enabled during power-on configuration (hardware
strapping) and if machine check exceptions are enabled (by setting CR4.MCE = 1).
2. At least one X must equal one. Internal unclassified errors have not been classified.
Vol. 3 15-31
MACHINE-CHECK ARCHITECTURE
The I nt erpret at ion column in t he t able indicat es t he name of a compound
error. The name is const ruct ed by subst it ut ing mnemonics for t he sub- field
names given wit hin curly braces. For example, t he err or code
I CACHEL1_RD_ERR is const ruct ed from t he form:
{TT}CACHE{LL}_{RRRR}_ERR,
where {TT} is replaced by I, {LL} is replaced by L1, and {RRRR} is replaced by RD.
For more informat ion on t he Form and I nt erpret at ion columns, see
Sect ions Sect ion 15. 9. 2. 1, Correct ion Report Filt ering ( F) Bit t hrough
Sect ion 15. 9. 2. 5, Bus and I nt erconnect Errors .
15.9.2.1 Correction Report Filtering (F) Bit
St art ing wit h I nt el Core Duo processors, bit 12 in t he Form column in Table
15- 9 is used t o indicat e t hat a part icular post ing t o a log may be t he last
post ing for correct ions in t hat line/ ent ry, at least for some t ime:
0 in bit 12 indicat es normal filt ering ( original P6/ Pent ium4/ Xeon processor
meaning) .
1 in bit 12 indicat es correct ed filt ering ( filt ering is act ivat ed for t he line/ ent ry in
t he post ing) . Filt ering means t hat some or all of t he subsequent correct ions t o
t his ent ry ( in t his st ruct ure) will not be post ed. The enhanced error report ing
int roduced wit h t he I nt el Core Duo processors is based on t racking t he lines
affect ed by repeat ed correct ions ( see Sect ion 15. 4, Enhanced Cache Error
report ing ) . This capabilit y is indicat ed by I A32_MCG_CAP[ 11] . Only t he first few
correct ion event s for a line are post ed; subsequent redundant correct ion event s
t o t he same line are not post ed. Uncorrect ed event s are always post ed.
The behavior of error filt ering aft er crossing t he yellow t hreshold is model-
specific.
15.9.2.2 Transaction Type (TT) Sub-Field
The 2- bit TT sub- field ( Table 15- 10) indicat es t he t ype of t ransact ion ( dat a,
inst ruct ion, or generic) . The sub- field applies t o t he TLB, cache, and int er-
Table 15-9. IA32_MCi_Status [15:0] Compound Error Code Encoding
Type Form Interpretation
Generic Cache Hierarchy 000F 0000 0000 11LL Generic cache hierarchy error
TLB Errors 000F 0000 0001 TTLL {TT}TLB{LL}_ERR
Memory Controller Errors 000F 0000 1MMM CCCC {MMM}_CHANNEL{CCCC}_ERR
Cache Hierarchy Errors 000F 0001 RRRR TTLL {TT}CACHE{LL}_{RRRR}_ERR
Bus and Interconnect Errors 000F 1PPT RRRR IILL BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR
15-32 Vol. 3
MACHINE-CHECK ARCHITECTURE
connect error condit ions. Not e t hat int erconnect error condit ions are prima-
rily associat ed wit h P6 family and Pent ium processors, which ut ilize an
ext ernal API C bus separat e from t he syst em bus. The generic t ype is
report ed when t he pr ocessor cannot det ermine t he t ransact ion t ype.
15.9.2.3 Level (LL) Sub-Field
The 2- bit LL sub- field ( see Table 15- 11) indicat es t he level in t he memory
hierarchy where t he error occurred ( level 0, level 1, level 2, or generic) . The
LL sub- field also applies t o t he TLB, cache, and int erconnect error condi-
t ions. The Pent ium 4, I nt el Xeon, and P6 family processors support t wo
levels in t he cache hierarchy and one level in t he TLBs. Again, t he gener ic
t ype is report ed when t he processor cannot det ermine t he hierarchy level.
15.9.2.4 Request (RRRR) Sub-Field
The 4- bit RRRR sub- field ( see Table 15- 12) indicat es t he t ype of act ion asso-
ciat ed wit h t he error. Act ions include read and wr it e operat ions, pr efet ches,
cache evict ions, and snoops. Generic error is ret urned when t he t ype of error
cannot be det er mined. Gener ic read and generic writ e are ret urned when
t he processor cannot det ermine t he t ype of inst ruct ion or dat a request t hat
caused t he error. Evict ion and snoop request s apply only t o t he caches. All of
t he ot her request s apply t o TLBs, caches and int erconnect s.
Table 15-10. Encoding for TT (Transaction Type) Sub-Field
Transaction Type Mnemonic Binary Encoding
Instruction I 00
Data D 01
Generic G 10
Table 15-11. Level Encoding for LL (Memory Hierarchy Level) Sub-Field
Hierarchy Level Mnemonic Binary Encoding
Level 0 L0 00
Level 1 L1 01
Level 2 L2 10
Generic LG 11
Table 15-12. Encoding of Request (RRRR) Sub-Field
Request Type Mnemonic Binary Encoding
Generic Error ERR 0000
Generic Read RD 0001
Vol. 3 15-33
MACHINE-CHECK ARCHITECTURE
15.9.2.5 Bus and Interconnect Errors
The bus and int erconnect errors are defined wit h t he 2- bit PP ( part icipat ion) ,
1- bit T ( t ime- out ) , and 2- bit I I ( memory or I / O) sub- fields, in addit ion t o t he
LL and RRRR sub- fields ( see Table 15- 13) . The bus error condit ions are
implement at ion dependent and relat ed t o t he t ype of bus implement ed by
t he processor. Likewise, t he int erconnect error condit ions are pr edicat ed on
a specific implement at ion- dependent int erconnect model t hat describes t he
connect ions bet ween t he different levels of t he st orage hierarchy. The t ype
of bus is implement at ion dependent , and as such is not specified in t his
document . A bus or int erconnect t ransact ion consist s of a request involving
an address and a response.
Generic Write WR 0010
Data Read DRD 0011
Data Write DWR 0100
Instruction Fetch IRD 0101
Prefetch PREFETCH 0110
Eviction EVICT 0111
Snoop SNOOP 1000
Table 15-13. Encodings of PP, T, and II Sub-Fields
Sub-Field Transaction Mnemonic Binary Encoding
PP (Participation) Local processor* originated request SRC 00
Local processor* responded to request RES 01
Local processor* observed error as
third party
OBS 10
Generic 11
T (Time-out) Request timed out TIMEOUT 1
Request did not time out NOTIMEOUT 0
II (Memory or I/O) Memory Access M 00
Reserved 01
I/O IO 10
Other transaction 11
NOTE:
* Local processor differentiates the processor reporting the error from other system components
(including the APIC, other processors, etc.).
Table 15-12. Encoding of Request (RRRR) Sub-Field (Contd.)
15-34 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.9.2.6 Memory Controller Errors
The memory cont roller errors are defined wit h t he 3- bit MMM ( memory
t ransact ion t ype) , and 4- bit CCCC ( channel) sub- fields. The encodings for
MMM and CCCC are defined in Table 15- 14.
15.9.3 Architecturally Defined UCR Errors
Soft ware recoverable compound error code are defined in t his sect ion.
15.9.3.1 Architecturally Defined SRAO Errors
The following t wo SRAO errors are archit ect urally defined.
UCR Errors det ect ed by memory cont roller scrubbing; and
UCR Errors det ect ed during L3 cache ( L3) explicit writ ebacks.
The MCA error code encodings for t hese t wo archit ect urally- defined UCR
errors corresponds t o sub- classes of compound MCA err or codes ( see Table
15- 9) . Their values and compound encoding format are given in Table
15- 15.
Table 15-14. Encodings of MMM and CCCC Sub-Fields
Sub-Field Transaction Mnemonic Binary Encoding
MMM Generic undefined request GEN 000
Memory read error RD 001
Memory write error WR 010
Address/Command Error AC 011
Memory Scrubbing Error MS 100
Reserved 101-111
CCCC Channel number CHN 0000-1110
Channel not specified 1111
Vol. 3 15-35
MACHINE-CHECK ARCHITECTURE
Table 15- 16 list s values of relevant bit fields of I A32_MCi_STATUS for archi-
t ect urally defined SRAO errors.
For bot h t he memory scrubbing and L3 explicit writ eback errors, t he ADDRV
and MI SCV flags in t he I A32_MCi_STATUS regist er are set t o indicat e t hat
t he offending physical address informat ion is available from t he
I A32_MCi_MI SC and t he I A32_MCi_ADDR regist ers. For t he memory scrub-
bing and L3 explicit writ eback errors, t he addr ess mode in t he
I A32_MCi_MI SC regist er should be set as physical address mode ( 010b) and
t he address LSB informat ion in t he I A32_MCi_MI SC regist er should indicat e
t he lowest valid address bit in t he address informat ion provided from t he
I A32_MCi_ADDR regist er.
An MCE signal is broadcast t o all logical processors on t he syst em on which
t he UCR errors are support ed. MCi_STATUS banks can be shared by logical
processors wit hin a core or wit hin t he same package. So several logical
processors may find an SRAO error in t he shared I A32_MCi_STATUS bank
but ot her processors do not find it in any of t he I A32_MCi_STATUS banks.
Table 15- 17 shows t he RI PV and EI PV flag indicat ion in t he
I A32_MCG_STATUS regist er for t he memory scrubbing and L3 explicit writ e-
back errors on bot h t he report ing and non- report ing logical processors.
Table 15-15. MCA Compound Error Code Encoding for SRAO Errors
Type MCACOD Value MCA Error Code Encoding
1
NOTES:
1. Note that for both of these errors the correction report filtering (F) bit (bit 12) of the MCA error is
0, indicating "normal" filtering.
Memory Scrubbing 0xC0 - 0xCF 0000_0000_1100_CCCC
000F 0000 1MMM CCCC (Memory Controller Error), where
Memory subfield MMM = 100B (memory scrubbing)
Channel subfield CCCC = channel # or generic
L3 Explicit Writeback 0x17A 0000_0001_0111_1010
000F 0001 RRRR TTLL (Cache Hierarchy Error) where
Request subfields RRRR = 0111B (Eviction)
Transaction Type subfields TT = 10B (Generic)
Level subfields LL = 10B
Table 15-16. IA32_MCi_STATUS Values for SRAO Errors
SRAO Error Valid OVER UC EN MISCV ADDRV PCC S AR MCACOD
Memory Scrubbing 1 0 1 1 1 1 0 1 0 0xC0-0xCF
L3 Explicit Writeback 1 0 1 1 1 1 0 1 0 0x17A
15-36 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.9.3.2 Architecturally Defined SRAR Errors
The following t wo SRAR errors are archit ect urally defined.
UCR Errors det ect ed on dat a load; and
UCR Errors det ect ed on inst ruct ion fet ch.
The MCA error code encodings for t hese t wo archit ect urally- defined UCR
errors corresponds t o sub- classes of compound MCA err or codes ( see Table
15- 9) . Their values and compound encoding format are given in Table
15- 18.
Table 15-17. IA32_MCG_STATUS Flag Indication for SRAO Errors
SRAO Type Reporting Logical Processors Non-reporting Logical Processors
RIPV EIPV RIPV EIPV
Memory Scrubbing 1 0 1 0
L3 Explicit Writeback 1 0 1 0
Table 15-18. MCA Compound Error Code Encoding for SRAR Errors
Type MCACOD Value MCA Error Code Encoding
1
NOTES:
1. Note that for both of these errors the correction report filtering (F) bit (bit 12) of the MCA error is
0, indicating "normal" filtering.
Data Load 0x134 0000_0001_0011_0100
000F 0001 RRRR TTLL (Cache Hierarchy Error), where
Request subfield RRRR = 0011B (Data Load)
Transaction Type subfield TT= 01B (Data)
Level subfield LL = 00B (Level 0)
Instruction Fetch 0x150 0000_0001_0101_0000
000F 0001 RRRR TTLL (Cache Hierarchy Error), where
Request subfield RRRR = 0101B (Instruction Fetch)
Transaction Type subfield TT= 00B (Instruction)
Level subfield LL = 00B (Level 0)
Vol. 3 15-37
MACHINE-CHECK ARCHITECTURE
Table 15- 19 list s values of relevant bit fields of I A32_MCi_STATUS for archi-
t ect urally defined SRAR errors.
For bot h t he dat a load and inst ruct ion fet ch errors, t he ADDRV and MI SCV
flags in t he I A32_MCi_STATUS regist er are set t o indicat e t hat t he offending
physical addr ess informat ion is available from t he I A32_MCi_MI SC and t he
I A32_MCi_ADDR regist ers. For t he memory scrubbing and L3 explicit writ e-
back errors, t he address mode in t he I A32_MCi_MI SC regist er should be set
as physical address mode ( 010b) and t he addr ess LSB informat ion in t he
I A32_MCi_MI SC regist er should indicat e t he lowest valid address bit in t he
addr ess informat ion provided fr om t he I A32_MCi_ADDR regist er.
An MCE signal is broadcast t o all logical processors on t he syst em on which
t he UCR errors are support ed. The I A32_MCG_STATUS MSR allows syst em
soft ware t o dist inguish t he affect ed logical processor of an SRAR error
amongst logical processors t hat observed SRAR via a shared MCi_STATUS
bank.
Table 15- 20 shows t he RI PV and EI PV flag indicat ion in t he
I A32_MCG_STATUS regist er for t he dat a load and inst ruct ion fet ch errors on
bot h t he report ing and non- report ing logical processors.
The affect ed logical processor is t he one t hat has det ect ed and raised an
SRAR error at t he point of t he consumpt ion in t he execut ion flow. The
affect ed logical processor should find t he Dat a Load or t he I nst ruct ion Fet ch
error informat ion in t he I A32_MCi_STATUS regist er t hat is report ing t he
SRAR error.
For Dat a Load recoverable errors, t he affect ed logical processor should find
t hat t he I A32_MCG_STATUS. RI PV flag is cleared and t he
I A32_MCG_STATUS. EI PV flag is set indicat ing t hat t he error is det ect ed at
t he inst ruct ion point er saved on t he st ack for t his machine check except ion
and rest art ing execut ion wit h t he int errupt ed cont ext is not possible.
Table 15-19. IA32_MCi_STATUS Values for SRAR Errors
SRAR Error Valid OVER UC EN MISCV ADDRV PCC S AR MCACOD
Data Load 1 0 1 1 1 1 0 1 1 0x134
Instruction Fetch 1 0 1 1 1 1 0 1 1 0x150
Table 15-20. IA32_MCG_STATUS Flag Indication for SRAR Errors
SRAR Type Affected Logical Processors Non-Affected Logical Processors
RIPV EIPV RIPV EIPV
Data Load 0 1 1 0
instruction Fetch 0 0 1 0
15-38 Vol. 3
MACHINE-CHECK ARCHITECTURE
For I nst ruct ion Fet ch recoverable error, t he affect ed logical processor should
find t hat t he RI PV flag and t he EI PV Flag in t he I A32_MCG_STATUS regist er
are cleared, indicat ing t hat t he error is det ect ed at t he inst ruct ion point er
saved on t he st ack may not be associat ed wit h t his err or and rest art ing t he
execut ion wit h t he int err upt ed cont ext is not possible.
The logical processors t hat observed but not affect ed by an SRAR error
should find t hat t he RI PV flag in t he I A32_MCG_STATUS regist er is set and
t he EI PV flag in t he I A32_MCG_STATUS regist er is cleared, indicat ing t hat it
is safe t o rest art t he execut ion at t he inst ruct ion saved on t he st ack for t he
machine check except ion on t hese processors aft er t he recovery act ion is
successfully t aken by syst em soft ware.
For t he Dat a- Load and t he I nst ruct ion- Fet ch recoverable errors, syst em
soft ware may t ake t he following recovery act ions for t he affect ed logical
processor:
The current execut ing t hread cannot be cont inued. You must t erminat e t he
int errupt ed st ream of execut ion and provide a new st ream of execut ion on ret urn
from t he machine check handler for t he affect ed logical processor
I n addit ion t o t aking t he recovery act ion described above, syst em soft ware
may also need t o disable t he use of t he affect ed page from t he program. This
recovery act ion by syst em soft ware may prevent t he occurrence of fut ure
consumpt ion errors fr om t hat affect ed page.
15.9.4 Multiple MCA Errors
When mult iple MCA errors are det ect ed wit hin a cert ain det ect ion window,
t he processor may aggregat e t he report ing of t hese errors t oget her as a
single event , i. e. a single machine except ion condit ion. I f t his occurs,
syst em soft ware may find mult iple MCA errors logged in different MC banks
on one logical processor or find mult iple MCA errors logged across different
processors for a single machine check broadcast event . I n order t o handle
mult iple UCR errors report ed from a single machine check event and
possibly recover fr om mult iple errors, syst em soft ware may consider t he
following:
Whet her it can recover from mult iple errors is det ermined by t he most severe
error report ed on t he syst em. I f t he most severe error is found t o be an unrecov-
erable error ( VAL= 1, UC= 1, PCC= 1 and EN= 1) aft er syst em soft ware examines
t he MC banks of all processors t o which t he MCA signal is broadcast , recovery
from t he mult iple errors is not possible and syst em soft ware needs t o reset t he
syst em.
Vol. 3 15-39
MACHINE-CHECK ARCHITECTURE
When mult iple recoverable errors are report ed and no ot her fat al condit ion ( e. g. .
overflowed condit ion for SRAR error) is found for t he report ed recoverable errors,
it is possible for syst em soft ware t o recover from t he mult iple recoverable errors
by t aking necessary recovery act ion for each individual recoverable error.
However, syst em soft ware can no longer expect one t o one relat ionship wit h t he
error informat ion recorded in t he I A32_MCi_STATUS regist er and t he st at es of
t he RI PV and EI PV flags in t he I A32_MCG_STATUS regist er as t he st at es of t he
RI PV and t he EI PV flags in t he I A32_MCG_STATUS regist er may indicat e t he
informat ion for t he most severe error recorded on t he processor. Syst em
soft ware is required t o use t he RI PV flag indicat ion in t he I A32_MCG_STATUS
regist er t o make a final decision of recoverabilit y of t he errors and find t he
rest art - abilit y requirement aft er examining each I A32_MCi_STATUS regist er
error informat ion in t he MC banks.
15.9.5 Machine-Check Error Codes Interpretation
Appendix E, I nt erpret ing Machine- Check Error Codes, provides informat ion
on int erpret ing t he MCA error code, model- specific error code, and ot her
informat ion err or code fields. For P6 family processors, informat ion has been
included on decoding ext ernal bus error s. For Pent ium 4 and I nt el Xeon
processors; informat ion is included on ext ernal bus, int er nal t imer and cache
hierar chy errors.
15.10 GUIDELINES FOR WRITING MACHINE-CHECK
SOFTWARE
The machine- check archit ect ure and error logging can be used in t hree
different ways:
To det ect machine errors during normal inst ruct ion execut ion, using t he
machine- check except ion ( # MC) .
To periodically check and log machine errors.
To examine recoverable UCR errors, det ermine soft ware recoverabilit y and
perform recovery act ions via a machine- check except ion handler or a correct ed
machine- check int errupt handler.
To use t he machine- check except ion, t he operat ing syst em or execut ive
soft ware must provide a machine- check except ion handler. This handler may
need t o be designed specifically for each family of processors.
A special program or ut ilit y is required t o log machine errors.
Guidelines for writ ing a machine- check except ion handler or a machine-
error logging ut ilit y are given in t he following sect ions.
15-40 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.10.1 Machine-Check Exception Handler
The machine- check except ion ( # MC) corresponds t o vect or 18. To service
machine- check except ions, a t rap gat e must be added t o t he I DT. The
point er in t he t rap gat e must point t o a machine- check except ion handler.
Two appr oaches can be t aken t o designing t he except ion handler:
1. The handler can merely log all t he machine st at us and error informat ion, t hen call
a debugger or shut down t he syst em.
2. The handler can analyze t he report ed error informat ion and, in some cases,
at t empt t o correct t he error and rest art t he processor.
For Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors; virt ually all
machine- check condit ions cannot be correct ed ( t hey result in abort - t ype
except ions) . The logging of st at us and error informat ion is t herefore a base-
line implement at ion requirement .
When recovery from a machine- check error may be possible, consider t he
following when writ ing a machine- check except ion handler:
To det ermine t he nat ure of t he error, t he handler must read each of t he error-
report ing regist er banks. The count field in t he I A32_MCG_CAP regist er gives
number of regist er banks. The first regist er of regist er bank 0 is at address 400H.
The VAL ( valid) flag in each I A32_MCi_STATUS regist er indicat es whet her t he
error informat ion in t he regist er is valid. I f t his flag is clear, t he regist ers in t hat
bank do not cont ain valid error informat ion and do not need t o be checked.
To writ e a port able except ion handler, only t he MCA error code field in t he
I A32_MCi_STATUS regist er should be checked. See Sect ion 15. 9, I nt erpret ing
t he MCA Error Codes, for informat ion t hat can be used t o writ e an algorit hm t o
int erpret t his field.
The RI PV, PCC, and OVER flags in each I A32_MCi_STATUS regist er indicat e
whet her recovery from t he error is possible. I f PCC or OVER are set , recovery is
not possible. I f RI PV is not set , program execut ion can not be rest art ed reliably.
When recovery is not possible, t he handler t ypically records t he error informat ion
and signals an abort t o t he operat ing syst em.
Correct able errors are correct ed aut omat ically by t he processor. The UC flag in
each I A32_MCi_STATUS regist er indicat es whet her t he processor aut omat ically
correct ed an error.
The RI PV flag in t he I A32_MCG_STATUS regist er indicat es whet her t he program
can be rest art ed at t he inst ruct ion indicat ed by t he inst ruct ion point er ( t he
address of t he inst ruct ion pushed on t he st ack when t he except ion was
generat ed) . I f t his flag is clear, t he processor may st ill be able t o be rest art ed ( for
debugging purposes) but not wit hout loss of program cont inuit y.
For unrecoverable errors, t he EI PV flag in t he I A32_MCG_STATUS regist er
indicat es whet her t he inst ruct ion indicat ed by t he inst ruct ion point er pushed on
Vol. 3 15-41
MACHINE-CHECK ARCHITECTURE
t he st ack ( when t he except ion was generat ed) is relat ed t o t he error. I f t he flag is
clear, t he pushed inst ruct ion may not be relat ed t o t he error.
The MCI P flag in t he I A32_MCG_STATUS regist er indicat es whet her a machine-
check except ion was generat ed. Before ret urning from t he machine- check
except ion handler, soft ware should clear t his flag so t hat it can be used reliably by
an error logging ut ilit y. The MCI P flag also det ect s recursion. The machine- check
archit ect ure does not support recursion. When t he processor det ect s machine-
check recursion, it ent ers t he shut down st at e.
Example 15- 2 gives t ypical st eps carried out by a machine- check except ion
handler.
Example 15-2. Machine-Check Exception Handler Pseudocode
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
call errorlogging routine; (* returns restartability *)
FI;
ELSE (* Pentium(R) processor compatible *)
READ P5_MC_ADDR
READ P5_MC_TYPE;
report RESTARTABILITY to console;
FI;
IF error is not restartable
THEN
report RESTARTABILITY to console;
abort system;
FI;
CLEAR MCIP flag in IA32_MCG_STATUS;
15.10.2 Pentium

Processor Machine-Check Exception Handling
Machine- check except ion handler on P6 family and lat er processor families,
should follow t he guidelines described in Sect ion 15. 10. 1 and Example 15- 2
t hat check t he processor s support of MCA.
NOTE
On processors t hat support MCA ( CPUI D. 1. EDX. MCA = = 1) reading
t he P5_MC_TYPE and P5_MC_ADDR regist ers may produce invalid
dat a.
When machine- check except ions are enabled for t he Pent ium processor
( MCE flag is set in cont rol regist er CR4) , t he machine- check except ion
handler uses t he RDMSR inst ruct ion t o read t he err or t ype fr om t he
P5_MC_TYPE regist er and t he machine check address fr om t he
15-42 Vol. 3
MACHINE-CHECK ARCHITECTURE
P5_MC_ADDR regist er. The handler t hen normally report s t hese regist er
values t o t he syst em console before abort ing execut ion ( see Example 15- 2) .
15.10.3 Logging Correctable Machine-Check Errors
The error handling rout ine for servicing t he machine- check except ions is
responsible for logging uncorrect ed errors.
I f a machine- check error is correct able, t he processor does not generat e a
machine- check except ion for it . To det ect cor rect able machine- check errors,
a ut ilit y program must be writ t en t hat reads each of t he machine- check
error- report ing regist er banks and logs t he result s in an account ing file or
dat a st ruct ure. This ut ilit y can be implement ed in eit her of t he following
ways.
A syst em daemon t hat polls t he regist er banks on an infrequent basis, such as
hourly or daily.
A user- init iat ed applicat ion t hat polls t he regist er banks and records t he
except ions. Here, t he act ual polling service is provided by an operat ing- syst em
driver or t hrough t he syst em call int erface.
An int errupt service rout ine servicing CMCI can read t he MC banks and log t he
error.
Example 15- 3 gives pseudocode for an error logging ut ilit y.
Example 15-3. Machine-Check Error Logging Pseudocode
Assume that execution is restartable;
IF the processor supports MCA
THEN
FOR each bank of machine-check registers
DO
READ IA32_MCi_STATUS;
IF VAL flag in IA32_MCi_STATUS = 1
THEN
IF ADDRV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_ADDR;
FI;
IF MISCV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_MISC;
FI;
IF MCIP flag in IA32_MCG_STATUS = 1
(* Machine-check exception is in progress *)
AND PCC flag in IA32_MCi_STATUS = 1
OR RIPV flag in IA32_MCG_STATUS = 0
(* execution is not restartable *)
THEN
RESTARTABILITY = FALSE;
Vol. 3 15-43
MACHINE-CHECK ARCHITECTURE
return RESTARTABILITY to calling procedure;
FI;
Save time-stamp counter and processor ID;
Set IA32_MCi_STATUS to all 0s;
Execute serializing instruction (i.e., CPUID);
FI;
OD;
FI;
I f t he processor support s t he machine- check archit ect ure, t he ut ilit y reads
t hrough t he banks of error- report ing regist ers looking for valid regist er
ent ries. I t t hen saves t he values of t he I A32_MCi_STATUS, I A32_MCi_ADDR,
I A32_MCi_MI SC and I A32_MCG_STATUS regist ers for each bank t hat is
valid. The rout ine minimizes pr ocessing t ime by recording t he raw dat a int o
a syst em dat a st ruct ure or file, reducing t he overhead associat ed wit h
polling. User ut ilit ies analyze t he collect ed dat a in an off- line environment .
When t he MCI P flag is set in t he I A32_MCG_STATUS regist er, a machine-
check except ion is in progress and t he machine- check except ion handler has
called t he except ion logging rout ine.
Once t he logging process has been complet ed t he except ion- handling
rout ine must det ermine whet her execut ion can be rest art ed, which is usually
possible when damage has not occurred ( The PCC flag is clear, in t he
I A32_MCi_STATUS regist er) and when t he pr ocessor can guarant ee t hat
execut ion is rest art able ( t he RI PV flag is set in t he I A32_MCG_STATUS
regist er) . I f execut ion cannot be rest art ed, t he syst em is not recoverable
and t he except ion- handling rout ine should signal t he console appropriat ely
before ret urning t he error st at us t o t he Operat ing Syst em kernel for subse-
quent shut down.
The machine- check archit ect ure allows buffer ing of except ions from a given
error- report ing bank alt hough t he Pent ium 4, I nt el Xeon, and P6 family
processors do not implement t his feat ure. The error logging rout ine should
provide compat ibilit y wit h fut ure processors by reading each hardware
error- report ing bank' s I A32_MCi_STATUS regist er and t hen writ ing 0s t o
clear t he OVER and VAL flags in t his regist er. The error logging ut ilit y should
re- read t he I A32_MCi_STATUS regist er for t he bank ensuring t hat t he valid
bit is clear. The processor will writ e t he next error int o t he regist er bank and
set t he VAL flags.
Addit ional informat ion t hat should be st ored by t he except ion- logging
rout ine includes t he processor s t ime- st amp count er value, which provides a
mechanism t o indicat e t he frequency of except ions. A mult iprocessing oper-
at ing syst em st ores t he ident it y of t he processor node incurring t he excep-
t ion using a unique ident ifier, such as t he processor s API C I D ( see Sect ion
10. 8, Handling I nt errupt s ) .
15-44 Vol. 3
MACHINE-CHECK ARCHITECTURE
The basic algorit hm given in Example 15- 3 can be modified t o provide more
robust recovery t echniques. For example, soft ware has t he flexibilit y t o
at t empt recovery using informat ion unavailable t o t he hardware. Specifi-
cally, t he machine- check except ion handler can, aft er logging carefully
analyze t he error- report ing regist ers when t he error- logging rout ine report s
an error t hat does not allow execut ion t o be rest art ed. These recovery t ech-
niques can use ext ernal bus relat ed model- specific informat ion provided
wit h t he error report t o localize t he source of t he error wit hin t he syst em and
det ermine t he appropriat e recovery st rat egy.
15.10.4 Machine-Check Software Handler Guidelines for Error
Recovery
15.10.4.1 Machine-Check Exception Handler for Error Recovery
When writ ing a machine- check except ion ( MCE) handler t o support soft ware
recovery from Uncor rect ed Recoverable ( UCR) er rors, consider t he
following:
When I A32_MCG_CAP [ 24] is zero, t here are no recoverable errors support ed
and all machine- check are fat al except ions. The logging of st at us and error
informat ion is t herefore a baseline implement at ion requirement .
When I A32_MCG_CAP [ 24] is 1, cert ain uncorrect ed errors called uncorrect ed
recoverable ( UCR) errors may be soft ware recoverable. The handler can analyze
t he report ed error informat ion, and in some cases at t empt t o recover from t he
uncorrect ed error and cont inue execut ion.
For processors wit h DisplayFamily_DisplayModel encoding of 06H_EH and above,
a MCA signal is broadcast t o all logical processors in t he syst em. Due t o t he
pot ent ially shared machine check MSR resources among t he logical processors
on t he same package/ core, t he MCE handler may be required t o synchronize wit h
t he ot her processors t hat received a machine check error and serialize access t o
t he machine check regist ers when analyzing, logging and clearing t he
informat ion in t he machine check regist ers.
The VAL ( valid) flag in each I A32_MCi_STATUS regist er indicat es whet her t he
error informat ion in t he regist er is valid. I f t his flag is clear, t he regist ers in t hat
bank do not cont ain valid error informat ion and should not be checked.
The MCE handler is primarily responsible for processing uncorrect ed errors. The
UC flag in each I A32_MCi_St at us regist er indicat es whet her t he report ed error
was correct ed ( UC= 0) or uncorrect ed ( UC= 1) . The MCE handler can opt ionally
log and clear t he correct ed errors in t he MC banks if it can implement soft ware
algorit hm t o avoid t he undesired race condit ions wit h t he CMCI or CMC polling
handler.
Vol. 3 15-45
MACHINE-CHECK ARCHITECTURE
For uncorrect able errors, t he EI PV flag in t he I A32_MCG_STATUS regist er
indicat es ( when set ) t hat t he inst ruct ion point ed t o by t he inst ruct ion point er
pushed ont o t he st ack when t he machine- check except ion is generat ed is direct ly
associat ed wit h t he error. When t his flag is cleared, t he inst ruct ion point ed t o
may not be associat ed wit h t he error.
The MCI P flag in t he I A32_MCG_STATUS regist er indicat es whet her a machine-
check except ion was generat ed. When a machine check except ion is generat ed,
it is expect ed t hat t he MCI P flag in t he I A32_MCG_STATUS regist er is set t o 1. I f
it is not set , t his machine check was generat ed by eit her an I NT 18 inst ruct ion or
some piece of hardware signaling an int errupt wit h vect or 18.
When I A32_MCG_CAP [ 24] is 1, t he following rules can apply when writ ing a
machine check except ion ( MCE) handler t o support soft ware recovery:
The PCC flag in each I A32_MCi_STATUS regist er indicat es whet her recovery from
t he error is possible for uncorrect ed errors ( UC= 1) . I f t he PCC flag is set for
uncorrect ed errors ( UC= 1) , recovery is not possible. When recovery is not
possible, t he MCE handler t ypically records t he error informat ion and signals t he
operat ing syst em t o reset t he syst em.
The RI PV flag in t he I A32_MCG_STATUS regist er indicat es whet her rest art ing t he
program execut ion from t he inst ruct ion point er saved on t he st ack for t he
machine check except ion is possible. When t he RI PV is set , program execut ion
can be rest art ed reliably when recovery is possible. I f t he RI PV flag is not set ,
program execut ion cannot be rest art ed reliably. I n t his case t he recovery
algorit hm may involve t erminat ing t he current program execut ion and resuming
an alt ernat e t hread of execut ion upon ret urn from t he machine check handler
when recovery is possible. When recovery is not possible, t he MCE handler
signals t he operat ing syst em t o reset t he syst em.
When t he EN flag is zero but t he VAL and UC flags are one in t he
I A32_MCi_STATUS regist er, t he report ed uncorrect ed error in t his bank is not
enabled. As uncorrect ed errors wit h t he EN flag = 0 are not t he source of
machine check except ions, t he MCE handler should log and clear non- enabled
errors when t he S bit is set and should cont inue searching for enabled errors from
t he ot her I A32_MCi_STATUS regist ers. Not e t hat when I A32_MCG_CAP [ 24] is 0,
any uncorrect ed error condit ion ( VAL = 1 and UC= 1) including t he one wit h t he
EN flag cleared are fat al and t he handler must signal t he operat ing syst em t o
reset t he syst em. For t he errors t hat do not generat e machine check except ions,
t he EN flag has no meaning. See Appendix A: Table A- 4 t o find t he errors t hat do
not generat e machine check except ions.
When t he VAL flag is one, t he UC flag is one, t he EN flag is one and t he PCC flag
is zero in t he I A32_MCi_STATUS regist er, t he error in t his bank is an uncorrect ed
recoverable ( UCR) error. The MCE handler needs t o examine t he S flag and t he
AR flag t o find t he t ype of t he UCR error for soft ware recovery and det ermine if
soft ware error recovery is possible.
When bot h t he S and t he AR flags are clear in t he I A32_MCi_STATUS regist er for
t he UCR error ( VAL= 1, UC= 1, EN= x and PCC= 0) , t he error in t his bank is an
uncorrect ed no- act ion required error ( UCNA) . UCNA errors are uncorrect ed but
15-46 Vol. 3
MACHINE-CHECK ARCHITECTURE
do not require any OS recovery act ion t o cont inue execut ion. These errors
indicat e t hat some dat a in t he syst em is corrupt , but t hat dat a has not been
consumed and may not be consumed. I f t hat dat a is consumed a non- UNCA
machine check except ion will be generat ed. UCNA errors are signaled in t he same
way as correct ed machine check errors and t he CMCI and CMC polling handler is
primarily responsible for handling UCNA errors. Like correct ed errors, t he MCA
handler can opt ionally log and clear UCNA errors as long as it can avoid t he
undesired race condit ion wit h t he CMCI or CMC polling handler. As UCNA errors
are not t he source of machine check except ions, t he MCA handler should
cont inue searching for uncorrect ed or soft ware recoverable errors in all ot her MC
banks.
When t he S flag in t he I A32_MCi_STATUS regist er is set for t he UCR error
( ( VAL= 1, UC= 1, EN= 1 and PCC= 0) , t he error in t his bank is soft ware recoverable
and it was signaled t hrough a machine- check except ion. The AR flag in t he
I A32_MCi_STATUS regist er furt her clarifies t he t ype of t he soft ware recoverable
errors.
When t he AR flag in t he I A32_MCi_STATUS regist er is clear for t he soft ware
recoverable error ( VAL= 1, UC= 1, EN= 1, PCC= 0 and S= 1) , t he error in t his bank
is a soft ware recoverable act ion opt ional ( SRAO) error. The MCE handler and t he
operat ing syst em can analyze t he I A32_MCi_STATUS [ 15: 0] t o implement MCA
error code specific opt ional recovery act ion, but t his recovery act ion is opt ional.
Syst em soft ware can resume t he program execut ion from t he inst ruct ion point er
saved on t he st ack for t he machine check except ion when t he RI PV flag in t he
I A32_MCG_STATUS regist er is set .
When t he OVER flag in t he I A32_MCi_STATUS regist er is set for t he SRAO error
( VAL= 1, UC= 1, EN= 1, PCC= 0, S= 1 and AR= 0) , t he MCE handler cannot t ake
recovery act ion as t he informat ion of t he SRAO error in t he I A32_MCi_STATUS
regist er was pot ent ially lost due t o t he overflow condit ion. Since t he recovery
act ion for SRAO errors is opt ional, rest art ing t he program execut ion from t he
inst ruct ion point er saved on t he st ack for t he machine check except ion is st ill
possible for t he overflowed SRAO error if t he RI PV flag in t he I A32_MCG_STATUS
is set .
When t he AR flag in t he I A32_MCi_STATUS regist er is set for t he soft ware
recoverable error ( VAL= 1, UC= 1, EN= 1, PCC= 0 and S= 1) , t he error in t his bank
is a soft ware recoverable act ion required ( SRAR) error. The MCE handler and t he
operat ing syst em must t ake recovery act ion in order t o cont inue execut ion aft er
t he machine- check except ion. The MCA handler and t he operat ing syst em need
t o analyze t he I A32_MCi_STATUS [ 15: 0] t o det ermine t he MCA error code
specific recovery act ion. I f no recovery act ion can be performed, t he operat ing
syst em must reset t he syst em.
When t he OVER flag in t he I A32_MCi_STATUS regist er is set for t he SRAR error
( VAL= 1, UC= 1, EN= 1, PCC= 0, S= 1 and AR= 1) , t he MCE handler cannot t ake
recovery act ion as t he informat ion of t he SRAR error in t he I A32_MCi_STATUS
regist er was pot ent ially lost due t o t he overflow condit ion. Since t he recovery
act ion for SRAR errors must be t aken, t he MCE handler must signal t he operat ing
syst em t o reset t he syst em.
Vol. 3 15-47
MACHINE-CHECK ARCHITECTURE
When t he MCE handler cannot find any uncorrect ed ( VAL= 1, UC= 1 and EN= 1) or
any soft ware recoverable errors ( VAL= 1, UC= 1, EN= 1, PCC= 0 and S= 1) in any
of t he I A32_MCi banks of t he processors, t his is an unexpect ed condit ion for t he
MCE handler and t he handler should signal t he operat ing syst em t o reset t he
syst em.
Before ret urning from t he machine- check except ion handler, soft ware must clear
t he MCI P flag in t he I A32_MCG_STATUS regist er. The MCI P flag is used t o det ect
recursion. The machine- check archit ect ure does not support recursion. When t he
processor receives a machine check when MCI P is set , it aut omat ically ent ers t he
shut down st at e.
Example 15- 4 gives pseudocode for an MC except ion handler t hat support s
recovery of UCR.
Example 15-4. Machine-Check Error Handler Pseudocode Supporting UCR
MACHINE CHECK HANDLER: (* Called from INT 18 handler *)
NOERROR = TRUE;
ProcessorCount = 0;
IF CPU supports MCA
THEN
RESTARTABILITY = TRUE;
IF (Processor Family = 6 AND Display_Model >= 0EH) OR (Processor Family > 6)
THEN
MCA_BROADCAST = TRUE;
Acquire SpinLock;
ProcessorCount++; (* Allowing one logical processor at a time to examine machine check
registers *)
CALL MCA ERROR PROCESSING; (* returns RESTARTABILITY and NOERROR *)
ELSE
MCA_BROADCAST = FALSE;
(* Implement a rendezvous mechanism with the other processors if necessary *)
CALL MCA ERROR PROCESSING;
FI;
ELSE (* Pentium(R) processor compatible *)
READ P5_MC_ADDR
READ P5_MC_TYPE;
RESTARTABILITY = FALSE;
FI;
IF NOERROR = TRUE
THEN
IF NOT (MCG_RIPV = 1 AND MCG_EIPV = 0)
THEN
RESTARTABILITY = FALSE;
FI
FI;
IF RESTARTABILITY = FALSE
THEN
Report RESTARTABILITY to console;
15-48 Vol. 3
MACHINE-CHECK ARCHITECTURE
Reset system;
FI;
IF MCA_BROADCAST = TRUE
THEN
IF ProcessorCount = MAX_PROCESSORS
AND NOERROR = TRUE
THEN
Report RESTARTABILITY to console;
Reset system;
FI;
Release SpinLock;
Wait till ProcessorCount = MAX_PROCESSRS on system;
(* implement a timeout and abort function if necessary *)
FI;
CLEAR MCIP flag in IA32_MCG_STATUS;
RESUME Execution;
(* End of MACHINE CHECK HANDLER*)
MCA ERROR PROCESSING: (* MCA Error Processing Routine called from MCA Handler *)
IF MCIP flag in IA32_MCG_STATUS = 0
THEN (* MCIP=0 upon MCA is unexpected *)
RESTARTABILITY = FALSE;
FI;
FOR each bank of machine-check registers
DO
CLEAR_MC_BANK = FALSE;
READ IA32_MCi_STATUS;
IF VAL Flag in IA32_MCi_STATUS = 1
THEN
IF UC Flag in IA32_MCi_STATUS = 1
THEN
IF Bit 24 in IA32_MCG_CAP = 0
THEN (* the processor does not support software error recovery *)
RESTARTABILITY = FALSE;
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI;
(* the processor supports software error recovery *)
IF EN Flag in IA32_MCi_STATUS = 0 AND OVER Flag in IA32_MCi_STATUS=0
THEN (* It is a spurious MCA Log. Log and clear the register *)
CLEAR_MC_BANK = TRUE;
GOTO LOG MCA REGISTER;
FI;
IF PCC Flag in IA32_MCi_STATUS = 1
THEN (* processor context might have been corrupted *)
RESTARTABILITY = FALSE;
ELSE (* It is a uncorrected recoverable (UCR) error *)
IF S Flag in IA32_MCi_STATUS = 0
THEN
IF AR Flag in IA32_MCi_STATUS = 0
Vol. 3 15-49
MACHINE-CHECK ARCHITECTURE
THEN (* It is a uncorrected no action required (UCNA) error *)
GOTO CONTINUE; (* let CMCI and CMC polling handler to process *)
ELSE
FESTARTABILITY = FALSE; (* S=0, AR=1 is illegal *)
FI
FI;
IF RESTARTABILITY = FALSE
THEN (* no need to take recovery action if RESTARTABILITY is already false *)
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI;
(* S in IA32_MCi_STATUS = 1 *)
IF AR Flag in IA32_MCi_STATUS = 1
THEN (* It is a software recoverable and action required (SRAR) error *)
IF OVER Flag in IA32_MCi_STATUS = 1
THEN
RESTARTABILITY = FALSE;
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
FI
IF MCACOD Value in IA32_MCi_STATUS is recognized
AND Current Processor is an Affected Processor
THEN
Implement MCACOD specific recovery action;
CLEAR_MC_BANK = TURE;
ELSE
RESTARTABILITY = FALSE;
FI;
ELSE (* It is a software recoverable and action optional (SRAO) error *)
IF OVER Flag in IA32_MCi_STATUS = 0 AND
MCACOD in IA32_MCi_STATUS is recognized
THEN
Implement MCACOD specific recovery action;
FI;
CLEAR_MC_BANK = TRUE;
FI; AR
FI; PCC
NOERROR = FALSE;
GOTO LOG MCA REGISTER;
ELSE (* It is a corrected error; continue to the next IA32_MCi_STATUS *)
GOTO CONTINUE;
FI; UC
FI; VAL
LOG MCA REGISTER:
SAVE IA32_MCi_STATUS;
If MISCV in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_MISC;
FI;
IF ADDRV in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_ADDR;
15-50 Vol. 3
MACHINE-CHECK ARCHITECTURE
FI;
IF CLEAR_MC_BANK = TRUE
THEN
SET all 0 to IA32_MCi_STATUS;
If MISCV in IA32_MCi_STATUS
THEN
SET all 0 to IA32_MCi_MISC;
FI;
IF ADDRV in IA32_MCi_STATUS
THEN
SET all 0 to IA32_MCi_ADDR;
FI;
FI;
CONTINUE:
OD;
( *END FOR *)
RETURN;
(* End of MCA ERROR PROCESSING*)
15.10.4.2 Corrected Machine-Check Handler for Error Recovery
When writ ing a correct ed machine check handler, which is invoked as a
result of CMCI or called from an OS CMC Polling dispat cher, consider t he
following:
The VAL ( valid) flag in each I A32_MCi_STATUS regist er indicat es whet her t he
error informat ion in t he regist er is valid. I f t his flag is clear, t he regist ers in t hat
bank does not cont ain valid error informat ion and does not need t o be checked.
The CMCI or CMC polling handler is responsible for logging and clearing correct ed
errors. The UC flag in each I A32_MCi_St at us regist er indicat es whet her t he
report ed error was correct ed ( UC= 0) or not ( UC= 1) .
When I A32_MCG_CAP [ 24] is one, t he CMC handler is also responsible for
logging and clearing uncorrect ed no- act ion required ( UCNA) errors. When t he
UC flag is one but t he PCC, S, and AR flags are zero in t he I A32_MCi_STATUS
regist er, t he report ed error in t his bank is an uncorrect ed no- act ion required
( UCNA) error.
I n addit ion t o correct ed errors and UCNA errors, t he CMC handler opt ionally logs
uncorrect ed ( UC= 1 and PCC= 1) , soft ware recoverable machine check errors
( UC= 1, PCC= 0 and S= 1) , but should avoid clearing t hose errors from t he MC
banks. Clearing t hese errors may result in accident ally removing t hese errors
before t hese errors are act ually handled and processed by t he MCE handler for
at t empt ed soft ware error recovery.
Example 15- 5 gives pseudocode for a CMCI handler wit h UCR support .
Example 15-5. Corrected Error Handler Pseudocode with UCR Support
Corrected Error HANDLER: (* Called from CMCI handler or OS CMC Polling Dispatcher*)
IF CPU supports MCA
Vol. 3 15-51
MACHINE-CHECK ARCHITECTURE
THEN
FOR each bank of machine-check registers
DO
READ IA32_MCi_STATUS;
IF VAL flag in IA32_MCi_STATUS = 1
THEN
IF UC Flag in IA32_MCi_STATUS = 0 (* It is a corrected error *)
THEN
GOTO LOG CMC ERROR;
ELSE
IF Bit 24 in IA32_MCG_CAP = 0
THEN
GOTO CONTINUE;
FI;
IF S Flag in IA32_MCi_STATUS = 0 AND AR Flag in IA32_MCi_STATUS = 0
THEN (* It is a uncorrected no action required error *)
GOTO LOG CMC ERROR
FI
IF EN Flag in IA32_MCi_STATUS = 0
THEN (* It is a spurious MCA error *)
GOTO LOG MCM ERROR
FI;
FI;
FI;
GOTO CONTINUE;
LOG CMC ERROR:
SAVE IA32_MCi_STATUS;
If MISCV Flag in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_MISC;
SET all 0 to IA32_MCi_MISC;
FI;
IF ADDRV Flag in IA32_MCi_STATUS
THEN
SAVE IA32_MCi_ADDR;
SET all 0 to IA32_MCi_ADDR
FI;
SET all 0 to IA32_MCi_STATUS;
CONTINUE:
OD;
( *END FOR *)
FI;
15-52 Vol. 3
MACHINE-CHECK ARCHITECTURE
Vol. 3 16-1
CHAPTER 16
DEBUGGING, PROFILING BRANCHES AND TIME-
STAMP COUNTER
I nt el 64 and I A- 32 archit ect ures provide debug facilit ies for use in debugging code
and monit oring performance. These facilit ies are valuable for debugging applicat ion
soft ware, syst em soft ware, and mult it asking operat ing syst ems. Debug support is
accessed using debug regist ers ( DB0 t hrough DB7) and model- specific regist ers
( MSRs) :
Debug regist ers hold t he addresses of memory and I / O locat ions called break-
point s. Breakpoint s are user- select ed locat ions in a program, a dat a- st orage area
in memory, or specific I / O port s. They are set where a programmer or syst em
designer wishes t o halt execut ion of a program and examine t he st at e of t he
processor by invoking debugger soft ware. A debug except ion ( # DB) is generat ed
when a memory or I / O access is made t o a breakpoint address.
MSRs monit or branches, int errupt s, and except ions; t hey record addresses of t he
last branch, int errupt or except ion t aken and t he last branch t aken before an
int errupt or except ion.
16.1 OVERVIEW OF DEBUG SUPPORT FACILITIES
The following processor facilit ies support debugging and performance monit oring:
Debug ex cept i on ( # DB) Transfers program cont rol t o a debug procedure or
t ask when a debug event occurs.
Br eak poi nt ex cept i on ( # BP) See breakpoint inst ruct ion ( I NT 3) below.
Br eak poi nt - addr ess r egi st er s ( DR0 t hr ough DR3) Specifies t he
addresses of up t o 4 breakpoint s.
Debug st at us r egi st er ( DR6) Report s t he condit ions t hat were in effect
when a debug or breakpoint except ion was generat ed.
Debug cont r ol r egi st er ( DR7) Specifies t he forms of memory or I / O access
t hat cause breakpoint s t o be generat ed.
T ( t r ap) f l ag, TSS Generat es a debug except ion ( # DB) when an at t empt is
made t o swit ch t o a t ask wit h t he T flag set in it s TSS.
RF ( r esume) f l ag, EFLAGS r egi st er Suppresses mult iple except ions t o t he
same inst ruct ion.
TF ( t r ap) f l ag, EFLAGS r egi st er Generat es a debug except ion ( # DB) aft er
every execut ion of an inst ruct ion.
Br eak poi nt i nst r uct i on ( I NT 3) Generat es a breakpoint except ion ( # BP)
t hat t ransfers program cont rol t o t he debugger procedure or t ask. This
16-2 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
inst ruct ion is an alt ernat ive way t o set code breakpoint s. I t is especially useful
when more t han four breakpoint s are desired, or when breakpoint s are being
placed in t he source code.
Last br anch r ecor di ng f aci l i t i es St ore branch records in t he last branch
record ( LBR) st ack MSRs for t he most recent t aken branches, int errupt s, and/ or
except ions in MSRs. A branch record consist of a branch- from and a branch- t o
inst ruct ion address. Send branch records out on t he syst em bus as branch t race
messages ( BTMs) .
These facilit ies allow a debugger t o be called as a separat e t ask or as a procedure in
t he cont ext of t he current program or t ask. The following condit ions can be used t o
invoke t he debugger:
Task swit ch t o a specific t ask.
Execut ion of t he breakpoint inst ruct ion.
Execut ion of any inst ruct ion.
Execut ion of an inst ruct ion at a specified address.
Read or writ e t o a specified memory address/ range.
Writ e t o a specified memory address/ range.
I nput from a specified I / O address/ range.
Out put t o a specified I / O address/ range.
At t empt t o change t he cont ent s of a debug regist er.
16.2 DEBUG REGISTERS
Eight debug regist ers ( see Figure 16- 1) cont rol t he debug operat ion of t he processor.
These regist ers can be writ t en t o and read using t he move t o/ from debug regist er
form of t he MOV inst ruct ion. A debug regist er may be t he source or dest inat ion
operand for one of t hese inst ruct ions.
Debug regist ers are privileged resources; a MOV inst ruct ion t hat accesses t hese
regist ers can only be execut ed in real- address mode, in SMM or in prot ect ed mode at
a CPL of 0. An at t empt t o read or writ e t he debug regist ers from any ot her privilege
level generat es a general- prot ect ion except ion ( # GP) .
The primary funct ion of t he debug regist ers is t o set up and monit or from 1 t o 4
breakpoint s, numbered 0 t hough 3. For each breakpoint , t he following informat ion
can be specified:
The linear address where t he breakpoint is t o occur.
The lengt h of t he breakpoint locat ion ( 1, 2, or 4 byt es) .
The operat ion t hat must be performed at t he address for a debug except ion t o be
generat ed.
Whet her t he breakpoint is enabled.
Vol. 3 16-3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Whet her t he breakpoint condit ion was present when t he debug except ion was
generat ed.
The following paragraphs describe t he funct ions of flags and fields in t he debug
regist ers.
Figure 16-1. Debug Registers
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
DR7
L
Reserved
0
1 2 3 4 5 6 9 10 17 18 25 26 27 28 29 30
G
0
L
1
L
2
L
3
G
3
L
E
G
E
G
2
G
1
0 0 G
D
R/W
0
LEN
0
R/W
1
LEN
1
R/W
2
LEN
2
R/W
3
LEN
3
31 16 15 13 14 12 11 8 7 0
DR6
B
0
1 2 3 4 5 6 9 10
B
1
B
2
B
3
0 1 1 1 1 1 1 1 1 1 B
D
B
S
B
T
31 0
DR5
31 0
DR4
31 0
DR3 Breakpoint 3 Linear Address
31 0
DR2 Breakpoint 2 Linear Address
31 0
DR1 Breakpoint 1 Linear Address
31 0
DR0 Breakpoint 0 Linear Address
0 0 1
Reserved (set to 1)
16-4 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.2.1 Debug Address Registers (DR0-DR3)
Each of t he debug- address regist ers ( DR0 t hrough DR3) holds t he 32- bit linear
address of a breakpoint ( see Figure 16- 1) . Breakpoint comparisons are made before
physical address t ranslat ion occurs. The cont ent s of debug regist er DR7 furt her spec-
ifies breakpoint condit ions.
16.2.2 Debug Registers DR4 and DR5
Debug regist ers DR4 and DR5 are reserved when debug ext ensions are enabled
( when t he DE flag in cont rol regist er CR4 is set ) and at t empt s t o reference t he DR4
and DR5 regist ers cause invalid- opcode except ions ( # UD) . When debug ext ensions
are not enabled ( when t he DE flag is clear) , t hese regist ers are aliased t o debug
regist ers DR6 and DR7.
16.2.3 Debug Status Register (DR6)
The debug st at us regist er ( DR6) report s debug condit ions t hat were sampled at t he
t ime t he last debug except ion was generat ed ( see Figure 16- 1) . Updat es t o t his
regist er only occur when an except ion is generat ed. The flags in t his regist er show
t he following informat ion:
B0 t hr ough B3 ( br eak poi nt condi t i on det ect ed) f l ags ( bi t s 0 t hr ough 3)
I ndicat es ( when set ) t hat it s associat ed breakpoint condit ion was met when a
debug except ion was generat ed. These flags are set if t he condit ion described for
each breakpoint by t he LENn, and R/ Wn flags in debug cont rol regist er DR7 is
t rue. They may or may not be set if t he breakpoint is not enabled by t he Ln or t he
Gn flags in regist er DR7. Therefore on a # DB, a debug handler should check only
t hose B0- B3 bit s which correspond t o an enabled breakpoint .
BD ( debug r egi st er access det ect ed) f l ag ( bi t 13) I ndicat es t hat t he next
inst ruct ion in t he inst ruct ion st ream accesses one of t he debug regist ers ( DR0
t hrough DR7) . This flag is enabled when t he GD ( general det ect ) flag in debug
cont rol regist er DR7 is set . See Sect ion 16. 2. 4, Debug Cont rol Regist er ( DR7) ,
for furt her explanat ion of t he purpose of t his flag.
BS ( si ngl e st ep) f l ag ( bi t 14) I ndicat es ( when set ) t hat t he debug except ion
was t riggered by t he single- st ep execut ion mode ( enabled wit h t he TF flag in t he
EFLAGS regist er) . The single- st ep mode is t he highest - priorit y debug except ion.
When t he BS flag is set , any of t he ot her debug st at us bit s also may be set .
BT ( t ask sw i t ch) f l ag ( bi t 15) I ndicat es ( when set ) t hat t he debug
except ion result ed from a t ask swit ch where t he T flag ( debug t rap flag) in t he
TSS of t he t arget t ask was set . See Sect ion 7. 2. 1, Task- St at e Segment ( TSS) ,
for t he format of a TSS. There is no flag in debug cont rol regist er DR7 t o enable
or disable t his except ion; t he T flag of t he TSS is t he only enabling flag.
Cert ain debug except ions may clear bit s 0- 3. The remaining cont ent s of t he DR6
regist er are never cleared by t he processor. To avoid confusion in ident ifying debug
Vol. 3 16-5
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
except ions, debug handlers should clear t he regist er before ret urning t o t he int er-
rupt ed t ask.
16.2.4 Debug Control Register (DR7)
The debug cont rol regist er ( DR7) enables or disables breakpoint s and set s break-
point condit ions ( see Figure 16- 1) . The flags and fields in t his regist er cont rol t he
following t hings:
L0 t hr ough L3 ( l ocal br eak poi nt enabl e) f l ags ( bi t s 0, 2, 4, and 6)
Enables ( when set ) t he breakpoint condit ion for t he associat ed breakpoint for t he
current t ask. When a breakpoint condit ion is det ect ed and it s associat ed Ln flag
is set , a debug except ion is generat ed. The processor aut omat ically clears t hese
flags on every t ask swit ch t o avoid unwant ed breakpoint condit ions in t he new
t ask.
G0 t hr ough G3 ( gl obal br eak poi nt enabl e) f l ags ( bi t s 1, 3, 5, and 7)
Enables ( when set ) t he breakpoint condit ion for t he associat ed breakpoint for all
t asks. When a breakpoint condit ion is det ect ed and it s associat ed Gn flag is set ,
a debug except ion is generat ed. The processor does not clear t hese flags on a
t ask swit ch, allowing a breakpoint t o be enabled for all t asks.
LE and GE ( l ocal and gl obal ex act br eak poi nt enabl e) f l ags ( bi t s 8, 9)
This feat ure is not support ed in t he P6 family processors, lat er I A- 32 processors,
and I nt el 64 processors. When set , t hese flags cause t he processor t o det ect t he
exact inst ruct ion t hat caused a dat a breakpoint condit ion. For backward and
forward compat ibilit y wit h ot her I nt el processors, we recommend t hat t he LE and
GE flags be set t o 1 if exact breakpoint s are required.
GD ( gener al det ect enabl e) f l ag ( bi t 13) Enables ( when set ) debug-
regist er prot ect ion, which causes a debug except ion t o be generat ed prior t o any
MOV inst ruct ion t hat accesses a debug regist er. When such a condit ion is
det ect ed, t he BD flag in debug st at us regist er DR6 is set prior t o generat ing t he
except ion. This condit ion is provided t o support in- circuit emulat ors.
When t he emulat or needs t o access t he debug regist ers, emulat or soft ware can
set t he GD flag t o prevent int erference from t he program current ly execut ing on
t he processor.
The processor clears t he GD flag upon ent ering t o t he debug except ion handler,
t o allow t he handler access t o t he debug regist ers.
R/ W0 t hr ough R/ W3 ( r ead/ w r i t e) f i el ds ( bi t s 16, 17, 20, 21, 24, 25, 28,
and 29) Specifies t he breakpoint condit ion for t he corresponding breakpoint .
The DE ( debug ext ensions) flag in cont rol regist er CR4 det ermines how t he bit s in
t he R/ Wn fields are int erpret ed. When t he DE flag is set , t he processor int erpret s
bit s as follows:
00 Break on inst ruct ion execut ion only.
01 Break on dat a writ es only.
16-6 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
10 Break on I / O reads or writ es.
11 Break on dat a reads or writ es but not inst ruct ion fet ches.
When t he DE flag is clear, t he processor int erpret s t he R/ Wn bit s t he same as for
t he I nt el386 and I nt el486 processors, which is as follows:
00 Break on inst ruct ion execut ion only.
01 Break on dat a writ es only.
10 Undefined.
11 Break on dat a reads or writ es but not inst ruct ion fet ches.
LEN0 t hr ough LEN3 ( Lengt h) f i el ds ( bi t s 18, 19, 22, 23, 26, 27, 30, and
31) Specify t he size of t he memory locat ion at t he address specified in t he
corresponding breakpoint address regist er ( DR0 t hrough DR3) . These fields are
int erpret ed as follows:
00 1- byt e lengt h.
01 2- byt e lengt h.
10 Undefined ( or 8 byt e lengt h, see not e below) .
11 4- byt e lengt h.
I f t he corresponding RWn field in regist er DR7 is 00 ( inst ruct ion execut ion) , t hen t he
LENn field should also be 00. The effect of using ot her lengt hs is undefined. See
Sect ion 16. 2. 5, Breakpoint Field Recognit ion, below.
NOTES
For Pent ium

4 and I nt el

Xeon

processors wit h a CPUI D signat ure


corresponding t o family 15 ( model 3, 4, and 6) , break point
condit ions permit specifying 8- byt e lengt h on dat a read/ writ e wit h an
of encoding 10B in t he LENn field.
Encoding 10B is also support ed in processors based on I nt el Core
microarchit ect ure or enhanced I nt el Core microarchit ect ure, t he
respect ive CPUI D signat ures corresponding t o family 6, model 15,
and family 6, display_model value 23. The Encoding 10B is support ed
in processors based on I nt el

At om microarchit ect ure, wit h CPUI D


signat ure of family 6, display_model value 28. The encoding 10B is
undefined for ot her processors.
16.2.5 Breakpoint Field Recognition
Breakpoint address regist ers ( debug regist ers DR0 t hrough DR3) and t he LENn fields
for each breakpoint define a range of sequent ial byt e addresses for a dat a or I / O
breakpoint . The LENn fields permit specificat ion of a 1- , 2- , 4- , or 8- byt e range,
beginning at t he linear address specified in t he corresponding debug regist er ( DRn) .
Two- byt e ranges must be aligned on word boundaries; 4- byt e ranges must be
aligned on doubleword boundaries. I / O addresses are zero- ext ended ( from 16 t o 32
bit s, for comparison wit h t he breakpoint address in t he select ed debug regist er) .
These requirement s are enforced by t he processor; it uses LENn field bit s t o mask
Vol. 3 16-7
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
t he lower address bit s in t he debug regist ers. Unaligned dat a or I / O breakpoint
addresses do not yield valid result s.
A dat a breakpoint for reading or writ ing dat a is t riggered if any of t he byt es part ici-
pat ing in an access is wit hin t he range defined by a breakpoint address regist er and
it s LENn field. Table 16- 1 provides an example set up of debug regist ers and dat a
accesses t hat would subsequent ly t rap or not t rap on t he breakpoint s.
A dat a breakpoint for an unaligned operand can be const ruct ed using t wo break-
point s, where each breakpoint is byt e- aligned and t he t wo breakpoint s t oget her
cover t he operand. The breakpoint s generat e except ions only for t he operand, not for
neighboring byt es.
I nst ruct ion breakpoint addresses must have a lengt h specificat ion of 1 byt e ( t he
LENn field is set t o 00) . Code breakpoint s for ot her operand sizes are undefined. The
processor recognizes an inst ruct ion breakpoint address only when it point s t o t he
first byt e of an inst ruct ion. I f t he inst ruct ion has prefixes, t he breakpoint address
must point t o t he first prefix.
Table 16-1. Breakpoint Examples
Debug Register Setup
Debug Register R/Wn Breakpoint Address LENn
DR0
DR1
DR2
DR3
R/W0 = 11 (Read/Write)
R/W1 = 01 (Write)
R/W2 = 11 (Read/Write)
R/W3 = 01 (Write)
A0001H
A0002H
B0002H
C0000H
LEN0 = 00 (1 byte)
LEN1 = 00 (1 byte)
LEN2 = 01) (2 bytes)
LEN3 = 11 (4 bytes)
Data Accesses
Operation Address Access Length
(In Bytes)
Data operations that trap
- Read or write
- Read or write
- Write
- Write
- Read or write
- Read or write
- Read or write
- Write
- Write
- Write
A0001H
A0001H
A0002H
A0002H
B0001H
B0002H
B0002H
C0000H
C0001H
C0003H
1
2
1
2
4
1
2
4
2
1
16-8 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.2.6 Debug Registers and Intel

64 Processors
For I nt el 64 archit ect ure processors, debug regist ers DR0DR7 are 64 bit s. I n 16- bit
or 32- bit modes ( prot ect ed mode and compat ibilit y mode) , writ es t o a debug regist er
fill t he upper 32 bit s wit h zeros. Reads from a debug regist er ret urn t he lower 32 bit s.
I n 64- bit mode, MOV DRn inst ruct ions read or writ e all 64 bit s. Operand- size prefixes
are ignored.
I n 64- bit mode, t he upper 32 bit s of DR6 and DR7 are reserved and must be writ t en
wit h zeros. Writ ing 1 t o any of t he upper 32 bit s result s in a # GP( 0) except ion ( see
Figure 16- 2) . All 64 bit s of DR0DR3 are writ able by soft ware. However, MOV DRn
inst ruct ions do not check t hat addresses writ t en t o DR0DR3 are in t he linear-
address limit s of t he processor implement at ion ( address mat ching is support ed only
on valid addresses generat ed by t he processor implement at ion) . Break point condi-
t ions for 8- byt e memory read/ writ es are support ed in all modes.
Data operations that do not trap
- Read or write
- Read
- Read or write
- Read or write
- Read
- Read or write
A0000H
A0002H
A0003H
B0000H
C0000H
C0004H
1
1
4
2
2
4
Table 16-1. Breakpoint Examples (Contd.)
Debug Register Setup
Debug Register R/Wn Breakpoint Address LENn
Vol. 3 16-9
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.3 DEBUG EXCEPTIONS
The I nt el 64 and I A- 32 archit ect ures dedicat e t wo int errupt vect ors t o handling
debug except ions: vect or 1 ( debug except ion, # DB) and vect or 3 ( breakpoint excep-
t ion, # BP) . The following sect ions describe how t hese except ions are generat ed and
t ypical except ion handler operat ions.
16.3.1 Debug Exception (#DB)Interrupt Vector 1
The debug- except ion handler is usually a debugger program or part of a larger soft -
ware syst em. The processor generat es a debug except ion for any of several condi-
t ions. The debugger checks flags in t he DR6 and DR7 regist ers t o det ermine which
condit ion caused t he except ion and which ot her condit ions might apply. Table 16- 2
shows t he st at es of t hese flags following t he generat ion of each kind of breakpoint
condit ion.
I nst ruct ion- breakpoint and general- det ect condit ion ( see Sect ion 16. 3. 1. 3, General-
Det ect Except ion Condit ion ) result in fault s; ot her debug- except ion condit ions result
in t raps. The debug except ion may report one or bot h at one t ime. The following
sect ions describe each class of debug except ion.
Figure 16-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology
31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 0
DR7
L
Reserved
0
1 2 3 4 5 6 9 10 17 18 25 26 27 28 29 30
G
0
L
1
L
2
L
3
G
3
L
E
G
E
G
2
G
1
G
D
R/W
0
LEN
0
R/W
1
LEN
1
R/W
2
LEN
2
R/W
3
LEN
3
31 16 15 13 14 12 11 8 7 0
DR6
B
0
1 2 3 4 5 6 9 10
B
1
B
2
B
3
0 1 1 1 1 1 1 1 1 1 B
D
B
S
B
T
63 32
63 32
DR6
DR7
0 0 0 0 1
Reserved (set to 1)
16-10 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
See also: Chapt er 6, I nt errupt 1Debug Except ion ( # DB) , in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 3A.
16.3.1.1 Instruction-Breakpoint Exception Condition
The processor report s an inst ruct ion breakpoint when it at t empt s t o execut e an
inst ruct ion at an address specified in a breakpoint - address regist er ( DB0 t hrough
DR3) t hat has been set up t o det ect inst ruct ion execut ion ( R/ W flag is set t o 0) . Upon
report ing t he inst ruct ion breakpoint , t he processor generat es a fault - class, debug
except ion ( # DB) before it execut es t he t arget inst ruct ion for t he breakpoint .
I nst ruct ion breakpoint s are t he highest priorit y debug except ions. They are serviced
before any ot her except ions det ect ed during t he decoding or execut ion of an inst ruc-
t ion. However, if a code inst r uct ion br eakpoint is placed on an inst r uct ion locat ed
immediat ely aft er a POP SS/ MOV SS inst ruct ion, t he breakpoint may not be t rig-
ger ed. I n most sit uat ions, POP SS/ MOV SS will inhibit such int er r upt s ( see
MOVMove and POPPop a Value fr om t he St ack in Chapt er s 3 and 4 of t he
I nt el 64 and I A- 32 Archit ect ur es Soft ware Developers Manual, Volumes
2A & 2B) .
Because t he debug except ion for an inst ruct ion breakpoint is generat ed before t he
inst ruct ion is execut ed, if t he inst ruct ion breakpoint is not removed by t he except ion
handler; t he processor will det ect t he inst ruct ion breakpoint again when t he inst ruc-
t ion is rest art ed and generat e anot her debug except ion. To prevent looping on an
inst ruct ion breakpoint , t he I nt el 64 and I A- 32 archit ect ures provide t he RF flag
Table 16-2. Debug Exception Conditions
Debug or Breakpoint Condition DR6 Flags
Tested
DR7 Flags
Tested
Exception Class
Single-step trap BS = 1 Trap
Instruction breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 0 Fault
Data write breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 1 Trap
I/O read or write breakpoint, at
addresses defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 2 Trap
Data read or write (but not instruction
fetches), at addresses defined by DRn
and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 3 Trap
General detect fault, resulting from an
attempt to modify debug registers
(usually in conjunction with in-circuit
emulation)
BD = 1 Fault
Task switch BT = 1 Trap
Vol. 3 16-11
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
( resume flag) in t he EFLAGS regist er ( see Sect ion 2. 3, Syst em Flags and Fields in
t he EFLAGS Regist er, in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 3A) . When t he RF flag is set , t he processor ignores inst ruct ion
breakpoint s.
All I nt el 64 and I A- 32 processors manage t he RF flag as follows. The RF Flag is
cleared at t he st art of t he inst ruct ion aft er t he check for code breakpoint , CS limit
violat ion and FP except ions. Task Swit ches and I RETD/ I RETQ inst ruct ions t ransfer
t he RF image from t he TSS/ st ack t o t he EFLAGS regist er.
When calling an event handler, I nt el 64 and I A- 32 processors est ablish t he value of
t he RF flag in t he EFLAGS image pushed on t he st ack:
For any fault - class except ion except a debug except ion generat ed in response t o
an inst ruct ion breakpoint , t he value pushed for RF is 1.
For any int errupt arriving aft er any it erat ion of a repeat ed st ring inst ruct ion but
t he last it erat ion, t he value pushed for RF is 1.
For any t rap- class except ion generat ed by any it erat ion of a repeat ed st ring
inst ruct ion but t he last it erat ion, t he value pushed for RF is 1.
For ot her cases, t he value pushed for RF is t he value t hat was in EFLAG. RF at t he
t ime t he event handler was called. This includes:
Debug except ions generat ed in response t o inst ruct ion breakpoint s
Hardware- generat ed int errupt s arriving bet ween inst ruct ions ( including
t hose arriving aft er t he last it erat ion of a repeat ed st ring inst ruct ion)
Trap- class except ions generat ed aft er an inst ruct ion complet es ( including
t hose generat ed aft er t he last it erat ion of a repeat ed st ring inst ruct ion)
Soft ware- generat ed int errupt s ( RF is pushed as 0, since it was cleared at t he
st art of t he soft ware int errupt )
As not ed above, t he processor does not set t he RF flag prior t o calling t he debug
except ion handler for debug except ions result ing from inst ruct ion breakpoint s. The
debug except ion handler can prevent recurrence of t he inst ruct ion breakpoint by
set t ing t he RF flag in t he EFLAGS image on t he st ack. I f t he RF flag in t he EFLAGS
image is set when t he processor ret urns from t he except ion handler, it is copied int o
t he RF flag in t he EFLAGS regist er by I RETD/ I RETQ or a t ask swit ch t hat causes t he
ret urn. The processor t hen ignores inst ruct ion breakpoint s for t he durat ion of t he
next inst ruct ion. ( Not e t hat t he POPF, POPFD, and I RET inst ruct ions do not t ransfer
t he RF image int o t he EFLAGS regist er. ) Set t ing t he RF flag does not prevent ot her
t ypes of debug- except ion condit ions ( such as, I / O or dat a breakpoint s) from being
det ect ed, nor does it prevent non- debug except ions from being generat ed.
For t he Pent ium processor, when an inst ruct ion breakpoint coincides wit h anot her
fault - t ype except ion ( such as a page fault ) , t he processor may generat e one spurious
debug except ion aft er t he second except ion has been handled, even t hough t he
debug except ion handler set t he RF flag in t he EFLAGS image. To prevent a spurious
except ion wit h Pent ium processors, all fault - class except ion handlers should set t he
RF flag in t he EFLAGS image.
16-12 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.3.1.2 Data Memory and I/O Breakpoint Exception Conditions
Dat a memory and I / O breakpoint s are report ed when t he processor at t empt s t o
access a memory or I / O address specified in a breakpoint - address regist er ( DB0
t hrough DR3) t hat has been set up t o det ect dat a or I / O accesses ( R/ W flag is set t o
1, 2, or 3) . The processor generat es t he except ion aft er it execut es t he inst ruct ion
t hat made t he access, so t hese breakpoint condit ion causes a t rap- class except ion t o
be generat ed.
Because dat a breakpoint s are t raps, t he original dat a is overwrit t en before t he t rap
except ion is generat ed. I f a debugger needs t o save t he cont ent s of a writ e break-
point locat ion, it should save t he original cont ent s before set t ing t he breakpoint . The
handler can report t he saved value aft er t he breakpoint is t riggered. The address in
t he debug regist ers can be used t o locat e t he new value st ored by t he inst ruct ion t hat
t riggered t he breakpoint .
I nt el486 and lat er processors ignore t he GE and LE flags in DR7. I n I nt el386 proces-
sors, exact dat a breakpoint mat ching does not occur unless it is enabled by set t ing
t he LE and/ or t he GE flags.
P6 family processors are unable t o report dat a breakpoint s exact ly for t he REP MOVS
and REP STOS inst ruct ions unt il t he complet ion of t he it erat ion aft er t he it erat ion in
which t he breakpoint occurred.
For repeat ed I NS and OUTS inst ruct ions t hat generat e an I / O- breakpoint debug
except ion, t he processor generat es t he except ion aft er t he complet ion of t he first
it erat ion. Repeat ed I NS and OUTS inst ruct ions generat e a memory- breakpoint debug
except ion aft er t he it erat ion in which t he memory address breakpoint locat ion is
accessed.
16.3.1.3 General-Detect Exception Condition
When t he GD flag in DR7 is set , t he general- det ect debug except ion occurs when a
program at t empt s t o access any of t he debug regist ers ( DR0 t hrough DR7) at t he
same t ime t hey are being used by anot her applicat ion, such as an emulat or or
debugger. This prot ect ion feat ure guarant ees full cont rol over t he debug regist ers
when required. The debug except ion handler can det ect t his condit ion by checking
t he st at e of t he BD flag in t he DR6 regist er. The processor generat es t he except ion
before it execut es t he MOV inst ruct ion t hat accesses a debug regist er, which causes
a fault - class except ion t o be generat ed.
16.3.1.4 Single-Step Exception Condition
The processor generat es a single- st ep debug except ion if ( while an inst ruct ion is
being execut ed) it det ect s t hat t he TF flag in t he EFLAGS regist er is set . The excep-
t ion is a t rap- class except ion, because t he except ion is generat ed aft er t he inst ruc-
t ion is execut ed. The processor will not generat e t his except ion aft er t he inst ruct ion
t hat set s t he TF flag. For example, if t he POPF inst ruct ion is used t o set t he TF flag, a
Vol. 3 16-13
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
single- st ep t rap does not occur unt il aft er t he inst ruct ion t hat follows t he POPF
inst ruct ion.
The processor clears t he TF flag before calling t he except ion handler. I f t he TF flag
was set in a TSS at t he t ime of a t ask swit ch, t he except ion occurs aft er t he first
inst ruct ion is execut ed in t he new t ask.
The TF flag normally is not cleared by privilege changes inside a t ask. The I NT n and
I NTO inst ruct ions, however, do clear t his flag. Therefore, soft ware debuggers t hat
single- st ep code must recognize and emulat e I NT n or I NTO inst ruct ions rat her t han
execut ing t hem direct ly. To maint ain prot ect ion, t he operat ing syst em should check
t he CPL aft er any single- st ep t rap t o see if single st epping should cont inue at t he
current privilege level.
The int errupt priorit ies guarant ee t hat , if an ext ernal int errupt occurs, single st ep-
ping st ops. When bot h an ext ernal int errupt and a single- st ep int errupt occur
t oget her, t he single- st ep int errupt is processed first . This operat ion clears t he TF flag.
Aft er saving t he ret urn address or swit ching t asks, t he ext ernal int errupt input is
examined before t he first inst ruct ion of t he single- st ep handler execut es. I f t he
ext ernal int errupt is st ill pending, t hen it is serviced. The ext ernal int errupt handler
does not run in single- st ep mode. To single st ep an int errupt handler, single st ep an
I NT n inst ruct ion t hat calls t he int errupt handler.
16.3.1.5 Task-Switch Exception Condition
The processor generat es a debug except ion aft er a t ask swit ch if t he T flag of t he new
t ask' s TSS is set . This except ion is generat ed aft er program cont rol has passed t o t he
new t ask, and prior t o t he execut ion of t he first inst ruct ion of t hat t ask. The except ion
handler can det ect t his condit ion by examining t he BT flag of t he DR6 regist er.
I f ent ry 1 ( # DB) in t he I DT is a t ask gat e, t he T bit of t he corresponding TSS should
not be set . Failure t o observe t his rule will put t he processor in a loop.
16.3.2 Breakpoint Exception (#BP)Interrupt Vector 3
The breakpoint except ion ( int errupt 3) is caused by execut ion of an I NT 3 inst ruct ion.
See Chapt er 6, I nt errupt 3Breakpoint Except ion ( # BP) . Debuggers use break
except ions in t he same way t hat t hey use t he breakpoint regist ers; t hat is, as a
mechanism for suspending program execut ion t o examine regist ers and memory
locat ions. Wit h earlier I A- 32 processors, breakpoint except ions are used ext ensively
for set t ing inst ruct ion breakpoint s.
Wit h t he I nt el386 and lat er I A- 32 processors, it is more convenient t o set break-
point s wit h t he breakpoint - address regist ers ( DR0 t hrough DR3) . However, t he
breakpoint except ion st ill is useful for breakpoint ing debuggers, because a break-
point except ion can call a separat e except ion handler. The breakpoint except ion is
also useful when it is necessary t o set more breakpoint s t han t here are debug regis-
t ers or when breakpoint s are being placed in t he source code of a program under
development .
16-14 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING OVERVIEW
P6 family processors int roduced t he abilit y t o set breakpoint s on t aken branches,
int errupt s, and except ions, and t o single- st ep from one branch t o t he next . This
capabilit y has been modified and ext ended in t he Pent ium 4, I nt el Xeon, Pent ium M,
I nt el

Core Solo, I nt el

Core Duo, I nt el

Core2 Duo, I nt el

Core i7 and
I nt el

At om processors t o allow logging of branch t race messages in a branch t race


st ore ( BTS) buffer in memory.
See t he following sect ions for processor specific implement at ion of last branch, int er-
rupt and except ion recording:
Sect ion 16. 5, Last Branch, I nt errupt , and Except ion Recording ( I nt el


Core

2 Duo and I nt el

At om

Processor Family)
Sect ion 16. 6, Last Branch, I nt errupt , and Except ion Recording ( I nt el


Core

i7 Processor Family)
Sect ion 16. 7, Last Branch, I nt errupt , and Except ion Recording ( Processors
based on I nt el Net Burst

Microarchit ect ure)


Sect ion 16. 8, Last Branch, I nt errupt , and Except ion Recording ( I nt el

Core


Solo and I nt el

Core

Duo Processors)
Sect ion 16. 9, Last Branch, I nt errupt , and Except ion Recording ( Pent ium M
Processors)
Sect ion 16. 10, Last Branch, I nt errupt , and Except ion Recording ( P6 Family
Processors)
The following subsect ions of Sect ion 16. 4 describe common feat ures of profiling
branches. These feat ures are generally enabled using t he I A32_DEBUGCTL MSR
( older processor may have implement ed a subset or model- specific feat ures, see
definit ions of MSR_DEBUGCTLA, MSR_DEBUGCTLB, MSR_DEBUGCTL) .
16.4.1 IA32_DEBUGCTL MSR
The I A32_DEBUGCTL MSR provides bit field cont rols t o enable debug t race int er-
rupt s, debug t race st ores, t race messages enable, single st epping on branches, last
branch record recording, and t o cont rol freezing of LBR st ack or performance
count ers on a PMI request . I A32_DEBUGCTL MSR is locat ed at regist er address
01D9H.
See Figure 16- 3 for t he MSR layout and t he bullet s below for a descript ion of t he
flags:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s, and/ or
except ions t aken by t he processor ( prior t o a debug except ion being generat ed)
Vol. 3 16-15
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
in t he last branch record ( LBR) st ack. For more informat ion, see t he Sect ion
16.5. 1, LBR St ack .
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor t reat s
t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag rat her t han
a single- st ep on inst ruct ions flag. This mechanism allows single- st epping t he
processor on t aken branches, int errupt s, and except ions. See Sect ion 16. 4. 3,
Single- St epping on Branches, Except ions, and I nt errupt s, for more informat ion
about t he BTF flag.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race messages
are enabled. When t he processor det ect s a t aken branch, int errupt , or except ion;
it sends t he branch record out on t he syst em bus as a branch t race message
( BTM) . See Sect ion 16.4.4, Branch Trace Messages, for more informat ion about
t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , t he flag enables BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS save
area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t 8) When set , t he BTS facilit ies
generat e an int errupt when t he BTS buffer is full. When clear, BTMs are logged t o
t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace St ore ( BTS) ,
for a descript ion of t his mechanism.
BTS_OFF_OS ( br anch t r ace of f i n pr i vi l eged code) f l ag ( bi t 9) When set ,
BTS or BTM is skipped if CPL is 0. See Sect ion 16.7.2.
BTS_OFF_USR ( br anch t r ace of f i n user code) f l ag ( bi t 10) When set ,
BTS or BTM is skipped if CPL is great er t han 0. See Sect ion 16.7.2.
Figure 16-3. IA32_DEBUGCTL MSR for Processors based
on Intel Core

microarchitecture
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
9 10
BTS_OFF_OS BTS off in OS
BTS_OFF_USR BTS off in user code
FREEZE_LBRS_ON_PMI
FREEZE_PERFMON_ON_PMI
11 12 14
FREEZE_WHILE_SMM_EN
16-16 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) When set , t he LBR st ack is frozen on a
hardware PMI request ( e.g. when a count er overflows and is configured t o t rigger
PMI ) .
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) When set , a PMI request clears
each of t he ENABLE field of MSR_PERF_GLOBAL_CTRL MSR ( see Figure 30- 3) t o
disable all t he count ers.
FREEZE_WHI LE_SMM_EN ( bi t 14) I f t his bit is set , upon t he delivery of an
SMI , t he processor will clear all t he enable bit s of I A32_PERF_GLOBAL_CTRL,
save a copy of t he cont ent of I A32_DEBUGCTL and disable LBR, BTF, TR, and BTS
fields of I A32_DEBUGCTL before t ransferring cont rol t o t he SMI handler. Subse-
quent ly, t he enable bit s of I A32_PERF_GLOBAL_CTRL will be set t o 1, t he saved
copy of I A32_DEBUGCTL prior t o SMI delivery will be rest ored, aft er t he SMI
handler issues RSM t o complet e it s service. Not e t hat syst em soft ware must
check I A32_DEBUGCTL. t o det ermine if t he processor support s t he
FREEZE_WHI LE_SMM_EN cont rol bit . FREEZE_WHI LE_SMM_EN is support ed if
I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 30. 11 for det ails of det ect ing t he presence of I A32_PERF_CAPABI LI TI ES
MSR.
16.4.2 Monitoring Branches, Exceptions, and Interrupts
When t he LBR flag ( bit 0) in t he I A32_DEBUGCTL MSR is set , t he processor aut omat -
ically begins recording branch records for t aken branches, int errupt s, and except ions
( except for debug except ions) in t he LBR st ack MSRs.
When t he processor generat es a a debug except ion ( # DB) , it aut omat ically clears t he
LBR flag before execut ing t he except ion handler. This act ion does not clear previously
st ored LBR st ack MSRs. The branch record for t he last four t aken branches, int errupt s
and/ or except ions are ret ained for analysis.
A debugger can use t he linear addresses in t he LBR st ack t o re- set breakpoint s in t he
breakpoint address regist ers ( DR0 t hrough DR3) . This allows a backward t race from
t he manifest at ion of a part icular bug t oward it s source.
I f t he LBR flag is cleared and TR flag in t he I A32_DEBUGCTL MSR remains set , t he
processor will cont inue t o updat e LBR st ack MSRs. This is because BTM informat ion
must be generat ed from ent ries in t he LBR st ack. A # DB does not aut omat ically clear
t he TR flag.
16.4.3 Single-Stepping on Branches, Exceptions, and Interrupts
When soft ware set s bot h t he BTF flag ( bit 1) in t he I A32_DEBUGCTL MSR and t he TF
flag in t he EFLAGS regist er, t he processor generat es a single- st ep debug except ion
t he next t ime it t akes a branch, services an int errupt , or generat es an except ion. This
mechanism allows t he debugger t o single- st ep on cont rol t ransfers caused by
branches, int errupt s, and except ions. This cont rol- flow single st epping helps isolat e
Vol. 3 16-17
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
a bug t o a part icular block of code before inst ruct ion single- st epping furt her narrows
t he search. I f t he BTF flag is set when t he processor generat es a debug except ion,
t he processor clears t he BTF flag along wit h t he TF flag. The debugger must reset t he
BTF and TF flags before resuming program execut ion t o cont inue cont rol- flow single
st epping.
16.4.4 Branch Trace Messages
Set t ing t he TR flag ( bit 6) in t he I A32_DEBUGCTL MSR enables branch t race
messages ( BTMs) . Thereaft er, when t he processor det ect s a branch, except ion, or
int errupt , it sends a branch record out on t he syst em bus as a BTM. A debugging
device t hat is monit oring t he syst em bus can read t hese messages and synchronize
operat ions wit h t aken branch, int errupt , and except ion event s.
When int errupt s or except ions occur in conj unct ion wit h a t aken branch, addit ional
BTMs are sent out on t he bus, as described in Sect ion 16. 4. 2, Monit oring Branches,
Except ions, and I nt errupt s.
Unlike t he P6 family and Core family processors, t he Pent ium 4, At om, and I nt el Xeon
processors can collect branch records in t he LBR st ack MSRs while at t he same t ime
sending/ st oring BTMs when bot h t he TR and LBR flags are set in t he I A32_DEBUGCTL
MSR ( in t he case of Pent ium 4, processor, MSR_DEBUGCTLA) .
16.4.5 Branch Trace Store (BTS)
A t race of t aken branches, int errupt s, and except ions is useful for debugging code by
providing a met hod of det ermining t he decision pat h t aken t o reach a part icular code
locat ion. The LBR flag ( bit 0) of I A32_DEBUGCTL provides a mechanism for capt uring
records of t aken branches, int errupt s, and except ions and saving t hem in t he last
branch record ( LBR) st ack MSRs, set t ing t he TR flag for sending t hem out ont o t he
syst em bus as BTMs. The branch t race st ore ( BTS) mechanism provides t he addi-
t ional capabilit y of saving t he branch records in a memory- resident BTS buffer, which
is part of t he DS save area. The BTS buffer can be configured t o be circular so t hat
t he most recent branch records are always available or it can be configured t o
generat e an int errupt when t he buffer is nearly full so t hat all t he branch records can
be saved. The BTI NT flag ( bit 8) can be used t o enable t he generat ion of int errupt
when t he BTS buffer is full. See Sect ion 16. 4. 9. 2, Set t ing Up t he DS Save Area. for
addit ional det ails.
Set t ing t his flag ( BTS) alone can great ly reduce t he performance of t he processor.
CPL- qualified branch t race st oring mechanism can help mit igat e t he performance
impact of sending/ logging branch t race messages.
16-18 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.6 CPL-Qualified Branch Trace Mechanism
CPL- qualified branch t race mechanism is available t o a subset of I nt el 64 and I A- 32
processors t hat support t he branch t race st oring mechanism. The processor support s
t he CPL- qualified branch t race mechanism if CPUI D. 01H: ECX[ bit 4] = 1.
The CPL- qualified branch t race mechanism is described in Sect ion 16. 4. 9. 4. Syst em
soft ware can select ively specify CPL qualificat ion t o not send/ st ore Branch Trace
Messages associat ed wit h a specified privilege level. Two bit fields, BTS_OFF_USR
( bit 10) and BTS_OFF_OS ( bit 9) , are provided in t he debug cont rol regist er t o
specify t he CPL of BTMs t hat will not be logged in t he BTS buffer or sent on t he bus.
16.4.7 Freezing LBR and Performance Counters on PMI
Many issues may generat e a performance monit oring int errupt ( PMI ) ; a PMI service
handler will need t o det ermine cause t o handle t he sit uat ion. Two capabilit ies t hat
allow a PMI service rout ine t o improve branch t racing and performance monit oring
are:
Fr eezi ng LBRs on PMI ( bi t 11) The processor freezes LBRs on a PMI request
by clearing t he LBR bit ( bit 0) in I A32_DEBUGCTL. Soft ware must t hen re- enable
I A32_DEBUGCTL. [ 0] t o cont inue monit oring branches. When using t his feat ure,
soft ware should be careful about writ es t o I A32_DEBUGCTL t o avoid re- enabling
LBRs by accident if t hey were j ust disabled.
Fr eezi ng PMCs on PMI ( bit 12) The processor freezes t he performance
count ers on a PMI request by clearing t he MSR_PERF_GLOBAL_CTRL MSR ( see
Figure 30- 3) . The PMCs affect ed include bot h general- purpose count ers and
fixed- funct ion count ers ( see Sect ion 30. 4. 1, Fixed- funct ion Performance
Count ers ) . Soft ware must re- enable count s by writ ing 1s t o t he corresponding
enable bit s in MSR_PERF_GLOBAL_CTRL before leaving a PMI service rout ine t o
cont inue count er operat ion.
Freezing LBRs and PMCs on PMI s occur when:
A performance count er had an overflow and was programmed t o signal a PMI in
case of an overflow.
For t he general- purpose count ers; t his is done by set t ing bit 20 of t he
I A32_PERFEVTSELx regist er.
For t he fixed- funct ion count ers; t his is done by set t ing t he 3rd bit in t he
corresponding 4- bit cont rol field of t he MSR_PERF_FI XED_CTR_CTRL regist er
( see Figure 30- 1) or I A32_FI XED_CTR_CTRL MSR ( see Figure 30- 2) .
The PEBS buffer is almost full and reaches t he int errupt t hreshold.
The BTS buffer is almost full and reaches t he int errupt t hreshold.
Vol. 3 16-19
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8 LBR Stack
The last branch record st ack and t op- of- st ack ( TOS) point er MSRs are support ed
across I nt el 64 and I A- 32 processor families. However, t he number of MSRs in t he
LBR st ack and t he valid range of TOS point er value can vary bet ween different
processor families. Table 16- 3 list s t he LBR st ack size and TOS point er range for
several processor families according t o t he CPUI D signat ures of Display-
Family/ DisplayModel encoding ( see CPUI D inst ruct ion in Chapt er 3 of I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2A) .
The last branch recording mechanism t racks not only branch inst ruct ions ( like JMP,
Jcc, LOOP and CALL inst ruct ions) , but also ot her operat ions t hat cause a change in
t he inst ruct ion point er ( like ext ernal int errupt s, t raps and fault s) . The branch
recording mechanisms generally employs a set of MSRs, referred t o as last branch
record ( LRB) st ack. The size and exact locat ions of t he LRB st ack are generally
model- specific.
Last Br anch Recor d ( LBR) St ack The LBR consist s of N pairs of MSRs ( N is
list ed in t he LBR st ack size column of Table 16- 3) t hat st ore source and
dest inat ion address of recent branches ( see Figure 16- 3) :
MSR_LASTBRANCH_0_FROM_I P ( address is model specific) t hrough t he next
consecut ive ( N- 1) MSR address st ore source addresses
MSR_LASTBRANCH_0_TO_I P ( address is model specific ) t hrough t he next
consecut ive ( N- 1) MSR address st ore dest inat ion addresses.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The lowest significant M
bit s of t he TOS Point er MSR ( MSR_LASTBRANCH_TOS, address is model specific)
cont ains an M- bit point er t o t he MSR in t he LBR st ack t hat cont ains t he most
recent branch, int errupt , or except ion recorded. The valid range of t he M- bit POS
point er is given in Table 16- 3.
Table 16-3. LBR Stack Size and TOS Pointer Range
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
06_1AH, 06_1EH, 06_1FH,
06_2EH
16 0 to 15
06_17H, 06_1DH 4 0 to 3
06_0FH 4 0 to 3
06_1CH 8 0 to 7
16-20 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8.1 LBR Stack and Intel

64 Processors
LBR MSRs are 64- bit s. I f I A- 32e mode is disabled, only t he lower 32- bit s of t he
address is recorded. I f I A- 32e mode is enabled, t he processor writ es 64- bit values
int o t he MSR.
I n 64- bit mode, last branch records st ore 64- bit addresses; in compat ibilit y mode,
t he upper 32- bit s of last branch records are cleared.
Soft ware should query an archit ect ural MSR I A32_PERF_CAPABI LI TI ES[ 5: 0]
about t he format of t he address t hat is st ored in t he LBR st ack. Four format s are
defined by t he following encoding:
000000B ( 32- bi t r ecor d f or mat ) St ores 32- bit offset in current CS of
respect ive source/ dest inat ion,
000001B ( 64- bi t LI P r ecor d f or mat ) St ores 64- bit linear address of
respect ive source/ dest inat ion,
000010B ( 64- bi t EI P r ecor d f or mat ) St ores 64- bit offset ( effect ive
address) of respect ive source/ dest inat ion.
000011B ( 64- bi t EI P r ecor d f or mat ) and Fl ags St ores 64- bit offset
( effect ive address) of respect ive source/ dest inat ion. LBR flags are support ed
in t he upper bit s of FROM regist er in t he LBR st ack. See LBR st ack det ails
below for flag support and definit ion.
Processor s support for t he archit ect ural MSR I A32_PERF_CAPABI LI TI ES is
provided by CPUI D.01H: ECX[ PERF_CAPAB_MSR] ( bit 15) .
16.4.8.2 LBR Stack and IA-32 Processors
The LBR MSRs in I A- 32 processors int roduced prior t o I nt el 64 archit ect ure st ore t he
32- bit To Linear Address and From Linear Address using t he high and low half of
each 64- bit MSR.
Figure 16-4. 64-bit Address Layout of LBR MSR
63
Source Address
0
0 63
Destination Address
MSR_LASTBRANCH_0_FROM_IP through MSR_LASTBRANCH_(N-1)_FROM_IP
MSR_LASTBRANCH_0_TO_IP through MSR_LASTBRANCH_(N-1)_TO_IP
Vol. 3 16-21
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.8.3 Last Exception Records and Intel 64 Architecture
I nt el 64 and I A- 32 processors also provide MSRs t hat st ore t he branch record for t he
last branch t aken prior t o an except ion or an int errupt . The locat ion of t he last excep-
t ion record ( LER) MSRs are model specific. The MSRs t hat st ore last except ion
records are 64- bit s. I f I A- 32e mode is disabled, only t he lower 32- bit s of t he address
is recorded. I f I A- 32e mode is enabled, t he processor writ es 64- bit values int o t he
MSR. I n 64- bit mode, last except ion records st ore 64- bit addresses; in compat ibilit y
mode, t he upper 32- bit s of last except ion records are cleared.
16.4.9 BTS and DS Save Area
The Debug st or e ( DS) feat ure flag ( bit 21) , ret urned by CPUI D. 1: EDX[ 21] I ndicat es
t hat t he processor provides t he debug st ore ( DS) mechanism. This mechanism
allows BTMs t o be st ored in a memory- resident BTS buffer. See Sect ion 16. 4. 5,
Branch Trace St ore ( BTS) . Precise event - based sampling ( PEBS, see Sect ion
30. 4. 4, Precise Event Based Sampling ( PEBS) , ) also uses t he DS save area
provided by debug st ore mechanism. When CPUI D. 1: EDX[ 21] is set , t he following
BTS facilit ies are available:
The BTS_UNAVAI LABLE flag in t he I A32_MI SC_ENABLE MSR indicat es ( when
clear) t he availabilit y of t he BTS facilit ies, including t he abilit y t o set t he BTS and
BTI NT bit s in t he MSR_DEBUGCTLA MSR.
The I A32_DS_AREA MSR can be programmed t o point t o t he DS save area.
The debug st ore ( DS) save area is a soft ware- designat ed area of memory t hat is
used t o collect t he following t wo t ypes of informat ion:
Br anch r ecor ds When t he BTS flag in t he I A32_DEBUGCTL MSR is set , a
branch record is st ored in t he BTS buffer in t he DS save area whenever a t aken
branch, int errupt , or except ion is det ect ed.
PEBS r ecor ds When a performance count er is configured for PEBS, a PEBS
record is st ored in t he PEBS buffer in t he DS save area aft er t he count er overflow
occurs. This record cont ains t he archit ect ural st at e of t he processor ( st at e of t he
8 general purpose regist ers, EI P regist er, and EFLAGS regist er) at t he next
occurrence of t he PEBS event t hat caused t he count er t o overflow. When t he
st at e informat ion has been logged, t he count er is aut omat ically reset t o a
preselect ed value, and event count ing begins again. This feat ure is available only
for a subset of t he performance event s on processors t hat support PEBS.
NOTES
DS save area and recording mechanism is not available in t he SMM.
The feat ure is disabled on t ransit ion t o t he SMM mode. Similarly DS
recording is disabled on t he generat ion of a machine check except ion
16-22 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
and is cleared on processor RESET and I NI T. DS recording is available
in real address mode.
The BTS and PEBS facilit ies may not be available on all processors.
The availabilit y of t hese facilit ies is indicat ed by t he
BTS_UNAVAI LABLE and PEBS_UNAVAI LABLE flags, respect ively, in
t he I A32_MI SC_ENABLE MSR ( see Appendix B) .
The DS save area is divided int o t hree part s ( see Figure 16- 5) : buffer management
area, branch t race st ore ( BTS) buffer, and PEBS buffer. The buffer management area
is used t o define t he locat ion and size of t he BTS and PEBS buffers. The processor
t hen uses t he buffer management area t o keep t rack of t he branch and/ or PEBS
records in t heir respect ive buffers and t o record t he performance count er reset value.
The linear address of t he first byt e of t he DS buffer management area is specified
wit h t he I A32_DS_AREA MSR.
The fields in t he buffer management area are as follows:
BTS buf f er base Linear address of t he first byt e of t he BTS buffer. This
address should point t o a nat ural doubleword boundary.
BTS i ndex Linear address of t he first byt e of t he next BTS record t o be writ t en
t o. I nit ially, t his address should be t he same as t he address in t he BTS buffer
base field.
BTS absol ut e max i mum Linear address of t he next byt e past t he end of t he
BTS buffer. This address should be a mult iple of t he BTS record size ( 12 byt es)
plus 1.
BTS i nt er r upt t hr eshol d Linear address of t he BTS record on which an
int errupt is t o be generat ed. This address must point t o an offset from t he BTS
buffer base t hat is a mult iple of t he BTS record size. Also, it must be several
records short of t he BTS absolut e maximum address t o allow a pending int errupt
t o be handled prior t o processor writ ing t he BTS absolut e maximum record.
PEBS buf f er base Linear address of t he first byt e of t he PEBS buffer. This
address should point t o a nat ural doubleword boundary.
PEBS i ndex Linear address of t he first byt e of t he next PEBS record t o be
writ t en t o. I nit ially, t his address should be t he same as t he address in t he PEBS
buffer base field.
Vol. 3 16-23
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
PEBS absol ut e max i mum Linear address of t he next byt e past t he end of t he
PEBS buffer. This address should be a mult iple of t he PEBS record size ( 40 byt es)
plus 1.
PEBS i nt er r upt t hr eshol d Linear address of t he PEBS record on which an
int errupt is t o be generat ed. This address must point t o an offset from t he PEBS
buffer base t hat is a mult iple of t he PEBS record size. Also, it must be several
records short of t he PEBS absolut e maximum address t o allow a pending
int errupt t o be handled prior t o processor writ ing t he PEBS absolut e maximum
record.
Figure 16-5. DS Save Area
BTS Buffer Base
BTS Index
BTS Absolute
BTS Interrupt
PEBS Absolute
PEBS Interrupt
PEBS
Maximum
Maximum
Threshold
PEBS Index
PEBS Buffer Base
Threshold
Counter Reset
Reserved
0H
4H
8H
CH
10H
14H
18H
1CH
20H
24H
30H
Branch Record 0
Branch Record 1
Branch Record n
PEBS Record 0
PEBS Record 1
PEBS Record n
BTS Buffer
PEBS Buffer
DS Buffer Management Area
IA32_DS_AREA MSR
16-24 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
PEBS count er r eset val ue A 40- bit value t hat t he count er is t o be reset t o
aft er st at e informat ion has collect ed following count er overflow. This value allows
st at e informat ion t o be collect ed aft er a preset number of event s have been
count ed.
Figures 16- 6 shows t he st ruct ure of a 12- byt e branch record in t he BTS buffer. The
fields in each record are as follows:
Last br anch f r om Linear address of t he inst ruct ion from which t he branch,
int errupt , or except ion was t aken.
Last br anch t o Linear address of t he branch t arget or t he first inst ruct ion in
t he int errupt or except ion service rout ine.
Br anch pr edi ct ed Bit 4 of field indicat es whet her t he branch t hat was t aken
was predict ed ( set ) or not predict ed ( clear) .
Figures 16- 7 shows t he st ruct ure of t he 40- byt e PEBS records. Nominally t he regist er
values are t hose at t he beginning of t he inst ruct ion t hat caused t he event . However,
t here are cases where t he regist ers may be logged in a part ially modified st at e. The
linear I P field shows t he value in t he EI P regist er t ranslat ed from an offset int o t he
current code segment t o a linear address.
Figure 16-6. 32-bit Branch Trace Record Format
Last Branch From
Last Branch To
Branch Predicted
0H
4H
8H
0
31 4
Vol. 3 16-25
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.9.1 DS Save Area and IA-32e Mode Operation
When I A- 32e mode is act ive ( I A32_EFER. LMA = 1) , t he st ruct ure of t he DS save area
is shown in Figure 16- 8. The organizat ion of each field in I A- 32e mode operat ion is
similar t o t hat of non- I A- 32e mode operat ion. However, each field now st ores a
64- bit address. The I A32_DS_AREA MSR holds t he 64- bit linear address of t he first
byt e of t he DS buffer management area.
Figure 16-7. PEBS Record Format
EFLAGS 0H
4H
8H
0
31
Linear IP
10H
18H
14H
1CH
20H
24H
CH
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
16-26 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
When I A- 32e mode is act ive, t he st ruct ure of a branch t race record is similar t o t hat
shown in Figure 16- 6, but each field is 8 byt es in lengt h. This makes each BTS record
24 byt es ( see Figure 16- 9) . The st ruct ure of a PEBS record is similar t o t hat shown in
Figure 16- 7, but each field is 8 byt es in lengt h and archit ect ural st at es include
regist er R8 t hrough R15. This makes t he size of a PEBS record in 64- bit mode 144
byt es ( see Figure 16- 10) .
Figure 16-8. IA-32e Mode DS Save Area
BTS Buffer Base
BTS Index
BTS Absolute
BTS Interrupt
PEBS Absolute
PEBS Interrupt
PEBS
Maximum
Maximum
Threshold
PEBS Index
PEBS Buffer Base
Threshold
Counter Reset
Reserved
0H
8H
10H
18H
20H
28H
30H
38H
40H
48H
50H
Branch Record 0
Branch Record 1
Branch Record n
PEBS Record 0
PEBS Record 1
PEBS Record n
BTS Buffer
PEBS Buffer
DS Buffer Management Area
IA32_DS_AREA MSR
Vol. 3 16-27
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Fields in t he buffer management area of a DS save area are described in Sect ion
16. 4. 9.
The format of a branch t race record and a PEBS record are t he same as t he 64- bit
record format s shown in Figures 16- 9 and Figures 16- 10, wit h t he except ion t hat t he
branch predict ed bit is not support ed by I nt el Core microarchit ect ure or I nt el At om
microarchit ect ure. The 64- bit record format s for BTS and PEBS apply t o DS save area
for all operat ing modes.
Figure 16-9. 64-bit Branch Trace Record Format
Figure 16-10. 64-bit PEBS Record Format
Last Branch From
Last Branch To
Branch Predicted
0H
8H
10H
0
63 4
RFLAGS 0H
8H
10H
0
63
RIP
20H
30H
28H
38H
40H
48H
18H
RAX
RBX
RCX
RDX
RSI
RDI
RBP
RSP
R8
...
R15
50H
...
88H
16-28 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
The procedures used t o program I A32_DEBUG_CTRL MSR t o set up a BTS buffer or a
CPL- qualified BTS are described in Sect ion 16.4. 9. 3 and Sect ion 16.4. 9. 4.
Required element s for writ ing a DS int errupt service rout ine are largely t he same on
processors t hat support using DS Save area for BTS or PEBS records. However, on
processors based on I nt el Net Burst

microarchit ect ure, re- enabling count ing


requires writ ing t o CCCRs. But a DS int errupt service rout ine on processors based on
I nt el Core or I nt el At om microarchit ect ure should:
Re- enable t he enable bit s in I A32_PERF_GLOBAL_CTRL MSR if it is servicing an
overflow PMI due t o PEBS.
Clear overflow indicat ions by writ ing t o I A32_PERF_GLOBAL_OVF_CTRL when a
count ing configurat ion is changed. This includes bit 62 ( ClrOvfBuffer) and t he
overflow indicat ion of count ers used in eit her PEBS or general- purpose count ing
( specifically: bit s 0 or 1; see Figures 30- 3) .
16.4.9.2 Setting Up the DS Save Area
To save branch records wit h t he BTS buffer, t he DS save area must first be set up in
memory as described in t he following procedure ( See Sect ion 30. 4. 4. 1, Set t ing up
t he PEBS Buffer, for inst ruct ions for set t ing up a PEBS buffer, respect ively, in t he DS
save area) :
1. Creat e t he DS buffer management informat ion area in memory ( see Sect ion
16. 4. 9, BTS and DS Save Area, and Sect ion 16. 4. 9. 1, DS Save Area and I A-
32e Mode Operat ion ) . Also see t he addit ional not es in t his sect ion.
2. Writ e t he base linear address of t he DS buffer management area int o t he
I A32_DS_AREA MSR.
3. Set up t he performance count er ent ry in t he xAPI C LVT for fixed delivery and
edge sensit ive. See Sect ion 10. 5. 1, Local Vect or Table.
4. Est ablish an int errupt handler in t he I DT for t he vect or associat ed wit h t he
performance count er ent ry in t he xAPI C LVT.
5. Writ e an int errupt service rout ine t o handle t he int errupt . See Sect ion 16. 4. 9. 5,
Writ ing t he DS I nt errupt Service Rout ine.
The following rest rict ions should be applied t o t he DS save area.
The t hree DS save area sect ions should be allocat ed from a non- paged pool, and
marked accessed and dirt y. I t is t he responsibilit y of t he operat ing syst em t o
keep t he pages t hat cont ain t he buffer present and t o mark t hem accessed and
dirt y. The implicat ion is t hat t he operat ing syst em cannot do lazy page- t able
ent ry propagat ion for t hese pages.
The DS save area can be larger t han a page, but t he pages must be mapped t o
cont iguous linear addresses. The buffer may share a page, so it need not be
aligned on a 4- KByt e boundary. For performance reasons, t he base of t he buffer
must be aligned on a doubleword boundary and should be aligned on a cache line
boundary.
Vol. 3 16-29
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
I t is recommended t hat t he buffer size for t he BTS buffer and t he PEBS buffer be
an int eger mult iple of t he corresponding record sizes.
The precise event records buffer should be large enough t o hold t he number of
precise event records t hat can occur while wait ing for t he int errupt t o be
serviced.
The DS save area should be in kernel space. I t must not be on t he same page as
code, t o avoid t riggering self- modifying code act ions.
There are no memory t ype rest rict ions on t he buffers, alt hough it is
recommended t hat t he buffers be designat ed as WB memory t ype for
performance considerat ions.
Eit her t he syst em must be prevent ed from ent ering A20M mode while DS save
area is act ive, or bit 20 of all addresses wit hin buffer bounds must be 0.
Pages t hat cont ain buffers must be mapped t o t he same physical addresses for all
processes, such t hat any change t o cont rol regist er CR3 will not change t he DS
addresses.
The DS save area is expect ed t o used only on syst ems wit h an enabled API C. The
LVT Performance Count er ent ry in t he APCI must be init ialized t o use an int errupt
gat e inst ead of t he t rap gat e.
16.4.9.3 Setting Up the BTS Buffer
Three flags in t he MSR_DEBUGCTLA MSR ( see Table 16- 4) , I A32_DEBUGCTL ( see
Figure 16- 3) , or MSR_DEBUGCTLB ( see Figure 16- 16) cont rol t he generat ion of
branch records and st oring of t hem in t he BTS buffer; t hese are TR, BTS, and BTI NT.
The TR flag enables t he generat ion of BTMs. The BTS flag det ermines whet her t he
BTMs are sent out on t he syst em bus ( clear) or st ored in t he BTS buffer ( set ) . BTMs
cannot be simult aneously sent t o t he syst em bus and logged in t he BTS buffer. The
BTI NT flag enables t he generat ion of an int errupt when t he BTS buffer is full. When
t his flag is clear, t he BTS buffer is a circular buffer.
The following procedure describes how t o set up a DS Save area t o collect branch
records in t he BTS buffer :
1. Place values in t he BTS buffer base, BTS index, BTS absolut e maximum, and BTS
int errupt t hreshold fields of t he DS buffer management area t o set up t he BTS
buffer in memory.
Table 16-4. IA32_DEBUGCTL Flag Encodings
TR BTS BTINT Description
0 X X Branch trace messages (BTMs) off
1 0 X Generate BTMs
1 1 0 Store BTMs in the BTS buffer, used here as a circular buffer
1 1 1 Store BTMs in the BTS buffer, and generate an interrupt when
the buffer is nearly full
16-30 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
2. Set t he TR and BTS flags in t he I A32_DEBUGCTL for I nt el Core Solo and I nt el
Core Duo processors or lat er processors ( or MSR_DEBUGCTLA MSR for
processors based on I nt el Net Burst Microarchit ect ure; or MSR_DEBUGCTLB for
Pent ium M processors) .
3. Clear t he BTI NT flag in t he corresponding I A32_DEBUGCTL ( or MSR_DEBUGCTLA
MSR; or MSR_DEBUGCTLB) if a circular BTS buffer is desired.
NOTES
I f t he buffer size is set t o less t han t he minimum allowable value ( i. e.
BTS absolut e maximum < 1 + size of BTS record) , t he result s of BTS
is undefined.
I n order t o prevent generat ing an int errupt , when working wit h
circular BTS buffer, SW need t o set BTS int errupt t hreshold t o a value
great er t han BTS absolut e maximum ( fields of t he DS buffer
management area) . I t ' s not enough t o clear t he BTI NT flag it self only.
16.4.9.4 Setting Up CPL-Qualified BTS
I f t he processor support s CPL- qualified last branch recording mechanism, t he gener-
at ion of branch records and st oring of t hem in t he BTS buffer are det ermined by: TR,
BTS, BTS_OFF_OS, BTS_OFF_USR, and BTI NT. The encoding of t hese five bit s are
shown in Table 16- 5.
Table 16-5. CPL-Qualified Branch Trace Store Encodings
TR BTS BTS_OFF_OS BTS_OFF_USR BTINT Description
0 X X X X Branch trace messages (BTMs)
off
1 0 X X X Generates BTMs but do not
store BTMs
1 1 0 0 0 Store all BTMs in the BTS buffer,
used here as a circular buffer
1 1 1 0 0 Store BTMs with CPL > 0 in the
BTS buffer
1 1 0 1 0 Store BTMs with CPL = 0 in the
BTS buffer
1 1 1 1 X Generate BTMs but do not store
BTMs
1 1 0 0 1 Store all BTMs in the BTS buffer;
generate an interrupt when the
buffer is nearly full
Vol. 3 16-31
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.4.9.5 Writing the DS Interrupt Service Routine
The BTS, non- precise event - based sampling, and PEBS facilit ies share t he same
int errupt vect or and int errupt service rout ine ( called t he debug st ore int errupt
service rout ine or DS I SR) . To handle BTS, non- precise event - based sampling, and
PEBS int errupt s: separat e handler rout ines must be included in t he DS I SR. Use t he
following guidelines when writ ing a DS I SR t o handle BTS, non- precise event - based
sampling, and/ or PEBS int errupt s.
The DS int errupt service rout ine ( I SR) must be part of a kernel driver and operat e
at a current privilege level of 0 t o secure t he buffer st orage area.
Because t he BTS, non- precise event - based sampling, and PEBS facilit ies share
t he same int errupt vect or, t he DS I SR must check for all t he possible causes of
int errupt s from t hese facilit ies and pass cont rol on t o t he appropriat e handler.
BTS and PEBS buffer overflow would be t he sources of t he int errupt if t he buffer
index mat ches/ exceeds t he int errupt t hreshold specified. Det ect ion of non-
precise event - based sampling as t he source of t he int errupt is accomplished by
checking for count er overflow.
There must be separat e save areas, buffers, and st at e for each processor in an
MP syst em.
Upon ent ering t he I SR, branch t race messages and PEBS should be disabled t o
prevent race condit ions during access t o t he DS save area. This is done by
clearing TR flag in t he I A32_DEBUGCTL ( or MSR_DEBUGCTLA MSR) and by
clearing t he precise event enable flag in t he MSR_PEBS_ENABLE MSR. These
set t ings should be rest ored t o t heir original values when exit ing t he I SR.
The processor will not disable t he DS save area when t he buffer is full and t he
circular mode has not been select ed. The current DS set t ing must be ret ained
and rest ored by t he I SR on exit .
Aft er reading t he dat a in t he appropriat e buffer, up t o but not including t he
current index int o t he buffer, t he I SR must reset t he buffer index t o t he beginning
of t he buffer. Ot herwise, everyt hing up t o t he index will look like new ent ries upon
t he next invocat ion of t he I SR.
1 1 1 0 1 Store BTMs with CPL > 0 in the
BTS buffer; generate an
interrupt when the buffer is
nearly full
1 1 0 1 1 Store BTMs with CPL = 0 in the
BTS buffer; generate an
interrupt when the buffer is
nearly full
Table 16-5. CPL-Qualified Branch Trace Store Encodings (Contd.)
TR BTS BTS_OFF_OS BTS_OFF_USR BTINT Description
16-32 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
The I SR must clear t he mask bit in t he performance count er LVT ent ry.
The I SR must re- enable t he count ers t o count via
I A32_PERF_GLOBAL_CTRL/ I A32_PERF_GLOBAL_OVF_CTRL if it is servicing an
overflow PMI due t o PEBS ( or via CCCR' s ENABLE bit on processor based on I nt el
Net Burst microarchit ect ure) .
The Pent ium 4 Processor and I nt el Xeon Processor mask PMI s upon receiving an
int errupt . Clear t his condit ion before leaving t he int errupt handler.
16.5 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (INTEL

CORE

2 DUO AND INTEL


ATOM

PROCESSOR FAMILY)
The I nt el Core 2 Duo processor family and I nt el Xeon processors based on I nt el Core
microarchit ect ure or enhanced I nt el Core microarchit ect ure provide last branch
int errupt and except ion recording. The facilit ies described in t his sect ion also apply t o
I nt el At om processor family. These capabilit ies are similar t o t hose found in Pent ium
4 processors, including support for t he following facilit ies:
Debug Tr ace and Br anch Recor di ng Cont r ol The I A32_DEBUGCTL MSR
provide bit fields for soft ware t o configure mechanisms relat ed t o debug t race,
branch recording, branch t race st ore, and performance count er operat ions. See
Sect ion 16. 4. 1 for a descript ion of t he flags. See Figure 16- 3 for t he MSR layout .
Last br anch r ecor d ( LBR) st ack There are a collect ion of MSR pairs t hat
st ore t he source and dest inat ion addresses relat ed t o recent ly execut ed
branches. See Sect ion 16. 5. 1.
Moni t or i ng and si ngl e- st eppi ng of br anches, ex cept i ons, and i nt er r upt s
See Sect ion 16. 4. 2 and Sect ion 16. 4. 3. I n addit ion, t he abilit y t o freeze t he
LBR st ack on a PMI request is available.
The I nt el At om processor family clears t he TR flag when t he
FREEZE_LBRS_ON_PMI flag is set .
Br anch t r ace messages See Sect ion 16. 4. 4.
Last ex cept i on r ecor ds See Sect ion 16. 7. 3.
Br anch t r ace st or e and CPL- qual i f i ed BTS See Sect ion 16. 4. 5.
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) see Sect ion 16. 4.7.
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) see Sect ion 16. 4. 7.
FREEZE_WHI LE_SMM_EN ( bi t 14) FREEZE_WHI LE_SMM_EN is support ed
if I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 16. 4. 1.
Vol. 3 16-33
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.5.1 LBR Stack
The last branch record st ack and t op- of- st ack ( TOS) point er MSRs are support ed
across I nt el Core 2, I nt el Xeon and I nt el At om processor families. Four pair of MSRs
are support ed in t he LBR st ack
Last Br anch Recor d ( LBR) St ack
MSR_LASTBRANCH_0_FROM_I P ( address 40H) t hrough
MSR_LASTBRANCH_3_FROM_I P ( address 43H) st ore source addresses
MSR_LASTBRANCH_0_TO_I P ( address 60H) t hrough
MSR_LASTBRANCH_3_To_I P ( address 63H) st ore dest inat ion addresses.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The lowest significant 2
bit s of t he TOS Point er MSR ( MSR_LASTBRANCH_TOS, address 1C9H) cont ains a
point er t o t he MSR in t he LBR st ack t hat cont ains t he most recent branch,
int errupt , or except ion recorded.
For compat ibilit y, t he MSR_LER_TO_LI P and t he MSR_LER_FROM_LI P MSRs) dupli-
cat e funct ions of t he Last Except ionToI P and Last Except ionFromI P MSRs found in P6
family processors.
16.6 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (INTEL

CORE

I7 PROCESSOR FAMILY)
The I nt el Core i7 processor family and I nt el Xeon processors based on I nt el


microarchit ect ure codename Nehalem support last branch int errupt and except ion
recording. These capabilit ies are similar t o t hose found in I nt el Core 2 processors and
adds addit ional capabilit ies:
Debug Tr ace and Br anch Recor di ng Cont r ol The I A32_DEBUGCTL MSR
provides bit fields for soft ware t o configure mechanisms relat ed t o debug t race,
branch recording, branch t race st ore, and performance count er operat ions. See
Sect ion 16. 4. 1 for a descript ion of t he flags. See Figure 16- 11 for t he MSR layout .
Last br anch r ecor d ( LBR) st ack There are 16 MSR pairs t hat st ore t he
source and dest inat ion addresses relat ed t o recent ly execut ed branches. See
Sect ion 16. 6. 1.
Moni t or i ng and si ngl e- st eppi ng of br anches, ex cept i ons, and i nt er r upt s
See Sect ion 16. 4. 2 and Sect ion 16. 4. 3. I n addit ion, t he abilit y t o freeze t he
LBR st ack on a PMI request is available.
Br anch t r ace messages The I A32_DEBUGCTL MSR provides bit fields for
soft ware t o enable each logical processor t o generat e branch t race messages.
See Sect ion 16. 4. 4. However, not all BTM messages are observable using t he
I nt el

QPI link.
Last ex cept i on r ecor ds See Sect ion 16. 7. 3.
16-34 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Br anch t r ace st or e and CPL- qual i f i ed BTS See Sect ion 16. 4. 6 and Sect ion
16. 4. 5.
FREEZE_LBRS_ON_PMI f l ag ( bi t 11) see Sect ion 16. 4.7.
FREEZE_PERFMON_ON_PMI f l ag ( bi t 12) see Sect ion 16. 4. 7.
FREEZE_WHI LE_SMM_EN ( bi t 14) FREEZE_WHI LE_SMM_EN is support ed
if I A32_PERF_CAPABI LI TI ES. FREEZE_WHI LE_SMM[ Bit 12] is report ing 1. See
Sect ion 16. 4. 1.
Processors based on I nt el microarchit ect ure codename Nehalem provide addit ional
capabilit ies:
I ndependent cont r ol of uncor e PMI The I A32_DEBUGCTL MSR provides a
bit field ( see Figure 16- 11) for soft ware t o enable each logical processor t o
receive an uncore count er overflow int errupt .
LBR f i l t er i ng Processors based on I nt el microarchit ect ure codename Nehalem
support filt ering of LBR based on combinat ion of CPL and branch t ype condit ions.
When LBR filt ering is enabled, t he LBR st ack only capt ures t he subset of branches
t hat are specified by MSR_LBR_SELECT.
16.6.1 LBR Stack
Processors based on I nt el microarchit ect ure codename Nehalem provide 16 pairs of
MSR t o record last branch record informat ion. The layout of each MSR pair is shown
in Table 16- 6 and Table 16- 7.
Figure 16-11. IA32_DEBUGCTL MSR for Processors based
on Intel microarchitecture codename Nehalem
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
9 10
BTS_OFF_OS BTS off in OS
BTS_OFF_USR BTS off in user code
FREEZE_LBRS_ON_PMI
FREEZE_PERFMON_ON_PMI
11 12 14
FREEZE_WHILE_SMM_EN
UNCORE_PMI_EN
13
Vol. 3 16-35
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Processors based on I nt el microarchit ect ure codename Nehalem have an LBR MSR
St ack as shown in Table 16- 8.
Table 16-8. LBR Stack Size and TOS Pointer Range
16.6.2 Filtering of Last Branch Records
MSR_LBR_SELECT is cleared t o zero at RESET, and LBR filt ering is disabled, i. e. all
branches will be capt ured. MSR_LBR_SELECT provides bit fields t o specify t he condi-
t ions of subset s of branches t hat will not be capt ured in t he LBR. The layout of
MSR_LBR_SELECT is shown in Table 16- 9.
Table 16-6. IA32_LASTBRACH_x_FROM_IP
Bit Field Bit Offset Access Description
Data 47:0 R/O The linear address of the branch instruction itself,
This is the branch from address
SIGN_EXt 62:48 R/0 Signed extension of bit 47 of this register
MISPRED 63 R/O When set, indicates the branch was predicted;
otherwise, the branch was mispredicted.
Table 16-7. IA32_LASTBRACH_x_TO_IP
Bit Field Bit Offset Access Description
Data 47:0 R/O The linear address of the target of the branch
instruction itself, This is the branch to address
SIGN_EXt 63:48 R/0 Signed extension of bit 47 of this register
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
06_1AH 16 0 to 15
Table 16-9. MSR_LBR_SELECT
Bit Field Bit Offset Access Description
CPL_EQ_0 0 R/W When set, do not capture branches occurring in ring 0
CPL_NEQ_0 1 R/W When set, do not capture branches occurring in ring
>0
16-36 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.7 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PROCESSORS BASED ON INTEL
NETBURST

MICROARCHITECTURE)
Pent ium 4 and I nt el Xeon processors based on I nt el Net Burst microarchit ect ure
provide t he following met hods for recording t aken branches, int errupt s and excep-
t ions:
St ore branch records in t he last branch record ( LBR) st ack MSRs for t he most
recent t aken branches, int errupt s, and/ or except ions in MSRs. A branch record
consist of a branch- from and a branch- t o inst ruct ion address.
Send t he branch records out on t he syst em bus as branch t race messages
( BTMs) .
Log BTMs in a memory- resident branch t race st ore ( BTS) buffer.
To support t hese funct ions, t he processor provides t he following MSRs and relat ed
facilit ies:
MSR_DEBUGCTLA MSR Enables last branch, int errupt , and except ion
recording; single- st epping on t aken branches; branch t race messages ( BTMs) ;
and branch t race st ore ( BTS) . This regist er is named DebugCt lMSR in t he P6
family processors.
Debug st or e ( DS) f eat ur e f l ag ( CPUI D. 1: EDX. DS[ bi t 21] ) I ndicat es t hat
t he processor provides t he debug st ore ( DS) mechanism, which allows BTMs t o
be st ored in a memory- resident BTS buffer.
CPL- qual i f i ed debug st or e ( DS) f eat ur e f l ag ( CPUI D.1: ECX.DS- CPL[ bi t
4] ) I ndicat es t hat t he processor provides a CPL- qualified debug st ore ( DS)
mechanism, which allows soft ware t o select ively skip sending and st oring BTMs,
according t o specified current privilege level set t ings, int o a memory- resident
BTS buffer.
JCC 2 R/W When set, do not capture conditional branches
NEAR_REL_CALL 3 R/W When set, do not capture near relative calls
NEAR_IND_CALL 4 R/W When set, do not capture near indirect calls
NEAR_RET 5 R/W When set, do not capture near returns
NEAR_IND_JMP 6 R/W When set, do not capture near indirect jumps
NEAR_REL_JMP 7 R/W When set, do not capture near relative jumps
FAR_BRANCH 8 R/W When set, do not capture far branches
Reserved 63:9 Must be zero
Table 16-9. MSR_LBR_SELECT (Contd.)
Bit Field Bit Offset Access Description
Vol. 3 16-37
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
I A32_MI SC_ENABLE MSR I ndicat es t hat t he processor provides t he BTS
facilit ies.
Last br anch r ecor d ( LBR) st ack The LBR st ack is a cir cular st ack t hat
consist s of four MSRs ( MSR_LASTBRANCH_0 t hr ough MSR_LASTBRANCH_3) for
t he Pent ium 4 and I nt el Xeon pr ocessor family [ CPUI D family 0FH, models 0H-
02H] . The LBR st ack consist s of 16 MSR pairs ( MSR_LASTBRANCH_0_FROM_LI P
t hr ough MSR_LASTBRANCH_15_FROM_LI P and MSR_LASTBRANCH_0_TO_LI P
t hr ough MSR_LASTBRANCH_15_TO_LI P) for t he Pent ium 4 and I nt el Xeon
pr ocessor family [ CPUI D family 0FH, model 03H] .
Last br anch r ecor d t op- of - st ack ( TOS) poi nt er The TOS Point er MSR
cont ains a 2- bit point er ( 0- 3) t o t he MSR in t he LBR st ack t hat cont ains t he most
recent branch, int errupt , or except ion recorded for t he Pent ium 4 and I nt el Xeon
processor family [ CPUI D family 0FH, models 0H- 02H] . This point er becomes a
4- bit point er ( 0- 15) for t he Pent ium 4 and I nt el Xeon processor family [ CPUI D
family 0FH, model 03H] . See also: Table 16- 10, Figure 16- 12, and Sect ion
16. 7. 2, LBR St ack for Processors Based on I nt el Net Burst

Microarchit ect ure.


Last ex cept i on r ecor d See Sect ion 16. 7. 3, Last Except ion Records.
16.7.1 MSR_DEBUGCTLA MSR
The MSR_DEBUGCTLA MSR enables and disables t he various last branch recording
mechanisms described in t he previous sect ion. This regist er can be writ t en t o using
t he WRMSR inst ruct ion, when operat ing at privilege level 0 or when in real- address
mode. A prot ect ed- mode operat ing syst em procedure is required t o provide user
access t o t his regist er. Figure 16- 12 shows t he flags in t he MSR_DEBUGCTLA MSR.
The funct ions of t hese flags are as follows:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s, and/ or
except ions t aken by t he processor ( prior t o a debug except ion being generat ed)
in t he last branch record ( LBR) st ack. Each branch, int errupt , or except ion is
recorded as a 64- bit branch record. The processor clears t his flag whenever a
debug except ion is generat ed ( for example, when an inst ruct ion or dat a
breakpoint or a single- st ep t rap occurs) . See Sect ion 16. 7. 2, LBR St ack for
Processors Based on I nt el Net Burst

Microarchit ect ure.


BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor t reat s
t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag rat her t han
a single- st ep on inst ruct ions flag. This mechanism allows single- st epping t he
processor on t aken branches, int errupt s, and except ions. See Sect ion 16. 4. 3,
Single- St epping on Branches, Except ions, and I nt errupt s.
TR ( t r ace message enabl e) f l ag ( bi t 2) When set , branch t race messages
are enabled. Thereaft er, when t he processor det ect s a t aken branch, int errupt , or
except ion, it sends t he branch record out on t he syst em bus as a branch t race
message ( BTM) . See Sect ion 16. 4. 4, Branch Trace Messages.
16-38 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
BTS ( br anch t r ace st or e) f l ag ( bi t 3) When set , enables t he BTS facilit ies t o
log BTMs t o a memory- resident BTS buffer t hat is part of t he DS save area. See
Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t s 4) When set , t he BTS facilit ies
generat e an int errupt when t he BTS buffer is full. When clear, BTMs are logged t o
t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace St ore ( BTS) .
BTS_OFF_OS ( di sabl e r i ng 0 br anch t r ace st or e) f l ag ( bi t 5) When set ,
enables t he BTS facilit ies t o skip sending/ logging CPL_0 BTMs t o t he memory-
resident BTS buffer. See Sect ion 16. 7. 2, LBR St ack for Processors Based on I nt el
Net Burst

Microarchit ect ure.


BTS_OFF_USR ( di sabl e r i ng 0 br anch t r ace st or e) f l ag ( bi t 6) When set ,
enables t he BTS facilit ies t o skip sending/ logging non- CPL_0 BTMs t o t he
memory- resident BTS buffer. See Sect ion 16. 7. 2, LBR St ack for Processors
Based on I nt el Net Burst

Microarchit ect ure.


The init ial implement at ion of BTS_OFF_USR and BTS_OFF_OS in
MSR_DEBUGCTLA is shown in Figure 16- 12. The BTS_OFF_USR and
BTS_OFF_OS fields may be implement ed on ot her model- specific
debug cont rol regist er at different locat ions.
See Appendix B, Model- Specific Regist ers ( MSRs) , for a det ailed descript ion of each
of t he last branch recording MSRs.
16.7.2 LBR Stack for Processors Based on Intel NetBurst


Microarchitecture
The LBR st ack is made up of LBR MSRs t hat are t reat ed by t he processor as a circular
st ack. The TOS point er ( MSR_LASTBRANCH_TOS MSR) point s t o t he LBR MSR ( or
Figure 16-12. MSR_DEBUGCTLA MSR for Pentium 4 and Intel Xeon Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
5 4 3 2 1 0
BTS Branch trace store
Reserved
6 7
BTS_OFF_OS Disable storing CPL_0 BTS
BTS_OFF_USR Disable storing non-CPL_0 BTS
Vol. 3 16-39
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
LBR MSR pair) t hat cont ains t he most recent ( last ) branch record placed on t he st ack.
Prior t o placing a new branch record on t he st ack, t he TOS is increment ed by 1. When
t he TOS point er reaches it maximum value, it wraps around t o 0. See Table 16- 10
and Figure 16- 12.
Table 16-10. LBR MSR Stack Size and TOS Pointer Range for the Pentium

4 and the
Intel

Xeon

Processor Family
The regist ers in t he LBR MSR st ack and t he MSR_LASTBRANCH_TOS MSR are read-
only and can be read using t he RDMSR inst ruct ion.
Figure 16- 13 shows t he layout of a branch record in an LBR MSR ( or MSR pair) . Each
branch record consist s of t wo linear addresses, which represent t he from and t o
inst ruct ion point ers for a branch, int errupt , or except ion. The cont ent s of t he from
and t o addresses differ, depending on t he source of t he branch:
Tak en br anch I f t he record is for a t aken branch, t he from address is t he
address of t he branch inst ruct ion and t he t o address is t he t arget inst ruct ion of
t he branch.
I nt er r upt I f t he record is for an int errupt , t he from address t he ret urn
inst ruct ion point er ( RI P) saved for t he int errupt and t he t o address is t he
address of t he first inst ruct ion in t he int errupt handler rout ine. The RI P is t he
linear address of t he next inst ruct ion t o be execut ed upon ret urning from t he
int errupt handler.
Ex cept i on I f t he record is for an except ion, t he from address is t he linear
address of t he inst ruct ion t hat caused t he except ion t o be generat ed and t he t o
address is t he address of t he first inst ruct ion in t he except ion handler rout ine.
DisplayFamily_DisplayModel Size of LBR Stack Range of TOS Pointer
Family 0FH, Models 0H-02H;
MSRs at locations 1DBH-
1DEH.
4 0 to 3
Family 0FH, Models; MSRs at
locations 680H-68FH.
16 0 to 15
Family 0FH, Model 03H; MSRs
at locations 6C0H-6CFH.
16 0 to 15
16-40 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Addit ional informat ion is saved if an except ion or int errupt occurs in conj unct ion wit h
a branch inst ruct ion. I f a branch inst ruct ion generat es a t rap t ype except ion, t wo
branch records are st ored in t he LBR st ack: a branch record for t he branch inst ruct ion
followed by a branch record for t he except ion.
I f a branch inst ruct ion is immediat ely followed by an int errupt , a branch record is
st ored in t he LBR st ack for t he branch inst ruct ion followed by a record for t he
int errupt .
16.7.3 Last Exception Records
The Pent ium 4, I nt el Xeon, Pent ium M, I nt el

Core Solo, I nt el

Core Duo, I nt el


Core2 Duo, I nt el

Core i7 and I nt el

At om processors provide t wo MSRs ( t he


MSR_LER_TO_LI P and t he MSR_LER_FROM_LI P MSRs) t hat duplicat e t he funct ions
of t he Last Except ionToI P and Last Except ionFromI P MSRs found in t he P6 family
processors. The MSR_LER_TO_LI P and MSR_LER_FROM_LI P MSRs cont ain a branch
record for t he last branch t hat t he processor t ook prior t o an except ion or int errupt
being generat ed.
Figure 16-13. LBR MSR Branch Record Layout for the Pentium 4
and Intel Xeon Processor Family
63
From Linear Address
0
To Linear Address
63
From Linear Address
0
0 63
To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3
CPUID Family 0FH, Models 0H-02H
Reserved
CPUID Family 0FH, Model 03H-04H
Reserved
MSR_LASTBRANCH_0_FROM_LIP through MSR_LASTBRANCH_15_FROM_LIP
32 - 31
32 - 31
MSR_LASTBRANCH_0_TO_LIP through MSR_LASTBRANCH_15_TO_LIP
Vol. 3 16-41
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.8 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (INTEL

CORE

SOLO AND INTEL


CORE

DUO PROCESSORS)
I nt el Core Solo and I nt el Core Duo processors provide last branch int errupt and
except ion recording. This capabilit y is almost ident ical t o t hat found in Pent ium 4 and
I nt el Xeon processors. There are differences in t he st ack and in some MSR names
and locat ions.
Not e t he following:
I A32_DEBUGCTL MSR Enables debug t race int errupt , debug t race st ore,
t race messages enable, performance monit oring breakpoint flags, single
st epping on branches, and last branch. I A32_DEBUGCTL MSR is locat ed at
regist er address 01D9H.
See Figure 16- 14 for t he layout and t he ent ries below for a descript ion of t he
flags:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s,
and/ or except ions t aken by t he processor ( prior t o a debug except ion being
generat ed) in t he last branch record ( LBR) st ack. For more informat ion, see
t he Last Branch Record ( LBR) St ack below.
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor
t reat s t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag
rat her t han a single- st ep on inst ruct ions flag. This mechanism allows
single- st epping t he processor on t aken branches, int errupt s, and except ions.
See Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt er-
rupt s, for more informat ion about t he BTF flag.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race
messages are enabled. When t he processor det ect s a t aken branch,
int errupt , or except ion; it sends t he branch record out on t he syst em bus as
a branch t race message ( BTM) . See Sect ion 16. 4. 4, Branch Trace Messages,
for more informat ion about t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , t he flag enables BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS
save area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t s 8) When set , t he BTS
facilit ies generat e an int errupt when t he BTS buffer is full. When clear, BTMs are
logged t o t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace
St ore ( BTS) , for a descript ion of t his mechanism.
16-42 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
Debug st or e ( DS) f eat ur e f l ag ( bi t 21) , r et ur ned by t he CPUI D
i nst r uct i on I ndicat es t hat t he processor provides t he debug st ore ( DS)
mechanism, which allows BTMs t o be st ored in a memory- resident BTS buffer.
See Sect ion 16. 4. 5, Branch Trace St ore ( BTS) .
Last Br anch Recor d ( LBR) St ack The LBR st ack consist s of 8 MSRs
( MSR_LASTBRANCH_0 t hrough MSR_LASTBRANCH_7) ; bit s 31- 0 hold t he from
address, bit s 63- 32 hold t he t o address ( MSR addresses st art at 40H) . See
Figure 16- 15.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The TOS Point er MSR
cont ains a 3- bit point er ( bit s 2- 0) t o t he MSR in t he LBR st ack t hat cont ains t he
most recent branch, int errupt , or except ion recorded. For I nt el Core Solo and
I nt el Core Duo processors, t his MSR is locat ed at regist er address 01C9H.
For compat ibilit y, t he I nt el Core Solo and I nt el Core Duo processors provide t wo 32-
bit MSRs ( t he MSR_LER_TO_LI P and t he MSR_LER_FROM_LI P MSRs) t hat duplicat e
funct ions of t he Last Except ionToI P and Last Except ionFromI P MSRs found in P6 family
processors.
For det ails, see Sect ion 16. 7, Last Branch, I nt errupt , and Except ion Recording
( Processors based on I nt el Net Burst

Microarchit ect ure) , and Appendix B. 7, MSRs


I n I nt el

Core

Solo and I nt el

Core

Duo Processors.
Figure 16-14. IA32_DEBUGCTL MSR for Intel Core Solo
and Intel Core

Duo Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
Reserved
Vol. 3 16-43
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PENTIUM M PROCESSORS)
Like t he Pent ium 4 and I nt el Xeon processor family, Pent ium M processors provide
last branch int errupt and except ion recording. The capabilit y operat es almost ident i-
cally t o t hat found in Pent ium 4 and I nt el Xeon processors. There are differences in
t he shape of t he st ack and in some MSR names and locat ions. Not e t he following:
MSR_DEBUGCTLB MSR Enables debug t race int errupt , debug t race st ore,
t race messages enable, performance monit oring breakpoint flags, single
st epping on branches, and last branch. For Pent ium M processors, t his MSR is
locat ed at regist er address 01D9H. See Figure 16- 16 and t he ent ries below for a
descript ion of t he flags.
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records a running t race of t he most recent branches, int errupt s,
and/ or except ions t aken by t he processor ( prior t o a debug except ion being
generat ed) in t he last branch record ( LBR) st ack. For more informat ion, see
t he Last Branch Record ( LBR) St ack bullet below.
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor
t reat s t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag
rat her t han a single- st ep on inst ruct ions flag. This mechanism allows
single- st epping t he processor on t aken branches, int errupt s, and except ions.
See Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt er-
rupt s, for more informat ion about t he BTF flag.
PBi ( per f or mance moni t or i ng/ br eak poi nt pi ns) f l ags ( bi t s 5- 2)
When t hese flags are set , t he performance monit oring/ breakpoint pins on t he
processor ( BP0# , BP1# , BP2# , and BP3# ) report breakpoint mat ches in t he
corresponding breakpoint - address regist ers ( DR0 t hrough DR3) . The
processor assert s t hen deassert s t he corresponding BPi# pin when a
breakpoint mat ch occurs. When a PBi flag is clear, t he performance
monit oring/ breakpoint pins report performance event s. Processor execut ion
is not affect ed by report ing performance event s.
Figure 16-15. LBR Branch Record Layout for the Intel Core Solo
and Intel

Core Duo Processor
0 63
From Linear Address To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7
16-44 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , branch t race
messages are enabled. When t he processor det ect s a t aken branch,
int errupt , or except ion, it sends t he branch record out on t he syst em bus as a
branch t race message ( BTM) . See Sect ion 16. 4. 4, Branch Trace Messages,
for more informat ion about t he TR flag.
BTS ( br anch t r ace st or e) f l ag ( bi t 7) When set , enables t he BTS
facilit ies t o log BTMs t o a memory- resident BTS buffer t hat is part of t he DS
save area. See Sect ion 16. 4. 9, BTS and DS Save Area.
BTI NT ( br anch t r ace i nt er r upt ) f l ag ( bi t s 8) When set , t he BTS
facilit ies generat e an int errupt when t he BTS buffer is full. When clear, BTMs are
logged t o t he BTS buffer in a circular fashion. See Sect ion 16.4.5, Branch Trace
St ore ( BTS) , for a descript ion of t his mechanism.
Debug st or e ( DS) f eat ur e f l ag ( bi t 21) , r et ur ned by t he CPUI D
i nst r uct i on I ndicat es t hat t he processor provides t he debug st ore ( DS)
mechanism, which allows BTMs t o be st ored in a memory- resident BTS buffer.
See Sect ion 16. 4. 5, Branch Trace St ore ( BTS) .
Last Br anch Recor d ( LBR) St ack The LBR st ack consist s of 8 MSRs
( MSR_LASTBRANCH_0 t hrough MSR_LASTBRANCH_7) ; bit s 31- 0 hold t he from
address, bit s 63- 32 hold t he t o address. For Pent ium M Processors, t hese pairs
are locat ed at regist er addresses 040H- 047H. See Figure 16- 17.
Last Br anch Recor d Top- of - St ack ( TOS) Poi nt er The TOS Point er MSR
cont ains a 3- bit point er ( bit s 2- 0) t o t he MSR in t he LBR st ack t hat cont ains t he
most recent branch, int errupt , or except ion recorded. For Pent ium M Processors,
t his MSR is locat ed at regist er address 01C9H.
Figure 16-16. MSR_DEBUGCTLB MSR for Pentium M Processors
31
TR Trace messages enable
BTINT Branch trace interrupt
BTF Single-step on branches
LBR Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS Branch trace store
PB3/2/1/0 Performance monitoring breakpoint flags
Vol. 3 16-45
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
For more det ail on t hese capabilit ies, see Sect ion 16. 7. 3, Last Except ion Records,
and Appendix B. 8, MSRs I n t he Pent ium M Processor.
16.10 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (P6 FAMILY PROCESSORS)
The P6 family processors provide five MSRs for recording t he last branch, int errupt ,
or except ion t aken by t he processor: DEBUGCTLMSR, Last BranchToI P, Last Branch-
FromI P, Last Except ionToI P, and Last Except ionFromI P. These regist ers can be used t o
collect last branch records, t o set breakpoint s on branches, int errupt s, and excep-
t ions, and t o single- st ep from one branch t o t he next .
See Appendix B, Model- Specific Regist ers ( MSRs) , for a det ailed descript ion of each
of t he last branch recording MSRs.
16.10.1 DEBUGCTLMSR Register
The version of t he DEBUGCTLMSR regist er found in t he P6 family processors enables
last branch, int errupt , and except ion recording; t aken branch breakpoint s; t he
breakpoint report ing pins; and t race messages. This regist er can be writ t en t o using
t he WRMSR inst ruct ion, when operat ing at privilege level 0 or when in real- address
mode. A prot ect ed- mode operat ing syst em procedure is required t o provide user
access t o t his regist er. Figure 16- 18 shows t he flags in t he DEBUGCTLMSR regist er
for t he P6 family processors. The funct ions of t hese flags are as follows:
LBR ( l ast br anch/ i nt er r upt / ex cept i on) f l ag ( bi t 0) When set , t he
processor records t he source and t arget addresses ( in t he Last BranchToI P,
Last BranchFromI P, Last Except ionToI P, and Last Except ionFromI P MSRs) for t he
last branch and t he last except ion or int errupt t aken by t he processor prior t o a
debug except ion being generat ed. The processor clears t his flag whenever a
debug except ion, such as an inst ruct ion or dat a breakpoint or single- st ep t rap
occurs.
Figure 16-17. LBR Branch Record Layout for the Pentium M Processor
0 63
From Linear Address To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7
16-46 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
BTF ( si ngl e- st ep on br anches) f l ag ( bi t 1) When set , t he processor t reat s
t he TF flag in t he EFLAGS regist er as a single- st ep on branches flag. See
Sect ion 16. 4. 3, Single- St epping on Branches, Except ions, and I nt errupt s.
PBi ( per f or mance moni t or i ng/ br eak poi nt pi ns) f l ags ( bi t s 2 t hr ough 5)
When t hese flags are set , t he performance monit oring/ breakpoint pins on t he
processor ( BP0# , BP1# , BP2# , and BP3# ) report breakpoint mat ches in t he
corresponding breakpoint - address regist ers ( DR0 t hrough DR3) . The processor
assert s t hen deassert s t he corresponding BPi# pin when a breakpoint mat ch
occurs. When a PBi flag is clear, t he performance monit oring/ breakpoint pins
report performance event s. Processor execut ion is not affect ed by report ing
performance event s.
TR ( t r ace message enabl e) f l ag ( bi t 6) When set , t race messages are
enabled as described in Sect ion 16. 4. 4, Branch Trace Messages. Set t ing t his
flag great ly reduces t he performance of t he processor. When t race messages are
enabled, t he values st ored in t he Last BranchToI P, Last BranchFromI P, Last Excep-
t ionToI P, and Last Except ionFromI P MSRs are undefined.
16.10.2 Last Branch and Last Exception MSRs
The Last BranchToI P and Last BranchFromI P MSRs are 32- bit regist ers for recording
t he inst ruct ion point ers for t he last branch, int errupt , or except ion t hat t he processor
t ook prior t o a debug except ion being generat ed. When a branch occurs, t he
processor loads t he address of t he branch inst ruct ion int o t he Last BranchFromI P MSR
and loads t he t arget address for t he branch int o t he Last BranchToI P MSR.
When an int errupt or except ion occurs ( ot her t han a debug except ion) , t he address
of t he inst ruct ion t hat was int errupt ed by t he except ion or int errupt is loaded int o t he
Last BranchFromI P MSR and t he address of t he except ion or int errupt handler t hat is
called is loaded int o t he Last BranchToI P MSR.
The Last Except ionToI P and Last Except ionFromI P MSRs ( also 32- bit regist ers) record
t he inst ruct ion point ers for t he last branch t hat t he processor t ook prior t o an excep-
Figure 16-18. DEBUGCTLMSR Register (P6 Family Processors)
31
TR Trace messages enable
PBi Performance monitoring/breakpoint pins
BTF Single-step on branches
LBR Last branch/interrupt/exception
7 6 5 4 3 2 1 0
P
B
2
P
B
1
P
B
0
B
T
F
T
R
L
B
R
P
B
3
Reserved
Vol. 3 16-47
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
t ion or int errupt being generat ed. When an except ion or int errupt occurs, t he
cont ent s of t he Last BranchToI P and Last BranchFromI P MSRs are copied int o t hese
regist ers before t he t o and from addresses of t he except ion or int errupt are recorded
in t he Last BranchToI P and Last BranchFromI P MSRs.
These regist ers can be read using t he RDMSR inst ruct ion.
Not e t hat t he values st ored in t he Last BranchToI P, Last BranchFromI P, Last Except ion-
ToI P, and Last Except ionFromI P MSRs are offset s int o t he current code segment , as
opposed t o linear addresses, which are saved in last branch records for t he Pent ium
4 and I nt el Xeon processors.
16.10.3 Monitoring Branches, Exceptions, and Interrupts
When t he LBR flag in t he DEBUGCTLMSR regist er is set , t he processor aut omat ically
begins recording branches t hat it t akes, except ions t hat are generat ed ( except for
debug except ions) , and int errupt s t hat are serviced. Each t ime a branch, except ion,
or int errupt occurs, t he processor records t he t o and from inst ruct ion point ers in t he
Last BranchToI P and Last BranchFromI P MSRs. I n addit ion, for int errupt s and excep-
t ions, t he processor copies t he cont ent s of t he Last BranchToI P and Last Branch-
FromI P MSRs int o t he Last Except ionToI P and Last Except ionFromI P MSRs prior t o
recording t he t o and from addresses of t he int errupt or except ion.
When t he processor generat es a debug except ion ( # DB) , it aut omat ically clears t he
LBR flag before execut ing t he except ion handler, but does not t ouch t he last branch
and last except ion MSRs. The addresses for t he last branch, int errupt , or except ion
t aken are t hus ret ained in t he Last BranchToI P and Last BranchFromI P MSRs and t he
addresses of t he last branch prior t o an int errupt or except ion are ret ained in t he
Last Except ionToI P, and Last Except ionFromI P MSRs.
The debugger can use t he last branch, int errupt , and/ or except ion addresses in
combinat ion wit h code- segment select ors ret rieved from t he st ack t o reset break-
point s in t he breakpoint - address regist ers ( DR0 t hrough DR3) , allowing a backward
t race from t he manifest at ion of a part icular bug t oward it s source. Because t he
inst ruct ion point ers recorded in t he Last BranchToI P, Last BranchFromI P, Last Except i-
onToI P, and Last Except ionFromI P MSRs are offset s int o a code segment , soft ware
must det ermine t he segment base address of t he code segment associat ed wit h t he
cont rol t ransfer t o calculat e t he linear address t o be placed in t he breakpoint - address
regist ers. The segment base address can be det ermined by reading t he segment
select or for t he code segment from t he st ack and using it t o locat e t he segment
descript or for t he segment in t he GDT or LDT. The segment base address can t hen be
read from t he segment descript or.
Before resuming program execut ion from a debug- except ion handler, t he handler
must set t he LBR flag again t o re- enable last branch and last except ion/ int errupt
recording.
16-48 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.11 TIME-STAMP COUNTER
The I nt el 64 and I A- 32 archit ect ures ( beginning wit h t he Pent ium processor) define a
t ime- st amp count er mechanism t hat can be used t o monit or and ident ify t he relat ive
t ime occurrence of processor event s. The count er s archit ect ure includes t he
following component s:
TSC f l ag A feat ure bit t hat indicat es t he availabilit y of t he t ime- st amp count er.
The count er is available in an if t he funct ion CPUI D.1: EDX. TSC[ bit 4] = 1.
I A32_TI ME_STAMP_COUNTER MSR ( called TSC MSR in P6 family and
Pent ium processors) The MSR used as t he count er.
RDTSC i nst r uct i on An inst ruct ion used t o read t he t ime- st amp count er.
TSD f l ag A cont rol regist er flag is used t o enable or disable t he t ime- st amp
count er ( enabled if CR4.TSD[ bit 2] = 1) .
The t ime- st amp count er ( as implement ed in t he P6 family, Pent ium, Pent ium M,
Pent ium 4, I nt el Xeon, I nt el Core Solo and I nt el Core Duo processors and lat er
processors) is a 64- bit count er t hat is set t o 0 following a RESET of t he processor.
Following a RESET, t he count er increment s even when t he processor is halt ed by t he
HLT inst ruct ion or t he ext ernal STPCLK# pin. Not e t hat t he assert ion of t he ext ernal
DPSLP# pin may cause t he t ime- st amp count er t o st op.
Processor families increment t he t ime- st amp count er different ly:
For Pent ium M processors ( family [ 06H] , models [ 09H, 0DH] ) ; for Pent ium 4
processors, I nt el Xeon processors ( family [ 0FH] , models [ 00H, 01H, or 02H] ) ;
and for P6 family processors: t he t ime- st amp count er increment s wit h every
int ernal processor clock cycle.
The int ernal processor clock cycle is det ermined by t he current core- clock t o bus-
clock rat io. I nt el SpeedSt ep t echnology t ransit ions may also impact t he
processor clock.
For Pent ium 4 processors, I nt el Xeon processors ( family [ 0FH] , models [ 03H and
higher] ) ; for I nt el Core Solo and I nt el Core Duo processors ( family [ 06H] , model
[ 0EH] ) ; for t he I nt el Xeon processor 5100 series and I nt el Core 2 Duo processors
( family [ 06H] , model [ 0FH] ) ; for I nt el Core 2 and I nt el Xeon processors ( family
[ 06H] , display_model [ 17H] ) ; for I nt el At om processors ( family [ 06H] ,
display_model [ 1CH] ) : t he t ime- st amp count er increment s at a const ant rat e.
That rat e may be set by t he maximum core- clock t o bus- clock rat io of t he
processor or may be set by t he maximum resolved frequency at which t he
processor is boot ed. The maximum resolved frequency may differ from t he
maximum qualified frequency of t he processor, see Sect ion 30. 10. 5 for more
det ail.
The specific processor configurat ion det ermines t he behavior. Const ant TSC
behavior ensures t hat t he durat ion of each clock t ick is uniform and support s t he
use of t he TSC as a wall clock t imer even if t he processor core changes frequency.
This is t he archit ect ural behavior moving forward.
Vol. 3 16-49
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
NOTE
To det ermine average processor clock frequency, I nt el recommends
t he use of EMON logic t o count processor core clocks over t he period
of t ime for which t he average is required. See Sect ion 30. 10,
Count ing Clocks, and Appendix A, Performance-
Monit oring Event s, for more informat ion.
The RDTSC inst ruct ion reads t he t ime- st amp count er and is guarant eed t o ret urn a
monot onically increasing unique value whenever execut ed, except for a 64- bit
count er wraparound. I nt el guarant ees t hat t he t ime- st amp count er will not wrap-
around wit hin 10 years aft er being reset . The period for count er wrap is longer for
Pent ium 4, I nt el Xeon, P6 family, and Pent ium processors.
Normally, t he RDTSC inst ruct ion can be execut ed by programs and procedures
running at any privilege level and in virt ual- 8086 mode. The TSD flag allows use of
t his inst ruct ion t o be rest rict ed t o programs and procedures running at privilege level
0. A secure operat ing syst em would set t he TSD flag during syst em init ializat ion t o
disable user access t o t he t ime- st amp count er. An operat ing syst em t hat disables
user access t o t he t ime- st amp count er should emulat e t he inst ruct ion t hrough a
user- accessible programming int erface.
The RDTSC inst ruct ion is not serializing or ordered wit h ot her inst ruct ions. I t does not
necessarily wait unt il all previous inst ruct ions have been execut ed before reading t he
count er. Similarly, subsequent inst ruct ions may begin execut ion before t he RDTSC
inst ruct ion operat ion is performed.
The RDMSR and WRMSR inst ruct ions read and writ e t he t ime- st amp count er, t reat ing
t he t ime- st amp count er as an ordinary MSR ( address 10H) . I n t he Pent ium 4, I nt el
Xeon, and P6 family processors, all 64- bit s of t he t ime- st amp count er are read using
RDMSR ( j ust as wit h RDTSC) . When WRMSR is used t o writ e t he t ime- st amp count er
on processors before family [ 0FH] , models [ 03H, 04H] : only t he low- order 32- bit s of
t he t ime- st amp count er can be writ t en ( t he high- order 32 bit s are cleared t o 0) . For
family [ 0FH] , models [ 03H, 04H, 06H] ; for family [ 06H] ] , model [ 0EH, 0FH] ; for
family [ 06H] ] , display_model [ 17H, 1AH, 1CH, 1DH] : all 64 bit s are writ able.
16.11.1 Invariant TSC
The t ime st amp count er in newer processors may support an enhancement , referred
t o as invariant TSC. Processor s support for invariant TSC is indicat ed by
CPUI D. 80000007H: EDX[ 8] .
The invariant TSC will run at a const ant rat e in all ACPI P- , C- . and T- st at es. This is
t he archit ect ural behavior moving forward. On processors wit h invariant TSC
support , t he OS may use t he TSC for wall clock t imer services ( inst ead of ACPI or
HPET t imers) . TSC reads are much more efficient and do not incur t he overhead
associat ed wit h a ring t ransit ion or access t o a plat form resource.
16-50 Vol. 3
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER
16.11.2 IA32_TSC_AUX Register and RDTSCP Support
Processors based on I nt el microarchit ect ure codename Nehalem provide an auxiliary
TSC regist er, I A32_TSC_AUX t hat is designed t o be used in conj unct ion wit h
I A32_TSC. I A32_TSC_AUX provides a 32- bit field t hat is init ialized by privileged soft -
ware wit h a signat ure value ( for example, a logical processor I D) .
The primary usage of I A32_TSC_AUX in conj unct ion wit h I A32_TSC is t o allow soft -
ware t o read t he 64- bit t ime st amp in I A32_TSC and signat ure value in
I A32_TSC_AUX wit h t he inst ruct ion RDTSCP in an at omic operat ion. RDTSCP ret urns
t he 64- bit t ime st amp in EDX: EAX and t he 32- bit TSC_AUX signat ure value in ECX.
The at omicit y of RDTSCP ensures t hat no cont ext swit ch can occur bet ween t he reads
of t he TSC and TSC_AUX values.
Support for RDTSCP is indicat ed by CPUI D. 80000001H: EDX[ 27] . As wit h RDTSC
inst ruct ion, non- ring 0 access is cont rolled by CR4. TSD ( Time St amp Disable flag) .
User mode soft ware can use RDTSCP t o det ect if CPU migrat ion has occurred
bet ween successive reads of t he TSC. I t can also be used t o adj ust for per- CPU differ-
ences in TSC values in a NUMA syst em.
Vol. 3 17-1
CHAPTER 17
8086 EMULATION
I A- 32 processors ( beginning wit h t he I nt el386 processor) provide t wo ways t o
execut e new or legacy programs t hat are assembled and/ or compiled t o run on an
I nt el 8086 processor:
Real- address mode.
Virt ual- 8086 mode.
Figure 2- 3 shows t he relat ionship of t hese operat ing modes t o prot ect ed mode and
syst em management mode ( SMM) .
When t he processor is powered up or reset , it is placed in t he real- address mode.
This operat ing mode almost exact ly duplicat es t he execut ion environment of t he
I nt el 8086 processor, wit h some ext ensions. Virt ually any program assembled and/ or
compiled t o run on an I nt el 8086 processor will run on an I A- 32 processor in t his
mode.
When running in prot ect ed mode, t he processor can be swit ched t o virt ual- 8086
mode t o run 8086 programs. This mode also duplicat es t he execut ion environment of
t he I nt el 8086 processor, wit h ext ensions. I n virt ual- 8086 mode, an 8086 program
runs as a separat e prot ect ed- mode t ask. Legacy 8086 programs are t hus able t o run
under an operat ing syst em ( such as Microsoft Windows* ) t hat t akes advant age of
prot ect ed mode and t o use prot ect ed- mode facilit ies, such as t he prot ect ed- mode
int errupt - and except ion- handling facilit ies. Prot ect ed- mode mult it asking permit s
mult iple virt ual- 8086 mode t asks ( wit h each t ask running a separat e 8086 program)
t o be run on t he processor along wit h ot her non- virt ual- 8086 mode t asks.
This sect ion describes bot h t he basic real- address mode execut ion environment and
t he virt ual- 8086- mode execut ion environment , available on t he I A- 32 processors
beginning wit h t he I nt el386 processor.
17.1 REAL-ADDRESS MODE
The I A- 32 archit ect ures real- address mode runs programs writ t en for t he I nt el 8086,
I nt el 8088, I nt el 80186, and I nt el 80188 processors, or for t he real- address mode of
t he I nt el 286, I nt el386, I nt el486, Pent ium, P6 family, Pent ium 4, and I nt el Xeon
processors.
The execut ion environment of t he processor in real- address mode is designed t o
duplicat e t he execut ion environment of t he I nt el 8086 processor. To an 8086
program, a processor operat ing in real- address mode behaves like a high- speed
8086 processor. The principal feat ures of t his archit ect ure are defined in Chapt er 3,
Basic Execut ion Environment , of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1.
17-2 Vol. 3
8086 EMULATION
The following is a summary of t he core feat ures of t he real- address mode execut ion
environment as would be seen by a program writ t en for t he 8086:
The processor support s a nominal 1- MByt e physical address space ( see Sect ion
17. 1. 1, Address Translat ion in Real- Address Mode , for specific det ails) . This
address space is divided int o segment s, each of which can be up t o 64 KByt es in
lengt h. The base of a segment is specified wit h a 16- bit segment select or, which
is zero ext ended t o form a 20- bit offset from address 0 in t he address space. An
operand wit hin a segment is addressed wit h a 16- bit offset from t he base of t he
segment . A physical address is t hus formed by adding t he offset t o t he 20- bit
segment base ( see Sect ion 17. 1. 1, Address Translat ion in Real- Address Mode ) .
All operands in nat ive 8086 code are 8- bit or 16- bit values. ( Operand size
override prefixes can be used t o access 32- bit operands. )
Eight 16- bit general- purpose regist ers are provided: AX, BX, CX, DX, SP, BP, SI ,
and DI . The ext ended 32 bit regist ers ( EAX, EBX, ECX, EDX, ESP, EBP, ESI , and
EDI ) are accessible t o programs t hat explicit ly perform a size override operat ion.
Four segment regist ers are provided: CS, DS, SS, and ES. ( The FS and GS
regist ers are accessible t o programs t hat explicit ly access t hem. ) The CS regist er
cont ains t he segment select or for t he code segment ; t he DS and ES regist ers
cont ain segment select ors for dat a segment s; and t he SS regist er cont ains t he
segment select or for t he st ack segment .
The 8086 16- bit inst ruct ion point er ( I P) is mapped t o t he lower 16- bit s of t he EI P
regist er. Not e t his regist er is a 32- bit regist er and unint ent ional address wrapping
may occur.
The 16- bit FLAGS regist er cont ains st at us and cont rol flags. ( This regist er is
mapped t o t he 16 least significant bit s of t he 32- bit EFLAGS regist er. )
All of t he I nt el 8086 inst ruct ions are support ed ( see Sect ion 17. 1. 3, I nst ruct ions
Support ed in Real-Address Mode ) .
A single, 16- bit - wide st ack is provided for handling procedure calls and
invocat ions of int errupt and except ion handlers. This st ack is cont ained in t he
st ack segment ident ified wit h t he SS regist er. The SP ( st ack point er) regist er
cont ains an offset int o t he st ack segment . The st ack grows down ( t oward lower
segment offset s) from t he st ack point er. The BP ( base point er) regist er also
cont ains an offset int o t he st ack segment t hat can be used as a point er t o a
paramet er list . When a CALL inst ruct ion is execut ed, t he processor pushes t he
current inst ruct ion point er ( t he 16 least - significant bit s of t he EI P regist er and,
on far calls, t he current value of t he CS regist er) ont o t he st ack. On a ret urn,
init iat ed wit h a RET inst ruct ion, t he processor pops t he saved inst ruct ion point er
from t he st ack int o t he EI P regist er ( and CS regist er on far ret urns) . When an
implicit call t o an int errupt or except ion handler is execut ed, t he processor
pushes t he EI P, CS, and EFLAGS ( low- order 16- bit s only) regist ers ont o t he
st ack. On a ret urn from an int errupt or except ion handler, init iat ed wit h an I RET
inst ruct ion, t he processor pops t he saved inst ruct ion point er and EFLAGS image
from t he st ack int o t he EI P, CS, and EFLAGS regist ers.
Vol. 3 17-3
8086 EMULATION
A single int errupt t able, called t he int errupt vect or t able or int errupt t able, is
provided for handling int errupt s and except ions ( see Figure 17- 2) . The int errupt
t able ( which has 4- byt e ent ries) t akes t he place of t he int errupt descript or t able
( I DT, wit h 8- byt e ent ries) used when handling prot ect ed- mode int errupt s and
except ions. I nt errupt and except ion vect or numbers provide an index t o ent ries
in t he int errupt t able. Each ent ry provides a point er ( called a vect or ) t o an
int errupt - or except ion- handling procedure. See Sect ion 17.1.4, I nt errupt and
Except ion Handling , for more det ails. I t is possible for soft ware t o relocat e t he
I DT by means of t he LI DT inst ruct ion on I A- 32 processors beginning wit h t he
I nt el386 processor.
The x87 FPU is act ive and available t o execut e x87 FPU inst ruct ions in real-
address mode. Programs writ t en t o run on t he I nt el 8087 and I nt el 287 mat h
coprocessors can be run in real- address mode wit hout modificat ion.
The following ext ensions t o t he I nt el 8086 execut ion environment are available in t he
I A- 32 archit ect ures real- address mode. I f backwards compat ibilit y t o I nt el 286 and
I nt el 8086 processors is required, t hese feat ures should not be used in new programs
writ t en t o run in real- address mode.
Two addit ional segment regist ers ( FS and GS) are available.
Many of t he int eger and syst em inst ruct ions t hat have been added t o lat er I A- 32
processors can be execut ed in real- address mode ( see Sect ion 17. 1. 3, I nst ruc-
t ions Support ed in Real-Address Mode ) .
The 32- bit operand prefix can be used in real- address mode programs t o execut e
t he 32- bit forms of inst ruct ions. This prefix also allows real- address mode
programs t o use t he processor s 32- bit general- purpose regist ers.
The 32- bit address prefix can be used in real- address mode programs, allowing
32- bit offset s.
The following sect ions describe address format ion, regist ers, available inst ruct ions,
and int errupt and except ion handling in real- address mode. For informat ion on I / O in
real- address mode, see Chapt er 13, I nput / Out put , of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1.
17.1.1 Address Translation in Real-Address Mode
I n real- address mode, t he processor does not int erpret segment select ors as indexes
int o a descript or t able; inst ead, it uses t hem direct ly t o form linear addresses as t he
8086 processor does. I t shift s t he segment select or left by 4 bit s t o form a 20- bit
base address ( see Figure 17- 1) . The offset int o a segment is added t o t he base
address t o creat e a linear address t hat maps direct ly t o t he physical address space.
When using 8086- st yle address t ranslat ion, it is possible t o specify addresses larger
t han 1 MByt e. For example, wit h a segment select or value of FFFFH and an offset of
FFFFH, t he linear ( and physical) address would be 10FFEFH ( 1 megabyt e plus 64
KByt es) . The 8086 processor, which can form addresses only up t o 20 bit s long, t run-
cat es t he high- order bit , t hereby wrapping t his address t o FFEFH. When operat ing
17-4 Vol. 3
8086 EMULATION
in real- address mode, however, t he processor does not t runcat e such an address and
uses it as a physical address. ( Not e, however, t hat for I A- 32 processors beginning
wit h t he I nt el486 processor, t he A20M# signal can be used in real- address mode t o
mask address line A20, t hereby mimicking t he 20- bit wrap- around behavior of t he
8086 processor. ) Care should be t ake t o ensure t hat A20M# based address wrapping
is handled correct ly in mult iprocessor based syst em.
The I A- 32 processors beginning wit h t he I nt el386 processor can generat e 32- bit
offset s using an address override prefix; however, in real- address mode, t he value of
a 32- bit offset may not exceed FFFFH wit hout causing an except ion.
For full compat ibilit y wit h I nt el 286 real- address mode, pseudo- prot ect ion fault s
( int errupt 12 or 13) occur if a 32- bit offset is generat ed out side t he range 0 t hrough
FFFFH.
17.1.2 Registers Supported in Real-Address Mode
The regist er set available in real- address mode includes all t he regist ers defined for
t he 8086 processor plus t he new regist ers int roduced in lat er I A- 32 processors, such
as t he FS and GS segment regist ers, t he debug regist ers, t he cont rol regist ers, and
t he float ing- point unit regist ers. The 32- bit operand prefix allows a real- address
mode program t o use t he 32- bit general- purpose regist ers ( EAX, EBX, ECX, EDX,
ESP, EBP, ESI , and EDI ) .
17.1.3 Instructions Supported in Real-Address Mode
The following inst ruct ions make up t he core inst ruct ion set for t he 8086 processor. I f
backwards compat ibilit y t o t he I nt el 286 and I nt el 8086 processors is required, only
t hese inst ruct ions should be used in a new program writ t en t o run in real- address
mode.
Figure 17-1. Real-Address Mode Address Translation
19 0
16-bit Segment Selector
3
0 0 0 0 Base
19 0
16-bit Effective Address
15
0 0 0 0 Offset
0
20-bit Linear Address
Linear
Address
+
=
4
16
19
Vol. 3 17-5
8086 EMULATION
Move ( MOV) inst ruct ions t hat move operands bet ween general- purpose
regist ers, segment regist ers, and bet ween memory and general- purpose
regist ers.
The exchange ( XCHG) inst ruct ion.
Load segment regist er inst ruct ions LDS and LES.
Arit hmet ic inst ruct ions ADD, ADC, SUB, SBB, MUL, I MUL, DI V, I DI V, I NC, DEC,
CMP, and NEG.
Logical inst ruct ions AND, OR, XOR, and NOT.
Decimal inst ruct ions DAA, DAS, AAA, AAS, AAM, and AAD.
St ack inst ruct ions PUSH and POP ( t o general- purpose regist ers and segment
regist ers) .
Type conversion inst ruct ions CWD, CDQ, CBW, and CWDE.
Shift and rot at e inst ruct ions SAL, SHL, SHR, SAR, ROL, ROR, RCL, and RCR.
TEST inst ruct ion.
Cont rol inst ruct ions JMP, Jcc, CALL, RET, LOOP, LOOPE, and LOOPNE.
I nt errupt inst ruct ions I NT n, I NTO, and I RET.
EFLAGS cont rol inst ruct ions STC, CLC, CMC, CLD, STD, LAHF, SAHF, PUSHF, and
POPF.
I / O inst ruct ions I N, I NS, OUT, and OUTS.
Load effect ive address ( LEA) inst ruct ion, and t ranslat e ( XLATB) inst ruct ion.
LOCK prefix.
Repeat prefixes REP, REPE, REPZ, REPNE, and REPNZ.
Processor halt ( HLT) inst ruct ion.
No operat ion ( NOP) inst ruct ion.
The following inst ruct ions, added t o lat er I A- 32 processors ( some in t he I nt el 286
processor and t he remainder in t he I nt el386 processor) , can be execut ed in real-
address mode, if backwards compat ibilit y t o t he I nt el 8086 processor is not required.
Move ( MOV) inst ruct ions t hat operat e on t he cont rol and debug regist ers.
Load segment regist er inst ruct ions LSS, LFS, and LGS.
Generalized mult iply inst ruct ions and mult iply immediat e dat a.
Shift and rot at e by immediat e count s.
St ack inst ruct ions PUSHA, PUSHAD, POPA and POPAD, and PUSH immediat e
dat a.
Move wit h sign ext ension inst ruct ions MOVSX and MOVZX.
Long- displacement Jcc inst ruct ions.
Exchange inst ruct ions CMPXCHG, CMPXCHG8B, and XADD.
St ring inst ruct ions MOVS, CMPS, SCAS, LODS, and STOS.
17-6 Vol. 3
8086 EMULATION
Bit t est and bit scan inst ruct ions BT, BTS, BTR, BTC, BSF, and BSR; t he byt e- set -
on condit ion inst ruct ion SETcc; and t he byt e swap ( BSWAP) inst ruct ion.
Double shift inst ruct ions SHLD and SHRD.
EFLAGS cont rol inst ruct ions PUSHF and POPF.
ENTER and LEAVE cont rol inst ruct ions.
BOUND inst ruct ion.
CPU ident ificat ion ( CPUI D) inst ruct ion.
Syst em inst ruct ions CLTS, I NVD, WI NVD, I NVLPG, LGDT, SGDT, LI DT, SI DT,
LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
Execut ion of any of t he ot her I A- 32 archit ect ure inst ruct ions ( not given in t he
previous t wo list s) in real- address mode result in an invalid- opcode except ion ( # UD)
being generat ed.
17.1.4 Interrupt and Exception Handling
When operat ing in real- address mode, soft ware must provide int errupt and excep-
t ion- handling facilit ies t hat are separat e from t hose provided in prot ect ed mode.
Even during t he early st ages of processor init ializat ion when t he processor is st ill in
real- address mode, element ary real- address mode int errupt and except ion- handling
facilit ies must be provided t o insure reliable operat ion of t he processor, or t he init ial-
izat ion code must insure t hat no int errupt s or except ions will occur.
The I A- 32 processors handle int errupt s and except ions in real- address mode similar
t o t he way t hey handle t hem in prot ect ed mode. When a processor receives an int er-
rupt or generat es an except ion, it uses t he vect or number of t he int errupt or excep-
t ion as an index int o t he int errupt t able. ( I n prot ect ed mode, t he int errupt t able is
called t he i nt er r upt descr i pt or t abl e ( I DT) , but in real- address mode, t he t able is
usually called t he i nt er r upt vect or t abl e, or simply t he i nt er r upt t abl e. ) The ent ry
in t he int errupt vect or t able provides a point er t o an int errupt - or except ion- handler
procedure. ( The point er consist s of a segment select or for a code segment and a 16-
bit offset int o t he segment . ) The processor performs t he following act ions t o make an
implicit call t o t he select ed handler:
1. Pushes t he current values of t he CS and EI P regist ers ont o t he st ack. ( Only t he 16
least - significant bit s of t he EI P regist er are pushed. )
2. Pushes t he low- order 16 bit s of t he EFLAGS regist er ont o t he st ack.
3. Clears t he I F flag in t he EFLAGS regist er t o disable int errupt s.
4. Clears t he TF, RC, and AC flags, in t he EFLAGS regist er.
5. Transfers program cont rol t o t he locat ion specified in t he int errupt vect or t able.
An I RET inst ruct ion at t he end of t he handler procedure reverses t hese st eps t o
ret urn program cont rol t o t he int errupt ed program. Except ions do not ret urn error
codes in real- address mode.
Vol. 3 17-7
8086 EMULATION
The int errupt vect or t able is an array of 4- byt e ent ries ( see Figure 17- 2) . Each ent ry
consist s of a far point er t o a handler procedure, made up of a segment select or and
an offset . The processor scales t he int errupt or except ion vect or by 4 t o obt ain an
offset int o t he int errupt t able. Following reset , t he base of t he int errupt vect or t able
is locat ed at physical address 0 and it s limit is set t o 3FFH. I n t he I nt el 8086
processor, t he base address and limit of t he int errupt vect or t able cannot be
changed. I n t he lat er I A- 32 processors, t he base address and limit of t he int errupt
vect or t able are cont ained in t he I DTR regist er and can be changed using t he LI DT
inst ruct ion.
( For backward compat ibilit y t o I nt el 8086 processors, t he default base address and
limit of t he int errupt vect or t able should not be changed. )
Table 17- 1 shows t he int errupt and except ion vect ors t hat can be generat ed in real-
address mode and virt ual- 8086 mode, and in t he I nt el 8086 processor. See Chapt er
6, I nt errupt and Except ion Handling , for a descript ion of t he except ion condit ions.
Figure 17-2. Interrupt Vector Table in Real-Address Mode
0
2
4
8
12
0 15
Segment Selector
Offset
* Interrupt vector number 0 selects entry 0
Interrupt Vector 0*
Entry 1
Entry 2
Entry 3
Up to Entry 255
IDTR
(called interrupt vector 0) in the interrupt
vector table. Interrupt vector 0 in turn
points to the start of the interrupt handler
for interrupt 0.
17-8 Vol. 3
8086 EMULATION
17.2 VIRTUAL-8086 MODE
Virt ual- 8086 mode is act ually a special t ype of a t ask t hat runs in prot ect ed mode.
When t he operat ing- syst em or execut ive swit ches t o a virt ual- 8086- mode t ask, t he
processor emulat es an I nt el 8086 processor. The execut ion environment of t he
processor while in t he 8086- emulat ion st at e is t he same as is described in Sect ion
17. 1, Real- Address Mode for real- address mode, including t he ext ensions. The
maj or difference bet ween t he t wo modes is t hat in virt ual- 8086 mode t he 8086
emulat or uses some prot ect ed- mode services ( such as t he prot ect ed- mode int errupt
and except ion- handling and paging facilit ies) .
As in real- address mode, any new or legacy program t hat has been assembled
and/ or compiled t o run on an I nt el 8086 processor will run in a virt ual- 8086- mode
t ask. And several 8086 programs can be run as virt ual- 8086- mode t asks concur-
rent ly wit h normal prot ect ed- mode t asks, using t he processor s mult it asking
facilit ies.
Table 17-1. Real-Address Mode Exceptions and Interrupts
Vector
No.
Description Real-Address
Mode
Virtual-8086
Mode
Intel 8086
Processor
0 Divide Error (#DE) Yes Yes Yes
1 Debug Exception (#DB) Yes Yes No
2 NMI Interrupt Yes Yes Yes
3 Breakpoint (#BP) Yes Yes Yes
4 Overflow (#OF) Yes Yes Yes
5 BOUND Range Exceeded (#BR) Yes Yes Reserved
6 Invalid Opcode (#UD) Yes Yes Reserved
7 Device Not Available (#NM) Yes Yes Reserved
8 Double Fault (#DF) Yes Yes Reserved
9 (Intel reserved. Do not use.) Reserved Reserved Reserved
10 Invalid TSS (#TS) Reserved Yes Reserved
11 Segment Not Present (#NP) Reserved Yes Reserved
12 Stack Fault (#SS) Yes Yes Reserved
13 General Protection (#GP)* Yes Yes Reserved
14 Page Fault (#PF) Reserved Yes Reserved
15 (Intel reserved. Do not use.) Reserved Reserved Reserved
16 Floating-Point Error (#MF) Yes Yes Reserved
17 Alignment Check (#AC) Reserved Yes Reserved
18 Machine Check (#MC) Yes Yes Reserved
Vol. 3 17-9
8086 EMULATION
17.2.1 Enabling Virtual-8086 Mode
The processor runs in virt ual- 8086 mode when t he VM ( virt ual machine) flag in t he
EFLAGS regist er is set . This flag can only be set when t he processor swit ches t o a
new prot ect ed- mode t ask or resumes virt ual- 8086 mode via an I RET inst ruct ion.
Syst em soft ware cannot change t he st at e of t he VM flag direct ly in t he EFLAGS
regist er ( for example, by using t he POPFD inst ruct ion) . I nst ead it changes t he flag in
t he image of t he EFLAGS regist er st ored in t he TSS or on t he st ack following a call t o
an int errupt - or except ion- handler procedure. For example, soft ware set s t he VM flag
in t he EFLAGS image in t he TSS when first creat ing a virt ual- 8086 t ask.
The processor t est s t he VM flag under t hree general condit ions:
When loading segment regist ers, t o det ermine whet her t o use 8086- st yle
address t ranslat ion.
When decoding inst ruct ions, t o det ermine which inst ruct ions are not support ed in
virt ual- 8086 mode and which inst ruct ions are sensit ive t o I OPL.
When checking privileged inst ruct ions, on page accesses, or when performing
ot her permission checks. ( Virt ual- 8086 mode always execut es at CPL 3.)
17.2.2 Structure of a Virtual-8086 Task
A virt ual- 8086- mode t ask consist s of t he following it ems:
A 32- bit TSS for t he t ask.
The 8086 program.
A virt ual- 8086 monit or.
8086 operat ing- syst em services.
The TSS of t he new t ask must be a 32- bit TSS, not a 16- bit TSS, because t he 16- bit
TSS does not load t he most - significant word of t he EFLAGS regist er, which cont ains
t he VM flag. All TSSs, st acks, dat a, and code used t o handle except ions when in
virt ual- 8086 mode must also be 32- bit segment s.
19-31 (Intel reserved. Do not use.) Reserved Reserved Reserved
32-
255
User Defined Interrupts Yes Yes Yes
NOTE:
* In the real-address mode, vector 13 is the segment overrun exception. In protected and vir-
tual-8086 modes, this exception covers all general-protection error conditions, including traps
to the virtual-8086 monitor from virtual-8086 mode.
Table 17-1. Real-Address Mode Exceptions and Interrupts (Contd.)
Vector
No.
Description Real-Address
Mode
Virtual-8086
Mode
Intel 8086
Processor
17-10 Vol. 3
8086 EMULATION
The processor ent ers virt ual- 8086 mode t o run t he 8086 program and ret urns t o
prot ect ed mode t o run t he virt ual- 8086 monit or.
The virt ual- 8086 monit or is a 32- bit prot ect ed- mode code module t hat runs at a CPL
of 0. The monit or consist s of init ializat ion, int errupt - and except ion- handling, and I / O
emulat ion procedures t hat emulat e a personal comput er or ot her 8086- based plat -
form. Typically, t he monit or is eit her part of or closely associat ed wit h t he prot ect ed-
mode general- prot ect ion ( # GP) except ion handler, which also runs at a CPL of 0. As
wit h any prot ect ed- mode code module, code- segment descript ors for t he virt ual-
8086 monit or must exist in t he GDT or in t he t asks LDT. The virt ual- 8086 monit or
also may need dat a- segment descript ors so it can examine t he I DT or ot her part s of
t he 8086 program in t he first 1 MByt e of t he address space. The linear addresses
above 10FFEFH are available for t he monit or, t he operat ing syst em, and ot her syst em
soft ware.
The 8086 operat ing- syst em services consist s of a kernel and/ or operat ing- syst em
procedures t hat t he 8086 program makes calls t o. These services can be imple-
ment ed in eit her of t he following t wo ways:
They can be included in t he 8086 program. This approach is desirable for eit her
of t he following reasons:
The 8086 program code modifies t he 8086 operat ing- syst em services.
There is not sufficient development t ime t o merge t he 8086 operat ing-
syst em services int o main operat ing syst em or execut ive.
They can be implement ed or emulat ed in t he virt ual- 8086 monit or. This approach
is desirable for any of t he following reasons:
The 8086 operat ing- syst em procedures can be more easily coordinat ed
among several virt ual- 8086 t asks.
Memory can be saved by not duplicat ing 8086 operat ing- syst em procedure
code for several virt ual- 8086 t asks.
The 8086 operat ing- syst em procedures can be easily emulat ed by calls t o t he
main operat ing syst em or execut ive.
The approach chosen for implement ing t he 8086 operat ing- syst em services may
result in different virt ual- 8086- mode t asks using different 8086 operat ing- syst em
services.
17.2.3 Paging of Virtual-8086 Tasks
Even t hough a program running in virt ual- 8086 mode can use only 20- bit linear
addresses, t he processor convert s t hese addresses int o 32- bit linear addresses
before mapping t hem t o t he physical address space. I f paging is being used, t he
8086 address space for a program running in virt ual- 8086 mode can be paged and
locat ed in a set of pages in physical address space. I f paging is used, it is t ransparent
t o t he program running in virt ual- 8086 mode j ust as it is for any t ask running on t he
processor.
Vol. 3 17-11
8086 EMULATION
Paging is not necessary for a single virt ual- 8086- mode t ask, but paging is useful or
necessary in t he following sit uat ions:
When running mult iple virt ual- 8086- mode t asks. Here, paging allows t he lower 1
MByt e of t he linear address space for each virt ual- 8086- mode t ask t o be mapped
t o a different physical address locat ion.
When emulat ing t he 8086 address- wraparound t hat occurs at 1 MByt e. When
using 8086- st yle address t ranslat ion, it is possible t o specify addresses larger
t han 1 MByt e. These addresses aut omat ically wraparound in t he I nt el 8086
processor ( see Sect ion 17. 1. 1, Address Translat ion in Real-Address Mode ) . I f
any 8086 programs depend on address wraparound, t he same effect can be
achieved in a virt ual- 8086- mode t ask by mapping t he linear addresses bet ween
100000H and 110000H and linear addresses bet ween 0 and 10000H t o t he same
physical addresses.
When sharing t he 8086 operat ing- syst em services or ROM code t hat is common
t o several 8086 programs running as different 8086- mode t asks.
When redirect ing or t rapping references t o memory- mapped I / O devices.
17.2.4 Protection within a Virtual-8086 Task
Prot ect ion is not enforced bet ween t he segment s of an 8086 program. Eit her of t he
following t echniques can be used t o prot ect t he syst em soft ware running in a virt ual-
8086- mode t ask from t he 8086 program:
Reserve t he first 1 MByt e plus 64 KByt es of each t asks linear address space for
t he 8086 program. An 8086 processor t ask cannot generat e addresses out side
t his range.
Use t he U/ S flag of page- t able ent ries t o prot ect t he virt ual- 8086 monit or and
ot her syst em soft ware in t he virt ual- 8086 mode t ask space. When t he processor
is in virt ual- 8086 mode, t he CPL is 3. Therefore, an 8086 processor program has
only user privileges. I f t he pages of t he virt ual- 8086 monit or have supervisor
privilege, t hey cannot be accessed by t he 8086 program.
17.2.5 Entering Virtual-8086 Mode
Figure 17- 3 summarizes t he met hods of ent ering and leaving virt ual- 8086 mode.
The processor swit ches t o virt ual- 8086 mode in eit her of t he following sit uat ions:
Task swit ch when t he VM flag is set t o 1 in t he EFLAGS regist er image st ored in
t he TSS for t he t ask. Here t he t ask swit ch can be init iat ed in eit her of t wo ways:
A CALL or JMP inst ruct ion.
An I RET inst ruct ion, where t he NT flag in t he EFLAGS image is set t o 1.
Ret urn from a prot ect ed- mode int errupt or except ion handler when t he VM flag is
set t o 1 in t he EFLAGS regist er image on t he st ack.
17-12 Vol. 3
8086 EMULATION
When a t ask swit ch is used t o ent er virt ual- 8086 mode, t he TSS for t he virt ual- 8086-
mode t ask must be a 32- bit TSS. ( I f t he new TSS is a 16- bit TSS, t he upper word of
t he EFLAGS regist er is not in t he TSS, causing t he processor t o clear t he VM flag
when it loads t he EFLAGS regist er. ) The processor updat es t he VM flag prior t o
loading t he segment regist ers from t heir images in t he new TSS. The new set t ing of
t he VM flag det ermines whet her t he processor int erpret s t he cont ent s of t he segment
regist ers as 8086- st yle segment select ors or prot ect ed- mode segment select ors.
When t he VM flag is set , t he segment regist ers are loaded from t he TSS, using 8086-
st yle address t ranslat ion t o form base addresses.
See Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode , for infor-
mat ion on ent ering virt ual- 8086 mode on a ret urn from an int errupt or except ion
handler.
Vol. 3 17-13
8086 EMULATION
Figure 17-3. Entering and Leaving Virtual-8086 Mode
Monitor
Virtual-8086
Real Mode
Code
Protected-
Mode Tasks
Virtual-8086
Mode Tasks
(8086
Programs)
Protected-
Mode Interrupt
and Exception
Handlers
Task Switch
1
VM = 1
Protected
Mode
Virtual-8086
Mode
Real-Address
Mode
RESET
PE=1
PE=0 or
RESET
#GP Exception
3
CALL
RET
Task Switch
VM=0
Redirect Interrupt to 8086 Program
Interrupt or Exception Handler
6
IRET
4
Interrupt or
Exception
2
VM = 0
NOTES:
- CALL or JMP where the VM flag in the EFLAGS image is 1.
- IRET where VM is 1 and NT is 1.
4. Normal return from protected-mode interrupt or exception handler.
3. General-protection exception caused by software interrupt (INT n), IRET,
POPF, PUSHF, IN, or OUT when IOPL is less than 3.
2. Hardware interrupt or exception; software interrupt (INT n) when IOPL is 3.
5. A return from the 8086 monitor to redirect an interrupt or exception back
to an interrupt or exception handler in the 8086 program running in virtual-
6. Internal redirection of a software interrupt (INT n) when VME is 1,
IOPL is <3, and the redirection bit is 1.
IRET
5
8086 mode.
1. Task switch carried out in either of two ways:
17-14 Vol. 3
8086 EMULATION
17.2.6 Leaving Virtual-8086 Mode
The processor can leave t he virt ual- 8086 mode only t hrough an int errupt or excep-
t ion. The following are sit uat ions where an int errupt or except ion will lead t o t he
processor leaving virt ual- 8086 mode ( see Figure 17- 3) :
The processor services a hardware int errupt generat ed t o signal t he suspension
of execut ion of t he virt ual- 8086 applicat ion. This hardware int errupt may be
generat ed by a t imer or ot her ext ernal mechanism. Upon receiving t he hardware
int errupt , t he processor ent ers prot ect ed mode and swit ches t o a prot ect ed-
mode ( or anot her virt ual- 8086 mode) t ask eit her t hrough a t ask gat e in t he
prot ect ed- mode I DT or t hrough a t rap or int errupt gat e t hat point s t o a handler
t hat init iat es a t ask swit ch. A t ask swit ch from a virt ual- 8086 t ask t o anot her t ask
loads t he EFLAGS regist er from t he TSS of t he new t ask. The value of t he VM flag
in t he new EFLAGS det ermines if t he new t ask execut es in virt ual- 8086 mode or
not .
The processor services an except ion caused by code execut ing t he virt ual- 8086
t ask or services a hardware int errupt t hat belongs t o t he virt ual- 8086 t ask.
Here, t he processor ent ers prot ect ed mode and services t he except ion or
hardware int errupt t hrough t he prot ect ed- mode I DT ( normally t hrough an
int errupt or t rap gat e) and t he prot ect ed- mode except ion- and int errupt -
handlers. The processor may handle t he except ion or int errupt wit hin t he cont ext
of t he virt ual 8086 t ask and ret urn t o virt ual- 8086 mode on a ret urn from t he
handler procedure. The processor may also execut e a t ask swit ch and handle t he
except ion or int errupt in t he cont ext of anot her t ask.
The processor services a soft ware int errupt generat ed by code execut ing in t he
virt ual- 8086 t ask ( such as a soft ware int errupt t o call a MS- DOS* operat ing
syst em rout ine) . The processor provides several met hods of handling t hese
soft ware int errupt s, which are discussed in det ail in Sect ion 17. 3. 3, Class
3Soft ware I nt errupt Handling in Virt ual- 8086 Mode . Most of t hem involve t he
processor ent ering prot ect ed mode, oft en by means of a general- prot ect ion
( # GP) except ion. I n prot ect ed mode, t he processor can send t he int errupt t o t he
virt ual- 8086 monit or for handling and/ or redirect t he int errupt back t o t he
applicat ion program running in virt ual- 8086 mode t ask for handling.
I A- 32 processors t hat incorporat e t he virt ual mode ext ension ( enabled wit h t he
VME flag in cont rol regist er CR4) are capable of redirect ing soft ware- generat ed
int errupt s back t o t he programs int errupt handlers wit hout leaving virt ual- 8086
mode. See Sect ion 17.3.3. 4, Met hod 5: Soft ware I nt errupt Handling , for more
informat ion on t his mechanism.
A hardware reset init iat ed by assert ing t he RESET or I NI T pin is a special kind of
int errupt . When a RESET or I NI T is signaled while t he processor is in virt ual- 8086
mode, t he processor leaves virt ual- 8086 mode and ent ers real- address mode.
Execut ion of t he HLT inst ruct ion in virt ual- 8086 mode will cause a general-
prot ect ion ( GP# ) fault , which t he prot ect ed- mode handler generally sends t o t he
virt ual- 8086 monit or. The virt ual- 8086 monit or t hen det ermines t he correct
Vol. 3 17-15
8086 EMULATION
execut ion sequence aft er verifying t hat it was ent ered as a result of a HLT
execut ion.
See Sect ion 17. 3, I nt errupt and Except ion Handling in Virt ual- 8086 Mode , for infor-
mat ion on leaving virt ual- 8086 mode t o handle an int errupt or except ion generat ed
in virt ual- 8086 mode.
17.2.7 Sensitive Instructions
When an I A- 32 processor is running in virt ual- 8086 mode, t he CLI , STI , PUSHF, POPF,
I NT n, and I RET inst ruct ions are sensit ive t o I OPL. The I N, I NS, OUT, and OUTS
inst ruct ions, which are sensit ive t o I OPL in prot ect ed mode, are not sensit ive in
virt ual- 8086 mode.
The CPL is always 3 while running in virt ual- 8086 mode; if t he I OPL is less t han 3, an
at t empt t o use t he I OPL- sensit ive inst ruct ions list ed above t riggers a general- prot ec-
t ion except ion ( # GP) . These inst ruct ions are sensit ive t o I OPL t o give t he virt ual-
8086 monit or a chance t o emulat e t he facilit ies t hey affect .
17.2.8 Virtual-8086 Mode I/O
Many 8086 programs writ t en for non- mult it asking syst ems direct ly access I / O port s.
This pract ice may cause problems in a mult it asking environment . I f more t han one
program accesses t he same port , t hey may int erfere wit h each ot her. Most mult i-
t asking syst ems require applicat ion programs t o access I / O port s t hrough t he oper-
at ing syst em. This result s in simplified, cent ralized cont rol.
The processor provides I / O prot ect ion for creat ing I / O t hat is compat ible wit h t he
environment and t ransparent t o 8086 programs. Designers may t ake any of several
possible approaches t o prot ect ing I / O port s:
Prot ect t he I / O address space and generat e except ions for all at t empt s t o
perform I / O direct ly.
Let t he 8086 program perform I / O direct ly.
Generat e except ions on at t empt s t o access specific I / O port s.
Generat e except ions on at t empt s t o access specific memory- mapped I / O port s.
The met hod of cont rolling access t o I / O port s depends upon whet her t hey are
I / O- port mapped or memory mapped.
17.2.8.1 I/O-Port-Mapped I/O
The I / O permission bit map in t he TSS can be used t o generat e except ions on
at t empt s t o access specific I / O port addresses. The I / O permission bit map of each
virt ual- 8086- mode t ask det ermines which I / O addresses generat e except ions for
t hat t ask. Because each t ask may have a different I / O permission bit map, t he
addresses t hat generat e except ions for one t ask may be different from t he addresses
17-16 Vol. 3
8086 EMULATION
for anot her t ask. This differs from prot ect ed mode in which, if t he CPL is less t han or
equal t o t he I OPL, I / O access is allowed wit hout checking t he I / O permission bit map.
See Chapt er 13, I nput / Out put , in t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 1, for more informat ion about t he I / O permission bit
map.
17.2.8.2 Memory-Mapped I/O
I n syst ems which use memory- mapped I / O, t he paging facilit ies of t he processor can
be used t o generat e except ions for at t empt s t o access I / O port s. The virt ual- 8086
monit or may use paging t o cont rol memory- mapped I / O in t hese ways:
Map part of t he linear address space of each t ask t hat needs t o perform I / O t o t he
physical address space where I / O port s are placed. By put t ing t he I / O port s at
different addresses ( in different pages) , t he paging mechanism can enforce
isolat ion bet ween t asks.
Map part of t he linear address space t o pages t hat are not - present . This
generat es an except ion whenever a t ask at t empt s t o perform I / O t o t hose pages.
Syst em soft ware t hen can int erpret t he I / O operat ion being at t empt ed.
Soft ware emulat ion of t he I / O space may require t oo much operat ing syst em int er-
vent ion under some condit ions. I n t hese cases, it may be possible t o generat e an
except ion for only t he first at t empt t o access I / O. The syst em soft ware t hen may
det ermine whet her a program can be given exclusive cont rol of I / O t emporarily, t he
prot ect ion of t he I / O space may be lift ed, and t he program allowed t o run at full
speed.
17.2.8.3 Special I/O Buffers
Buffers of int elligent cont rollers ( for example, a bit - mapped frame buffer) also can be
emulat ed using page mapping. The linear space for t he buffer can be mapped t o a
different physical space for each virt ual- 8086- mode t ask. The virt ual- 8086 monit or
t hen can cont rol which virt ual buffer t o copy ont o t he real buffer in t he physical
address space.
17.3 INTERRUPT AND EXCEPTION HANDLING
IN VIRTUAL-8086 MODE
When t he processor receives an int errupt or det ect s an except ion condit ion while in
virt ual- 8086 mode, it invokes an int errupt or except ion handler, j ust as it does in
prot ect ed or real- address mode. The int errupt or except ion handler t hat is invoked
and t he mechanism used t o invoke it depends on t he class of int errupt or except ion
t hat has been det ect ed or generat ed and t he st at e of various syst em flags and fields.
Vol. 3 17-17
8086 EMULATION
I n virt ual- 8086 mode, t he int errupt s and except ions are divided int o t hree classes for
t he purposes of handling:
Cl ass 1 All processor- generat ed except ions and all hardware int errupt s,
including t he NMI int errupt and t he hardware int errupt s sent t o t he processor s
ext ernal int errupt delivery pins. All class 1 except ions and int errupt s are handled
by t he prot ect ed- mode except ion and int errupt handlers.
Cl ass 2 Special case for maskable hardware int errupt s ( Sect ion 6. 3. 2,
Maskable Hardware I nt errupt s ) when t he virt ual mode ext ensions are enabled.
Cl ass 3 All soft ware- generat ed int errupt s, t hat is int errupt s generat ed wit h
t he I NT n inst ruct ion
1
.
The met hod t he processor uses t o handle class 2 and 3 int errupt s depends on t he
set t ing of t he following flags and fields:
I OPL f i el d ( bi t s 12 and 13 i n t he EFLAGS r egi st er ) Cont rols how class 3
soft ware int errupt s are handled when t he processor is in virt ual- 8086 mode ( see
Sect ion 2. 3, Syst em Flags and Fields in t he EFLAGS Regist er ) . This field also
cont rols t he enabling of t he VI F and VI P flags in t he EFLAGS regist er when t he
VME flag is set . The VI F and VI P flags are provided t o assist in t he handling of
class 2 maskable hardware int errupt s.
VME f l ag ( bi t 0 i n cont r ol r egi st er CR4) Enables t he virt ual mode ext ension
for t he processor when set ( see Sect ion 2. 5, Cont rol Regist ers ) .
Sof t w ar e i nt er r upt r edi r ect i on bi t map ( 32 by t es i n t he TSS, see
Fi gur e 17- 5) Cont ains 256 flags t hat indicat es how class 3 soft ware
int errupt s should be handled when t hey occur in virt ual- 8086 mode. A soft ware
int errupt can be direct ed eit her t o t he int errupt and except ion handlers in t he
current ly running 8086 program or t o t he prot ect ed- mode int errupt and
except ion handlers.
The vi r t ual i nt er r upt f l ag ( VI F) and vi r t ual i nt er r upt pendi ng f l ag ( VI P)
i n t he EFLAGS r egi st er Provides vi r t ual i nt er r upt suppor t for t he handling
of class 2 maskable hardware int errupt s ( see Sect ion 17. 3. 2, Class 2Maskable
Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual I nt errupt
Mechanism ) .
NOTE
The VME flag, soft ware int errupt redirect ion bit map, and VI F and VI P
flags are only available in I A- 32 processors t hat support t he virt ual
mode ext ensions. These ext ensions were int roduced in t he I A- 32
archit ect ure wit h t he Pent ium processor.
The following sect ions describe t he act ions t hat processor t akes and t he possible
act ions of int errupt and except ion handlers for t he t wo classes of int errupt s described
1. The INT 3 instruction is a special case (see the description of the INT n instruction in Chapter 3,
Instruction Set Reference, A-M, of the Intel 64 and IA-32 Architectures Software Developers
Manual, Volume 2A).
17-18 Vol. 3
8086 EMULATION
in t he previous paragraphs. These sect ions describe t hree possible t ypes of int errupt
and except ion handlers:
Pr ot ect ed- mode i nt er r upt and ex cept i ons handl er s These are t he
st andard handlers t hat t he processor calls t hrough t he prot ect ed- mode I DT.
Vi r t ual - 8086 moni t or i nt er r upt and ex cept i on handl er s These handlers
are resident in t he virt ual- 8086 monit or, and t hey are commonly accessed
t hrough a general- prot ect ion except ion ( # GP, int errupt 13) t hat is direct ed t o t he
prot ect ed- mode general- prot ect ion except ion handler.
8086 pr ogr am i nt er r upt and ex cept i on handl er s These handlers are part
of t he 8086 program t hat is running in virt ual- 8086 mode.
The following sect ions describe how t hese handlers are used, depending on t he
select ed class and met hod of int errupt and except ion handling.
17.3.1 Class 1Hardware Interrupt and Exception Handling in
Virtual-8086 Mode
I n virt ual- 8086 mode, t he Pent ium, P6 family, Pent ium 4, and I nt el Xeon processors
handle hardware int errupt s and except ions in t he same manner as t hey are handled
by t he I nt el486 and I nt el386 processors. They invoke t he prot ect ed- mode int errupt
or except ion handler t hat t he int errupt or except ion vect or point s t o in t he I DT. Here,
t he I DT ent ry must cont ain eit her a 32- bit t rap or int errupt gat e or a t ask gat e. The
following sect ions describe various ways t hat a virt ual- 8086 mode int errupt or excep-
t ion can be handled aft er t he prot ect ed- mode handler has been invoked.
See Sect ion 17. 3. 2, Class 2Maskable Hardware I nt errupt Handling in Virt ual- 8086
Mode Using t he Virt ual I nt errupt Mechanism , for a descript ion of t he virt ual int errupt
mechanism t hat is available for handling maskable hardware int errupt s while in
virt ual- 8086 mode. When t his mechanism is eit her not available or not enabled,
maskable hardware int errupt s are handled in t he same manner as except ions, as
described in t he following sect ions.
17.3.1.1 Handling an Interrupt or Exception Through a Protected-Mode
Trap or Interrupt Gate
When an int errupt or except ion vect or point s t o a 32- bit t rap or int errupt gat e in t he
I DT, t he gat e must in t urn point t o a nonconforming, privilege- level 0, code segment .
When accessing t his code segment , processor performs t he following st eps.
1. Swit ches t o 32- bit prot ect ed mode and privilege level 0.
2. Saves t he st at e of t he processor on t he privilege- level 0 st ack. The st at es of t he
EI P, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS regist ers are saved ( see
Figure 17- 4) .
3. Clears t he segment regist ers. Saving t he DS, ES, FS, and GS regist ers on t he
st ack and t hen clearing t he regist ers let s t he int errupt or except ion handler safely
Vol. 3 17-19
8086 EMULATION
save and rest ore t hese regist ers regardless of t he t ype segment select ors t hey
cont ain ( prot ect ed- mode or 8086- st yle) . The int errupt and except ion handlers,
which may be called in t he cont ext of eit her a prot ect ed- mode t ask or a virt ual-
8086- mode t ask, can use t he same code sequences for saving and rest oring t he
regist ers for any t ask. Clearing t hese regist ers before execut ion of t he I RET
inst ruct ion does not cause a t rap in t he int errupt handler. I nt errupt procedures
t hat expect values in t he segment regist ers or t hat ret urn values in t he segment
regist ers must use t he regist er images saved on t he st ack for privilege level 0.
4. Clears VM, NT, RF and TF flags ( in t he EFLAGS regist er) . I f t he gat e is an int errupt
gat e, clears t he I F flag.
5. Begins execut ing t he select ed int errupt or except ion handler.
I f t he t rap or int errupt gat e references a procedure in a conforming segment or in a
segment at a privilege level ot her t han 0, t he processor generat es a general- prot ec-
t ion except ion ( # GP) . Here, t he error code is t he segment select or of t he code
segment t o which a call was at t empt ed.
Figure 17-4. Privilege Level 0 Stack After Interrupt or
Exception in Virtual-8086 Mode
Unused
Old GS
Old ESP
With Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP
Error Code New ESP
TSS
Unused
Old GS
Old ESP
Without Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP New ESP
TSS
17-20 Vol. 3
8086 EMULATION
I nt errupt and except ion handlers can examine t he VM flag on t he st ack t o det ermine
if t he int errupt ed procedure was running in virt ual- 8086 mode. I f so, t he int errupt or
except ion can be handled in one of t hree ways:
The prot ect ed- mode int errupt or except ion handler t hat was called can handle
t he int errupt or except ion.
The prot ect ed- mode int errupt or except ion handler can call t he virt ual- 8086
monit or t o handle t he int errupt or except ion.
The virt ual- 8086 monit or ( if called) can in t urn pass cont rol back t o t he 8086
programs int errupt and except ion handler.
I f t he int errupt or except ion is handled wit h a prot ect ed- mode handler, t he handler
can ret urn t o t he int errupt ed program in virt ual- 8086 mode by execut ing an I RET
inst ruct ion. This inst ruct ion loads t he EFLAGS and segment regist ers from t he
images saved in t he privilege level 0 st ack ( see Figure 17- 4) . A set VM flag in t he
EFLAGS image causes t he processor t o swit ch back t o virt ual- 8086 mode. The CPL at
t he t ime t he I RET inst ruct ion is execut ed must be 0, ot herwise t he processor does
not change t he st at e of t he VM flag.
The virt ual- 8086 monit or runs at privilege level 0, like t he prot ect ed- mode int errupt
and except ion handlers. I t is commonly closely t ied t o t he prot ect ed- mode general-
prot ect ion except ion ( # GP, vect or 13) handler. I f t he prot ect ed- mode int errupt or
except ion handler calls t he virt ual- 8086 monit or t o handle t he int errupt or except ion,
t he ret urn from t he virt ual- 8086 monit or t o t he int errupt ed virt ual- 8086 mode
program requires t wo ret urn inst ruct ions: a RET inst ruct ion t o ret urn t o t he
prot ect ed- mode handler and an I RET inst ruct ion t o ret urn t o t he int errupt ed
program.
The virt ual- 8086 monit or has t he opt ion of direct ing t he int errupt and except ion back
t o an int errupt or except ion handler t hat is part of t he int errupt ed 8086 program, as
described in Sect ion 17. 3. 1. 2, Handling an I nt errupt or Except ion Wit h an 8086
Program I nt errupt or Except ion Handler .
17.3.1.2 Handling an Interrupt or Exception With an 8086 Program
Interrupt or Exception Handler
Because it was designed t o run on an 8086 processor, an 8086 program running in a
virt ual- 8086- mode t ask cont ains an 8086- st yle int errupt vect or t able, which st art s at
linear address 0. I f t he virt ual- 8086 monit or correct ly direct s an int errupt or excep-
t ion vect or back t o t he virt ual- 8086- mode t ask it came from, t he handlers in t he
8086 program can handle t he int errupt or except ion. The virt ual- 8086 monit or must
carry out t he following st eps t o send an int errupt or except ion back t o t he 8086
program:
1. Use t he 8086 int errupt vect or t o locat e t he appropriat e handler procedure in t he
8086 program int errupt t able.
Vol. 3 17-21
8086 EMULATION
2. St ore t he EFLAGS ( low- order 16 bit s only) , CS and EI P values of t he 8086
program on t he privilege- level 3 st ack. This is t he st ack t hat t he virt ual- 8086-
mode t ask is using. ( The 8086 handler may use or modify t his informat ion. )
3. Change t he ret urn link on t he privilege- level 0 st ack t o point t o t he privilege- level
3 handler procedure.
4. Execut e an I RET inst ruct ion t o pass cont rol t o t he 8086 program handler.
5. When t he I RET inst ruct ion from t he privilege- level 3 handler t riggers a general-
prot ect ion except ion ( # GP) and t hus effect ively again calls t he virt ual- 8086
monit or, rest ore t he ret urn link on t he privilege- level 0 st ack t o point t o t he
original, int errupt ed, privilege- level 3 procedure.
6. Copy t he low order 16 bit s of t he EFLAGS image from t he privilege- level 3 st ack
t o t he privilege- level 0 st ack ( because some 8086 handlers modify t hese flags t o
ret urn informat ion t o t he code t hat caused t he int errupt ) .
7. Execut e an I RET inst ruct ion t o pass cont rol back t o t he int errupt ed 8086
program.
Not e t hat if an operat ing syst em int ends t o support all 8086 MS- DOS- based
programs, it is necessary t o use t he act ual 8086 int errupt and except ion handlers
supplied wit h t he program. The reason for t his is t hat some programs modify t heir
own int errupt vect or t able t o subst it ut e ( or hook in series) t heir own specialized
int errupt and except ion handlers.
17.3.1.3 Handling an Interrupt or Exception Through a Task Gate
When an int errupt or except ion vect or point s t o a t ask gat e in t he I DT, t he processor
performs a t ask swit ch t o t he select ed int errupt - or except ion- handling t ask. The
following act ions are carried out as part of t his t ask swit ch:
1. The EFLAGS regist er wit h t he VM flag set is saved in t he current TSS.
2. The link field in t he TSS of t he called t ask is loaded wit h t he segment select or of
t he TSS for t he int errupt ed virt ual- 8086- mode t ask.
3. The EFLAGS regist er is loaded from t he image in t he new TSS, which clears t he
VM flag and causes t he processor t o swit ch t o prot ect ed mode.
4. The NT flag in t he EFLAGS regist er is set .
5. The processor begins execut ing t he select ed int errupt - or except ion- handler
t ask.
When an I RET inst ruct ion is execut ed in t he handler t ask and t he NT flag in t he
EFLAGS regist er is set , t he processors swit ches from a prot ect ed- mode int errupt - or
except ion- handler t ask back t o a virt ual- 8086- mode t ask. Here, t he EFLAGS and
segment regist ers are loaded from images saved in t he TSS for t he virt ual- 8086-
mode t ask. I f t he VM flag is set in t he EFLAGS image, t he processor swit ches back t o
virt ual- 8086 mode on t he t ask swit ch. The CPL at t he t ime t he I RET inst ruct ion is
17-22 Vol. 3
8086 EMULATION
execut ed must be 0, ot herwise t he processor does not change t he st at e of t he VM
flag.
17.3.2 Class 2Maskable Hardware Interrupt Handling in
Virtual-8086 Mode Using the Virtual Interrupt Mechanism
Maskable hardware int errupt s are t hose int errupt s t hat are delivered t hrough t he
I NTR# pin or t hrough an int errupt request t o t he local API C ( see Sect ion 6. 3. 2,
Maskable Hardware I nt errupt s ) . These int errupt s can be inhibit ed ( masked) from
int errupt ing an execut ing program or t ask by clearing t he I F flag in t he EFLAGS
regist er.
When t he VME flag in cont rol regist er CR4 is set and t he I OPL field in t he EFLAGS
regist er is less t han 3, t wo addit ional flags are act ivat ed in t he EFLAGS regist er:
VI F ( virt ual int errupt ) flag, bit 19 of t he EFLAGS regist er.
VI P ( virt ual int errupt pending) flag, bit 20 of t he EFLAGS regist er.
These flags provide t he virt ual- 8086 monit or wit h more efficient cont rol over
handling maskable hardware int errupt s t hat occur during virt ual- 8086 mode t asks.
They also reduce int errupt - handling overhead, by eliminat ing t he need for all I F
relat ed operat ions ( such as PUSHF, POPF, CLI , and STI inst ruct ions) t o t rap t o t he
virt ual- 8086 monit or. The purpose and use of t hese flags are as follows.
NOTE
The VI F and VI P flags are only available in I A- 32 processors t hat
support t he virt ual mode ext ensions. These ext ensions were
int roduced in t he I A- 32 archit ect ure wit h t he Pent ium processor.
When t his mechanism is eit her not available or not enabled,
maskable hardware int errupt s are handled as class 1 int errupt s.
Here, if VI F and VI P flags are needed, t he virt ual- 8086 monit or can
implement t hem in soft ware.
Exist ing 8086 programs commonly set and clear t he I F flag in t he EFLAGS regist er t o
enable and disable maskable hardware int errupt s, respect ively; for example, t o
disable int errupt s while handling anot her int errupt or an except ion. This pract ice
works well in single t ask environment s, but can cause problems in mult it asking and
mult iple- processor environment s, where it is oft en desirable t o prevent an applica-
t ion program from having direct cont rol over t he handling of hardware int errupt s.
When using earlier I A- 32 processors, t his problem was oft en solved by creat ing a
virt ual I F flag in soft ware. The I A- 32 processors ( beginning wit h t he Pent ium
processor) provide hardware support for t his virt ual I F flag t hrough t he VI F and VI P
flags.
The VI F flag is a virt ualized version of t he I F flag, which an applicat ion program
running from wit hin a virt ual- 8086 t ask can used t o cont rol t he handling of maskable
hardware int errupt s. When t he VI F flag is enabled, t he CLI and STI inst ruct ions
operat e on t he VI F flag inst ead of t he I F flag. When an 8086 program execut es t he
Vol. 3 17-23
8086 EMULATION
CLI inst ruct ion, t he processor clears t he VI F flag t o request t hat t he virt ual- 8086
monit or inhibit maskable hardware int errupt s from int errupt ing program execut ion;
when it execut es t he STI inst ruct ion, t he processor set s t he VI F flag request ing t hat
t he virt ual- 8086 monit or enable maskable hardware int errupt s for t he 8086
program. But act ually t he I F flag, managed by t he operat ing syst em, always cont rols
whet her maskable hardware int errupt s are enabled. Also, if under t hese circum-
st ances an 8086 program t ries t o read or change t he I F flag using t he PUSHF or POPF
inst ruct ions, t he processor will change t he VI F flag inst ead, leaving I F unchanged.
The VI P flag provides soft ware a means of recording t he exist ence of a deferred ( or
pending) maskable hardware int errupt . This flag is read by t he processor but never
explicit ly writ t en by t he processor; it can only be writ t en by soft ware.
I f t he I F flag is set and t he VI F and VI P flags are enabled, and t he processor receives
a maskable hardware int errupt ( int errupt vect or 0 t hrough 255) , t he processor
performs and t he int errupt handler soft ware should perform t he following
operat ions:
1. The processor invokes t he prot ect ed- mode int errupt handler for t he int errupt
received, as described in t he following st eps. These st eps are almost ident ical t o
t hose described for met hod 1 int errupt and except ion handling in Sect ion
17. 3. 1. 1, Handling an I nt errupt or Except ion Through a Prot ect ed- Mode Trap or
I nt errupt Gat e :
a. Swit ches t o 32- bit prot ect ed mode and privilege level 0.
b. Saves t he st at e of t he processor on t he privilege- level 0 st ack. The st at es of
t he EI P, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS regist ers are saved ( see
Figure 17- 4) .
c. Clears t he segment regist ers.
d. Clears t he VM flag in t he EFLAGS regist er.
e. Begins execut ing t he select ed prot ect ed- mode int errupt handler.
2. The recommended act ion of t he prot ect ed- mode int errupt handler is t o read t he
VM flag from t he EFLAGS image on t he st ack. I f t his flag is set , t he handler makes
a call t o t he virt ual- 8086 monit or.
3. The virt ual- 8086 monit or should read t he VI F flag in t he EFLAGS regist er.
I f t he VI F flag is clear, t he virt ual- 8086 monit or set s t he VI P flag in t he
EFLAGS image on t he st ack t o indicat e t hat t here is a deferred int errupt
pending and ret urns t o t he prot ect ed- mode handler.
I f t he VI F flag is set , t he virt ual- 8086 monit or can handle t he int errupt if it
belongs t o t he 8086 program running in t he int errupt ed virt ual- 8086 t ask;
ot herwise, it can call t he prot ect ed- mode int errupt handler t o handle t he
int errupt .
4. The prot ect ed- mode handler execut es a ret urn t o t he program execut ing in
virt ual- 8086 mode.
17-24 Vol. 3
8086 EMULATION
5. Upon ret urning t o virt ual- 8086 mode, t he processor cont inues execut ion of t he
8086 program.
When t he 8086 program is ready t o receive maskable hardware int errupt s, it
execut es t he STI inst ruct ion t o set t he VI F flag ( enabling maskable hardware
int errupt s) . Prior t o set t ing t he VI F flag, t he processor aut omat ically checks t he VI P
flag and does one of t he following, depending on t he st at e of t he flag:
I f t he VI P flag is clear ( indicat ing no pending int errupt s) , t he processor set s t he
VI F flag.
I f t he VI P flag is set ( indicat ing a pending int errupt ) , t he processor generat es a
general- prot ect ion except ion ( # GP) .
The recommended act ion of t he prot ect ed- mode general- prot ect ion except ion
handler is t o t hen call t he virt ual- 8086 monit or and let it handle t he pending int er-
rupt . Aft er handling t he pending int errupt , t he t ypical act ion of t he virt ual- 8086
monit or is t o clear t he VI P flag and set t he VI F flag in t he EFLAGS image on t he st ack,
and t hen execut e a ret urn t o t he virt ual- 8086 mode. The next t ime t he processor
receives a maskable hardware int errupt , it will t hen handle it as described in st eps 1
t hrough 5 earlier in t his sect ion.
I f t he processor finds t hat bot h t he VI F and VI P flags are set at t he beginning of an
inst ruct ion, it generat es a general- prot ect ion except ion. This act ion allows t he
virt ual- 8086 monit or t o handle t he pending int errupt for t he virt ual- 8086 mode t ask
for which t he VI F flag is enabled. Not e t hat t his sit uat ion can only occur immediat ely
following execut ion of a POPF or I RET inst ruct ion or upon ent ering a virt ual- 8086
mode t ask t hrough a t ask swit ch.
Not e t hat t he st at es of t he VI F and VI P flags are not modified in real- address mode or
during t ransit ions bet ween real- address and prot ect ed modes.
NOTE
The virt ual int errupt mechanism described in t his sect ion is also
available for use in prot ect ed mode, see Sect ion 17. 4, Prot ect ed-
Mode Virt ual I nt errupt s .
17.3.3 Class 3Software Interrupt Handling in Virtual-8086 Mode
When t he processor receives a soft ware int errupt ( an int errupt generat ed wit h t he
I NT n inst ruct ion) while in virt ual- 8086 mode, it can use any of six different met hods
t o handle t he int errupt . The met hod select ed depends on t he set t ings of t he VME flag
in cont rol regist er CR4, t he I OPL field in t he EFLAGS regist er, and t he soft ware int er-
rupt redirect ion bit map in t he TSS. Table 17- 2 list s t he six met hods of handling soft -
ware int errupt s in virt ual- 8086 mode and t he respect ive set t ings of t he VME flag,
I OPL field, and t he bit s in t he int errupt redirect ion bit map for each met hod. The t able
also summarizes t he various act ions t he processor t akes for each met hod.
The VME flag enables t he virt ual mode ext ensions for t he Pent ium and lat er I A- 32
processors. When t his flag is clear, t he processor responds t o int errupt s and excep-
Vol. 3 17-25
8086 EMULATION
t ions in virt ual- 8086 mode in t he same manner as an I nt el386 or I nt el486 processor
does. When t his flag is set , t he virt ual mode ext ension provides t he following
enhancement s t o virt ual- 8086 mode:
Speeds up t he handling of soft ware- generat ed int errupt s in virt ual- 8086 mode by
allowing t he processor t o bypass t he virt ual- 8086 monit or and redirect soft ware
int errupt s back t o t he int errupt handlers t hat are part of t he current ly running
8086 program.
Support s virt ual int errupt s for soft ware writ t en t o run on t he 8086 processor.
The I OPL value int eract s wit h t he VME flag and t he bit s in t he int errupt redirect ion bit
map t o det ermine how specific soft ware int errupt s should be handled.
The soft ware int errupt redirect ion bit map ( see Figure 17- 5) is a 32- byt e field in t he
TSS. This map is locat ed direct ly below t he I / O permission bit map in t he TSS. Each
bit in t he int errupt redirect ion bit map is mapped t o an int errupt vect or. Bit 0 in t he
int errupt redirect ion bit map ( which maps t o vect or zero in t he int errupt t able) is
locat ed at t he I / O base map address in t he TSS minus 32 byt es. When a bit in t his bit
map is set , it indicat es t hat t he associat ed soft ware int errupt ( int errupt generat ed
wit h an I NT n inst ruct ion) should be handled t hrough t he prot ect ed- mode I DT and
int errupt and except ion handlers. When a bit in t his bit map is clear, t he processor
redirect s t he associat ed soft ware int errupt back t o t he int errupt t able in t he 8086
program ( locat ed at linear address 0 in t he programs address space) .
NOTE
The soft ware int errupt redirect ion bit map does not affect hardware
generat ed int errupt s and except ions. Hardware generat ed int errupt s
and except ions are always handled by t he prot ect ed- mode int errupt
and except ion handlers.
17-26 Vol. 3
8086 EMULATION
Table 17-2. Software Interrupt Handling Methods While in Virtual-8086 Mode
Method VME IOPL
Bit in
Redir.
Bitmap* Processor Action
1 0 3 X Interrupt directed to a protected-mode interrupt handler:
Switches to privilege-level 0 stack
Pushes GS, FS, DS and ES onto privilege-level 0 stack
Pushes SS, ESP, EFLAGS, CS and EIP of interrupted task onto
privilege-level 0 stack
Clears VM, RF, NT, and TF flags
If serviced through interrupt gate, clears IF flag
Clears GS, FS, DS and ES to 0
Sets CS and EIP from interrupt gate
2 0 < 3 X Interrupt directed to protected-mode general-protection
exception (#GP) handler.
3 1 < 3 1 Interrupt directed to a protected-mode general-protection
exception (#GP) handler; VIF and VIP flag support for handling
class 2 maskable hardware interrupts.
4 1 3 1 Interrupt directed to protected-mode interrupt handler: (see
method 1 processor action).
5 1 3 0 Interrupt redirected to 8086 program interrupt handler:
Pushes EFLAGS
Pushes CS and EIP (lower 16 bits only)
Clears IF flag
Clears TF flag
Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
6 1 < 3 0 Interrupt redirected to 8086 program interrupt handler; VIF and
VIP flag support for handling class 2 maskable hardware
interrupts:
Pushes EFLAGS with IOPL set to 3 and VIF copied to IF
Pushes CS and EIP (lower 16 bits only)
Clears the VIF flag
Clears TF flag
Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
NOTE:
* When set to 0, software interrupt is redirected back to the 8086 program interrupt handler;
when set to 1, interrupt is directed to protected-mode handler.
Vol. 3 17-27
8086 EMULATION
Redirect ing soft ware int errupt s back t o t he 8086 program pot ent ially speeds up
int errupt handling because a swit ch back and fort h bet ween virt ual- 8086 mode and
prot ect ed mode is not required. This lat t er int errupt - handling t echnique is part icu-
larly useful for 8086 operat ing syst ems ( such as MS- DOS) t hat use t he I NT n inst ruc-
t ion t o call operat ing syst em procedures.
The CPUI D inst ruct ion can be used t o verify t hat t he virt ual mode ext ension is imple-
ment ed on t he processor. Bit 1 of t he feat ure flags regist er ( EDX) indicat es t he avail-
abilit y of t he virt ual mode ext ension ( see CPUI DCPU I dent ificat ion in Chapt er 3,
I nst ruct ion Set Reference, A- M , of t he I nt el 64 and I A- 32 Archit ect ures Soft ware
Developers Manual, Volume 2A) .
The following sect ions describe t he six met hods ( or mechanisms) for handling soft -
ware int errupt s in virt ual- 8086 mode. See Sect ion 17. 3. 2, Class 2Maskable Hard-
ware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual I nt errupt
Mechanism , for a descript ion of t he use of t he VI F and VI P flags in t he EFLAGS
regist er for handling maskable hardware int errupt s.
17.3.3.1 Method 1: Software Interrupt Handling
When t he VME flag in cont rol regist er CR4 is clear and t he I OPL field is 3, a Pent ium
or lat er I A- 32 processor handles soft ware int errupt s in t he same manner as t hey are
handled by an I nt el386 or I nt el486 processor. I t execut es an implicit call t o t he int er-
Figure 17-5. Software Interrupt Redirection Bit Map in TSS
I/O Map Base
Task-State Segment (TSS)
64H
31 24 23
0
1 1 1 1 1 1 1 1
I/O Permission Bit Map
0
I / O map
base must
not exceed
DFFFH.
Last byt e of
bit
map must be
Software Interrupt Redirection Bit Map (32 Bytes)
17-28 Vol. 3
8086 EMULATION
rupt handler in t he prot ect ed- mode I DT point ed t o by t he int errupt vect or. See
Sect ion 17. 3. 1, Class 1Hardware I nt errupt and Except ion Handling in Virt ual- 8086
Mode , for a complet e descript ion of t his mechanism and it s possible uses.
17.3.3.2 Methods 2 and 3: Software Interrupt Handling
When a soft ware int errupt occurs in virt ual- 8086 mode and t he met hod 2 or 3 condi-
t ions are present , t he processor generat es a general- prot ect ion except ion ( # GP) .
Met hod 2 is enabled when t he VME flag is set t o 0 and t he I OPL value is less t han 3.
Here t he I OPL value is used t o bypass t he prot ect ed- mode int errupt handlers and
cause any soft ware int errupt t hat occurs in virt ual- 8086 mode t o be t reat ed as a
prot ect ed- mode general- prot ect ion except ion ( # GP) . The general- prot ect ion excep-
t ion handler calls t he virt ual- 8086 monit or, which can t hen emulat e an 8086-
program int errupt handler or pass cont rol back t o t he 8086 programs handler, as
described in Sect ion 17. 3. 1. 2, Handling an I nt errupt or Except ion Wit h an 8086
Program I nt errupt or Except ion Handler .
Met hod 3 is enabled when t he VME flag is set t o 1, t he I OPL value is less t han 3, and
t he corresponding bit for t he soft ware int errupt in t he soft ware int errupt redirect ion
bit map is set t o 1. Here, t he processor performs t he same operat ion as it does for
met hod 2 soft ware int errupt handling. I f t he corresponding bit for t he soft ware int er-
rupt in t he soft ware int errupt redirect ion bit map is set t o 0, t he int errupt is handled
using met hod 6 ( see Sect ion 17. 3. 3. 5, Met hod 6: Soft ware I nt errupt Handling ) .
17.3.3.3 Method 4: Software Interrupt Handling
Met hod 4 handling is enabled when t he VME flag is set t o 1, t he I OPL value is 3, and
t he bit for t he int errupt vect or in t he redirect ion bit map is set t o 1. Met hod 4 soft -
ware int errupt handling allows met hod 1 st yle handling when t he virt ual mode ext en-
sion is enabled; t hat is, t he int errupt is direct ed t o a prot ect ed- mode handler ( see
Sect ion 17. 3. 3. 1, Met hod 1: Soft ware I nt errupt Handling ) .
17.3.3.4 Method 5: Software Interrupt Handling
Met hod 5 soft ware int errupt handling provides a st reamlined met hod of redirect ing
soft ware int errupt s ( invoked wit h t he I NT n inst ruct ion) t hat occur in virt ual 8086
mode back t o t he 8086 programs int errupt vect or t able and it s int errupt handlers.
Met hod 5 handling is enabled when t he VME flag is set t o 1, t he I OPL value is 3, and
t he bit for t he int errupt vect or in t he redirect ion bit map is set t o 0. The processor
performs t he following act ions t o make an implicit call t o t he select ed 8086 program
int errupt handler:
1. Pushes t he low- order 16 bit s of t he EFLAGS regist er ont o t he st ack.
2. Pushes t he current values of t he CS and EI P regist ers ont o t he current st ack.
( Only t he 16 least - significant bit s of t he EI P regist er are pushed and no st ack
swit ch occurs. )
Vol. 3 17-29
8086 EMULATION
3. Clears t he I F flag in t he EFLAGS regist er t o disable int errupt s.
4. Clears t he TF flag, in t he EFLAGS regist er.
5. Locat es t he 8086 program int errupt vect or t able at linear address 0 for t he 8086-
mode t ask.
6. Loads t he CS and EI P regist ers wit h values from t he int errupt vect or t able ent ry
point ed t o by t he int errupt vect or number. Only t he 16 low- order bit s of t he EI P
are loaded and t he 16 high- order bit s are set t o 0. The int errupt vect or t able is
assumed t o be at linear address 0 of t he current virt ual- 8086 t ask.
7. Begins execut ing t he select ed int errupt handler.
An I RET inst ruct ion at t he end of t he handler procedure reverses t hese st eps t o
ret urn program cont rol t o t he int errupt ed 8086 program.
Not e t hat wit h met hod 5 handling, a mode swit ch from virt ual- 8086 mode t o
prot ect ed mode does not occur. The processor remains in virt ual- 8086 mode
t hroughout t he int errupt - handling operat ion.
The met hod 5 handling act ions are virt ually ident ical t o t he act ions t he processor
t akes when handling soft ware int errupt s in real- address mode. The benefit of using
met hod 5 handling t o access t he 8086 program handlers is t hat it avoids t he over-
head of met hods 2 and 3 handling, which requires first going t o t he virt ual- 8086
monit or, t hen t o t he 8086 program handler, t hen back again t o t he virt ual- 8086
monit or, before ret urning t o t he int errupt ed 8086 program ( see Sect ion 17. 3. 1. 2,
Handling an I nt errupt or Except ion Wit h an 8086 Program I nt errupt or Except ion
Handler ) .
NOTE
Met hods 1 and 4 handling can handle a soft ware int errupt in a virt ual-
8086 t ask wit h a regular prot ect ed- mode handler, but t his approach
requires all virt ual- 8086 t asks t o use t he same soft ware int errupt
handlers, which generally does not give sufficient lat it ude t o t he
programs running in t he virt ual- 8086 t asks, part icularly MS- DOS
programs.
17.3.3.5 Method 6: Software Interrupt Handling
Met hod 6 handling is enabled when t he VME flag is set t o 1, t he I OPL value is less
t han 3, and t he bit for t he int errupt or except ion vect or in t he redirect ion bit map is
set t o 0. Wit h met hod 6 int errupt handling, soft ware int errupt s are handled in t he
same manner as was described for met hod 5 handling ( see Sect ion 17. 3. 3. 4,
Met hod 5: Soft ware I nt errupt Handling ) .
Met hod 6 differs from met hod 5 in t hat wit h t he I OPL value set t o less t han 3, t he VI F
and VI P flags in t he EFLAGS regist er are enabled, providing virt ual int errupt support
for handling class 2 maskable hardware int errupt s ( see Sect ion 17. 3. 2, Class
2Maskable Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he Virt ual
I nt errupt Mechanism ) . These flags provide t he virt ual- 8086 monit or wit h an effi-
17-30 Vol. 3
8086 EMULATION
cient means of handling maskable hardware int errupt s t hat occur during a virt ual-
8086 mode t ask. Also, because t he I OPL value is less t han 3 and t he VI F flag is
enabled, t he informat ion pushed on t he st ack by t he processor when invoking t he
int errupt handler is slight ly different bet ween met hods 5 and 6 ( see Table 17- 2) .
17.4 PROTECTED-MODE VIRTUAL INTERRUPTS
The I A- 32 processors ( beginning wit h t he Pent ium processor) also support t he VI F
and VI P flags in t he EFLAGS regist er in prot ect ed mode by set t ing t he PVI ( prot ect ed-
mode virt ual int errupt ) flag in t he CR4 regist er. Set t ing t he PVI flag allows applica-
t ions running at privilege level 3 t o execut e t he CLI and STI inst ruct ions wit hout
causing a general- prot ect ion except ion ( # GP) or affect ing hardware int errupt s.
When t he PVI flag is set t o 1, t he CPL is 3, and t he I OPL is less t han 3, t he STI and
CLI inst ruct ions set and clear t he VI F flag in t he EFLAGS regist er, leaving I F unaf-
fect ed. I n t his mode of operat ion, an applicat ion running in prot ect ed mode and at a
CPL of 3 can inhibit int errupt s in t he same manner as is described in Sect ion 17. 3. 2,
Class 2Maskable Hardware I nt errupt Handling in Virt ual- 8086 Mode Using t he
Virt ual I nt errupt Mechanism , for a virt ual- 8086 mode t ask. When t he applicat ion
execut es t he CLI inst ruct ion, t he processor clears t he VI F flag. I f t he processor
receives a maskable hardware int errupt , t he processor invokes t he prot ect ed- mode
int errupt handler. This handler checks t he st at e of t he VI F flag in t he EFLAGS regist er.
I f t he VI F flag is clear ( indicat ing t hat t he act ive t ask does not want t o have int errupt s
handled now) , t he handler set s t he VI P flag in t he EFLAGS image on t he st ack and
ret urns t o t he privilege- level 3 applicat ion, which cont inues program execut ion.
When t he applicat ion execut es a STI inst ruct ion t o set t he VI F flag, t he processor
aut omat ically invokes t he general- prot ect ion except ion handler, which can t hen
handle t he pending int errupt . Aft er handing t he pending int errupt , t he handler t ypi-
cally set s t he VI F flag and clears t he VI P flag in t he EFLAGS image on t he st ack and
execut es a ret urn t o t he applicat ion program. The next t ime t he processor receives a
maskable hardware int errupt , t he processor will handle it in t he normal manner for
int errupt s received while t he processor is operat ing at a CPL of 3.
As wit h t he virt ual mode ext ension ( enabled wit h t he VME flag in t he CR4 regist er) ,
t he prot ect ed- mode virt ual int errupt ext ension only affect s maskable hardware
int errupt s ( int errupt vect ors 32 t hrough 255) . NMI int errupt s and except ions are
handled in t he normal manner.
When prot ect ed- mode virt ual int errupt s are disabled ( t hat is, when t he PVI flag in
cont rol regist er CR4 is set t o 0, t he CPL is less t han 3, or t he I OPL value is 3) , t hen
t he CLI and STI inst ruct ions execut e in a manner compat ible wit h t he I nt el486
processor. That is, if t he CPL is great er ( less privileged) t han t he I / O privilege level
( I OPL) , a general- prot ect ion except ion occurs. I f t he I OPL value is 3, CLI and STI
clear or set t he I F flag, respect ively.
PUSHF, POPF, I RET and I NT are execut ed like in t he I nt el486 processor, regardless of
whet her prot ect ed- mode virt ual int errupt s are enabled.
Vol. 3 17-31
8086 EMULATION
I t is only possible t o ent er virt ual- 8086 mode t hrough a t ask swit ch or t he execut ion
of an I RET inst ruct ion, and it is only possible t o leave virt ual- 8086 mode by fault ing
t o a prot ect ed- mode int errupt handler ( t ypically t he general- prot ect ion except ion
handler, which in t urn calls t he virt ual 8086- mode monit or) . I n bot h cases, t he
EFLAGS regist er is saved and rest ored. This is not t rue, however, in prot ect ed mode
when t he PVI flag is set and t he processor is not in virt ual- 8086 mode. Here, it is
possible t o call a procedure at a different privilege level, in which case t he EFLAGS
regist er is not saved or modified. However, t he st at es of VI F and VI P flags are never
examined by t he processor when t he CPL is not 3.
17-32 Vol. 3
8086 EMULATION
Vol. 3 18-1
CHAPTER 18
MIXING 16-BIT AND 32-BIT CODE
Program modules writ t en t o run on I A- 32 processors can be eit her 16- bit modules or
32- bit modules. Table 18- 1 shows t he charact erist ic of 16- bit and 32- bit modules.
The I A- 32 processors funct ion most efficient ly when execut ing 32- bit program
modules. They can, however, also execut e 16- bit program modules, in any of t he
following ways:
I n real- address mode.
I n virt ual- 8086 mode.
Syst em management mode ( SMM) .
As a prot ect ed- mode t ask, when t he code, dat a, and st ack segment s for t he t ask
are all configured as a 16- bit segment s.
By int egrat ing 16- bit and 32- bit segment s int o a single prot ect ed- mode t ask.
By int egrat ing 16- bit operat ions int o 32- bit code segment s.
Real- address mode, virt ual- 8086 mode, and SMM are nat ive 16- bit modes. A legacy
program assembled and/ or compiled t o run on an I nt el 8086 or I nt el 286 processor
should run in real- address mode or virt ual- 8086 mode wit hout modificat ion. Sixt een-
bit program modules can also be writ t en t o run in real- address mode for handling
syst em init ializat ion or t o run in SMM for handling syst em management funct ions.
See Chapt er 17, 8086 Emulat ion, for det ailed informat ion on real- address mode
and virt ual- 8086 mode; see Chapt er 26, Syst em Management Mode, for informa-
t ion on SMM.
This chapt er describes how t o int egrat e 16- bit program modules wit h 32- bit program
modules when operat ing in prot ect ed mode and how t o mix 16- bit and 32- bit code
wit hin 32- bit code segment s.
Table 18-1. Characteristics of 16-Bit and 32-Bit Program Modules
Characteristic 16-Bit Program Modules 32-Bit Program Modules
Segment Size 0 to 64 KBytes 0 to 4 GBytes
Operand Sizes 8 bits and 16 bits 8 bits and 32 bits
Pointer Offset Size (Address
Size)
16 bits 32 bits
Stack Pointer Size 16 Bits 32 Bits
Control Transfers Allowed to
Code Segments of This Size
16 Bits 32 Bits
18-2 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
18.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES
The following I A- 32 archit ect ure mechanisms are used t o dist inguish bet ween and
support 16- bit and 32- bit segment s and operat ions:
The D ( default operand and address size) flag in code- segment descript ors.
The B ( default st ack size) flag in st ack- segment descript ors.
16- bit and 32- bit call gat es, int errupt gat es, and t rap gat es.
Operand- size and address- size inst ruct ion prefixes.
16- bit and 32- bit general- purpose regist ers.
The D flag in a code- segment descript or det ermines t he default operand- size and
address- size for t he inst ruct ions of a code segment . ( I n real- address mode and
virt ual- 8086 mode, which do not use segment descript ors, t he default is 16 bit s. ) A
code segment wit h it s D flag set is a 32- bit segment ; a code segment wit h it s D flag
clear is a 16- bit segment .
The B flag in t he st ack- segment descript or specifies t he size of st ack point er ( t he
32- bit ESP regist er or t he 16- bit SP regist er) used by t he processor for implicit st ack
references. The B flag for all dat a descript ors also cont rols upper address range for
expand down segment s.
When t ransferring program cont rol t o anot her code segment t hrough a call gat e,
int errupt gat e, or t rap gat e, t he operand size used during t he t ransfer is det ermined
by t he t ype of gat e used ( 16- bit or 32- bit ) , ( not by t he D- flag or prefix of t he t ransfer
inst ruct ion) . The gat e t ype det ermines how ret urn informat ion is saved on t he st ack
( or st acks) .
For most efficient and t rouble- free operat ion of t he processor, 32- bit programs or
t asks should have t he D flag in t he code- segment descript or and t he B flag in t he
st ack- segment descript or set , and 16- bit programs or t asks should have t hese flags
clear. Program cont rol t ransfers from 16- bit segment s t o 32- bit segment s ( and vice
versa) are handled most efficient ly t hrough call, int errupt , or t rap gat es.
I nst ruct ion prefixes can be used t o override t he default operand size and address size
of a code segment . These prefixes can be used in real- address mode as well as in
prot ect ed mode and virt ual- 8086 mode. An operand- size or address- size prefix only
changes t he size for t he durat ion of t he inst ruct ion.
18.2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A
CODE SEGMENT
The following t wo inst ruct ion prefixes allow mixing of 32- bit and 16- bit operat ions
wit hin one segment :
The operand- size prefix ( 66H)
The address- size prefix ( 67H)
Vol. 3 18-3
MIXING 16-BIT AND 32-BIT CODE
These prefixes reverse t he default size select ed by t he D flag in t he code- segment
descript or. For example, t he processor can int erpret t he ( MOV mem, reg) inst ruct ion
in any of four ways:
I n a 32- bit code segment :
Moves 32 bit s from a 32- bit regist er t o memory using a 32- bit effect ive
address.
I f preceded by an operand- size prefix, moves 16 bit s from a 16- bit regist er t o
memory using a 32- bit effect ive address.
I f preceded by an address- size prefix, moves 32 bit s from a 32- bit regist er t o
memory using a 16- bit effect ive address.
I f preceded by bot h an address- size prefix and an operand- size prefix, moves
16 bit s from a 16- bit regist er t o memory using a 16- bit effect ive address.
I n a 16- bit code segment :
Moves 16 bit s from a 16- bit regist er t o memory using a 16- bit effect ive
address.
I f preceded by an operand- size prefix, moves 32 bit s from a 32- bit regist er t o
memory using a 16- bit effect ive address.
I f preceded by an address- size prefix, moves 16 bit s from a 16- bit regist er t o
memory using a 32- bit effect ive address.
I f preceded by bot h an address- size prefix and an operand- size prefix, moves
32 bit s from a 32- bit regist er t o memory using a 32- bit effect ive address.
The previous examples show t hat any inst ruct ion can generat e any combinat ion of
operand size and address size regardless of whet her t he inst ruct ion is in a 16- or
32- bit segment . The choice of t he 16- or 32- bit default for a code segment is
normally based on t he following crit eria:
Per f or mance Always use 32- bit code segment s when possible. They run
much fast er t han 16- bit code segment s on P6 family processors, and somewhat
fast er on earlier I A- 32 processors.
The oper at i ng syst em t he code segment w i l l be r unni ng on I f t he
operat ing syst em is a 16- bit operat ing syst em, it may not support 32- bit program
modules.
Mode of oper at i on I f t he code segment is being designed t o run in real-
address mode, virt ual- 8086 mode, or SMM, it must be a 16- bit code segment .
Back w ar d compat i bi l i t y t o ear l i er I A- 32 pr ocessor s I f a code segment
must be able t o run on an I nt el 8086 or I nt el 286 processor, it must be a 16- bit
code segment .
18-4 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
18.3 SHARING DATA AMONG MIXED-SIZE CODE
SEGMENTS
Dat a segment s can be accessed from bot h 16- bit and 32- bit code segment s. When a
dat a segment t hat is larger t han 64 KByt es is t o be shared among 16- and 32- bit
code segment s, t he dat a t hat is t o be accessed from t he 16- bit code segment s must
be locat ed wit hin t he first 64 KByt es of t he dat a segment . The reason for t his is t hat
16- bit point ers by definit ion can only point t o t he first 64 KByt es of a segment .
A st ack t hat spans less t han 64 KByt es can be shared by bot h 16- and 32- bit code
segment s. This class of st acks includes:
St acks in expand- up segment s wit h t he G ( granularit y) and B ( big) flags in t he
st ack- segment descript or clear.
St acks in expand- down segment s wit h t he G and B flags clear.
St acks in expand- up segment s wit h t he G flag set and t he B flag clear and where
t he st ack is cont ained complet ely wit hin t he lower 64 KByt es. ( Offset s great er
t han FFFFH can be used for dat a, ot her t han t he st ack, which is not shared. )
See Sect ion 3. 4.5, Segment Descript ors, for a descript ion of t he G and B flags and
t he expand- down st ack t ype.
The B flag cannot , in general, be used t o change t he size of st ack used by a 16- bit
code segment . This flag cont rols t he size of t he st ack point er only for implicit st ack
references such as t hose caused by int errupt s, except ions, and t he PUSH, POP, CALL,
and RET inst ruct ions. I t does not cont rol explicit st ack references, such as accesses
t o paramet ers or local variables. A 16- bit code segment can use a 32- bit st ack only if
t he code is modified so t hat all explicit references t o t he st ack are preceded by t he
32- bit address- size prefix, causing t hose references t o use 32- bit addressing and
explicit writ es t o t he st ack point er are preceded by a 32- bit operand- size prefix.
I n 32- bit , expand- down segment s, all offset s may be great er t han 64 KByt es; t here-
fore, 16- bit code cannot use t his kind of st ack segment unless t he code segment is
modified t o use 32- bit addressing.
18.4 TRANSFERRING CONTROL AMONG MIXED-SIZE CODE
SEGMENTS
There are t hree ways for a procedure in a 16- bit code segment t o safely make a call
t o a 32- bit code segment :
Make t he call t hrough a 32- bit call gat e.
Make a 16- bit call t o a 32- bit int erface procedure. The int erface procedure t hen
makes a 32- bit call t o t he int ended dest inat ion.
Modify t he 16- bit procedure, insert ing an operand- size prefix before t he call, t o
change it t o a 32- bit call.
Vol. 3 18-5
MIXING 16-BIT AND 32-BIT CODE
Likewise, t here are t hree ways for procedure in a 32- bit code segment t o safely make
a call t o a 16- bit code segment :
Make t he call t hrough a 16- bit call gat e. Here, t he EI P value at t he CALL
inst ruct ion cannot exceed FFFFH.
Make a 32- bit call t o a 16- bit int erface procedure. The int erface procedure t hen
makes a 16- bit call t o t he int ended dest inat ion.
Modify t he 32- bit procedure, insert ing an operand- size prefix before t he call,
changing it t o a 16- bit call. Be cert ain t hat t he ret urn offset does not exceed
FFFFH.
These met hods of t ransferring program cont rol overcome t he following archit ect ural
limit at ions imposed on calls bet ween 16- bit and 32- bit code segment s:
Point ers from 16- bit code segment s ( which by default can only be 16 bit s) cannot
be used t o address dat a or code locat ed beyond FFFFH in a 32- bit segment .
The operand- size at t ribut es for a CALL and it s companion RETURN inst ruct ion
must be t he same t o maint ain st ack coherency. This is also t rue for implicit calls
t o int errupt and except ion handlers and t heir companion I RET inst ruct ions.
A 32- bit paramet ers ( part icularly a point er paramet er) great er t han FFFFH
cannot be squeezed int o a 16- bit paramet er locat ion on a st ack.
The size of t he st ack point er ( SP or ESP) changes when swit ching bet ween 16- bit
and 32- bit code segment s.
These limit at ions are discussed in great er det ail in t he following sect ions.
18.4.1 Code-Segment Pointer Size
For cont rol- t ransfer inst ruct ions t hat use a point er t o ident ify t he next inst ruct ion
( t hat is, t hose t hat do not use gat es) , t he operand- size at t ribut e det ermines t he size
of t he offset port ion of t he point er. The implicat ions of t his rule are as follows:
A JMP, CALL, or RET inst ruct ion from a 32- bit segment t o a 16- bit segment is
always possible using a 32- bit operand size, providing t he 32- bit point er does not
exceed FFFFH.
A JMP, CALL, or RET inst ruct ion from a 16- bit segment t o a 32- bit segment
cannot address a dest inat ion great er t han FFFFH, unless t he inst ruct ion is given
an operand- size prefix.
See Sect ion 18. 4. 5, Writ ing I nt erface Procedures, for an int erface procedure t hat
can t ransfer program cont rol from 16- bit segment s t o dest inat ions in 32- bit
segment s beyond FFFFH.
18.4.2 Stack Management for Control Transfer
Because t he st ack is managed different ly for 16- bit procedure calls t han for 32- bit
calls, t he operand- size at t ribut e of t he RET inst ruct ion must mat ch t hat of t he CALL
18-6 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
inst ruct ion ( see Figure 18- 1) . On a 16- bit call, t he processor pushes t he cont ent s of
t he 16- bit I P regist er and ( for calls bet ween privilege levels) t he 16- bit SP regist er.
The mat ching RET inst ruct ion must also use a 16- bit operand size t o pop t hese 16- bit
values from t he st ack int o t he 16- bit regist ers.
A 32- bit CALL inst ruct ion pushes t he cont ent s of t he 32- bit EI P regist er and ( for
int er- privilege- level calls) t he 32- bit ESP regist er. Here, t he mat ching RET inst ruct ion
must use a 32- bit operand size t o pop t hese 32- bit values from t he st ack int o t he
32- bit regist ers. I f t he t wo part s of a CALL/ RET inst ruct ion pair do not have mat ching
operand sizes, t he st ack will not be managed correct ly and t he values of t he inst ruc-
t ion point er and st ack point er will not be rest ored t o correct values.
Figure 18-1. Stack after Far 16- and 32-Bit Calls
SP
After 16-bit Call
PARM 1
IP SP
SS
PARM 2
CS
0 31
SS
EIP
After 32-bit Call
CS
ESP
ESP
PARM 2
PARM 1
0 31
With Privilege Transition
Stack
Growth
After 16-bit Call
PARM 1
IP SP
PARM 2
CS
0 31
Without Privilege Transition
Stack
Growth
After 32-bit Call
PARM 1
ESP
PARM 2
CS
0 31
EIP
Undefined
Vol. 3 18-7
MIXING 16-BIT AND 32-BIT CODE
While execut ing 32- bit code, if a call is made t o a 16- bit code segment which is at t he
same or a more privileged level ( t hat is, t he DPL of t he called code segment is less
t han or equal t o t he CPL of t he calling code segment ) t hrough a 16- bit call gat e, t hen
t he upper 16- bit s of t he ESP regist er may be unreliable upon ret urning t o t he 32- bit
code segment ( t hat is, aft er execut ing a RET in t he 16- bit code segment ) .
When t he CALL inst ruct ion and it s mat ching RET inst ruct ion are in code segment s
t hat have D flags wit h t he same values ( t hat is, bot h are 32- bit code segment s or
bot h are 16- bit code segment s) , t he default set t ings may be used. When t he CALL
inst ruct ion and it s mat ching RET inst ruct ion are in segment s which have different
D- flag set t ings, an operand- size prefix must be used.
18.4.2.1 Controlling the Operand-Size Attribute For a Call
Three t hings can det ermine t he operand- size of a call:
The D flag in t he segment descript or for t he calling code segment .
An operand- size inst ruct ion prefix.
The t ype of call gat e ( 16- bit or 32- bit ) , if a call is made t hrough a call gat e.
When a call is made wit h a point er ( rat her t han a call gat e) , t he D flag for t he calling
code segment det ermines t he operand- size for t he CALL inst ruct ion. This operand-
size at t ribut e can be overridden by prepending an operand- size prefix t o t he CALL
inst ruct ion. So, for example, if t he D flag for a code segment is set for 16 bit s and t he
operand- size prefix is used wit h a CALL inst ruct ion, t he processor will cause t he infor-
mat ion st ored on t he st ack t o be st ored in 32- bit format . I f t he call is t o a 32- bit code
segment , t he inst ruct ions in t hat code segment will be able t o read t he st ack coher-
ent ly. Also, a RET inst ruct ion from t he 32- bit code segment wit hout an operand- size
prefix will maint ain st ack coherency wit h t he 16- bit code segment being ret urned t o.
When a CALL inst ruct ion references a call- gat e descript or, t he t ype of call is det er-
mined by t he t ype of call gat e ( 16- bit or 32- bit ) . The offset t o t he dest inat ion in t he
code segment being called is t aken from t he gat e descript or; t herefore, if a 32- bit call
gat e is used, a procedure in a 16- bit code segment can call a procedure locat ed more
t han 64 KByt es from t he base of a 32- bit code segment , because a 32- bit call gat e
uses a 32- bit offset .
Not e t hat regardless of t he operand size of t he call and how it is det ermined, t he size
of t he st ack point er used ( SP or ESP) is always cont rolled by t he B flag in t he st ack-
segment descript or current ly in use ( t hat is, when B is clear, SP is used, and when B
is set , ESP is used) .
An unmodified 16- bit code segment t hat has run successfully on an 8086 processor
or in real- mode on a lat er I A- 32 archit ect ure processor will have it s D flag clear and
will not use operand- size override prefixes. As a result , all CALL inst ruct ions in t his
code segment will use t he 16- bit operand- size at t ribut e. Procedures in t hese code
18-8 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
segment s can be modified t o safely call procedures t o 32- bit code segment s in eit her
of t wo ways:
Relink t he CALL inst ruct ion t o point t o 32- bit call gat es ( see Sect ion 18. 4. 2. 2,
Passing Paramet ers Wit h a Gat e ) .
Add a 32- bit operand- size prefix t o each CALL inst ruct ion.
18.4.2.2 Passing Parameters With a Gate
When referencing 32- bit gat es wit h 16- bit procedures, it is import ant t o consider t he
number of paramet ers passed in each procedure call. The count field of t he gat e
descript or specifies t he size of t he paramet er st ring t o copy from t he current st ack t o
t he st ack of a more privileged ( numerically lower privilege level) procedure. The
count field of a 16- bit gat e specifies t he number of 16- bit words t o be copied,
whereas t he count field of a 32- bit gat e specifies t he number of 32- bit doublewords
t o be copied. The count field for a 32- bit gat e must t hus be half t he size of t he
number of words being placed on t he st ack by a 16- bit procedure. Also, t he 16- bit
procedure must use an even number of words as paramet ers.
18.4.3 Interrupt Control Transfers
A program- cont rol t ransfer caused by an except ion or int errupt is always carried out
t hrough an int errupt or t rap gat e ( locat ed in t he I DT) . Here, t he t ype of t he gat e
( 16- bit or 32- bit ) det ermines t he operand- size at t ribut e used in t he implicit call t o
t he except ion or int errupt handler procedure in anot her code segment .
A 32- bit int errupt or t rap gat e provides a safe int erface t o a 32- bit except ion or int er-
rupt handler when t he except ion or int errupt occurs in eit her a 32- bit or a 16- bit code
segment . I t is somet imes impract ical, however, t o place except ion or int errupt
handlers in 16- bit code segment s, because only 16- bit ret urn addresses are saved on
t he st ack. I f an except ion or int errupt occurs in a 32- bit code segment when t he EI P
was great er t han FFFFH, t he 16- bit handler procedure cannot provide t he correct
ret urn address.
18.4.4 Parameter Translation
When segment offset s or point ers ( which cont ain segment offset s) are passed as
paramet ers bet ween 16- bit and 32- bit procedures, some t ranslat ion is required. I f a
32- bit procedure passes a point er t o dat a locat ed beyond 64 KByt es t o a 16- bit
procedure, t he 16- bit procedure cannot use it . Except for t his limit at ion, int erface
code can perform any format conversion bet ween 32- bit and 16- bit point ers t hat
may be needed.
Paramet ers passed by value bet ween 32- bit and 16- bit code also may require t rans-
lat ion bet ween 32- bit and 16- bit format s. The form of t he t ranslat ion is applicat ion-
dependent .
Vol. 3 18-9
MIXING 16-BIT AND 32-BIT CODE
18.4.5 Writing Interface Procedures
Placing int erface code bet ween 32- bit and 16- bit procedures can be t he solut ion t o
t he following int erface problems:
Allowing procedures in 16- bit code segment s t o call procedures wit h offset s
great er t han FFFFH in 32- bit code segment s.
Mat ching operand- size at t ribut es bet ween companion CALL and RET inst ruct ions.
Translat ing paramet ers ( dat a) , including managing paramet er st rings wit h a
variable count or an odd number of 16- bit words.
The possible invalidat ion of t he upper bit s of t he ESP regist er.
The int erface procedure is simplified where t hese rules are followed.
1. The int erface procedure must reside in a 32- bit code segment ( t he D flag for t he
code- segment descript or is set ) .
2. All procedures t hat may be called by 16- bit procedures must have offset s not
great er t han FFFFH.
3. All ret urn addresses saved by 16- bit procedures must have offset s not great er
t han FFFFH.
The int erface procedure becomes more complex if any of t hese rules are violat ed. For
example, if a 16- bit procedure calls a 32- bit procedure wit h an ent ry point beyond
FFFFH, t he int erface procedure will need t o provide t he offset t o t he ent ry point . The
mapping bet ween 16- and 32- bit addresses is only performed aut omat ically when a
call gat e is used, because t he gat e descript or for a call gat e cont ains a 32- bit
address. When a call gat e is not used, t he int erface code must provide t he 32- bit
address.
The st ruct ure of t he int erface procedure depends on t he t ypes of calls it is going t o
support , as follows:
Cal l s f r om 16- bi t pr ocedur es t o 32- bi t pr ocedur es Calls t o t he int erface
procedure from a 16- bit code segment are made wit h 16- bit CALL inst ruct ions
( by default , because t he D flag for t he calling code- segment descript or is clear) ,
and 16- bit operand- size prefixes are used wit h RET inst ruct ions t o ret urn from
t he int erface procedure t o t he calling procedure. Calls from t he int erface
procedure t o 32- bit procedures are performed wit h 32- bit CALL inst ruct ions ( by
default , because t he D flag for t he int erface procedures code segment is set ) ,
and ret urns from t he called procedures t o t he int erface procedure are performed
wit h 32- bit RET inst ruct ions ( also by default ) .
Cal l s f r om 32- bi t pr ocedur es t o 16- bi t pr ocedur es Calls t o t he int erface
procedure from a 32- bit code segment are made wit h 32- bit CALL inst ruct ions
( by default ) , and ret urns t o t he calling procedure from t he int erface procedure
are made wit h 32- bit RET inst ruct ions ( also by default ) . Calls from t he int erface
procedure t o 16- bit procedures require t he CALL inst ruct ions t o have t he
operand- size prefixes, and ret urns from t he called procedures t o t he int erface
procedure are performed wit h 16- bit RET inst ruct ions ( by default ) .
18-10 Vol. 3
MIXING 16-BIT AND 32-BIT CODE
Vol. 3 19-1
CHAPTER 19
ARCHITECTURE COMPATIBILITY
I nt el 64 and I A- 32 processors are binary compat ible. Compat ibilit y means t hat ,
wit hin limit ed const raint s, programs t hat execut e on previous generat ions of proces-
sors will produce ident ical result s when execut ed on lat er processors. The compat i-
bilit y const raint s and any implement at ion differences bet ween t he I nt el 64 and I A- 32
processors are described in t his chapt er.
Each new processor has enhanced t he soft ware visible archit ect ure from t hat found
in earlier I nt el 64 and I A- 32 processors. Those enhancement s have been defined
wit h considerat ion for compat ibilit y wit h previous and fut ure processors. This chapt er
also summarizes t he compat ibilit y considerat ions for t hose ext ensions.
19.1 PROCESSOR FAMILIES AND CATEGORIES
I A- 32 processors are referred t o in several different ways in t his chapt er, depending
on t he t ype of compat ibilit y informat ion being relat ed, as described in t he following:
I A- 32 Pr ocessor s All t he I nt el processors based on t he I nt el I A- 32 Archi-
t ect ure, which include t he 8086/ 88, I nt el 286, I nt el386, I nt el486, Pent ium,
Pent ium Pro, Pent ium I I , Pent ium III, Pent ium 4, and I nt el Xeon processors.
32- bi t Pr ocessor s All t he I A- 32 processors t hat use a 32- bit archit ect ure,
which include t he I nt el386, I nt el486, Pent ium, Pent ium Pro, Pent ium I I ,
Pent ium III, Pent ium 4, and I nt el Xeon processors.
16- bi t Pr ocessor s All t he I A- 32 processors t hat use a 16- bit archit ect ure,
which include t he 8086/ 88 and I nt el 286 processors.
P6 Fami l y Pr ocessor s All t he I A- 32 processors t hat are based on t he P6
microarchit ect ure, which include t he Pent ium Pro, Pent ium I I , and Pent ium III
processors.
Pent i um

4 Pr ocessor s A family of I A- 32 and I nt el 64 processors t hat are


based on t he I nt el Net Burst

microarchit ect ure.


I nt el

Pent i um

M Pr ocessor s A family of I A- 32 processors t hat are based


on t he I nt el Pent ium M processor microarchit ect ure.
I nt el

Cor e Duo and Sol o Pr ocessor s Families of I A- 32 processors t hat


are based on an improved I nt el Pent ium M processor microarchit ect ure.
I nt el

Xeon

Pr ocessor s A family of I A- 32 and I nt el 64 processors t hat are


based on t he I nt el Net Burst microarchit ect ure. This family includes t he I nt el Xeon
processor and t he I nt el Xeon processor MP based on t he I nt el Net Burst microar-
chit ect ure. I nt el Xeon processors 3000, 3100, 3200, 3300, 3200, 5100, 5200,
5300, 5400, 7200, 7300 series are based on I nt el Core microarchit ect ures and
support I nt el 64 archit ect ure.
19-2 Vol. 3
ARCHITECTURE COMPATIBILITY
Pent i um

D Pr ocessor s A family of dual- core I nt el 64 processors t hat


provides t wo processor cores in a physical package. Each core is based on t he
I nt el Net Burst microarchit ect ure.
Pent i um

Pr ocessor Ex t r eme Edi t i ons A family of dual- core I nt el 64


processors t hat provides t wo processor cores in a physical package. Each core is
based on t he I nt el Net Burst microarchit ect ure and support s I nt el Hyper-
Threading Technology.
I nt el

Cor e 2 Pr ocessor f ami l y A family of I nt el 64 processors t hat are


based on t he I nt el Core microarchit ect ure. I nt el Pent ium Dual- Core processors
are also based on t he I nt el Core microarchit ect ure.
I nt el

At om Pr ocessor s A family of I A- 32 and I nt el 64 processors t hat are


based on t he I nt el At om microarchit ect ure.
19.2 RESERVED BITS
Throughout t his manual, cert ain bit s are marked as reserved in many regist er and
memory layout descript ions. When bit s are marked as undefined or reserved, it is
essent ial for compat ibilit y wit h fut ure processors t hat soft ware t reat t hese bit s as
having a fut ure, t hough unknown effect . Soft ware should follow t hese guidelines in
dealing wit h reserved bit s:
Do not depend on t he st at es of any reserved bit s when t est ing t he values of
regist ers or memory locat ions t hat cont ain such bit s. Mask out t he reserved bit s
before t est ing.
Do not depend on t he st at es of any reserved bit s when st oring t hem t o memory
or t o a regist er.
Do not depend on t he abilit y t o ret ain informat ion writ t en int o any reserved bit s.
When loading a regist er, always load t he reserved bit s wit h t he values indicat ed
in t he document at ion, if any, or reload t hem wit h values previously read from t he
same regist er.
Soft ware writ t en for exist ing I A- 32 processor t hat handles reserved bit s correct ly will
port t o fut ure I A- 32 processors wit hout generat ing prot ect ion except ions.
19.3 ENABLING NEW FUNCTIONS AND MODES
Most of t he new cont rol funct ions defined for t he P6 family and Pent ium processors
are enabled by new mode flags in t he cont rol regist ers ( primarily regist er CR4) . This
regist er is undefined for I A- 32 processors earlier t han t he Pent ium processor.
At t empt ing t o access t his regist er wit h an I nt el486 or earlier I A- 32 processor result s
in an invalid- opcode except ion ( # UD) . Consequent ly, programs t hat execut e
correct ly on t he I nt el486 or earlier I A- 32 processor cannot erroneously enable t hese
funct ions. At t empt ing t o set a reserved bit in regist er CR4 t o a value ot her t han it s
Vol. 3 19-3
ARCHITECTURE COMPATIBILITY
original value result s in a general- prot ect ion except ion ( # GP) . So, programs t hat
execut e on t he P6 family and Pent ium processors cannot erroneously enable func-
t ions t hat may be implement ed in fut ure I A- 32 processors.
The P6 family and Pent ium processors do not check for at t empt s t o set reserved bit s
in model- specific regist ers; however t hese bit s may be checked on more recent
processors. I t is t he obligat ion of t he soft ware writ er t o enforce t his discipline. These
reserved bit s may be used in fut ure I nt el processors.
19.4 DETECTING THE PRESENCE OF NEW FEATURES
THROUGH SOFTWARE
Soft ware can check for t he presence of new archit ect ural feat ures and ext ensions in
eit her of t wo ways:
1. Test for t he presence of t he feat ure or ext ension. Soft ware can t est for t he
presence of new flags in t he EFLAGS regist er and cont rol regist ers. I f t hese flags
are reserved ( meaning not present in t he processor execut ing t he t est ) , an
except ion is generat ed. Likewise, soft ware can at t empt t o execut e a new
inst ruct ion, which result s in an invalid- opcode except ion ( # UD) being generat ed
if it is not support ed.
2. Execut e t he CPUI D inst ruct ion. The CPUI D inst ruct ion ( added t o t he I A- 32 in t he
Pent ium processor) indicat es t he presence of new feat ures direct ly.
See Chapt er 14, Processor I dent ificat ion and Feat ure Det erminat ion, in t he I nt el
64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1, for det ailed
informat ion on det ect ing new processor feat ures and ext ensions.
19.5 INTEL MMX TECHNOLOGY
The Pent ium processor wit h MMX t echnology int roduced t he MMX t echnology and a
set of MMX inst ruct ions t o t he I A- 32. The MMX inst ruct ions are described in Chapt er
9, Programming wit h I nt el MMX Technology, in t he I nt el 64 and I A- 32 Archi-
t ect ures Soft ware Developers Manual, Volume 1, and in t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volumes 2A & 2B. The MMX t echnology
and MMX inst ruct ions are also included in t he Pent ium I I , Pent ium III, Pent ium 4, and
I nt el Xeon processors.
19.6 STREAMING SIMD EXTENSIONS (SSE)
The St reaming SI MD Ext ensions ( SSE) were int roduced in t he Pent ium III processor.
The SSE ext ensions consist of a new set of inst ruct ions and a new set of regist ers.
The new regist ers include t he eight 128- bit XMM regist ers and t he 32- bit MXCSR
19-4 Vol. 3
ARCHITECTURE COMPATIBILITY
cont rol and st at us regist er. These inst ruct ions and regist ers are designed t o allow
SI MD comput at ions t o be made on single- precision float ing- point numbers. Several
of t hese new inst ruct ions also operat e in t he MMX regist ers. SSE inst ruct ions and
regist ers are described in Sect ion 10, Programming wit h St reaming SI MD Ext en-
sions ( SSE) , in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 1, and in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volumes 2A & 2B.
19.7 STREAMING SIMD EXTENSIONS 2 (SSE2)
The St reaming SI MD Ext ensions 2 ( SSE2) were int roduced in t he Pent ium 4 and I nt el
Xeon processors. They consist of a new set of inst ruct ions t hat operat e on t he XMM
and MXCSR regist ers and perform SI MD operat ions on double- precision float ing-
point values and on int eger values. Several of t hese new inst ruct ions also operat e in
t he MMX regist ers. SSE2 inst ruct ions and regist ers are described in Chapt er 11,
Programming wit h St reaming SI MD Ext ensions 2 ( SSE2) , in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1, and in t he I nt el 64
and I A- 32 Archit ect ures Soft ware Developers Manual, Volumes 2A & 2B.
19.8 STREAMING SIMD EXTENSIONS 3 (SSE3)
The St reaming SI MD Ext ensions 3 ( SSE3) wer e int roduced in Pent ium 4 pr ocessor s
support ing I nt el Hyper-Threading Technology and I nt el Xeon processors. SSE3
ext ensions include 13 inst ruct ions. Ten of t hese 13 inst ruct ions suppor t t he single
inst ruct ion mult iple dat a ( SI MD) execut ion model used wit h SSE/ SSE2 ext ensions.
One SSE3 inst r uct ion accelerat es x87 st yle pr ogramming for conver sion t o int eger.
The remaining t wo inst ruct ions ( MONI TOR and MWAI T) accelerat e synchronizat ion
of t hreads. SSE3 inst ruct ions are described in Chapt er 12, Programming wit h SSE3,
SSSE3 and SSE4, in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers
Manual, Volume 1, and in t he I nt el 64 and I A- 32 Archit ect ures Soft ware Devel-
opers Manual, Volumes 2A & 2B.
19.9 ADDITIONAL STREAMING SIMD EXTENSIONS
The Supplement al St reaming SI MD Ext ensions 3 ( SSSE3) were int roduced in t he
I nt el Core 2 processor and I nt el Xeon processor 5100 series. St reaming SI MD Ext en-
sions 4 provided 54 new inst ruct ions int roduced in 45nm I nt el Xeon processors and
I nt el Core 2 processors. SSSE3, SSE4.1 and SSE4. 2 inst ruct ions are described in
Chapt er 12, Programming wit h SSE3, SSSE3 and SSE4, in t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1, and in t he I nt el 64 and
I A- 32 Archit ect ures Soft ware Developers Manual, Volumes 2A & 2B.
Vol. 3 19-5
ARCHITECTURE COMPATIBILITY
19.10 INTEL HYPER-THREADING TECHNOLOGY
I nt el Hyper-Threading Technology provides t wo logical processors t hat can execut e
t wo separat e code st reams ( called t hreads) concurrent ly by using shared resources
in a single processor core or in a physical package.
This feat ure was int roduced in t he I nt el Xeon processor MP and lat er st eppings of t he
I nt el Xeon processor, and Pent ium 4 processors support ing I nt el Hyper-Threading
Technology. The feat ure is also found in t he Pent ium processor Ext reme Edit ion. See
also: Sect ion 8.7, I nt el

Hyper-Threading Technology Archit ect ure.


I nt el At om processors also support I nt el Hyper-Threading Technology.
19.11 MULTI-CORE TECHNOLOGY
The Pent ium D processor and Pent ium processor Ext reme Edit ion provide t wo
processor cores in each physical processor package. See also: Sect ion 8. 5, I nt el


Hyper-Threading Technology and I nt el

Mult i- Core Technology, and Sect ion 8. 8,


Mult i- Core Archit ect ure. I nt el Core 2 Duo, I nt el Pent ium Dual- Core processors,
I nt el Xeon processors 3000, 3100, 5100, 5200 series provide t wo processor cores in
each physical processor package. I nt el Core 2 Ext reme, I nt el Core 2 Quad proces-
sors, I nt el Xeon processors 3200, 3300, 5300, 5400, 7300 series provide t wo
processor cores in each physical processor package.
19.12 SPECIFIC FEATURES OF DUAL-CORE PROCESSOR
Dual- core processors may have some processor- specific feat ures. Use CPUI D feat ure
flags t o det ect t he availabilit y feat ures. Not e t he following:
CPUI D Br and St r i ng On Pent ium processor Ext reme Edit ion, t he process will
report t he correct brand st ring only aft er t he correct microcode updat es are
loaded.
Enhanced I nt el SpeedSt ep Technol ogy This feat ure is support ed in
Pent ium D processor but not in Pent ium processor Ext reme Edit ion.
19.13 NEW INSTRUCTIONS IN THE PENTIUM AND LATER
IA-32 PROCESSORS
Table 19- 1 ident ifies t he inst ruct ions int roduced int o t he I A- 32 in t he Pent ium
processor and lat er I A- 32 processors.
19-6 Vol. 3
ARCHITECTURE COMPATIBILITY
19.13.1 Instructions Added Prior to the Pentium Processor
The following inst ruct ions were added in t he I nt el486 processor:
BSWAP ( byt e swap) inst ruct ion.
XADD ( exchange and add) inst ruct ion.
CMPXCHG ( compare and exchange) inst ruct ion.
NVD ( invalidat e cache) inst ruct ion.
WBI NVD ( writ e- back and invalidat e cache) inst ruct ion.
I NVLPG ( invalidat e TLB ent ry) inst ruct ion.
The following inst ruct ions were added in t he I nt el386 processor:
LSS, LFS, and LGS ( load SS, FS, and GS regist ers) .
Long- displacement condit ional j umps.
Table 19-1. New Instruction in the Pentium Processor and
Later IA-32 Processors
Instruction CPUID Identification Bits Introduced In
CMOVcc (conditional move) EDX, Bit 15 Pentium Pro processor
FCMOVcc (floating-point conditional
move)
EDX, Bits 0 and 15
FCOMI (floating-point compare and set
EFLAGS)
EDX, Bits 0 and 15
RDPMC (read performance monitoring
counters)
EAX, Bits 8-11, set to 6H;
see Note 1
UD2 (undefined) EAX, Bits 8-11, set to 6H
CMPXCHG8B (compare and exchange 8
bytes)
EDX, Bit 8 Pentium processor
CPUID (CPU identification) None; see Note 2
RDTSC (read time-stamp counter) EDX, Bit 4
RDMSR (read model-specific register) EDX, Bit 5
WRMSR (write model-specific register) EDX, Bit 5
MMX Instructions EDX, Bit 23
NOTES:
1. The RDPMC instruction was introduced in the P6 family of processors and added to later model
Pentium processors. This instruction is model specific in nature and not architectural.
2. The CPUID instruction is available in all Pentium and P6 family processors and in later models of
the Intel486 processors. The ability to set and clear the ID flag (bit 21) in the EFLAGS register
indicates the availability of the CPUID instruction.
Vol. 3 19-7
ARCHITECTURE COMPATIBILITY
Single- bit inst ruct ions.
Bit scan inst ruct ions.
Double- shift inst ruct ions.
Byt e set on condit ion inst ruct ion.
Move wit h sign/ zero ext ension.
Generalized mult iply inst ruct ion.
MOV t o and from cont rol regist ers.
MOV t o and from t est regist ers ( now obsolet e) .
MOV t o and from debug regist ers.
RSM ( resume from SMM) . This inst ruct ion was int roduced in t he I nt el386 SL and
I nt el486 SL processors.
The following inst ruct ions were added in t he I nt el 387 mat h coprocessor:
FPREM1.
FUCOM, FUCOMP, and FUCOMPP.
19.14 OBSOLETE INSTRUCTIONS
The MOV t o and from t est regist ers inst ruct ions were removed from t he Pent ium
processor and fut ure I A- 32 processors. Execut ion of t hese inst ruct ions generat es an
invalid- opcode except ion ( # UD) .
19.15 UNDEFINED OPCODES
All new inst ruct ions defined for I A- 32 processors use binary encodings t hat were
reserved on earlier- generat ion processors. At t empt ing t o execut e a reserved opcode
always result s in an invalid- opcode ( # UD) except ion being generat ed. Consequent ly,
programs t hat execut e correct ly on earlier- generat ion processors cannot erroneously
execut e t hese inst ruct ions and t hereby produce unexpect ed result s when execut ed
on lat er I A- 32 processors.
19.16 NEW FLAGS IN THE EFLAGS REGISTER
The sect ion t it led EFLAGS Regist er in Chapt er 3, Basic Execut ion Environment , of
t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1,
shows t he configurat ion of flags in t he EFLAGS regist er for t he P6 family processors.
No new flags have been added t o t his regist er in t he P6 family processors. The flags
added t o t his regist er in t he Pent ium and I nt el486 processors are described in t he
following sect ions.
19-8 Vol. 3
ARCHITECTURE COMPATIBILITY
The following flags were added t o t he EFLAGS regist er in t he Pent ium processor:
VI F ( virt ual int errupt flag) , bit 19.
VI P ( virt ual int errupt pending) , bit 20.
I D ( ident ificat ion flag) , bit 21.
The AC flag ( bit 18) was added t o t he EFLAGS regist er in t he I nt el486 processor.
19.16.1 Using EFLAGS Flags to Distinguish Between 32-Bit IA-32
Processors
The following bit s in t he EFLAGS regist er t hat can be used t o different iat e bet ween
t he 32- bit I A- 32 processors:
Bit 18 ( t he AC flag) can be used t o dist inguish an I nt el386 processor from t he P6
family, Pent ium, and I nt el486 processors. Since it is not implement ed on t he
I nt el386 processor, it will always be clear.
Bit 21 ( t he I D flag) indicat es whet her an applicat ion can execut e t he CPUI D
inst ruct ion. The abilit y t o set and clear t his bit indicat es t hat t he processor is a P6
family or Pent ium processor. The CPUI D inst ruct ion can t hen be used t o
det ermine which processor.
Bit s 19 ( t he VI F flag) and 20 ( t he VI P flag) will always be zero on processors t hat
do not support virt ual mode ext ensions, which includes all 32- bit processors prior
t o t he Pent ium processor.
See Chapt er 14, Processor I dent ificat ion and Feat ure Det erminat ion, in t he I nt el
64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 1, for more infor-
mat ion on ident ifying processors.
19.17 STACK OPERATIONS
This sect ion ident ifies t he differences in st ack implement at ion bet ween t he various
I A- 32 processors.
19.17.1 PUSH SP
The P6 family, Pent ium, I nt el486, I nt el386, and I nt el 286 processors push a different
value on t he st ack for a PUSH SP inst ruct ion t han t he 8086 processor. The 32- bit
processors push t he value of t he SP regist er before it is decrement ed as part of t he
push operat ion; t he 8086 processor pushes t he value of t he SP regist er aft er it is
decrement ed. I f t he value pushed is import ant , replace PUSH SP inst ruct ions wit h t he
following t hree inst ruct ions:
PUSH BP
MOV BP, SP
Vol. 3 19-9
ARCHITECTURE COMPATIBILITY
XCHG BP, [BP]
This code funct ions as t he 8086 processor PUSH SP inst ruct ion on t he P6 family,
Pent ium, I nt el486, I nt el386, and I nt el 286 processors.
19.17.2 EFLAGS Pushed on the Stack
The set t ing of t he st ored values of bit s 12 t hrough 15 ( which includes t he I OPL field
and t he NT flag) in t he EFLAGS regist er by t he PUSHF inst ruct ion, by int errupt s, and
by except ions is different wit h t he 32- bit I A- 32 processors t han wit h t he 8086 and
I nt el 286 processors. The differences are as follows:
8086 processorbit s 12 t hrough 15 are always set .
I nt el 286 processorbit s 12 t hrough 15 are always cleared in real- address mode.
32- bit processors in real- address modebit 15 ( reserved) is always cleared, and
bit s 12 t hrough 14 have t he last value loaded int o t hem.
19.18 X87 FPU
This sect ion addresses t he issues t hat must be faced when port ing float ing- point
soft ware designed t o run on earlier I A- 32 processors and mat h coprocessors t o a
Pent ium 4, I nt el Xeon, P6 family, or Pent ium processor wit h int egrat ed x87 FPU. To
soft ware, a Pent ium 4, I nt el Xeon, or P6 family processor looks very much like a
Pent ium processor. Float ing- point soft ware which runs on a Pent ium or I nt el486 DX
processor, or on an I nt el486 SX processor/ I nt el 487 SX mat h coprocessor syst em or
an I nt el386 processor/ I nt el 387 mat h coprocessor syst em, will run wit h at most
minor modificat ions on a Pent ium 4, I nt el Xeon, or P6 f ami l y pr ocessor. To por t code
di r ect l y f r om an I nt el 286 pr ocessor / I nt el 287 mat h coprocessor syst em or an
I nt el 8086 processor/ 8087 mat h coprocessor syst em t o a Pent ium 4, I nt el Xeon, P6
family, or Pent ium processor, cert ain addit ional issues must be addressed.
I n t he following sect ions, t he t erm 32- bit x87 FPUs refers t o t he P6 family, Pent ium,
and I nt el486 DX processors, and t o t he I nt el 487 SX and I nt el 387 mat h coproces-
sors; t he t erm 16- bit I A- 32 mat h coprocessors refers t o t he I nt el 287 and 8087
mat h coprocessors.
19.18.1 Control Register CR0 Flags
The ET, NE, and MP flags in cont rol regist er CR0 cont rol t he int erface bet ween t he
int eger unit of an I A- 32 processor and eit her it s int ernal x87 FPU or an ext ernal mat h
coprocessor. The effect of t hese flags in t he various I A- 32 processors are described in
t he following paragraphs.
The ET ( ext ension t ype) flag ( bit 4 of t he CR0 regist er) is used in t he I nt el386
processor t o indicat e whet her t he mat h coprocessor in t he syst em is an I nt el 287
19-10 Vol. 3
ARCHITECTURE COMPATIBILITY
mat h coprocessor ( flag is clear) or an I nt el 387 DX mat h coprocessor ( flag is set ) .
This bit is hardwired t o 1 in t he P6 family, Pent ium, and I nt el486 processors.
The NE ( Numeric Except ion) flag ( bit 5 of t he CR0 regist er) is used in t he P6 family,
Pent ium, and I nt el486 processors t o det ermine whet her unmasked float ing- point
except ions are report ed int ernally t hrough int errupt vect or 16 ( flag is set ) or ext er-
nally t hrough an ext ernal int errupt ( flag is clear) . On a hardware reset , t he NE flag is
init ialized t o 0, so soft ware using t he aut omat ic int ernal error- report ing mechanism
must set t his flag t o 1. This flag is nonexist ent on t he I nt el386 processor.
As on t he I nt el 286 and I nt el386 processors, t he MP ( monit or coprocessor) flag ( bit 1
of regist er CR0) det ermines whet her t he WAI T/ FWAI T inst ruct ions or wait ing- t ype
float ing- point inst ruct ions t rap when t he cont ext of t he x87 FPU is different from t hat
of t he current ly- execut ing t ask. I f t he MP and TS flag are set , t hen a WAI T/ FWAI T
inst ruct ion and wait ing inst ruct ions will cause a device- not - available except ion
( int errupt vect or 7) . The MP flag is used on t he I nt el 286 and I nt el386 processors t o
support t he use of a WAI T/ FWAI T inst ruct ion t o wait on a device ot her t han a mat h
coprocessor. The device report s it s st at us t hrough t he BUSY# pin. Since t he P6
family, Pent ium, and I nt el486 processors do not have such a pin, t he MP flag has no
relevant use and should be set t o 1 for normal operat ion.
19.18.2 x87 FPU Status Word
This sect ion ident ifies differences t o t he x87 FPU st at us word for t he different I A- 32
processors and mat h coprocessors, t he reason for t he differences, and t heir impact
on soft ware.
19.18.2.1 Condition Code Flags (C0 through C3)
The following informat ion pert ains t o differences in t he use of t he condit ion code
flags ( C0 t hrough C3) locat ed in bit s 8, 9, 10, and 14 of t he x87 FPU st at us word.
Aft er execut ion of an FI NI T inst ruct ion or a hardware reset on a 32- bit x87 FPU, t he
condit ion code flags are set t o 0. The same operat ions on a 16- bit I A- 32 mat h copro-
cessor leave t hese flags int act ( t hey cont ain t heir prior value) . This difference in
operat ion has no impact on soft ware and provides a consist ent st at e aft er reset .
Transcendent al inst ruct ion result s in t he core range of t he P6 family and Pent ium
processors may differ from t he I nt el486 DX processor and I nt el 487 SX mat h copro-
cessor by 2 t o 3 unit s in t he last place ( ulps) ( see Transcendent al I nst ruct ion Accu-
racy in Chapt er 8, Programming wit h t he x87 FPU, of t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1) . As a result , t he value saved
in t he C1 flag may also differ.
Aft er an incomplet e FPREM/ FPREM1 inst ruct ion, t he C0, C1, and C3 flags are set t o 0
on t he 32- bit x87 FPUs. Aft er t he same operat ion on a 16- bit I A- 32 mat h copro-
cessor, t hese flags are left int act .
Vol. 3 19-11
ARCHITECTURE COMPATIBILITY
On t he 32- bit x87 FPUs, t he C2 flag serves as an incomplet e flag for t he FTAN inst ruc-
t ion. On t he 16- bit I A- 32 mat h coprocessors, t he C2 flag is undefined for t he FPTAN
inst ruct ion. This difference has no impact on soft ware, because I nt el 287 or 8087
programs do not check C2 aft er an FPTAN inst ruct ion. The use of t his flag on lat er
processors allows fast checking of operand range.
19.18.2.2 Stack Fault Flag
When unmasked st ack overflow or underflow occurs on a 32- bit x87 FPU, t he I E flag
( bit 0) and t he SF flag ( bit 6) of t he x87 FPU st at us word are set t o indicat e a st ack
fault and condit ion code flag C1 is set or cleared t o indicat e overflow or underflow,
respect ively. When unmasked st ack overflow or underflow occurs on a 16- bit I A- 32
mat h coprocessor, only t he I E flag is set . Bit 6 is reserved on t hese processors. The
addit ion of t he SF flag on a 32- bit x87 FPU has no impact on soft ware. Exist ing excep-
t ion handlers need not change, but may be upgraded t o t ake advant age of t he addi-
t ional informat ion.
19.18.3 x87 FPU Control Word
Only affine closure is support ed for infinit y cont rol on a 32- bit x87 FPU. The infinit y
cont rol flag ( bit 12 of t he x87 FPU cont rol word) remains programmable on t hese
processors, but has no effect . This change was made t o conform t o t he I EEE St an-
dard 754 for Binary Float ing- Point Arit hmet ic. On a 16- bit I A- 32 mat h coprocessor,
bot h affine and proj ect ive closures are support ed, as det ermined by t he set t ing of bit
12. Aft er a hardware reset , t he default value of bit 12 is proj ect ive. Soft ware t hat
requires proj ect ive infinit y arit hmet ic may give different result s.
19.18.4 x87 FPU Tag Word
When loading t he t ag word of a 32- bit x87 FPU, using an FLDENV, FRSTOR, or
FXRSTOR ( Pent ium III processor only) inst ruct ion, t he processor examines t he
incoming t ag and classifies t he locat ion only as empt y or non- empt y. Thus, t ag
values of 00, 01, and 10 are int erpret ed by t he processor t o indicat e a non- empt y
locat ion. The t ag value of 11 is int erpret ed by t he processor t o indicat e an empt y
locat ion. Subsequent operat ions on a non- empt y regist er always examine t he value
in t he regist er, not t he value in it s t ag. The FSTENV, FSAVE, and FXSAVE ( Pent ium III
processor only) inst ruct ions examine t he non- empt y regist ers and put t he correct
values in t he t ags before st oring t he t ag word.
The corresponding t ag for a 16- bit I A- 32 mat h coprocessor is checked before each
regist er access t o det ermine t he class of operand in t he regist er; t he t ag is updat ed
aft er every change t o a regist er so t hat t he t ag always reflect s t he most recent st at us
of t he regist er. Soft ware can load a t ag wit h a value t hat disagrees wit h t he cont ent s
of a regist er ( for example, t he regist er cont ains a valid value, but t he t ag says
special) . Here, t he 16- bit I A- 32 mat h coprocessors honor t he t ag and do not examine
t he regist er.
19-12 Vol. 3
ARCHITECTURE COMPATIBILITY
Soft ware writ t en t o run on a 16- bit I A- 32 mat h coprocessor may not operat e
correct ly on a 16- bit x87 FPU, if it uses t he FLDENV, FRSTOR, or FXRSTOR inst ruc-
t ions t o change t ags t o values ( ot her t han t o empt y) t hat are different from act ual
regist er cont ent s.
The encoding in t he t ag word for t he 32- bit x87 FPUs for unsupport ed dat a format s
( including pseudo- zero and unnormal) is special ( 10B) , t o comply wit h I EEE St andard
754. The encoding in t he 16- bit I A- 32 mat h coprocessors for pseudo- zero and
unnormal is valid ( 00B) and t he encoding for ot her unsupport ed dat a format s is
special ( 10B) . Code t hat recognizes t he pseudo- zero or unnormal format as valid
must t herefore be changed if it is port ed t o a 32- bit x87 FPU.
19.18.5 Data Types
This sect ion discusses t he differences of dat a t ypes for t he various x87 FPUs and
mat h coprocessors.
19.18.5.1 NaNs
The 32- bit x87 FPUs dist inguish bet ween signaling NaNs ( SNaNs) and quiet NaNs
( QNaNs) . These x87 FPUs only generat e QNaNs and normally do not generat e an
except ion upon encount ering a QNaN. An invalid- operat ion except ion ( # I ) is gener-
at ed only upon encount ering a SNaN, except for t he FCOM, FI ST, and FBSTP inst ruc-
t ions, which also generat es an invalid- operat ion except ions for a QNaNs. This
behavior mat ches I EEE St andard 754.
The 16- bit I A- 32 mat h coprocessors only generat e one kind of NaN ( t he equivalent of
a QNaN) , but t he raise an invalid- operat ion except ion upon encount ering any kind of
NaN.
When port ing soft ware writ t en t o run on a 16- bit I A- 32 mat h coprocessor t o a 32- bit
x87 FPU, uninit ialized memory locat ions t hat cont ain QNaNs should be changed t o
SNaNs t o cause t he x87 FPU or mat h coprocessor t o fault when uninit ialized memory
locat ions are referenced.
19.18.5.2 Pseudo-zero, Pseudo-NaN, Pseudo-infinity, and Unnormal
Formats
The 32- bit x87 FPUs neit her generat e nor support t he pseudo- zero, pseudo- NaN,
pseudo- infinit y, and unnormal format s. Whenever t hey encount er t hem in an arit h-
met ic operat ion, t hey raise an invalid- operat ion except ion. The 16- bit I A- 32 mat h
coprocessors define and support special handling for t hese format s. Support for
t hese format s was dropped t o conform wit h I EEE St andard 754 for Binary Float ing-
Point Arit hmet ic.
This change should not impact soft ware port ed from 16- bit I A- 32 mat h coprocessors
t o 32- bit x87 FPUs. The 32- bit x87 FPUs do not generat e t hese format s, and t here-
fore will not encount er t hem unless soft ware explicit ly loads t hem in t he dat a regis-
Vol. 3 19-13
ARCHITECTURE COMPATIBILITY
t ers. The only affect may be in how soft ware handles t he t ags in t he t ag word ( see
also: Sect ion 19. 18. 4, x87 FPU Tag Word ) .
19.18.6 Floating-Point Exceptions
This sect ion ident ifies t he implement at ion differences in except ion handling for
float ing- point inst ruct ions in t he various x87 FPUs and mat h coprocessors.
19.18.6.1 Denormal Operand Exception (#D)
When t he denormal operand except ion is masked, t he 32- bit x87 FPUs aut omat ically
normalize denormalized numbers when possible; whereas, t he 16- bit I A- 32 mat h
coprocessors ret urn a denormal result . A program writ t en t o run on a 16- bit I A- 32
mat h coprocessor t hat uses t he denormal except ion solely t o normalize denormal-
ized operands is redundant when run on t he 32- bit x87 FPUs. I f such a program is run
on 32- bit x87 FPUs, performance can be improved by masking t he denormal excep-
t ion. Float ing- point programs run fast er when t he FPU performs normalizat ion of
denormalized operands.
The denormal operand except ion is not raised for t ranscendent al inst ruct ions and t he
FXTRACT inst ruct ion on t he 16- bit I A- 32 mat h coprocessors. This except ion is raised
for t hese inst ruct ions on t he 32- bit x87 FPUs. The except ion handlers port ed t o t hese
lat t er processors need t o be changed only if t he handlers gives special t reat ment t o
different opcodes.
19.18.6.2 Numeric Overflow Exception (#O)
On t he 32- bit x87 FPUs, when t he numeric overflow except ion is masked and t he
rounding mode is set t o chop ( t oward 0) , t he result is t he largest posit ive or smallest
negat ive number. The 16- bit I A- 32 mat h coprocessors do not signal t he overflow
except ion when t he masked response is not ; t hat is, t hey signal overflow only
when t he rounding cont rol is not set t o round t o 0. I f rounding is set t o chop ( t oward
0) , t he result is posit ive or negat ive . Under t he most common rounding modes, t his
difference has no impact on exist ing soft ware.
I f rounding is t oward 0 ( chop) , a program on a 32- bit x87 FPU produces, under over-
flow condit ions, a result t hat is different in t he least significant bit of t he significand,
compared t o t he result on a 16- bit I A- 32 mat h coprocessor. The reason for t his differ-
ence is I EEE St andard 754 compat ibilit y.
When t he overflow except ion is not masked, t he precision except ion is flagged on t he
32- bit x87 FPUs. When t he result is st ored in t he st ack, t he significand is rounded
according t o t he precision cont rol ( PC) field of t he FPU cont rol word or according t o
t he opcode. On t he 16- bit I A- 32 mat h coprocessors, t he precision except ion is not
flagged and t he significand is not rounded. The impact on exist ing soft ware is t hat if
t he result is st ored on t he st ack, a program running on a 32- bit x87 FPU produces a
different result under overflow condit ions t han on a 16- bit I A- 32 mat h coprocessor.
19-14 Vol. 3
ARCHITECTURE COMPATIBILITY
The difference is apparent only t o t he except ion handler. This difference is for I EEE
St andard 754 compat ibilit y.
19.18.6.3 Numeric Underflow Exception (#U)
When t he underflow except ion is masked on t he 32- bit x87 FPUs, t he underflow
except ion is signaled when bot h t he result is t iny and denormalizat ion result s in a
loss of accuracy. When t he underflow except ion is unmasked and t he inst ruct ion is
supposed t o st ore t he result on t he st ack, t he significand is rounded t o t he appro-
priat e precision ( according t o t he PC flag in t he FPU cont rol word, for t hose inst ruc-
t ions cont rolled by PC, ot herwise t o ext ended precision) , aft er adj ust ing t he
exponent .
When t he underflow except ion is masked on t he 16- bit I A- 32 mat h coprocessors and
rounding is t oward 0, t he underflow except ion flag is raised on a t iny result , regard-
less of loss of accuracy. When t he underflow except ion is not masked and t he dest i-
nat ion is t he st ack, t he significand is not rounded, but inst ead is left as is.
When t he underflow except ion is masked, t his difference has no impact on exist ing
soft ware. The underflow except ion occurs less oft en when rounding is t oward 0.
When t he underflow except ion not masked. A program running on a 32- bit x87 FPU
produces a different result during underflow condit ions t han on a 16- bit I A- 32 mat h
coprocessor if t he result is st ored on t he st ack. The difference is only in t he least
significant bit of t he significand and is apparent only t o t he except ion handler.
19.18.6.4 Exception Precedence
There is no difference in t he precedence of t he denormal- operand except ion on t he
32- bit x87 FPUs, whet her it be masked or not . When t he denormal- operand excep-
t ion is not masked on t he 16- bit I A- 32 mat h coprocessors, it t akes precedence over
all ot her except ions. This difference causes no impact on exist ing soft ware, but some
unneeded normalizat ion of denormalized operands is prevent ed on t he I nt el486
processor and I nt el 387 mat h coprocessor.
19.18.6.5 CS and EIP For FPU Exceptions
On t he I nt el 32- bit x87 FPUs, t he values from t he CS and EI P regist ers saved for
float ing- point except ions point t o any prefixes t hat come before t he float ing- point
inst ruct ion. On t he 8087 mat h coprocessor, t he saved CS and I P regist ers point s t o
t he float ing- point inst ruct ion.
19.18.6.6 FPU Error Signals
The float ing- point error signals t o t he P6 family, Pent ium, and I nt el486 processors do
not pass t hrough an int errupt cont roller; an I NT# signal from an I nt el 387, I nt el 287
or 8087 mat h coprocessors does. I f an 8086 processor uses anot her except ion for
Vol. 3 19-15
ARCHITECTURE COMPATIBILITY
t he 8087 int errupt , bot h except ion vect ors should call t he float ing- point - error excep-
t ion handler. Some inst ruct ions in a float ing- point - error except ion handler may need
t o be delet ed if t hey use t he int errupt cont roller. The P6 family, Pent ium, and I nt el486
processors have signals t hat , wit h t he addit ion of ext ernal logic, support report ing for
emulat ion of t he int errupt mechanism used in many personal comput ers.
On t he P6 family, Pent ium, and I nt el486 processors, an undefined float ing- point
opcode will cause an invalid- opcode except ion ( # UD, int errupt vect or 6) . Undefined
float ing- point opcodes, like legal float ing- point opcodes, cause a device not available
except ion ( # NM, int errupt vect or 7) when eit her t he TS or EM flag in cont rol regist er
CR0 is set . The P6 family, Pent ium, and I nt el486 processors do not check for float ing-
point error condit ions on encount ering an undefined float ing- point opcode.
19.18.6.7 Assertion of the FERR# Pin
When using t he MS- DOS compat ibilit y mode for handing float ing- point except ions,
t he FERR# pin must be connect ed t o an input t o an ext ernal int errupt cont roller. An
ext ernal int errupt is t hen generat ed when t he FERR# out put drives t he input t o t he
int errupt cont roller and t he int errupt cont roller in t urn drives t he I NTR pin on t he
processor.
For t he P6 family and I nt el386 processors, an unmasked float ing- point except ion
always causes t he FERR# pin t o be assert ed upon complet ion of t he inst ruct ion t hat
caused t he except ion. For t he Pent ium and I nt el486 processors, an unmasked
float ing- point except ion may cause t he FERR# pin t o be assert ed eit her at t he end of
t he inst ruct ion causing t he except ion or immediat ely before execut ion of t he next
float ing- point inst ruct ion. ( Not e t hat t he next float ing- point inst ruct ion would not be
execut ed unt il t he pending unmasked except ion has been handled. ) See Appendix D,
Guidelines for Writ ing x87 FPU Ext ension Handlers, in t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1, for a complet e descript ion of
t he required mechanism for handling float ing- point except ions using t he MS- DOS
compat ibilit y mode.
Using FERR# and I GNNE# t o handle float ing- point except ion is deprecat ed by
modern operat ing syst ems; t his approach also limit s newer processors t o operat e
wit h one logical processor act ive.
19.18.6.8 Invalid Operation Exception On Denormals
An invalid- operat ion except ion is not generat ed on t he 32- bit x87 FPUs upon encoun-
t ering a denormal value when execut ing a FSQRT, FDI V, or FPREM inst ruct ion or upon
conversion t o BCD or t o int eger. The operat ion proceeds by first normalizing t he
value. On t he 16- bit I A- 32 mat h coprocessors, upon encount ering t his sit uat ion, t he
invalid- operat ion except ion is generat ed. This difference has no impact on exist ing
soft ware. Soft ware running on t he 32- bit x87 FPUs cont inues t o execut e in cases
where t he 16- bit I A- 32 mat h coprocessors t rap. The reason for t his change was t o
eliminat e an except ion from being raised.
19-16 Vol. 3
ARCHITECTURE COMPATIBILITY
19.18.6.9 Alignment Check Exceptions (#AC)
I f alignment checking is enabled, a misaligned dat a operand on t he P6 family,
Pent ium, and I nt el486 processors causes an alignment check except ion ( # AC) when
a program or procedure is running at privilege- level 3, except for t he st ack port ion of
t he FSAVE/ FNSAVE, FXSAVE, FRSTOR, and FXRSTOR inst ruct ions.
19.18.6.10 Segment Not Present Exception During FLDENV
On t he I nt el486 processor, when a segment not present except ion ( # NP) occurs in
t he middle of an FLDENV inst ruct ion, it can happen t hat part of t he environment is
loaded and part not . I n such cases, t he FPU cont rol word is left wit h a value of 007FH.
The P6 family and Pent ium processors ensure t he int ernal st at e is correct at all t imes
by at t empt ing t o read t he first and last byt es of t he environment before updat ing t he
int ernal st at e.
19.18.6.11 Device Not Available Exception (#NM)
The device- not - available except ion ( # NM, int errupt 7) will occur in t he P6 family,
Pent ium, and I nt el486 processors as described in Sect ion 2. 5, Cont rol Regist ers,
Table 2- 1, and Chapt er 6, I nt errupt 7Device Not Available Except ion ( # NM) .
19.18.6.12 Coprocessor Segment Overrun Exception
The coprocessor segment overrun except ion ( int errupt 9) does not occur in t he P6
family, Pent ium, and I nt el486 processors. I n sit uat ions where t he I nt el 387 mat h
coprocessor would cause an int errupt 9, t he P6 family, Pent ium, and I nt el486 proces-
sors simply abort t he inst ruct ion. To avoid undet ect ed segment overruns, it is recom-
mended t hat t he float ing- point save area be placed in t he same page as t he TSS. This
placement will prevent t he FPU environment from being lost if a page fault occurs
during t he execut ion of an FLDENV, FRSTOR, or FXRSTOR inst ruct ion while t he oper-
at ing syst em is performing a t ask swit ch.
19.18.6.13 General Protection Exception (#GP)
A general- prot ect ion except ion ( # GP, int errupt 13) occurs if t he st art ing address of a
float ing- point operand falls out side a segment s size. An except ion handler should be
included t o report t hese programming errors.
19.18.6.14 Floating-Point Error Exception (#MF)
I n real mode and prot ect ed mode ( not including virt ual- 8086 mode) , int errupt vect or
16 must point t o t he float ing- point except ion handler. I n virt ual 8086 mode, t he
virt ual- 8086 monit or can be programmed t o accommodat e a different locat ion of t he
int errupt vect or for float ing- point except ions.
Vol. 3 19-17
ARCHITECTURE COMPATIBILITY
19.18.7 Changes to Floating-Point Instructions
This sect ion ident ifies t he differences in float ing- point inst ruct ions for t he various
I nt el FPU and mat h coprocessor archit ect ures, t he reason for t he differences, and
t heir impact on soft ware.
19.18.7.1 FDIV, FPREM, and FSQRT Instructions
The 32- bit x87 FPUs support operat ions on denormalized operands and, when
det ect ed, an underflow except ion can occur, for compat ibilit y wit h t he I EEE St andard
754. The 16- bit I A- 32 mat h coprocessors do not operat e on denormalized operands
or ret urn underflow result s. I nst ead, t hey generat e an invalid- operat ion except ion
when t hey det ect an underflow condit ion. An exist ing underflow except ion handler
will require change only if it gives different t reat ment t o different opcodes. Also, it is
possible t hat fewer invalid- operat ion except ions will occur.
19.18.7.2 FSCALE Instruction
Wit h t he 32- bit x87 FPUs, t he range of t he scaling operand is not rest rict ed. I f ( 0 < |
ST( 1) < 1) , t he scaling fact or is 0; t herefore, ST( 0) remains unchanged. I f t he
rounded result is not exact or if t here was a loss of accuracy ( masked underflow) , t he
precision except ion is signaled. Wit h t he 16- bit I A- 32 mat h coprocessors, t he range
of t he scaling operand is rest rict ed. I f ( 0 < | ST( 1) | < 1) , t he result is undefined and
no except ion is signaled. The impact of t his difference on exit ing soft ware is t hat
different result s are delivered on t he 32- bit and 16- bit FPUs and mat h coprocessors
when ( 0 < | ST( 1) | < 1) .
19.18.7.3 FPREM1 Instruction
The 32- bit x87 FPUs comput e a part ial remainder according t o I EEE St andard 754.
This inst ruct ion does not exist on t he 16- bit I A- 32 mat h coprocessors. The avail-
abilit y of t he FPREM1 inst ruct ion has is no impact on exist ing soft ware.
19.18.7.4 FPREM Instruction
On t he 32- bit x87 FPUs, t he condit ion code flags C0, C3, C1 in t he st at us word
correct ly reflect t he t hree low- order bit s of t he quot ient following execut ion of t he
FPREM inst ruct ion. On t he 16- bit I A- 32 mat h coprocessors, t he quot ient bit s are
incorrect when performing a reduct ion of ( 64
N
+ M) when ( N 1) and M is 1 or 2. This
difference does not affect exist ing soft ware; soft ware t hat works around t he bug
should not be affect ed.
19.18.7.5 FUCOM, FUCOMP, and FUCOMPP Instructions
When execut ing t he FUCOM, FUCOMP, and FUCOMPP inst ruct ions, t he 32- bit x87
FPUs perform unordered compare according t o I EEE St andard 754. These inst ruc-
19-18 Vol. 3
ARCHITECTURE COMPATIBILITY
t ions do not exist on t he 16- bit I A- 32 mat h coprocessors. The availabilit y of t hese
new inst ruct ions has no impact on exist ing soft ware.
19.18.7.6 FPTAN Instruction
On t he 32- bit x87 FPUs, t he range of t he operand for t he FPTAN inst ruct ion is much
less rest rict ed ( | ST( 0) | < 2
63
) t han on earlier mat h coprocessors. The inst ruct ion
reduces t he operand int ernally using an int ernal / 4 const ant t hat is more accurat e.
The range of t he operand is rest rict ed t o ( | ST( 0) | < / 4) on t he 16- bit I A- 32 mat h
coprocessors; t he operand must be reduced t o t his range using FPREM. This change
has no impact on exist ing soft ware.
19.18.7.7 Stack Overflow
On t he 32- bit x87 FPUs, if an FPU st ack overflow occurs when t he invalid- operat ion
except ion is masked, t he FPU ret urns t he real, int eger, or BCD- int eger indefinit e
value t o t he dest inat ion operand, depending on t he inst ruct ion being execut ed. On
t he 16- bit I A- 32 mat h coprocessors, t he original operand remains unchanged
following a st ack overflow, but it is loaded int o regist er ST( 1) . This difference has no
impact on exist ing soft ware.
19.18.7.8 FSIN, FCOS, and FSINCOS Instructions
On t he 32- bit x87 FPUs, t hese inst ruct ions perform t hree common t rigonomet ric
funct ions. These inst ruct ions do not exist on t he 16- bit I A- 32 mat h coprocessors. The
availabilit y of t hese inst ruct ions has no impact on exist ing soft ware, but using t hem
provides a performance upgrade.
19.18.7.9 FPATAN Instruction
On t he 32- bit x87 FPUs, t he range of operands for t he FPATAN inst ruct ion is unre-
st rict ed. On t he 16- bit I A- 32 mat h coprocessors, t he absolut e value of t he operand in
regist er ST( 0) must be smaller t han t he absolut e value of t he operand in regist er
ST( 1) . This difference has impact on exist ing soft ware.
19.18.7.10 F2XM1 Instruction
The 32- bit x87 FPUs support a wider range of operands ( 1 < ST ( 0) < + 1) for t he
F2XM1 inst ruct ion. The support ed operand range for t he 16- bit I A- 32 mat h coproces-
sors is ( 0 ST( 0) 0.5) . This difference has no impact on exist ing soft ware.
19.18.7.11 FLD Instruction
On t he 32- bit x87 FPUs, when using t he FLD inst ruct ion t o load an ext ended- real
value, a denormal- operand except ion is not generat ed because t he inst ruct ion is not
Vol. 3 19-19
ARCHITECTURE COMPATIBILITY
arit hmet ic. The 16- bit I A- 32 mat h coprocessors do report a denormal- operand
except ion in t his sit uat ion. This difference does not affect exist ing soft ware.
On t he 32- bit x87 FPUs, loading a denormal value t hat is in single- or double- real
format causes t he value t o be convert ed t o ext ended- real format . Loading a
denormal value on t he 16- bit I A- 32 mat h coprocessors causes t he value t o be
convert ed t o an unnormal. I f t he next inst ruct ion is FXTRACT or FXAM, t he 32- bit x87
FPUs will give a different result t han t he 16- bit I A- 32 mat h coprocessors. This change
was made for I EEE St andard 754 compat ibilit y.
On t he 32- bit x87 FPUs, loading an SNaN t hat is in single- or double- real format
causes t he FPU t o generat e an invalid- operat ion except ion. The 16- bit I A- 32 mat h
coprocessors do not raise an except ion when loading a signaling NaN. The invalid-
operat ion except ion handler for 16- bit mat h coprocessor soft ware needs t o be
updat ed t o handle t his condit ion when port ing soft ware t o 32- bit FPUs. This change
was made for I EEE St andard 754 compat ibilit y.
19.18.7.12 FXTRACT Instruction
On t he 32- bit x87 FPUs, if t he operand is 0 for t he FXTRACT inst ruct ion, t he divide-
by- zero except ion is report ed and is delivered t o regist er ST( 1) . I f t he operand is
+ , no except ion is report ed. I f t he operand is 0 on t he 16- bit I A- 32 mat h coproces-
sors, 0 is delivered t o regist er ST( 1) and no except ion is report ed. I f t he operand is
+ , t he invalid- operat ion except ion is report ed. These differences have no impact on
exist ing soft ware. Soft ware usually bypasses 0 and . This change is due t o t he I EEE
St andard 754 recommendat ion t o fully support t he logb funct ion.
19.18.7.13 Load Constant Instructions
On 32- bit x87 FPUs, rounding cont rol is in effect for t he load const ant inst ruct ions.
Rounding cont rol is not in effect for t he 16- bit I A- 32 mat h coprocessors. Result s for
t he FLDPI , FLDLN2, FLDLG2, and FLDL2E inst ruct ions are t he same as for t he 16- bit
I A- 32 mat h coprocessors when rounding cont rol is set t o round t o nearest or round
t o + . They are t he same for t he FLDL2T inst ruct ion when rounding cont rol is set t o
round t o nearest , round t o , or round t o zero. Result s are different from t he 16- bit
I A- 32 mat h coprocessors in t he least significant bit of t he mant issa if rounding
cont rol is set t o round t o or round t o 0 for t he FLDPI , FLDLN2, FLDLG2, and
FLDL2E inst ruct ions; t hey are different for t he FLDL2T inst ruct ion if round t o + is
specified. These changes were implement ed for compat ibilit y wit h I EEE St andard
754 for Float ing- Point Arit hmet ic recommendat ions.
19.18.7.14 FSETPM Instruction
Wit h t he 32- bit x87 FPUs, t he FSETPM inst ruct ion is t reat ed as NOP ( no operat ion) .
This inst ruct ion informs t he I nt el 287 mat h coprocessor t hat t he processor is in
prot ect ed mode. This change has no impact on exist ing soft ware. The 32- bit x87
19-20 Vol. 3
ARCHITECTURE COMPATIBILITY
FPUs handle all addressing and except ion- point er informat ion, whet her in prot ect ed
mode or not .
19.18.7.15 FXAM Instruction
Wit h t he 32- bit x87 FPUs, if t he FPU encount ers an empt y regist er when execut ing
t he FXAM inst ruct ion, it not generat e combinat ions of C0 t hrough C3 equal t o 1101 or
1111. The 16- bit I A- 32 mat h coprocessors may generat e t hese combinat ions, among
ot hers. This difference has no impact on exist ing soft ware; it provides a performance
upgrade t o provide repeat able result s.
19.18.7.16 FSAVE and FSTENV Instructions
Wit h t he 32- bit x87 FPUs, t he address of a memory operand point er st ored by FSAVE
or FSTENV is undefined if t he previous float ing- point inst ruct ion did not refer t o
memory
19.18.8 Transcendental Instructions
The float ing- point result s of t he P6 family and Pent ium processors for t ranscendent al
inst ruct ions in t he core range may differ from t he I nt el486 processors by about 2 or
3 ulps ( see Transcendent al I nst ruct ion Accuracy in Chapt er 8, Programming wit h
t he x87 FPU, of t he I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual,
Volume 1) . Condit ion code flag C1 of t he st at us word may differ as a result . The exact
t hreshold for underflow and overflow will vary by a few ulps. The P6 family and
Pent ium processors result s will have a worst case error of less t han 1 ulp when
rounding t o t he nearest - even and less t han 1. 5 ulps when rounding in ot her modes.
The t ranscendent al inst ruct ions are guarant eed t o be monot onic, wit h respect t o t he
input operands, t hroughout t he domain support ed by t he inst ruct ion.
Transcendent al inst ruct ions may generat e different result s in t he round- up flag ( C1)
on t he 32- bit x87 FPUs. The round- up flag is undefined for t hese inst ruct ions on t he
16- bit I A- 32 mat h coprocessors. This difference has no impact on exist ing soft ware.
19.18.9 Obsolete Instructions
The 8087 mat h coprocessor inst ruct ions FENI and FDI SI and t he I nt el 287 mat h
coprocessor inst ruct ion FSETPM are t reat ed as int eger NOP inst ruct ions in t he 32- bit
x87 FPUs. I f t hese opcodes are det ect ed in t he inst ruct ion st ream, no specific opera-
t ion is performed and no int ernal st at es are affect ed.
Vol. 3 19-21
ARCHITECTURE COMPATIBILITY
19.18.10 WAIT/FWAIT Prefix Differences
On t he I nt el486 processor, when a WAI T/ FWAI T inst ruct ion precedes a float ing- point
inst ruct ion ( one which it self aut omat ically synchronizes wit h t he previous float ing-
point inst ruct ion) , t he WAI T/ FWAI T inst ruct ion is t reat ed as a no- op. Pending
float ing- point except ions from a previous float ing- point inst ruct ion are processed not
on t he WAI T/ FWAI T inst ruct ion but on t he float ing- point inst ruct ion following t he
WAI T/ FWAI T inst ruct ion. I n such a case, t he report of a float ing- point except ion may
appear one inst ruct ion lat er on t he I nt el486 processor t han on a P6 family or Pent ium
FPU, or on I nt el 387 mat h coprocessor.
19.18.11 Operands Split Across Segments and/or Pages
On t he P6 family, Pent ium, and I nt el486 processor FPUs, when t he first half of an
operand t o be writ t en is inside a page or segment and t he second half is out side, a
memory fault can cause t he first half t o be st ored but not t he second half. I n t his sit u-
at ion, t he I nt el 387 mat h coprocessor st ores not hing.
19.18.12 FPU Instruction Synchronization
On t he 32- bit x87 FPUs, all float ing- point inst ruct ions are aut omat ically synchro-
nized; t hat is, t he processor aut omat ically wait s unt il t he previous float ing- point
inst ruct ion has complet ed before complet ing t he next float ing- point inst ruct ion. No
explicit WAI T/ FWAI T inst ruct ions are required t o assure t his synchronizat ion. For t he
8087 mat h coprocessors, explicit wait s are required before each float ing- point
inst ruct ion t o ensure synchronizat ion. Alt hough 8087 programs having explicit WAI T
inst ruct ions execut e perfect ly on t he 32- bit I A- 32 processors wit hout reassembly,
t hese WAI T inst ruct ions are unnecessary.
19.19 SERIALIZING INSTRUCTIONS
Cert ain inst ruct ions have been defined t o serialize inst ruct ion execut ion t o ensure
t hat modificat ions t o flags, regist ers and memory are complet ed before t he next
inst ruct ion is execut ed ( or in P6 family processor t erminology commit t ed t o machine
st at e ) . Because t he P6 family processors use branch- predict ion and out - of- order
execut ion t echniques t o improve performance, inst ruct ion execut ion is not generally
serialized unt il t he result s of an execut ed inst ruct ion are commit t ed t o machine st at e
( see Chapt er 2, I nt el 64 and I A- 32 Archit ect ures, in t he I nt el 64 and I A- 32
Archit ect ures Soft ware Developers Manual, Volume 1) .
As a result , at places in a program or t ask where it is crit ical t o have execut ion
complet ed for all previous inst ruct ions before execut ing t he next inst ruct ion ( for
example, at a branch, at t he end of a procedure, or in mult iprocessor dependent
code) , it is useful t o add a serializing inst ruct ion. See Sect ion 8.3, Serializing
I nst ruct ions, for more informat ion on serializing inst ruct ions.
19-22 Vol. 3
ARCHITECTURE COMPATIBILITY
19.20 FPU AND MATH COPROCESSOR INITIALIZATION
Table 9- 1 shows t he st at es of t he FPUs in t he P6 family, Pent ium, I nt el486 processors
and of t he I nt el 387 mat h coprocessor and I nt el 287 coprocessor following a power-
up, reset , or I NI T, or following t he execut ion of an FI NI T/ FNI NI T inst ruct ion. The
following is some addit ional compat ibilit y informat ion concerning t he init ializat ion of
x87 FPUs and mat h coprocessors.
19.20.1 Intel

387 and Intel

287 Math Coprocessor Initialization


Following an I nt el386 processor reset , t he processor ident ifies it s coprocessor t ype
( I nt el

287 or I nt el

387 DX mat h coprocessor) by sampling it s ERROR# input some


t ime aft er t he falling edge of RESET# signal and before execut ion of t he first float ing-
point inst ruct ion. The I nt el 287 coprocessor keeps it s ERROR# out put in inact ive
st at e aft er hardware reset ; t he I nt el 387 coprocessor keeps it s ERROR# out put in
act ive st at e aft er hardware reset .
Upon hardware reset or execut ion of t he FI NI T/ FNI NI T inst ruct ion, t he I nt el 387
mat h coprocessor signals an error condit ion. The P6 family, Pent ium, and I nt el486
processors, like t he I nt el 287 coprocessor, do not .
19.20.2 Intel486 SX Processor and Intel 487 SX Math Coprocessor
Initialization
When init ializing an I nt el486 SX processor and an I nt el 487 SX mat h coprocessor,
t he init ializat ion rout ine should check t he presence of t he mat h coprocessor and
should set t he FPU relat ed flags ( EM, MP, and NE) in cont rol regist er CR0 accordingly
( see Sect ion 2. 5, Cont rol Regist ers, for a complet e descript ion of t hese flags) . Table
19- 2 gives t he recommended set t ings for t hese flags when t he mat h coprocessor is
present . The FSTCW inst ruct ion will give a value of FFFFH for t he I nt el486 SX micro-
processor and 037FH for t he I nt el 487 SX mat h coprocessor.
The EM and MP flags in regist er CR0 are int erpret ed as shown in Table 19- 3.
Table 19-2. Recommended Values of the EM, MP, and NE Flags for Intel486 SX
Microprocessor/Intel 487 SX Math Coprocessor System
CR0 Flags Intel486 SX Processor Only Intel 487 SX Math Coprocessor Present
EM 1 0
MP 0 1
NE 1 0, for MS-DOS* systems
1, for user-defined exception handler
Vol. 3 19-23
ARCHITECTURE COMPATIBILITY
Following is an example code sequence t o init ialize t he syst em and check for t he
presence of I nt el486 SX processor/ I nt el 487 SX mat h coprocessor.
fninit
fstcw mem_loc
mov ax, mem_loc
cmp ax, 037fh
jz Intel487_SX_Math_CoProcessor_present ;ax=037fh
jmp Intel486_SX_microprocessor_present ;ax=ffffh
I f t he I nt el 487 SX mat h coprocessor is not present , t he following code can be run t o
set t he CR0 regist er for t he I nt el486 SX processor.
mov eax, cr0
and eax, fffffffdh ;make MP=0
or eax, 0024h ;make EM=1, NE=1
mov cr0, eax
This init ializat ion will cause any float ing- point inst ruct ion t o generat e a device not
available except ion ( # NH) , int errupt 7. The soft ware emulat ion will t hen t ake cont rol
t o execut e t hese inst ruct ions. This code is not required if an I nt el 487 SX mat h
coprocessor is present in t he syst em. I n t hat case, t he t ypical init ializat ion rout ine for
t he I nt el486 SX microprocessor will be adequat e.
Also, when designing an I nt el486 SX processor based syst em wit h an I nt el 487 SX
mat h coprocessor, t iming loops should be independent of clock speed and clocks per
inst ruct ion. One way t o at t ain t his is t o implement t hese loops in hardware and not in
soft ware ( for example, BI OS) .
Table 19-3. EM and MP Flag Interpretation
EM MP Interpretation
0 0 Floating-point instructions are passed to FPU; WAIT/FWAIT
and other waiting-type instructions ignore TS.
0 1 Floating-point instructions are passed to FPU; WAIT/FWAIT
and other waiting-type instructions test TS.
1 0 Floating-point instructions trap to emulator; WAIT/FWAIT and
other waiting-type instructions ignore TS.
1 1 Floating-point instructions trap to emulator; WAIT/FWAIT and
other waiting-type instructions test TS.
19-24 Vol. 3
ARCHITECTURE COMPATIBILITY
19.21 CONTROL REGISTERS
The following sect ions ident ify t he new cont rol regist ers and cont rol regist er flags
and fields t hat were int roduced t o t he 32- bit I A- 32 in various processor families. See
Figure 2- 6 for t he locat ion of t hese flags and fields in t he cont rol regist ers.
The Pent ium III processor int roduced one new cont rol flag in cont rol regist er CR4:
OSXMMEXCPT ( bit 10) The OS will set t his bit if it support s unmasked SI MD
float ing- point except ions.
The Pent ium I I processor int roduced one new cont rol flag in cont rol regist er CR4:
OSFXSR ( bit 9) The OS support s saving and rest oring t he Pent ium III processor
st at e during cont ext swit ches.
The Pent ium Pro processor int roduced t hree new cont rol flags in cont rol regist er CR4:
PAE ( bit 5) Physical address ext ension. Enables paging mechanism t o
reference ext ended physical addresses when set ; rest rict s physical addresses t o
32 bit s when clear ( see also: Sect ion 19. 22. 1. 1, Physical Memory Addressing
Ext ension ) .
PGE ( bit 7) Page global enable. I nhibit s flushing of frequent ly- used or shared
pages on CR3 writ es ( see also: Sect ion 19. 22. 1. 2, Global Pages ) .
PCE ( bit 8) Performance- monit oring count er enable. Enables execut ion of t he
RDPMC inst ruct ion at any prot ect ion level.
The cont ent of CR4 is 0H following a hardware reset .
Cont rol regist er CR4 was int roduced in t he Pent ium processor. This regist er cont ains
flags t hat enable cert ain new ext ensions provided in t he Pent ium processor:
VME Virt ual- 8086 mode ext ensions. Enables support for a virt ual int errupt flag
in virt ual- 8086 mode ( see Sect ion 17. 3, I nt errupt and Except ion Handling in
Virt ual- 8086 Mode ) .
PVI Prot ect ed- mode virt ual int errupt s. Enables support for a virt ual int errupt
flag in prot ect ed mode ( see Sect ion 17. 4, Prot ect ed- Mode Virt ual I nt errupt s ) .
TSD Time- st amp disable. Rest rict s t he execut ion of t he RDTSC inst ruct ion t o
procedures running at privileged level 0.
DE Debugging ext ensions. Causes an undefined opcode ( # UD) except ion t o be
generat ed when debug regist ers DR4 and DR5 are references for improved
performance ( see Sect ion 19. 23. 3, Debug Regist ers DR4 and DR5 ) .
PSE Page size ext ensions. Enables 4- MByt e pages wit h 32- bit paging when set
( see Sect ion 4. 3, 32- Bit Paging ) .
MCE Machine- check enable. Enables t he machine- check except ion, allowing
except ion handling for cert ain hardware error condit ions ( see Chapt er 15,
Machine- Check Archit ect ure ) .
The I nt el486 processor int roduced five new flags in cont rol regist er CR0:
Vol. 3 19-25
ARCHITECTURE COMPATIBILITY
NE Numeric error. Enables t he normal mechanism for report ing float ing- point
numeric errors.
WP Writ e prot ect . Writ e- prot ect s read- only pages against supervisor- mode
accesses.
AM Alignment mask. Cont rols whet her alignment checking is performed.
Operat es in conj unct ion wit h t he AC ( Alignment Check) flag.
NW Not writ e- t hrough. Enables writ e- t hroughs and cache invalidat ion cycles
when clear and disables invalidat ion cycles and writ e- t hroughs t hat hit in t he
cache when set .
CD Cache disable. Enables t he int ernal cache when clear and disables t he
cache when set .
The I nt el486 processor int roduced t wo new flags in cont rol regist er CR3:
PCD Page- level cache disable. The st at e of t his flag is driven on t he PCD# pin
during bus cycles t hat are not paged, such as int errupt acknowledge cycles, when
paging is enabled. The PCD# pin is used t o cont rol caching in an ext ernal cache
on a cycle- by- cycle basis.
PWT Page- level writ e- t hrough. The st at e of t his flag is driven on t he PWT# pin
during bus cycles t hat are not paged, such as int errupt acknowledge cycles, when
paging is enabled. The PWT# pin is used t o cont rol writ e t hrough in an ext ernal
cache on a cycle- by- cycle basis.
19.22 MEMORY MANAGEMENT FACILITIES
The following sect ions describe t he new memory management facilit ies available in
t he various I A- 32 processors and some compat ibilit y differences.
19.22.1 New Memory Management Control Flags
The Pent ium Pro processor int roduced t hree new memory management feat ures:
physical memory addressing ext ension, t he global bit in page- t able ent ries, and
general support for larger page sizes. These feat ures are only available when oper-
at ing in prot ect ed mode.
19.22.1.1 Physical Memory Addressing Extension
The new PAE ( physical address ext ension) flag in cont rol regist er CR4, bit 5, may
enable addit ional address lines on t he processor, allowing ext ended physical
addresses. This opt ion can only be used when paging is enabled, using a new page-
t able mechanism provided t o support t he larger physical address range ( see Sect ion
4. 1, Paging Modes and Cont rol Bit s ) .
19-26 Vol. 3
ARCHITECTURE COMPATIBILITY
19.22.1.2 Global Pages
The new PGE ( page global enable) flag in cont rol regist er CR4, bit 7, provides a
mechanism for prevent ing frequent ly used pages from being flushed from t he t rans-
lat ion lookaside buffer ( TLB) . When t his flag is set , frequent ly used pages ( such as
pages cont aining kernel procedures or common dat a t ables) can be marked global by
set t ing t he global flag in a page- direct ory or page- t able ent ry.
On a t ask swit ch or a writ e t o cont rol regist er CR3 ( which normally causes t he TLBs
t o be flushed) , t he ent ries in t he TLB marked global are not flushed. Marking pages
global in t his manner prevent s unnecessary reloading of t he TLB due t o TLB misses
on frequent ly used pages. See Sect ion 4. 10, Caching Translat ion I nformat ion for a
det ailed descript ion of t his mechanism.
19.22.1.3 Larger Page Sizes
The P6 family processors support large page sizes. For 32- bit paging, t his facilit y is
enabled wit h t he PSE ( page size ext ension) flag in cont rol regist er CR4, bit 4. When
t his flag is set , t he processor support s eit her 4- KByt e or 4- MByt e page sizes. PAE
paging and I A- 32e paging support 2- MByt e pages regardless of t he value of CR4. PSE
( see Sect ion 4.4, PAE Paging and Sect ion 4. 5, I A- 32e Paging ) . See Chapt er 4,
Paging, for more informat ion about large page sizes.
19.22.2 CD and NW Cache Control Flags
The CD and NW flags in cont rol regist er CR0 were int roduced in t he I nt el486
processor. I n t he P6 family and Pent ium processors, t hese flags are used t o imple-
ment a writ eback st rat egy for t he dat a cache; in t he I nt el486 processor, t hey imple-
ment a writ e- t hrough st rat egy. See Table 11- 5 for a comparison of t hese bit s on t he
P6 family, Pent ium, and I nt el486 processors. For complet e informat ion on caching,
see Chapt er 11, Memory Cache Cont rol.
19.22.3 Descriptor Types and Contents
Operat ing- syst em code t hat manages space in descript or t ables oft en cont ains an
invalid value in t he access- right s field of descript or- t able ent ries t o ident ify unused
ent ries. Access right s values of 80H and 00H remain invalid for t he P6 family,
Pent ium, I nt el486, I nt el386, and I nt el 286 processors. Ot her values t hat were invalid
on t he I nt el 286 processor may be valid on t he 32- bit processors because uses for
t hese bit s have been defined.
Vol. 3 19-27
ARCHITECTURE COMPATIBILITY
19.22.4 Changes in Segment Descriptor Loads
On t he I nt el386 processor, loading a segment descript or always causes a locked read
and writ e t o set t he accessed bit of t he descript or. On t he P6 family, Pent ium, and
I nt el486 processors, t he locked read and writ e occur only if t he bit is not already set .
19.23 DEBUG FACILITIES
The P6 family and Pent ium processors include ext ensions t o t he I nt el486 processor
debugging support for breakpoint s. To use t he new breakpoint feat ures, it is neces-
sary t o set t he DE flag in cont rol regist er CR4.
19.23.1 Differences in Debug Register DR6
I t is not possible t o writ e a 1 t o reserved bit 12 in debug st at us regist er DR6 on t he
P6 family and Pent ium processors; however, it is possible t o writ e a 1 in t his bit on t he
I nt el486 processor. See Table 9- 1 for t he different set t ing of t his regist er following a
power- up or hardware reset .
19.23.2 Differences in Debug Register DR7
The P6 family and Pent ium processors det ermines t he t ype of breakpoint access by
t he R/ W0 t hrough R/ W3 fields in debug cont rol regist er DR7 as follows:
00 Break on inst ruct ion execut ion only.
01 Break on dat a writ es only.
10 Undefined if t he DE flag in cont rol regist er CR4 is cleared; break on I / O reads
or writ es but not inst ruct ion fet ches if t he DE flag in cont rol regist er CR4 is
set .
11 Break on dat a reads or writ es but not inst ruct ion fet ches.
On t he P6 family and Pent ium processors, reserved bit s 11, 12, 14 and 15 are hard-
wired t o 0. On t he I nt el486 processor, however, bit 12 can be set . See Table 9- 1 for
t he different set t ings of t his regist er following a power- up or hardware reset .
19.23.3 Debug Registers DR4 and DR5
Alt hough t he DR4 and DR5 regist ers are document ed as reserved, previous genera-
t ions of processors aliased references t o t hese regist ers t o debug regist ers DR6 and
DR7, respect ively. When debug ext ensions are not enabled ( t he DE flag in cont rol
regist er CR4 is cleared) , t he P6 family and Pent ium processors remain compat ible
wit h exist ing soft ware by allowing t hese aliased references. When debug ext ensions
19-28 Vol. 3
ARCHITECTURE COMPATIBILITY
are enabled ( t he DE flag is set ) , at t empt s t o reference regist ers DR4 or DR5 will
result in an invalid- opcode except ion ( # UD) .
19.24 RECOGNITION OF BREAKPOINTS
For t he Pent ium processor, it is recommended t hat debuggers execut e t he LGDT
inst ruct ion before ret urning t o t he program being debugged t o ensure t hat break-
point s are det ect ed. This operat ion does not need t o be performed on t he P6 family,
I nt el486, or I nt el386 processors.
The implement at ion of t est regist ers on t he I nt el486 processor used for t est ing t he
cache and TLB has been redesigned using MSRs on t he P6 family and Pent ium
processors. ( Not e t hat MSRs used for t his funct ion are different on t he P6 family and
Pent ium processors. ) The MOV t o and from t est regist er inst ruct ions generat e
invalid- opcode except ions ( # UD) on t he P6 family processors.
19.25 EXCEPTIONS AND/OR EXCEPTION CONDITIONS
This sect ion describes t he new except ions and except ion condit ions added t o t he 32-
bit I A- 32 processors and implement at ion differences in exist ing except ion handling.
See Chapt er 6, I nt errupt and Except ion Handling, for a det ailed descript ion of t he
I A- 32 except ions.
The Pent ium III processor int roduced new st at e wit h t he XMM regist ers. Comput at ions
involving dat a in t hese regist ers can produce except ions. A new MXCSR
cont rol/ st at us regist er is used t o det ermine which except ion or except ions have
occurred. When an except ion associat ed wit h t he XMM regist ers occurs, an int errupt
is generat ed.
SI MD float ing- point except ion ( # XF, int errupt 19) New except ions associat ed
wit h t he SI MD float ing- point regist ers and result ing comput at ions.
No new except ions were added wit h t he Pent ium Pro and Pent ium I I processors. The
set of available except ions is t he same as for t he Pent ium processor. However, t he
following except ion condit ion was added t o t he I A- 32 wit h t he Pent ium Pro
processor:
Machine- check except ion ( # MC, int errupt 18) New except ion condit ions. Many
except ion condit ions have been added t o t he machine- check except ion and a new
archit ect ure has been added for handling and report ing on hardware errors. See
Chapt er 15, Machine- Check Archit ect ure, for a det ailed descript ion of t he new
condit ions.
The following except ions and/ or except ion condit ions were added t o t he I A- 32 wit h
t he Pent ium processor:
Machine- check except ion ( # MC, int errupt 18) New except ion. This except ion
report s parit y and ot her hardware errors. I t is a model- specific except ion and
Vol. 3 19-29
ARCHITECTURE COMPATIBILITY
may not be implement ed or implement ed different ly in fut ure processors. The
MCE flag in cont rol regist er CR4 enables t he machine- check except ion. When t his
bit is clear ( which it is at reset ) , t he processor inhibit s generat ion of t he machine-
check except ion.
General- prot ect ion except ion ( # GP, int errupt 13) New except ion condit ion
added. An at t empt t o writ e a 1 t o a reserved bit posit ion of a special regist er
causes a general- prot ect ion except ion t o be generat ed.
Page- fault except ion ( # PF, int errupt 14) New except ion condit ion added. When
a 1 is det ect ed in any of t he reserved bit posit ions of a page- t able ent ry, page-
direct ory ent ry, or page- direct ory point er during address t ranslat ion, a page- fault
except ion is generat ed.
The following except ion was added t o t he I nt el486 processor:
Alignment - check except ion ( # AC, int errupt 17) New except ion. Report s
unaligned memory references when alignment checking is being performed.
The following except ions and/ or except ion condit ions were added t o t he I nt el386
processor:
Divide- error except ion ( # DE, int errupt 0)
Change in except ion handling. Divide- error except ions on t he I nt el386
processors always leave t he saved CS: I P value point ing t o t he inst ruct ion t hat
failed. On t he 8086 processor, t he CS: I P value point s t o t he next inst ruct ion.
Change in except ion handling. The I nt el386 processors can generat e t he
largest negat ive number as a quot ient for t he I DI V inst ruct ion ( 80H and
8000H) . The 8086 processor generat es a divide- error except ion inst ead.
I nvalid- opcode except ion ( # UD, int errupt 6) New except ion condit ion added.
I mproper use of t he LOCK inst ruct ion prefix can generat e an invalid- opcode
except ion.
Page- fault except ion ( # PF, int errupt 14) New except ion condit ion added. I f
paging is enabled in a 16- bit program, a page- fault except ion can be generat ed
as follows. Paging can be used in a syst em wit h 16- bit t asks if all t asks use t he
same page direct ory. Because t here is no place in a 16- bit TSS t o st ore t he PDBR
regist er, swit ching t o a 16- bit t ask does not change t he value of t he PDBR
regist er. Tasks port ed from t he I nt el 286 processor should be given 32- bit TSSs
so t hey can make full use of paging.
General- prot ect ion except ion ( # GP, int errupt 13) New except ion condit ion
added. The I nt el386 processor set s a limit of 15 byt es on inst ruct ion lengt h. The
only way t o violat e t his limit is by put t ing redundant prefixes before an
inst ruct ion. A general- prot ect ion except ion is generat ed if t he limit on inst ruct ion
lengt h is violat ed. The 8086 processor has no inst ruct ion lengt h limit .
19-30 Vol. 3
ARCHITECTURE COMPATIBILITY
19.25.1 Machine-Check Architecture
The Pent ium Pro processor int roduced a new archit ect ure t o t he I A- 32 for handling
and report ing on machine- check except ions. This machine- check archit ect ure
( described in det ail in Chapt er 15, Machine- Check Archit ect ure ) great ly expands
t he abilit y of t he processor t o report on int ernal hardware errors.
19.25.2 Priority OF Exceptions
The priorit y of except ions are broken down int o several maj or cat egories:
1. Traps on t he previous inst ruct ion
2. Ext ernal int errupt s
3. Fault s on fet ching t he next inst ruct ion
4. Fault s in decoding t he next inst ruct ion
5. Fault s on execut ing an inst ruct ion
There are no changes in t he priorit y of t hese maj or cat egories bet ween t he different
processors, however, except ions wit hin t hese cat egories are implement at ion depen-
dent and may change from processor t o processor.
19.26 INTERRUPTS
The following differences in handling int errupt s are found among t he I A- 32
processors.
19.26.1 Interrupt Propagation Delay
Ext ernal hardware int errupt s may be recognized on different inst ruct ion boundaries
on t he P6 family, Pent ium, I nt el486, and I nt el386 processors, due t o t he superscaler
designs of t he P6 family and Pent ium processors. Therefore, t he EI P pushed ont o t he
st ack when servicing an int errupt may be different for t he P6 family, Pent ium,
I nt el486, and I nt el386 processors.
19.26.2 NMI Interrupts
Aft er an NMI int errupt is recognized by t he P6 family, Pent ium, I nt el486, I nt el386,
and I nt el 286 processors, t he NMI int errupt is masked unt il t he first I RET inst ruct ion
is execut ed, unlike t he 8086 processor.
Vol. 3 19-31
ARCHITECTURE COMPATIBILITY
19.26.3 IDT Limit
The LI DT inst ruct ion can be used t o set a limit on t he size of t he I DT. A double- fault
except ion ( # DF) is generat ed if an int errupt or except ion at t empt s t o read a vect or
beyond t he limit . Shut down t hen occurs on t he 32- bit I A- 32 processors if t he double-
fault handler vect or is beyond t he limit . ( The 8086 processor does not have a shut -
down mode nor a limit . )
19.27 ADVANCED PROGRAMMABLE INTERRUPT
CONTROLLER (APIC)
The Advanced Programmable I nt errupt Cont roller ( API C) , referred t o in t his book as
t he l ocal API C, was int roduced int o t he I A- 32 processors wit h t he Pent ium
processor ( beginning wit h t he 735/ 90 and 815/ 100 models) and is included in t he
Pent ium 4, I nt el Xeon, and P6 family processors. The feat ures and funct ions of t he
local API C are derived from t he I nt el 82489DX ext ernal API C, which was used wit h
t he I nt el486 and early Pent ium processors. Addit ional refinement s of t he local API C
archit ect ure were incorporat ed in t he Pent ium 4 and I nt el Xeon processors.
19.27.1 Software Visible Differences Between the Local APIC and
the 82489DX
The following feat ures in t he local API C feat ures differ from t hose found in t he
82489DX ext ernal API C:
When t he local API C is disabled by clearing t he API C soft ware enable/ disable flag
in t he spurious- int errupt vect or MSR, t he st at e of it s int ernal regist ers are
unaffect ed, except t hat t he mask bit s in t he LVT are all set t o block local
int errupt s t o t he processor. Also, t he local API C ceases accept ing I PI s except for
I NI T, SMI , NMI , and st art - up I PI s. I n t he 82489DX, when t he local unit is
disabled, all t he int ernal regist ers including t he I RR, I SR and TMR are cleared and
t he mask bit s in t he LVT are set . I n t his st at e, t he 82489DX local unit will accept
only t he reset deassert message.
I n t he local API C, NMI and I NI T ( except for I NI T deassert ) are always t reat ed as
edge t riggered int errupt s, even if programmed ot herwise. I n t he 82489DX, t hese
int errupt s are always level t riggered.
I n t he local API C, I PI s generat ed t hrough t he I CR are always t reat ed as edge
t riggered ( except I NI T Deassert ) . I n t he 82489DX, t he I CR can be used t o
generat e eit her edge or level t riggered I PI s.
I n t he local API C, t he logical dest inat ion regist er support s 8 bit s; in t he 82489DX,
it support s 32 bit s.
I n t he local API C, t he API C I D regist er is 4 bit s wide; in t he 82489DX, it is 8 bit s
wide.
19-32 Vol. 3
ARCHITECTURE COMPATIBILITY
The remot e read delivery mode provided in t he 82489DX and local API C for
Pent ium processors is not support ed in t he local API C in t he Pent ium 4, I nt el
Xeon, and P6 family processors.
For t he 82489DX, in t he lowest priorit y delivery mode, all t he t arget local API Cs
specified by t he dest inat ion field part icipat e in t he lowest priorit y arbit rat ion. For
t he local API C, only t hose local API Cs which have free int errupt slot s will
part icipat e in t he lowest priorit y arbit rat ion.
19.27.2 New Features Incorporated in the Local APIC for the P6
Family and Pentium Processors
The local API C in t he Pent ium and P6 family processors have t he following new
feat ures not found in t he 82489DX ext ernal API C.
Clust er addressing is support ed in logical dest inat ion mode.
Focus processor checking can be enabled/ disabled.
I nt errupt input signal polarit y can be programmed for t he LI NT0 and LI NT1 pins.
An SMI I PI is support ed t hrough t he I CR and I / O redirect ion t able.
An error st at us regist er is incorporat ed int o t he LVT t o log and report API C errors.
I n t he P6 family processors, t he local API C incorporat es an addit ional LVT regist er t o
handle performance monit oring count er int errupt s.
19.27.3 New Features Incorporated in the Local APIC of the Pentium
4 and Intel Xeon Processors
The local API C in t he Pent ium 4 and I nt el Xeon processors has t he following new
feat ures not found in t he P6 family and Pent ium processors and in t he 82489DX.
The local API C I D is ext ended t o 8 bit s.
An t hermal sensor regist er is incorporat ed int o t he LVT t o handle t hermal sensor
int errupt s.
The t he abilit y t o deliver lowest - priorit y int errupt s t o a focus processor is no
longer support ed.
The flat clust er logical dest inat ion mode is not support ed.
19.28 TASK SWITCHING AND TSS
This sect ion ident ifies t he implement at ion differences of t ask swit ching, addit ions t o
t he TSS and t he handling of TSSs and TSS segment select ors.
Vol. 3 19-33
ARCHITECTURE COMPATIBILITY
19.28.1 P6 Family and Pentium Processor TSS
When t he virt ual mode ext ensions are enabled ( by set t ing t he VME flag in cont rol
regist er CR4) , t he TSS in t he P6 family and Pent ium processors cont ain an int errupt
redirect ion bit map, which is used in virt ual- 8086 mode t o redirect int errupt s back t o
an 8086 program.
19.28.2 TSS Selector Writes
During t ask st at e saves, t he I nt el486 processor writ es 2- byt e segment select ors int o
a 32- bit TSS, leaving t he upper 16 bit s undefined. For performance reasons, t he P6
family and Pent ium processors writ e 4- byt e segment select ors int o t he TSS, wit h t he
upper 2 byt es being 0. For compat ibilit y reasons, code should not depend on t he
value of t he upper 16 bit s of t he select or in t he TSS.
19.28.3 Order of Reads/Writes to the TSS
The order of reads and writ es int o t he TSS is processor dependent . The P6 family and
Pent ium processors may generat e different page- fault addresses in cont rol regist er
CR2 in t he same TSS area t han t he I nt el486 and I nt el386 processors, if a TSS
crosses a page boundary ( which is not recommended) .
19.28.4 Using A 16-Bit TSS with 32-Bit Constructs
Task swit ches using 16- bit TSSs should be used only for pure 16- bit code. Any new
code writ t en using 32- bit const ruct s ( operands, addressing, or t he upper word of t he
EFLAGS regist er) should use only 32- bit TSSs. This is due t o t he fact t hat t he 32- bit
processors do not save t he upper 16 bit s of EFLAGS t o a 16- bit TSS. A t ask swit ch
back t o a 16- bit t ask t hat was execut ing in virt ual mode will never re- enable t he
virt ual mode, as t his flag was not saved in t he upper half of t he EFLAGS value in t he
TSS. Therefore, it is st rongly recommended t hat any code using 32- bit const ruct s
use a 32- bit TSS t o ensure correct behavior in a mult it asking environment .
19.28.5 Differences in I/O Map Base Addresses
The I nt el486 processor considers t he TSS segment t o be a 16- bit segment and wraps
around t he 64K boundary. Any I / O accesses check for permission t o access t his I / O
address at t he I / O base address plus t he I / O offset . I f t he I / O map base address
exceeds t he specified limit of 0DFFFH, an I / O access will wrap around and obt ain t he
permission for t he I / O address at an incorrect locat ion wit hin t he TSS. A TSS limit
violat ion does not occur in t his sit uat ion on t he I nt el486 processor. However, t he P6
family and Pent ium processors consider t he TSS t o be a 32- bit segment and a limit
violat ion occurs when t he I / O base address plus t he I / O offset is great er t han t he TSS
limit . By following t he recommended specificat ion for t he I / O base address t o be less
19-34 Vol. 3
ARCHITECTURE COMPATIBILITY
t han 0DFFFH, t he I nt el486 processor will not wrap around and access incorrect loca-
t ions wit hin t he TSS for I / O port validat ion and t he P6 family and Pent ium processors
will not experience general- prot ect ion except ions ( # GP) . Figure 19- 1 demonst rat es
t he different areas accessed by t he I nt el486 and t he P6 family and Pent ium
processors.
19.29 CACHE MANAGEMENT
The P6 family processors include t wo levels of int ernal caches: L1 ( level 1) and L2
( level 2) . The L1 cache is divided int o an inst ruct ion cache and a dat a cache; t he L2
cache is a general- purpose cache. See Sect ion 11. 1, I nt ernal Caches, TLBs, and
Buffers, for a descript ion of t hese caches. ( Not e t hat alt hough t he Pent ium I I
processor L2 cache is physically locat ed on a separat e chip in t he casset t e, it is
considered an int ernal cache. )
The Pent ium processor includes separat e level 1 inst ruct ion and dat a caches. The
dat a cache support s a writ eback ( or alt ernat ively writ e- t hrough, on a line by line
basis) policy for memory updat es.
The I nt el486 processor includes a single level 1 cache for bot h inst ruct ions and dat a.
The meaning of t he CD and NW flags in cont rol regist er CR0 have been redefined for
t he P6 family and Pent ium processors. For t hese processors, t he recommended value
( 00B) enables writ eback for t he dat a cache of t he Pent ium processor and for t he L1
Figure 19-1. I/O Map Base Address Differences
Intel486 Processor
FFFFH
I/O Map
Base Addres
FFFFH
FFFFH + 10H = FH
for I/O Validation
0H
FFFFH
FFFFH
I/O access at port 10H checks
0H
FFFFH + 10H = Outside Segment
for I/O Validation
bitmap at I/O address FFFFH + 10H,
which exceeds segment limit.
Wrap around does not occur,
general-protection exception (#GP)
I/O access at port 10H checks
bitmap at I/O map base address
FFFFH + 10H = offset 10H.
Offset FH from beginning of
TSS segment results because
P6 family and Pentium Processors
I/O Map
Base Addres
occurs. wraparound occurs.
Vol. 3 19-35
ARCHITECTURE COMPATIBILITY
dat a cache and L2 cache of t he P6 family processors. I n t he I nt el486 processor,
set t ing t hese flags t o ( 00B) enables writ e- t hrough for t he cache.
Ext ernal syst em hardware can force t he Pent ium processor t o disable caching or t o
use t he writ e- t hrough cache policy should t hat be required. I n t he P6 family proces-
sors, t he MTRRs can be used t o override t he CD and NW flags ( see Table 11- 6) .
The P6 family and Pent ium processors support page- level cache management in t he
same manner as t he I nt el486 processor by using t he PCD and PWT flags in cont rol
regist er CR3, t he page- direct ory ent ries, and t he page- t able ent ries. The I nt el486
processor, however, is not affect ed by t he st at e of t he PWT flag since t he int ernal
cache of t he I nt el486 processor is a writ e- t hrough cache.
19.29.1 Self-Modifying Code with Cache Enabled
On t he I nt el486 processor, a writ e t o an inst ruct ion in t he cache will modify it in bot h
t he cache and memory. I f t he inst ruct ion was prefet ched before t he writ e, however,
t he old version of t he inst ruct ion could be t he one execut ed. To prevent t his problem,
it is necessary t o flush t he inst ruct ion prefet ch unit of t he I nt el486 processor by
coding a j ump inst ruct ion immediat ely aft er any writ e t hat modifies an inst ruct ion.
The P6 family and Pent ium processors, however, check whet her a writ e may modify
an inst ruct ion t hat has been prefet ched for execut ion. This check is based on t he
linear address of t he inst ruct ion. I f t he linear address of an inst ruct ion is found t o be
present in t he prefet ch queue, t he P6 family and Pent ium processors flush t he
prefet ch queue, eliminat ing t he need t o code a j ump inst ruct ion aft er any writ es t hat
modify an inst ruct ion.
Because t he linear address of t he writ e is checked against t he linear address of t he
inst ruct ions t hat have been prefet ched, special care must be t aken for self- modifying
code t o work correct ly when t he physical addresses of t he inst ruct ion and t he writ t en
dat a are t he same, but t he linear addresses differ. I n such cases, it is necessary t o
execut e a serializing operat ion t o flush t he prefet ch queue aft er t he writ e and before
execut ing t he modified inst ruct ion. See Sect ion 8.3, Serializing I nst ruct ions, for
more informat ion on serializing inst ruct ions.
NOTE
The check on linear addresses described above is not in pract ice a
concern for compat ibilit y. Applicat ions t hat include self- modifying
code use t he same linear address for modifying and fet ching t he
inst ruct ion. Syst em soft ware, such as a debugger, t hat might
possibly modify an inst ruct ion using a different linear address t han
t hat used t o fet ch t he inst ruct ion must execut e a serializing
operat ion, such as I RET, before t he modified inst ruct ion is execut ed.
19-36 Vol. 3
ARCHITECTURE COMPATIBILITY
19.29.2 Disabling the L3 Cache
A unified t hird- level ( L3) cache in processors based on I nt el Net Burst microarchit ec-
t ure ( see Sect ion 11.1, I nt ernal Caches, TLBs, and Buffers ) provides t he t hird- level
cache disable flag, bit 6 of t he I A32_MI SC_ENABLE MSR. The t hird- level cache
disable flag allows t he L3 cache t o be disabled and enabled, independent ly of t he L1
and L2 caches ( see Sect ion 11. 5. 4, Disabling and Enabling t he L3 Cache ) . The
t hird- level cache disable flag applies only t o processors based on I nt el Net Burst
microarchit ect ure. Processors wit h L3 and based on ot her microarchit ect ures do not
support t he t hird- level cache disable flag.
19.30 PAGING
This sect ion ident ifies enhancement s made t o t he paging mechanism and implemen-
t at ion differences in t he paging mechanism for various I A- 32 processors.
19.30.1 Large Pages
The Pent ium processor ext ended t he memory management / paging facilit ies of t he
I A- 32 t o allow large ( 4 MByt es) pages sizes ( see Sect ion 4.3, 32- Bit Paging ) . The
first P6 family processor ( t he Pent ium Pro processor) added a 2 MByt e page size t o
t he I A- 32 in conj unct ion wit h t he physical address ext ension ( PAE) feat ure ( see
Sect ion 4.4, PAE Paging ) .
The availabilit y of large pages wit h 32- bit paging on any I A- 32 processor can be
det ermined via feat ure bit 3 ( PSE) of regist er EDX aft er t he CPUI D inst ruct ion has
been execut ion wit h an argument of 1. ( Large pages are always available wit h PAE
paging and I A- 32e paging. ) I nt el processors t hat do not support t he CPUI D inst ruc-
t ion support only 32- bit paging and do not support page size enhancement s. ( See
CPUI DCPU I dent ificat ion in Chapt er 3, I nst ruct ion Set Reference, A- M, in t he
I nt el 64 and I A- 32 Archit ect ures Soft ware Developers Manual, Volume 2A, and AP-
485, I nt el Processor I dent ificat ion and t he CPUI D I nst ruct ion, for more informat ion
on t he CPUI D inst ruct ion. )
19.30.2 PCD and PWT Flags
The PCD and PWT flags were int roduced t o t he I A- 32 in t he I nt el486 processor t o
cont rol t he caching of pages:
PCD ( page- level cache disable) flagCont rols caching on a page- by- page basis.
PWT ( page- level writ e- t hrough) flagCont rols t he writ e- t hrough/ writ eback
caching policy on a page- by- page basis. Since t he int ernal cache of t he I nt el486
processor is a writ e- t hrough cache, it is not affect ed by t he st at e of t he PWT flag.
Vol. 3 19-37
ARCHITECTURE COMPATIBILITY
19.30.3 Enabling and Disabling Paging
Paging is enabled and disabled by loading a value int o cont rol regist er CR0 t hat modi-
fies t he PG flag. For backward and forward compat ibilit y wit h all I A- 32 processors,
I nt el recommends t hat t he following operat ions be performed when enabling or
disabling paging:
1. Execut e a MOV CR0, REG inst ruct ion t o eit her set ( enable paging) or clear
( disable paging) t he PG flag.
2. Execut e a near JMP inst ruct ion.
The sequence bounded by t he MOV and JMP inst ruct ions should be ident it y mapped
( t hat is, t he inst ruct ions should reside on a page whose linear and physical addresses
are ident ical) .
For t he P6 family processors, t he MOV CR0, REG inst ruct ion is serializing, so t he
j ump operat ion is not required. However, for backwards compat ibilit y, t he JMP
inst ruct ion should st ill be included.
19.31 STACK OPERATIONS
This sect ion ident ifies t he differences in t he st ack mechanism for t he various I A- 32
processors.
19.31.1 Selector Pushes and Pops
When pushing a segment select or ont o t he st ack, t he Pent ium 4, I nt el Xeon, P6
family, and I nt el486 processors decrement t he ESP regist er by t he operand size and
t hen writ e 2 byt es. I f t he operand size is 32- bit s, t he upper t wo byt es of t he writ e are
not modified. The Pent ium processor decrement s t he ESP regist er by t he operand
size and det ermines t he size of t he writ e by t he operand size. I f t he operand size is
32- bit s, t he upper t wo byt es are writ t en as 0s.
When popping a segment select or from t he st ack, t he Pent ium 4, I nt el Xeon, P6
family, and I nt el486 processors read 2 byt es and increment t he ESP regist er by t he
operand size of t he inst ruct ion. The Pent ium processor det ermines t he size of t he
read from t he operand size and increment s t he ESP regist er by t he operand size.
I t is possible t o align a 32- bit select or push or pop such t hat t he operat ion generat es
an except ion on a Pent ium processor and not on an Pent ium 4, I nt el Xeon, P6 family,
or I nt el486 processor. This could occur if t he t hird and/ or fourt h byt e of t he operat ion
lies beyond t he limit of t he segment or if t he t hird and/ or fourt h byt e of t he operat ion
is locat e on a non- present or inaccessible page.
For a POP- t o- memory inst ruct ion t hat meet s t he following condit ions:
The st ack segment size is 16- bit .
Any 32- bit addressing form wit h t he SI B byt e specifying ESP as t he base regist er.
19-38 Vol. 3
ARCHITECTURE COMPATIBILITY
The init ial st ack point er is FFFCH ( 32- bit operand) or FFFEH ( 16- bit operand) and
will wrap around t o 0H as a result of t he POP operat ion.
The result of t he memory writ e is implement at ion- specific. For example, in P6 family
processors, t he result of t he memory writ e is SS: 0H plus any scaled index and
displacement . I n Pent ium processors, t he result of t he memory writ e may be eit her a
st ack fault ( real mode or prot ect ed mode wit h st ack segment size of 64 KByt e) , or
writ e t o SS: 10000H plus any scaled index and displacement ( prot ect ed mode and
st ack segment size exceeds 64 KByt e) .
19.31.2 Error Code Pushes
The I nt el486 processor implement s t he error code pushed on t he st ack as a 16- bit
value. When pushed ont o a 32- bit st ack, t he I nt el486 processor only pushes 2 byt es
and updat es ESP by 4. The P6 family and Pent ium processors error code is a full 32
bit s wit h t he upper 16 bit s set t o zero. The P6 family and Pent ium processors, t here-
fore, push 4 byt es and updat e ESP by 4. Any code t hat relies on t he st at e of t he upper
16 bit s may produce inconsist ent result s.
19.31.3 Fault Handling Effects on the Stack
During t he handling of cert ain inst ruct ions, such as CALL and PUSHA, fault s may
occur in different sequences for t he different processors. For example, during far
calls, t he I nt el486 processor pushes t he old CS and EI P before a possible branch fault
is resolved. A branch fault is a fault from a branch inst ruct ion occurring from a
segment limit or access right s violat ion. I f a branch fault is t aken, t he I nt el486 and
P6 family processors will have corrupt ed memory below t he st ack point er. However,
t he ESP regist er is backed up t o make t he inst ruct ion rest art able. The P6 family
processors issue t he branch before t he pushes. Therefore, if a branch fault does
occur, t hese processors do not corrupt memory below t he st ack point er. This imple-
ment at ion difference, however, does not const it ut e a compat ibilit y problem, as only
values at or above t he st ack point er are considered t o be valid. Ot her operat ions t hat
encount er fault s may also corrupt memory below t he st ack point er and t his behavior
may vary on different implement at ions.
19.31.4 Interlevel RET/IRET From a 16-Bit Interrupt or Call Gate
I f a call or int errupt is made from a 32- bit st ack environment t hrough a 16- bit gat e,
only 16 bit s of t he old ESP can be pushed ont o t he st ack. On t he subsequent
RET/ I RET, t he 16- bit ESP is popped but t he full 32- bit ESP is updat ed since cont rol is
being resumed in a 32- bit st ack environment . The I nt el486 processor writ es t he SS
select or int o t he upper 16 bit s of ESP. The P6 family and Pent ium processors writ e
zeros int o t he upper 16 bit s.
Vol. 3 19-39
ARCHITECTURE COMPATIBILITY
19.32 MIXING 16- AND 32-BIT SEGMENTS
The feat ures of t he 16- bit I nt el 286 processor are an obj ect - code compat ible subset
of t hose of t he 32- bit I A- 32 processors. The D ( default operat ion size) flag in
segment descript ors indicat es whet her t he processor t reat s a code or dat a segment
as a 16- bit or 32- bit segment ; t he B ( default st ack size) flag in segment descript ors
indicat es whet her t he processor t reat s a st ack segment as a 16- bit or 32- bit
segment .
The segment descript ors used by t he I nt el 286 processor are support ed by t he 32- bit
I A- 32 processors if t he I nt el- reserved word ( highest word) of t he descript or is clear.
On t he 32- bit I A- 32 processors, t his word includes t he upper bit s of t he base address
and t he segment limit .
The segment descript ors for dat a segment s, code segment s, local descript or t ables
( t here are no descript ors for global descript or t ables) , and t ask gat es are t he same
for t he 16- and 32- bit processors. Ot her 16- bit descript ors ( TSS segment , call gat e,
int errupt gat e, and t rap gat e) are support ed by t he 32- bit processors.
The 32- bit processors also have descript ors for TSS segment s, call gat es, int errupt
gat es, and t rap gat es t hat support t he 32- bit archit ect ure. Bot h kinds of descript ors
can be used in t he same syst em.
For t hose segment descript ors common t o bot h 16- and 32- bit processors, clear bit s
in t he reserved word cause t he 32- bit processors t o int erpret t hese descript ors
exact ly as an I nt el 286 processor does, t hat is:
Base Address The upper 8 bit s of t he 32- bit base address are clear, which limit s
base addresses t o 24 bit s.
Limit The upper 4 bit s of t he limit field are clear, rest rict ing t he value of t he
limit field t o 64 KByt es.
Granularit y bit The G ( granularit y) flag is clear, indicat ing t he value of t he
16- bit limit is int erpret ed in unit s of 1 byt e.
Big bit I n a dat a- segment descript or, t he B flag is clear in t he segment
descript or used by t he 32- bit processors, indicat ing t he segment is no larger t han
64 KByt es.
Default bit I n a code- segment descript or, t he D flag is clear, indicat ing 16- bit
addressing and operands are t he default . I n a st ack- segment descript or, t he D
flag is clear, indicat ing use of t he SP regist er ( inst ead of t he ESP regist er) and a
64- KByt e maximum segment limit .
For informat ion on mixing 16- and 32- bit code in applicat ions, see Chapt er 18,
Mixing 16- Bit and 32- Bit Code.
19.33 SEGMENT AND ADDRESS WRAPAROUND
This sect ion discusses differences in segment and address wraparound bet ween t he
P6 family, Pent ium, I nt el486, I nt el386, I nt el 286, and 8086 processors.
19-40 Vol. 3
ARCHITECTURE COMPATIBILITY
19.33.1 Segment Wraparound
On t he 8086 processor, an at t empt t o access a memory operand t hat crosses offset
65, 535 or 0FFFFH or offset 0 ( for example, moving a word t o offset 65, 535 or
pushing a word when t he st ack point er is set t o 1) causes t he offset t o wrap around
modulo 65, 536 or 010000H. Wit h t he I nt el 286 processor, any base and offset combi-
nat ion t hat addresses beyond 16 MByt es wraps around t o t he 1 MByt e of t he address
space. The P6 family, Pent ium, I nt el486, and I nt el386 processors in real- address
mode generat e an except ion in t hese cases:
A general- prot ect ion except ion ( # GP) if t he segment is a dat a segment ( t hat is,
if t he CS, DS, ES, FS, or GS regist er is being used t o address t he segment ) .
A st ack- fault except ion ( # SS) if t he segment is a st ack segment ( t hat is, if t he SS
regist er is being used) .
An except ion t o t his behavior occurs when a st ack access is dat a aligned, and t he
st ack point er is point ing t o t he last aligned piece of dat a t hat size at t he t op of t he
st ack ( ESP is FFFFFFFCH) . When t his dat a is popped, no segment limit violat ion
occurs and t he st ack point er will wrap around t o 0.
The address space of t he P6 family, Pent ium, and I nt el486 processors may wrap-
around at 1 MByt e in real- address mode. An ext ernal A20M# pin forces wraparound
if enabled. On I nt el 8086 processors, it is possible t o specify addresses great er t han
1 MByt e. For example, wit h a select or value FFFFH and an offset of FFFFH, t he effec-
t ive address would be 10FFEFH ( 1 MByt e plus 65519 byt es) . The 8086 processor,
which can form addresses up t o 20 bit s long, t runcat es t he uppermost bit , which
wraps t his address t o FFEFH. However, t he P6 family, Pent ium, and I nt el486
processors do not t runcat e t his bit if A20M# is not enabled.
I f a st ack operat ion wraps around t he address limit , shut down occurs. ( The 8086
processor does not have a shut down mode or a limit . )
The behavior when execut ing near t he limit of a 4- GByt e select or ( limit = 0xFFFFFFFF)
is different bet ween t he Pent ium Pro and t he Pent ium 4 family of processors. On t he
Pent ium Pro, inst ruct ions which cross t he limit - - for example, a t wo byt e inst ruct ion
such as I NC EAX t hat is encoded as 0xFF 0xC0 st art ing exact ly at t he limit fault s for
a segment violat ion ( a one byt e inst ruct ion at 0xFFFFFFFF does not cause an excep-
t ion) . Using t he Pent ium 4 microprocessor family, neit her of t hese sit uat ions causes
a fault .
Segment wraparound and t he funct ionalit y of A20M# is used primarily by older oper-
at ing syst ems and not used by modern operat ing syst ems. On newer I nt el 64 proces-
sors, A20M# may be absent .
19.34 STORE BUFFERS AND MEMORY ORDERING
The Pent ium 4, I nt el Xeon, and P6 family processors provide a st ore buffer for
t emporary st orage of writ es ( st ores) t o memory ( see Sect ion 11. 10, St ore Buffer ) .
Writ es st ored in t he st ore buffer( s) are always writ t en t o memory in program order,
Vol. 3 19-41
ARCHITECTURE COMPATIBILITY
wit h t he except ion of fast st ring st ore operat ions ( see Sect ion 8.2. 4, Out - of- Order
St ores For St ring Operat ions ) .
The Pent ium processor has t wo st ore buffers, one corresponding t o each of t he pipe-
lines. Writ es in t hese buffers are always writ t en t o memory in t he order t hey were
generat ed by t he processor core.
I t should be not ed t hat only memory writ es are buffered and I / O writ es are not . The
Pent ium 4, I nt el Xeon, P6 family, Pent ium, and I nt el486 processors do not synchro-
nize t he complet ion of memory writ es on t he bus and inst ruct ion execut ion aft er a
writ e. An I / O, locked, or serializing inst ruct ion needs t o be execut ed t o synchronize
writ es wit h t he next inst ruct ion ( see Sect ion 8. 3, Serializing I nst ruct ions ) .
The Pent ium 4, I nt el Xeon, and P6 family processors use processor ordering t o main-
t ain consist ency in t he order t hat dat a is read ( loaded) and writ t en ( st ored) in a
program and t he order t he processor act ually carries out t he reads and writ es. Wit h
t his t ype of ordering, reads can be carried out speculat ively and in any order, reads
can pass buffered writ es, and writ es t o memory are always carried out in program
order. ( See Sect ion 8.2, Memory Ordering, for more informat ion about processor
ordering. ) The Pent ium III processor int roduced a new inst ruct ion t o serialize writ es
and make t hem globally visible. Memory ordering issues can arise bet ween a
producer and a consumer of dat a. The SFENCE inst ruct ion provides a performance-
efficient way of ensuring ordering bet ween rout ines t hat produce weakly- ordered
result s and rout ines t hat consume t his dat a.
No re- ordering of reads occurs on t he Pent ium processor, except under t he condit ion
not ed in Sect ion 8. 2. 1, Memory Ordering in t he I nt el

Pent ium

and I nt el486


Processors, and in t he following paragraph describing t he I nt el486 processor.
Specifically, t he st ore buffers are flushed before t he I N inst ruct ion is execut ed. No
reads ( as a result of cache miss) are reordered around previously generat ed writ es
sit t ing in t he st ore buffers. The implicat ion of t his is t hat t he st ore buffers will be
flushed or empt ied before a subsequent bus cycle is run on t he ext ernal bus.
On bot h t he I nt el486 and Pent ium processors, under cert ain condit ions, a memory
read will go ont o t he ext ernal bus before t he pending memory writ es in t he buffer
even t hough t he writ es occurred earlier in t he program execut ion. A memory read
will only be reordered in front of all writ es pending in t he buffers if all writ es pending
in t he buffers are cache hit s and t he read is a cache miss. Under t hese condit ions, t he
I nt el486 and Pent ium processors will not read from an ext ernal memory locat ion t hat
needs t o be updat ed by one of t he pending writ es.
During a locked bus cycle, t he I nt el486 processor will always access ext ernal
memory, it will never look for t he locat ion in t he on- chip cache. All dat a pending in
t he I nt el486 processor' s st ore buffers will be writ t en t o memory before a locked cycle
is allowed t o proceed t o t he ext ernal bus. Thus, t he locked bus cycle can be used for
eliminat ing t he possibilit y of reordering read cycles on t he I nt el486 processor. The
Pent ium processor does check it s cache on a read- modify- writ e access and, if t he
cache line has been modified, writ es t he cont ent s back t o memory before locking t he
bus. The P6 family processors writ e t o t heir cache on a read- modify- writ e operat ion
( if t he access does not split across a cache line) and does not writ e back t o syst em
19-42 Vol. 3
ARCHITECTURE COMPATIBILITY
memory. I f t he access does split across a cache line, it locks t he bus and accesses
syst em memory.
I / O reads are never reordered in front of buffered memory writ es on an I A- 32
processor. This ensures an updat e of all memory locat ions before reading t he st at us
from an I / O device.
19.35 BUS LOCKING
The I nt el 286 processor performs t he bus locking different ly t han t he I nt el P6 family,
Pent ium, I nt el486, and I nt el386 processors. Programs t hat use forms of memory
locking specific t o t he I nt el 286 processor may not run properly when run on lat er
processors.
A locked inst ruct ion is guarant eed t o lock only t he area of memory defined by t he
dest inat ion operand, but may lock a larger memory area. For example, t ypical 8086
and I nt el 286 configurat ions lock t he ent ire physical memory space. Programmers
should not depend on t his.
On t he I nt el 286 processor, t he LOCK prefix is sensit ive t o I OPL. I f t he CPL is great er
t han t he I OPL, a general- prot ect ion except ion ( # GP) is generat ed. On t he I nt el386
DX, I nt el486, and Pent ium, and P6 family processors, no check against I OPL is
performed.
The Pent ium processor aut omat ically assert s t he LOCK# signal when acknowledging
ext ernal int errupt s. Aft er signaling an int errupt request , an ext ernal int errupt
cont roller may use t he dat a bus t o send t he int errupt vect or t o t he processor. Aft er
receiving t he int errupt request signal, t he processor assert s LOCK# t o insure t hat no
ot her dat a appears on t he dat a bus unt il t he int errupt vect or is received. This bus
locking does not occur on t he P6 family processors.
19.36 BUS HOLD
Unlike t he 8086 and I nt el 286 processors, but like t he I nt el386 and I nt el486 proces-
sors, t he P6 family and Pent ium processors respond t o request s for cont rol of t he bus
from ot her pot ent ial bus mast ers, such as DMA cont rollers, bet ween t ransfers of
part s of an unaligned operand, such as t wo words which form a doubleword. Unlike
t he I nt el386 processor, t he P6 family, Pent ium and I nt el486 processors respond t o
bus hold during reset init ializat ion.
19.37 MODEL-SPECIFIC EXTENSIONS TO THE IA-32
Cert ain ext ensions t o t he I A- 32 are specific t o a processor or family of I A- 32 proces-
sors and may not be implement ed or implement ed in t he same way in fut ure proces-
Vol. 3 19-43
ARCHITECTURE COMPATIBILITY
sors. The following sect ions describe t hese model- specific ext ensions. The CPUI D
inst ruct ion indicat es t he availabilit y of some of t he model- specific feat ures.
19.37.1 Model-Specific Registers
The Pent ium processor int roduced a set of model- specific regist ers ( MSRs) for use in
cont rolling hardware funct ions and performance monit oring. To access t hese MSRs,
t wo new inst ruct ions were added t o t he I A- 32 archit ect ure: read MSR ( RDMSR) and
writ e MSR ( WRMSR) . The MSRs in t he Pent ium processor are not guarant eed t o be
duplicat ed or provided in t he next generat ion I A- 32 processors.
The P6 family processors great ly increased t he number of MSRs available t o soft -
ware. See Appendix B, Model- Specific Regist ers ( MSRs) , for a complet e list of t he
available MSRs. The new regist ers cont rol t he debug ext ensions, t he performance
count ers, t he machine- check except ion capabilit y, t he machine- check archit ect ure,
and t he MTRRs. These regist ers are accessible using t he RDMSR and WRMSR inst ruc-
t ions. Specific informat ion on some of t hese new MSRs is provided in t he following
sect ions. As wit h t he Pent ium processor MSR, t he P6 family processor MSRs are not
guarant eed t o be duplicat ed or provided in t he next generat ion I A- 32 processors.
19.37.2 RDMSR and WRMSR Instructions
The RDMSR ( read model- specific regist er) and WRMSR ( writ e model- specific
regist er) inst ruct ions recognize a much larger number of model- specific regist ers in
t he P6 family processors. ( See RDMSRRead from Model Specific Regist er and
WRMSRWrit e t o Model Specific Regist er in t he I nt el 64 and I A- 32 Archit ect ures
Soft ware Developers Manual, Volumes 2A & 2B for more informat ion. )
19.37.3 Memory Type Range Registers
Memory t ype range regist ers ( MTRRs) are a new feat ure int roduced int o t he I A- 32 in
t he Pent ium Pro processor. MTRRs allow t he processor t o opt imize memory opera-
t ions for different t ypes of memory, such as RAM, ROM, frame buffer memory, and
memory- mapped I / O.
MTRRs are MSRs t hat cont ain an int ernal map of how physical address ranges are
mapped t o various t ypes of memory. The processor uses t his int ernal memory map
t o det ermine t he cacheabilit y of various physical memory locat ions and t he opt imal
met hod of accessing memory locat ions. For example, if a memory locat ion is speci-
fied in an MTRR as writ e- t hrough memory, t he processor handles accesses t o t his
locat ion as follows. I t reads dat a from t hat locat ion in lines and caches t he read dat a
or maps all writ es t o t hat locat ion t o t he bus and updat es t he cache t o maint ain cache
coherency. I n mapping t he physical address space wit h MTRRs, t he processor recog-
nizes five t ypes of memory: uncacheable ( UC) , uncacheable, speculat able, writ e-
combining ( WC) , writ e- t hrough ( WT) , writ e- prot ect ed ( WP) , and writ eback ( WB) .
19-44 Vol. 3
ARCHITECTURE COMPATIBILITY
Earlier I A- 32 processors ( such as t he I nt el486 and Pent ium processors) used t he
KEN# ( cache enable) pin and ext ernal logic t o maint ain an ext ernal memory map and
signal cacheable accesses t o t he processor. The MTRR mechanism simplifies hard-
ware designs by eliminat ing t he KEN# pin and t he ext ernal logic required t o drive it .
See Chapt er 9, Processor Management and I nit ializat ion, and Appendix B, Model-
Specific Regist ers ( MSRs) , for more informat ion on t he MTRRs.
19.37.4 Machine-Check Exception and Architecture
The Pent ium processor int roduced a new except ion called t he machine- check excep-
t ion ( # MC, int errupt 18) . This except ion is used t o det ect hardware- relat ed errors,
such as a parit y error on a read cycle.
The P6 family processors ext end t he t ypes of errors t hat can be det ect ed and t hat
generat e a machine- check except ion. I t also provides a new machine- check archit ec-
t ure for recording informat ion about a machine- check error and provides ext ended
recovery capabilit y.
The machine- check archit ect ure provides several banks of report ing regist ers for
recording machine- check errors. Each bank of regist ers is associat ed wit h a specific
hardware unit in t he processor. The primary focus of t he machine checks is on bus
and int erconnect operat ions; however, checks are also made of t ranslat ion lookaside
buffer ( TLB) and cache operat ions.
The machine- check archit ect ure can correct some errors aut omat ically and allow for
reliable rest art of inst ruct ion execut ion. I t also collect s sufficient informat ion for soft -
ware t o use in correct ing ot her machine errors not correct ed by hardware.
See Chapt er 15, Machine- Check Archit ect ure, for more informat ion on t he
machine- check except ion and t he machine- check archit ect ure.
19.37.5 Performance-Monitoring Counters
The P6 family and Pent ium processors provide t wo performance- monit oring count ers
for use in monit oring int ernal hardware operat ions. The number of performance
monit oring count ers and associat ed programming int erfaces may be implement at ion
specific for Pent ium 4 processors, Pent ium M processors. Lat er processors may have
implement ed t hese as part of an archit ect ural performance monit oring feat ure. The
archit ect ural and non- archit ect ural performance monit oring int erfaces for different
processor families are described in Chapt er 30, Performance Monit oring, . Appendix
A, Performance- Monit oring Event s, list s all t he event s t hat can be count ed for
archit ect ural performance monit oring event s and non- archit ect ural event s. The
count ers are set up, st art ed, and st opped using t wo MSRs and t he RDMSR and
WRMSR inst ruct ions. For t he P6 family processors, t he current count for a part icular
count er can be read using t he new RDPMC inst ruct ion.
Vol. 3 19-45
ARCHITECTURE COMPATIBILITY
The performance- monit oring count ers are useful for debugging programs, opt imizing
code, diagnosing syst em failures, or refining hardware designs. See Chapt er 30,
Performance Monit oring, for more informat ion on t hese count ers.
19.38 TWO WAYS TO RUN INTEL 286 PROCESSOR TASKS
When port ing 16- bit programs t o run on 32- bit I A- 32 processors, t here are t wo
approaches t o consider:
Port ing an ent ire 16- bit soft ware syst em t o a 32- bit processor, complet e wit h t he
old operat ing syst em, loader, and syst em builder. Here, all t asks will have 16- bit
TSSs. The 32- bit processor is being used as if it were a fast er version of t he 16- bit
processor.
Port ing select ed 16- bit applicat ions t o run in a 32- bit processor environment wit h
a 32- bit operat ing syst em, loader, and syst em builder. Here, t he TSSs used t o
represent 286 t asks should be changed t o 32- bit TSSs. I t is possible t o mix 16
and 32- bit TSSs, but t he benefit s are small and t he problems are great . All t asks
in a 32- bit soft ware syst em should have 32- bit TSSs. I t is not necessary t o
change t he 16- bit obj ect modules t hemselves; TSSs are usually const ruct ed by
t he operat ing syst em, by t he loader, or by t he syst em builder. See Chapt er 18,
Mixing 16- Bit and 32- Bit Code, for more det ailed informat ion about mixing
16- bit and 32- bit code.
Because t he 32- bit processors use t he cont ent s of t he reserved word of 16- bit
segment descript ors, 16- bit programs t hat place values in t his word may not run
correct ly on t he 32- bit processors.
19-46 Vol. 3
ARCHITECTURE COMPATIBILITY

S-ar putea să vă placă și