Concept, Design, and Implementation of A Slimline Boot Firmware For Linux On Power Architecture

— Diploma Thesis —
Concept, Design, and Implementation of a Slimline Boot

Firmware for Linux on Power Architecture
Heiko Joerg Schick

Concept, Design, and Implementation of a Slimline Boot
Firmware for Linux on Power Architecture
Heiko Joerg Schick

Matriculation Number: 66714
Hugo-Bertsch-Str. 16
72459 Albstadt
Tel.: 07431 / 971370
E-Mail: info@schihei.de
Dr. rer. nat. Otto Wohlmuth

IBM Deutschland Entwicklung GmbH
Open System Firmware Design & Development
Schoenaicherstr. 220
Tel.: 07031 / 16-3529
E-Mail: wohlmuth@de.ibm.com
Prof. Dr. Martin Rieger

Fachhochschule Albstadt-Sigmaringen
Fachbereich Engineering
Poststr. 6
Tel.: 07431 / 579-124
E-Mail: rieger@fh-albsig.de
i
Disclaimer
Hereby I reassure having written the presented work independently and by using only the
listed sources and facilities.
Albstadt, August 25, 2004 Heiko Joerg Schick
ii
Credits
Here by I would like to thank Dr. rer. nat. Otto Wohlmuth that he provided me with many
instructions, thaugt me many usful techniques, and helped me all the time. Thanks to Prof.
Dr. Martin Rieger for his tutorial work and remarks.
I would also like to thank my family for their patience and support, especially my father
who made me acquire the tase for computer sciences, my twin brother who helped to find ap-
propriate I2C hardware for testing my ideas, and my sister for proofreading this diploma thesis.
Many thanks to Hartmut Penner, Segher Boessenkool, and Benjamin Herrenschmidt for their
help in understanding firmware basics and concepts, the PowerPC architecture, the magic and
beauty of Forth, and the Linux/PPC64 kernel.
Thanks to all the colleagues at IBM Deutschland Entwicklung GmbH for their help in creation
of this project.
iii
Contents
1 Introduction 1
2 Basic Technologies 3
2.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 IBM JS20 Blade Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 IBM PowerPC 970 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Miscellaneous Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Firmware Anatomy 7
3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Common Hardware Reference Platform . . . . . . . . . . . . . . . . . . . . . . 8
3.3 RISC Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Apple Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 LinuxBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 OpenBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7 Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Programming Language “Forth” 15

4.1 Forth Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Elements of Forth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Implementation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Forth Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Advantages and Range of Application . . . . . . . . . . . . . . . . . . . . . . . 20
5 Linux/PPC64 Boot Procedure 22

5.1 Linux/PPC64 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Low-Level Support and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Interfacing to Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iv
Contents
6 Slimline Prototype Firmware 27

6.1 Low-Level Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.1.1 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1.2 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.1.3 Auxiliary Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2.1 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2.2 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2.3 Boot Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.4 Auxiliary Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Agnostic Device Drivers 42

7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2 How it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3 Packaging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.4 Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.4.1 Design Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.4.2 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.5 Components of the Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5.1 Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5.2 Byte-Code Verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5.3 Inner and Outer Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.5.4 Data and Return Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.5.5 Token-Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.5.6 The Doers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.5.7 Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6 Byte-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.1 Byte-Code Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.6.2 Byte-Code Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.6.3 Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.7 ADD to Linux I2C Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.7.1 I2C Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.7.2 I2C in Linux 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.7.3 Byte-Code to Linux I2C Binding . . . . . . . . . . . . . . . . . . . . . . 56
7.8 Further Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8 Conclusions 60
v
Contents
A Glossary 61
B ADD Byte-Code Functions 63
Bibliography 69
vi
List of Figures
2.1 Open Firmware Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 IBM PowerPC 970 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Common Hardware Reference Platform . . . . . . . . . . . . . . . . . . . . . . 8
3.3 RISC Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Apple Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 LinuxBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6 Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 Structure of a Dictionary Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Linux/PPC64 Boot Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Call Tree for prom init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 Low-Level Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.1 ADD – How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.2 ADD – Packaging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.3 ADD – Component Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
List of Tables
5.1 Physical Memory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.1 Comparsion: Run-Time Abstraction Services and Platform Expert . . . . . . . 43

7.2 Token-Table Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3 ADD Byte-Code Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4 I2C ADD Byte-Code Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
viii
Listings
7.1 Constant-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2 Variable-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 Colon Definition-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.4 i2c driver structure used for the I2C Chip Driver . . . . . . . . . . . . . . . . 57
ix
Chapter 1
Introduction
In consequence of Moores Law computer systems become not only smaller, faster and cheaper,
but also more complex. The development of soft– and hardware, which increases in volume,
tends to expanded administration and programming efforts. Configuration and debugging
engages more and more time. A big handicap of modern systems is the hardware. Defec-
tive hardware components are mostly recognized not until they are turned out. The problem
is that not only the whole computer systems can breakdown – other components can get
damages, too. It is only possible to find the cause of defect in complex analyses. Clues or in-
formation exist sparsely and debug tools are often only available in development environments.
As a result of this, one of the biggest problems in firmware development is the simplification
of the whole software design without circumcises functionality and flexibility. Leading com-
panies like IBM, Apple, and Intel addressed the problem and drives enormous research and
development efforts on firmware specifications.
The benefits are obvious:
2 Flexible interfaces during boot-time and run-time.
2 Extended debug facilities in case of soft– and hardware problems.
2 Small firmware layers without overdesigned functionality.
2 High portable boot firmware which runs on almost every hardware.
2 Hardware drivers which permits different packaging models.
2 Software which can be customized in case of performance and real-time requirements.
Many manufacturers are geared to Open Firmware. Open Firmware is a hardware-independent

firmware, developed by Sun Microsystems, and used in modern workstation and servers. It
1
Chapter 1. Introduction
is accessed by a Forth based language interface and is described by IEEE standard IEEE-1275.
Intel tries to establish his own BIOS standard. This standard is named as the Extensible
Firmware Interface (EFI) which describes a new model for the interface between operating
system and platform firmware. This interfaces contains platform-related information, plus
boot– and run-time service calls that are available to the operating system and its loader.
The target of Intel is that these components provide a standard environment for booting an
operating system and running pre-boot applications.
IBM uses the RISC Platform Architecture (RPA) which is based on Open Firmware. The
biggest different to Open Firmware is that the RISC Platform Architecture implements a hy-
pervisor which allows the execution of several operating systems at the same machine. This de-
sign is mostly used on big mainframe machines. They also implements Run-Time Abstraction
Services (RTAS) which provides hardware-specific functions, including functions for accessing
the real-time clock, non-volatile RAM (NVRAM), restart, shutdown, and PCI configuration
cycles. These functions are implemented under a hardware-independent synchronous interface.
Apple uses an Open Firmware based concept, but without the usage of a hypervisor and the
Run-Time Abstraction Services. Instead they implemented a new and complex software pack-
age to get rid of all drawbacks in such hardware-abstraction concepts.
These entire and other concepts have serious differences in skeletal structure and implemen-
tation. Every concept has drawbacks and advantages. To get a “Concept, Design, and Im-
plementation of a Slimline Boot Firmware for Linux on Power Architecture” it is necessary
to understand these basics completely. Chapter 2, 3, and 4 will give introductions and imple-
mentation details on these technologies.
Chapter 5 and 6 deal with the boot process and the control flow of a slimline boot firmware.
Furthermore, design aspects and implementations are specified and described more in detail.
In chapter 7, a new hardware-abstraction mechanism and implementation are introduced.

This new technology should avoid all drawbacks of existing concepts. It is called “Agnostic
Device Driver” and shows how a byte-code program could placed into an Open Firmware data
structure, which is later used by the operating system.
2
Chapter 2
Basic Technologies
This chapter describes the basic technologies, which are used in existing PowerPC systems.
The intention of this chapter is to be a good starting point in case of understanding Open
Firmware and the hardware of an IBM JS20 64-bit PowerPC processor-based 2-way blade
server.
2.1 Open Firmware

The IEEE Standard 1275–1994, Standard for Boot (Initialization Configuration) Firmware,
Core Requirements and Practices, is the first non-proprietary open standard for boot firmware
that is usable on different processors and buses. Firmware which complies with this standard
(also known as “Open Firmware”) includes a processor-independent device interface that al-
lows add-in devices to identify themselves and to supply a single boot driver that can be used,
unchanged, on any CPU. In addition, Open Firmware includes a user interface with power-
ful scripting and debugging support and a client interface that allows an operating system
and their loaders to use Open Firmware services during the configuration and initialization
process. Open Firmware stores all information of the complete hardware in a tree structure
called device tree. This device tree supports multiple interconnected system buses to offer a
framework for “plug and play”-type auto configuration across different buses.
It was designed to support a variety of different processor Instruction Set Architectures (ISAs)
and different buses, that’s why it is used over a million machines and supported by several
system vendors. For example: provisions for PCI, Futurebus+, VME+D, and SMBus already
exist and can be used for card identification and booting.
Beside this, Open Firmware uses the “plug-in driver” technique to make use of new devices for
booting or message display without modification to the main Open Firmware system ROM.
Each device has its own plug-in driver—normally located in a ROM on the device itself. Such
3
Chapter 2. Basic Technologies
a driver is realized in FCode and not in machine language. FCode is a machine indepen-
dent language, which is a byte-coded “intermediate language” for the Forth programming
language, therefore FCode drivers can be used on different hardware models. Here plug-in
device cards can use FCode to report their characteristics to the firmware and the system
software. Such characteristics may include the device name, model, revision level, device type,
register locations, interrupt levels, supported features, and any other identification informa-
tion that make sense for the particular device. System software, like an operating system, can
use this information for automatic configuration. All information’s are stored in a processor–
and architecture-independent format that may easily retrieved decoded. The main part of
Open Firmware is developed in the programming language Forth. Forth was originally devel-
oped in the early 1970s by Charles H. Moore, at the National Radio Astronomy Observatory.
It was used for controlling radio telescopes with all associated scientific instruments and for
high-speed data acquisition and graphical analysis. Forth is an industry-standard interactive
programming language and is based on a stack oriented “virtual machine” that may be easily
and efficiently implemented on any system.
Figure 2.1: Open Firmware Structure
4
2.2 Hardware
2.2.1 IBM JS20 Blade Server
It is designed to have exceptional performance for compute-intensive applications and high

throughput from processor to memory I/O. Both design goals combined, give an excellent
choice for tasks in bioinformatic, digital signal processing, scientific computing, and Linux
clustering. To meet this requirements the JS20 has two 1.6 GHz PowerPC 970 processors
with full speed 512 KB ECC L2 cache, system memory with ECC support, dual-channel
EIDE (ATA-100) controller, and two full-duplex dual Gigabit Ethernet PCI connections for
high-speed network connection.
2.2.2 IBM PowerPC 970
The IBM PowerPC 970 is the first 64-bit high-performance RISC processor for mainstream
desktop usage. It could be characterizes as “wide and deep”, which means, that the PowerPC
970 complies both design philosophies in modern chip manufacturing. In other words, it has an
extremely wide execution core and a 16-stage pipeline. One the other hand, with a maximum
of 2 GHz it has not the same speed like a Pentium 4, but it was also designed from the
ground with multiprocessing in mind. Instead of increasing the clockspeed to get a higher
performance, this processor is normaly used in a SMP system. The L1 cache of the PowerPC
970 is split into an instruction cache (i-cache) and a data cache (d-cache). Its instruction
cache is roughly twice the size of his predecessors. This is necessary, due to the much higher
performance penalty for cache misses, because of the longer pipeline. When you combine the
32 KB d-cache with the sizable 512 KB L2, the 900 Mhz DDR frontside bus, and the support
for up to 8 data prefetch streams, it is clear that this chip was designed for floating-point–
and SIMD-intensive applications.
2.2.3 Miscellaneous Devices

National Semiconductor PC87417 LPC Server I/O devices
Generally, the PC87417 is targeted for a wide range of servers and workstations. It provides
support for serial ports, an IEEE 1284 parallel port, floppy disk controller, keyboard and mouse
controller, LPC bus interface, system wakeup control, real time clock and general purpose I/O
ports.
AMD-8111 HyperTransport I/O Hub
The AMD-8111 HyperTransport I/O Hub replaces what traditional is called “Southbridge”.
This device integrates storage, connectivity, audio, I/O expansion, security and system man-
agement functions into a single component.
5
Figure 2.2: IBM PowerPC 970 Architectue; 64-bit data, 48-bit adresses (4TB), native 32-bit compatibility;
2LSU, 2IU, 2FPU, 2VPU (VALU+VPERM, 128-bits); up to 212 instructions in flight.
AMD-8131 HyperTransport PCI-X Tunnel
This high-speed device provides two independent high-performance PCI-X bus bridges inte-
grated with a high-speed HyperTransport technology tunnel. This tunnel function provides
connection to other HyperTransport technology devices.
6
Chapter 3
Firmware Anatomy
The target of this chapter is to describe the structure of existing boot firmware and all asso-
ciated mechanisms. It shows more details of a typically Open Firmware implementation and
explains boot firmware which is based on this standard, like the Common Hardware Reference
Platform or the RISC Platform Architecture. Because of the necessity to understand com-
peting implementations, like Intel’s Extensible Firmware Interface or LinuxBIOS, this chapter
includes also a short outlook of existing commercial and open source implementations.
3.1 Open Firmware

Every Open Firmware compliant boot firmware is divided into separate layers, which are
stacked on each other. The low-level firmware builds the lowest level and initializes the hard-
ware to a consistent state. This levels often implements debugging facilities and service rou-
tines. For example: a serial interface or a optical device could be used to print checkpoint,
status and error informations. Furthermore, this layer sets all exception handlers, includes
handling for systems with more than one processor and some routines for the later follow-
ing client interface of Open Firmware. Open Firmware normally is started by the low-level
firmware via loading the Forth system. The Forth system builds the skeletal structure for
Open Firmware, because the user interface, client interface and the device interface is mostly
done in the programming language Forth. The device interface loads the boot sequence of an
operating system from a storage device and executes it. During the boot phases it is possible
for the operating system to get hardware and system information over the client interface
from Open Firmware itself. This information is used to handle the hardware in a proper way.
After the operating system is completly started and has taken control of the system, Open
Firmware is not longer available, because it is overwritten by the operating system during the
boot process.
7
Chapter 3. Firmware Anatomy
3.2 Common Hardware Reference Platform

Common Hardware Reference Platform (CHRP) is a PowerPC hardware platform developed
by Apple, IBM, and Motorola. CHRP is superset of PreP1 , and was designed in 1996 with
openness of hardware and software in mind: it used many off-the-shelf components and was
supposed to run quite a few operating systems. In addition, any CHRP software, which
doesn’t require the Mac ROM, serial ports, or ADB ports should run on PreP machines. The
intention of CHRP is to make it possible for computer vendors to build Macintosh clones as
well as PowerPC based Windows NT computers. To reach this, CHRP uses Open Firmware
and RTAS to get a high level of hardware abstraction. During the boot process of the operating
system, RTAS is “initiated” by Open Firmware on request of the operating system, loaded into
the memory and made available to the operating system. RTAS, which stands for Run-Time
Abstraction Services, encapsulates some of the machine-dependent operations for PowerPC
computers into a machine-independent package. The operating system can call RTAS to do
things such as start and stop processors in an SMP configuration, display status indicators,
shutdown the system, and read/write NVRAM without having to know the details of how the
low-level functions are implemented on particular platforms. Open Firmware, RTAS, and any
legacy firmware refer to a collection often called “System Firmware”.
Figure 3.1: Open Firmware Figure 3.2: Common Hardware Reference Platform
3.3 RISC Platform Architecture

The RISC Platform Architecture (RPA) is essentially a combination of its predecessors, the
Common Hardware Reference Platform and some IBM extensions. This platform architecture
1
The PowerPC Reference Platform was a system standard, designed by IBM, intended to ensure compatibility
among PowerPC based systems built by different companies.
8
officially came into being in August of 1997. A key benefit of the RPA specification is the
ability of hardware platform developers to have degrees of freedom of implementation below
the level of architected interfaces and therefore have the opportunity for adding unique values.
In addition to this, RPA includes also a Hypervisor on top of the low-level firmware layer.
This Hypervisor owns all system resources and provides an abstraction layer through which
device access and control are arbitrated. Because of this, it is possible to run several operating
systems (at the same time) on a system.
Figure 3.3: RISC Platform Architecture
3.4 Apple Firmware

Apple’s firmware stack is based on Open Firmware. Apple has no RTAS to do hardware
abstraction for the operating system. Instead Apple implements Platform Expert. Platform
Expert consists of three components which are placed in the device tree of Open Firmware and
Mac OS X. Platform Expert Data is stored in the device tree as a sequence of big numbers.
These numbers are properties of nodes and could be fetched by Mac OS X over the client
interface. After Mac OS X has got the information, it could be processed by Platform Expert
or Platform Expert Code. The difference between Platform Expert and Platform Expert
Code is that Platform Expert Code implements exclusive machine dependent operations. One
drawback of the whole Platform Expert concept is that is was designed only for Mac OS X
and is quite inflexible in case of packaging and maintenance.
9
Figure 3.4: Apple Firmware
3.5 LinuxBIOS
LinuxBIOS is an open source replacement for BIOS’s found on x86, AMD64, Alpha and
PowerPC systems. The LinuxBIOS project was started at the Los Alamos National Lab
(LANL) in September 1999 to get better control during boot time in large cluster environments.
The original idea of LinuxBIOS was to load the Linux kernel from the ROM and build a boot
loader on top. Nowadays, it could be better described with: “Bring a computer for so far that
it is possible to boot a Linux kernel”. LinuxBIOS initializes the hardware, setups all exception
vectors, loads an ELF file and executes it. In other words, it interacts like low-level firmware
with an ELF loader included. Because of the ELF loader, LinuxBIOS can load several ELF
images (hereafter known as payload) and establishes four main scenarios how LinuxBIOS could
be used.
Variation A
This was the original concept of LinuxBIOS. LinuxBIOS replaces the normal BIOS code on
the motherboard with the Linux kernel itself, so that the operating system boots instantly into
Linux within seconds of turning it on. Nevertheless, this solution is only useful during bring
up of hardware. The problem is that packaging is inflexible, because every time the kernel
changes it is necessary to rewrite the flash. The next drawback is that, when the flashed
kernel is defective the complete hardware couldn’t used because of a broken firmware or a
Linux kernel.
10
Figure 3.5: LinuxBIOS
Variation B
The idea of this variation is to use separate kernels for the firmware and the Linux system. To
reach this, the firmware kernel implements a special system call (kexec, LOBOS, or 2 kernel
monte) which can load and execute another Linux kernel. Corresponding to the functionality
of the firmware kernel, the Kernel for the Linux system could be loaded from a file system
on a harddisk or via network. This solution may solve the inflexible packaging, but still has
some other problems. One major problem is that firmware, which makes use of this special
system call, only boots Linux. A second problem is that the system call needed to load and
execute a Linux kernel is not available on all platforms. But besides this, the idea of using
two separate kernels could be a great solution for machines which only want to boot Linux as
major operating system.
Variation C
Operating systems like Win2k and BSD need old-style PC-BIOS interrupt support during
the boot sequence. LinuxBIOS implements two additional layers on top of it to support
this functionality. The first layer is a small wrapper program to transfer informations from
LinuxBIOS to Bochs BIOS without having to make modifications in Bochs BIOS. This layer is
named Adhesive Loader (ADLO). ADLO is responsible for making sure the ROMs that makes
up Bochs BIOS and the VGA BIOS are stored at the expected addresses. It also performs the
11
task of copying Boch BIOS from its original location into shadow RAM. Additional, LinuxBIOS
stores some tables (e.g. memory map, IRQ routing) in a portable format. The problem is that
this format is not conforming to the format they are stored in PC-BIOS. ADLO converts these
tables to a format understood by Bochs BIOS. Bochs BIOS was written for the Bochs IA-32
emulation project to emulate an AMI BIOS. The primary job of Bochs BIOS is to setup the
Interrupt Vector Table and supply an entry point for each of its BIOS services. With these two
layer, ADLO and Bochs BIOS, it is possible to boot operating systems which needs PC-BIOS
support. This solution is not interesting for PowerPC platforms, because no operating systems
on such platforms uses PC-BIOS services.
Variation D
Sometimes a Linux kernel could not be used to boot another Linux kernel, as it is done in
variation B. The problem is mostly that the Linux kernel is too big to put it in the flash memory
or in the BIOS ROM. In such a case LinuxBIOS can boot a boot manager as payload. But this
soultion has the problem that every platform has its own boot manager which is completely
different. Intel machines for example uses LILO, Grub or FILO and PowerPC platforms uses
yaboot as boot manager. FILO is a small boot manager which can load boot images from local
file systems without the help of legacy BIOS services, which makes it attractive for porting it
to further platforms. It is also possible to use Etherboot as payload to support booting via
network. Etherboot is a software package for creating ROM images that can download code
over an Ethernet network to be executed on an x86 computer. Many network adapters have
a socket where a ROM chip can be installed. Etherboot is code that can be put in such a
ROM. Etherboot is normally used for for booting PCs diskless. A last option could be that
LinuxBIOS should load OpenBIOS. OpenBIOS is an open source project which wants to have
a 100% IEEE 1275–1994 compliant boot firmware.
3.6 OpenBIOS
OpenBIOS is a free portable firmware implementation. The goal is to implement a 100% IEEE
1275–1994 (referred to as Open Firmware) compliant firmware. Among it is features, Open
Firmware provides an instruction set independent device interface. This can be used to boot
the operating system from expansion cards without native initialization code. It is one goal of
OpenBIOS to work on all common platforms, like x86, Alpha, AMD64 and IPF. Additionally
OpenBIOS targets the embedded systems sector, where a sane and unified firmware is a crucial
design goal. Open Firmware is found on many servers and workstations and there are several
commercial implementations from SUN, Apple, IBM, CodeGen, and others. Even though
OpenBIOS has made quite some progress with it is several components, there’s a lot of work
12
to be done to get OpenBIOS booting an operating system. The basic development environment
is functional, but some parts of the device initialization infrastructure are still incomplete. Our
development environment consists of a Forth kernel (stack based virtual machine), an FCode
tokenizer and detokenizer (assembler/disassembler for Forth bytecode drivers).
3.7 Extensible Firmware Interface

The Extensible Firmware Interface (EFI) is Intel’s answer to have an interface between the
operating system and the platform firmware. EFI is a modular, platform-independent archi-
tecture that can perform boot and other BIOS function. It is driver based, clean, scalable, and
modular across different companies and platforms. EFI was mainly designed for IA-32, Intel
Itanium and Intel Xscale platforms. EFI is in the form of data tables that contain platform-
related information, boot and runtime service calls that are available to the operating system
loader and the operating system itself.
Figure 3.6: Extensible Firmware Interface
The Boot Services provides an interface for devices and system functionality that can be used
during boot time. Device access is abstracted through handles and protocols. During boot,
system resources are owned by the firmware and are controlled through boot services interface
functions. These functions can be characterized as global or handle-based. Runtime Services
are a minimal set of services which ensure an appropriate abstraction of base platform hard-
ware resources that may be needed by the operating system during its normal operating after
the boot phases. Beside this EFI implements a Boot Manager and a Virtual Machine. The
Boot Manager is a firmware policy engine that can be configured by modifying architecturally
defined global NVRAM variables and can load EFI drivers and EFI applications. EFI drivers
are EFI Byte Code programs and runs in the EFI Byte Code Virtual Machine. This virtual
13
machine provides platform- and processor-independent mechanisms to achieve a high level of

abstraction, operating system independence, and exclusive use of EFI Services.
For Intel the Extensible Firmware Interface is an innovative concept for next generation com-
puters, but the idea of a boot firmware with services for the operating system during boot-
and execution time, with stored platform information and a byte-code driver model is not
completely new. Exactly this behavior was described eight years ago in the IEEE Standard
for Boot (Initialization Configuration) Firmware: Core Requirements and Practices (IEEE
Std. 1275–1994).
14
Chapter 4
Programming Language “Forth”
The programming language Forth is the basis of every Open Firmware based boot firmware.
To understanding how a Forth systems works, as interpreter and as compiler, is necessary and
needful for the following chapters. This chapter shows all elements of a Forth system and
describes the different implementation strategies. At the moment, four open source implemen-
tation exist and could be used for a slimline boot firmware. To know which functionality is
necessary or can leave out, this chapter includes a detailed requirement list and shows advan-
tages and drawbacks.
4.1 Forth Introduction

Programming Forth claims a different way of thinking to the developer. This is due to the
fact that Forth is an extensible language and has a interactive development methodology. For
example: a programmer can implement support for object oriented programming for Forth
systems in the language Forth and the Forth system itself. The syntax of Forth is extremely
simple and is similar to postfix notation. Forth programs are a simple list of words, where
new words are defined as a sequence of previously defined words. But the true power of Forth
lies in the ability to switch between interpretation and compilation mode. Forth systems
uses two levels of interpretation: a text interpreter and an address interpreter. The text
interpreter extracts whitespace-separated character strings, which is entered via keyboard or
file. In interpretation mode the Forth system executes the corresponding word instantly. A
compiled Forth program is a collection of words, each of which contains a statically allocated
list of pointers to other words. In the end, the pointers lead to assembly language primitives.
The Forth address interpreter is used to execute compiled words, classically implemented as
threaded code.
15
Chapter 4. Programming Language “Forth”
4.2 Elements of Forth

Dictionary
The dictionary contains all executable “words” in a Forth system. Forth words are functionally
analogous to subroutines and equivalent to commands in other languages. A word is made by
a colon definitions.
The basic form of such a colon definition is:

: <name> <words to be executed> ;
“:” is a word like any other, but creates a new entry in the dictionary containing the word
name and places the interpreter in compilation mode. While in compilation mode, the compiler
extracts all words from an input stream and compiles them to the pointers of the word’s
definition in the dictionary. The compilation ends with the word “;”. The dictionary is
traditionally implemented as a linked list with variable-length entries, which are the Forth
words itself. In interpretation mode, the text interpreter searches the dictionary by sequentially
matching names in the source text against compiled words in the dictionary.
Figure 4.1: Structure of a Dictionary Entry (Indirect Threaded Code); the Control Bit controls the type and
the use of the Definition, the Parameter Field can include compiled Addresses which are used
by the Address Interpreter.
Data Stack
Forth implements a cell-wide push-down LIFO (last-in, first-out) data stack. The purpose
of the data stack is to hold numerical operands for Forth commands. Forth includes several
words to manipulate the data stack, like swap elements on the stack, duplicate or delete it.
16
Return Stack
The return stack is implemented like the data stack. This means, it is also a cell-wide push-
down LIFO stack. It cannot be directly manipulated via Forth words. The main tasks of the
return stack are to hold return addresses, loop parameters, to save temporary data, and the
interpreter pointer.
Text Interpreter
Every command typed by a user, read from stored source code on a disk, or evaluated from
a string is executed by the text interpreter. The first step of this interpreter is to parse
the given string. This is done by skipping leading spaces and parsing it with space (ASCII
0x20) as delimiter. Then the dictionary is searched for a definition which matches the current
token received from the parsed string. When a match occurs, the text interpreter performs
the interpretation or compilation behaviors of the definition. If no match is found, the text
interpreter tries to convert the token to a binary number. After successful conversion the
number is placed on the stack, otherwise the word “abort” is executed.
Address Interpreter
The internal engine of a Forth system is referred to as address interpreter and distinct from
the text interpreter which processes source code and user input. The text interpreter extracts
strings separated by spaces and looks if this word is in the dictionary. If the word is found in
the dictionary it is executed by the address interpreter who processes all addresses compiled in
the parameter field of a word definition by executing the definition pointed by the addresses.
The address interpreter has two important properties. First, it is fast, often requires as few
as one or two machine instructions per address. Second, it makes Forth definitions extremely
compact, as each reference requires only one cell.
Data Types and Defining Words
The primary unit (and almost the only data type) of information in the architecture of a Forth
system is the cell. A cell has the word length of the processors and is also the size of an address
and the size of an single item on a stack. It can be a flag, character, number, execution token,
or an address which means that Forth systems don’t have compiler services like type checking,
macro preprocessing, or common subexpression elimination. Forth also provides a basic set of
words used to define objects of various kinds. As with other features of Forth, the set of such
commands may be expanded. A word is defined when an entry is created in the dictionary.
CREATE is the basic word that does this; it may be used by: VARIABLE, CONSTANT, and other
defining words to perform the initial functions of setting up the dictionary entry.
17
CREATE <name> Constructs a dictionary entry for name. Execution of name will return the
address of its data space. No data space is allocated for name, however; this must be
done by subsequent actions such as ALLOT.
: <name> Creates a definition for name, called a colon definition. Enter compilation state
and start compiling the definition. The execution behavior of name will be determined
by the previously defined words that follow, which are compiled into the body of the
definition. name cannot be found in the dictionary until the definition is ended. At
execution time, the stack effects of name depend on its behavior.
VARIABLE <name> Defines a single-cell variable. Execution of name will return the address of
its data space.
CONSTANT <name> Defines a single-precision constant name whose value is x.
DEFER <name> Defines name to be an execution variable. When name is executed, the execution
token stored in name’s data area will be retrieved and the behavior associated with that
token will be performed.
VALUE <name> Defines a single-precision data object name whose initial value is x.
4.3 Implementation Strategies

Different models exist to implement the Forth virtual machine, these models are:
2 Indirect-Threaded Code:
This was the original design, and remains the most common method. Pointers to pre-
viously defined words are compiled into the execution word’s parameter field. The code
file of the execution word contains a pointer to machine code for an address interpreter,
which sequentially executes those definitions by performing indirect jumps through the
instruction pointer, which is used to keep its place. When a definition calls another def-
inition, the current instruction pointer is pushed onto the return stack; when the called
definition is finished, the saved instruction pointer is popped off of the return stack.
2 Direct-Threaded Code:
In this model, the code field contains the actual machine code for the address interpreter,
instead of a pointer to it. This is somewhat faster, but typically costs extra bytes for
some classes of words. It is most prevalent on 32-bit systems.
2 Subroutine-Threaded Code:
In this model, the compiler places a jump-to-subroutine instruction with the destination
18
address in-line. On 16-bit systems, this technique costs extra bytes for each compiled
reference. It is often slower than direct-threaded code, but it is an enabling technique to
allow the progression to native code generation.
2 Native Code Generation:

Going one step beyond subroutine-threaded code, the technique generates in-line machine
instructions for simple primitives. such as “+” and jumps to other high-level routines.
The resulting code can be much faster, at the cost of size and compiler complexity.
Native code can also be more difficult to debug than (indirect-)threaded code.
2 Token Threading:
This technique compiles to other words by using a token, such as an index into a table,
which is more compact than an absolute address. Such an implementation equalizes to
an indirect-threaded model.
4.4 Requirements
Various Forth system exists on the market. They differ in threading, design, implementation,
used programming language, and complexity. To choose an appropriate Forth system for the
prototype implementations it is necessary to define some requirements first.
1. The Forth system should use indirect-threading. Sure, indirect-threading is less efficient
as direct-threading, but it is easier to debug, because in indirect-thread implementations
the code field can support non-primitives like it is done for variables. Also a reason is,
that dictionary entries contain no machine code for primitives.
2. C should used as programming language for the Forth system. A Forth system could
also easily be implemented in Assembler, but Assembler code is harder to maintain than
C code and languages like C++ or Java still means to much overhead for firmware
development compared with a pure C implementation.
3. The design should be simple and implementations of own extensions must be possible.
4. It should be a full ANS Forth compliant implementation and must be distributed under
an open-source license (e.g. GPL or BSD).
4.5 Forth Systems

Gforth
Gforth is a fast and portable implementation of the ANS Forth language. It offers some nice
features such as input completion and history, backtraces, a decompiler and support for local
19
variables, and is well documented. Gforth combines traditional implementation techniques

with newer techniques for portability and performance: its inner innerpreter is direct threaded
with several optimizations, but it is also possible to use traditional-style indirect threaded
interpreter. Gforth is distributed under the GNU General Public license.
Portable Forth Environment
The Portable Forth Environment (PFE) is based on the ANSI Standard for Forth. The PFE
has been created by Dirk-Uwe Zoller and had been maintained up to the 0.9.x versions (1993-
1995). Tektronix has adopted the PFE package in 1998 and made a number of extensions.
It is now fully multithreaded and it features a module system. It is possible to load addi-
tional C objects at runtime to extend the Forth dictionary. It is best targeted for embedded
environments since terminal driver and the initialization routines could be easily changed.
Ficl
Ficl is a programming language interpreter designed to be embedded into other systems as a

command, macro, development prototyping language and is an acronym for “Forth Inspired
Command Language”. According to its developers, it is easy to port, easy to integrate, fast,
and is distributed under a BSD-style license. Ficl is also compliant with ANS Forth and has
a small memory footprint.
Paflof
Paflof is a full ANS Forth compliant Forth system and is portable to nearly every system. It
has been created by Segher Boessenkool and is distributed unter the BSD license. The current
implementation of the virtual machine is very clean and small. It fits uncompressed into less
than 40k flash memory. Paflof needs perl to create the initial dictionary and preferably a C99
compliant compiler which supports the restrict keyword and C++ style comments. It can
also run hosted in the user space of a UNIX style operating system. It is extensible, too –
primitives to read / write processors register, etc. could be easily implemented. This behavior
makes Paflof an ideal base for a slimline, Open Firmware based, boot firmware implementation.
4.6 Advantages and Range of Application

No single solution exists for embedded programming. Projects differ too widely in scale.
Real-time signal generators may need hand-optimized program code, but these programs take
only a few hundred lines of code. For such applications, assembly language is the only way
to go. Other jobs require extensive user interfacing and hundreds of thousands of lines of
code. There, the most economical solution is to program it in C with an operating system
20
included. In addition to this Forth has found its way. Forth isn’t a new language. It is been
commercially available for over 25 years and has its own ANSI standard. But it is not widely
used. There are probably less than a hundred full-time Forth programmers in the country.
But the programming language Forth isn’t out of date, because of the following advantages:
1. Forth remains one of the few environments which is totally comprehensible by one person.
This is a big plus for developers who works in safety-critical systems.
2. Forth makes the best out of a slow microprocessors with little RAM. Embedded systems
mostly include such a processor without haven 16 MB RAM and hard disk support. In
such scenarios Forth could be an appropriate solution.
3. There is no substitute for an interactive interpreter in case of debugging and program

development in embedded systems. An compile-test-cycle takes often more than 10
seconds which is clumsy. In Forth you can write and test a subroutine instantly. Beside
this, it is possible to include simple features in Forth to read and write processors registers
or memory.
4. Forth is an extensible programming language. This means that if the language doesn’t
support some features or capabilities which are necessary, it is easily possible to add
them – not as subroutines, but as a part of the programming language itself.
Because of all these benefits, Forth is not only used in Open Firmware. The NASA God-
dard Space Flight Center uses it for spacecraft flight system controllers, on-board payload
experiment controllers, ground support systems (e.g., communications controllers and data
processing systems), and to test flight and ground systems1 . Furthermore Forth is used in a
portable assistive and therapeutic communication device for people with aphasia, which was
developed by the Rehab R&D Center2 or in a computer-controlled electromechanical finger-
spelling hand to offers deaf-blind individuals access to computers, communication devices, or
person-to-person conversations3 . These areshort example where all over Forth is still in use.
It is a programming language which is still alive and quite a good environment not only for
embedded systems.
1
http://forth.gsfc.nasa.gov/
2
http://guide.stanford.edu/Projects/CommlProd.html
3
http://guide.stanford.edu/TTran/ttralph.html
21
Chapter 5
Linux/PPC64 Boot Procedure
Like every program, the Linux kernel must pass a load and initializing phase, before the real
jobs can be done. While this first phase is in normal applications quite unspectacular, the
kernel gets confronted as central layer with some exceptional problems. The boot phases self
could be forked in three different sections:
2 Loading of the kernel into RAM and draw up of a minimal runtime environment.
2 Jump into the platform dependent machine code of the kernel to do system specific
initializing of all element functions.
2 Jump into the platform independent initializing code, which does complete initializing
of all subsystems that is followed by a changeover to normal operation.
For firmware development, the first and second phases is important, because the kernel com-
municates in this layer with the firmware or processes firmware data. The concentration on
these layers is needed to get a better understanding of Open Firmware, firmware services, and
the later following firmware concepts.
5.1 Linux/PPC64 Overview

The design of Linux/PPC64 has targeted execution on all of IBM’s recent platforms that uses
the 64-bit PowerPC processor, including both pSeries and iSeries Systems. On pSeries sys-
tems, Linux runs directly on the hardware – whereat it can run on iSeries in logical partitions.
With this feature multiple instances of Linux/PPC64 can run on a system. The kernel im-
plementation has been designed to run in an iSeries logical partition or natively on a pSeries
system. The Linux/PPC64 kernel implements two data structures which are used to store
processor and system wide information. The first, the paca (processor address communication
area) contains information unique on each processors; therefore an array of paca’s are created,
one for each logical processor. The paca is mostly used to save locations during interrupt
22
Chapter 5. Linux/PPC64 Boot Procedure
processing. The second data structure is the naca (node address communications area). This
data structure is used to hold system wide informations like the number of processors in a
system or a partition, the size of real memory available to the kernel, and cache characteris-
tics. In addition, this data structure also contains one field to point to a data area used by
the hypervisor to transfer system configuration data to the kernel.
The early phases of boot and initialization differ between pSeries and iSeries platforms. For
the implementation of a slimline boot firmware only the pSeries kernel is interesting, because
this kernel is closer to a hypervisor less implementation as the iSeries kernel. First, the pSeries
kernel is loaded by a bootloader (e.g. Yaboot or directly Open Firmware) into a contiguous
block of real memory and gets control with relocation disabled. Initialization code in the kernel
interacts with Open Firmware to accomplish the following tasks:
1. Determine the system configuration (e.g. real memory and device tree).
2. Instantiate the Run-Time Abstraction Services.
3. Move secondary processors from spinning in Open Firmware to spinning in a kernel loop.
This initialization code then relocates the kernel to the real address 0x0, creates a kernel stack,
the TOC, builds initial hardware page tables and segment page tables, and initializations the
naca pointers. The naca is always located at a fixed real address (0x4000) in order to facilitate
debug. Table 5.1 shows the complete physical memory layout. Finally, relocation is enabled
and the common pSeries and iSeries code is executed.
0x0000 – 0x00ff Secondary processor spin code

0x0100 – 0x2fff pSeries Interrupt prologs
0x3000 – 0x3fff Interrupt support
0x4000 – 0x4fff naca
0x5000 – 0x5fff systemcfg
0x6000 iSeries and common interrupt prologs
0x9000 – 0x9fff Initial segment table
Table 5.1: Physical Memory Layout
5.2 Low-Level Support and Setup

All files needed for the complete Linux kernel low-level support and setup are stored in the
directory arch/ppc64/kernel. Hereafter, only the pSeries platform is described, because this
design is a more hypervisor less implementation and is therefore a good starting point for a
23
bare metal Linux/PPC64 implementation. The file heads.S contains the low-level support
and setup for Linux/PPC64 platforms, including trap and interrupt dispatch. Via entering
this code the following assumptions where taken:
1. The MMU is off and Open Firmware is running in real mode.
2. The kernel is entered at start.
The genesis of the Linux/PPC64 kernel is start. For pSeries platforms a branch to la-
bel start initialization pSeries is followed. This code fragment saves as first task all
parameters (client interface handler and client program arguments) which where given from
Open Firmware to a client program. Then 64-bit mode is enabled and a relocation offset is put
in r3. This relocation offset is necessary, because the PPC64 Linux kernel is not running at its
target address (KERNELBASE), due to in the low address region Open Firmware still takes place.
The Linux/PPC64 kernel needs communication with and informations from Open Firmware.
It must share during this time one memory space with it. As next the function prom init
is executed. This function does all interaction with the Open Firmware client interface (see
Chapter 5.3 for more information’s).
After the Open Firmware communication is done, a branch to label 970 cpu preinit is
done which setups some critical PowerPC 970 SPRs, before the MMU is switched off. Now
the Linux kernel is copied from its current address, where it is running, to its target ad-
dress at KERNELBASE. This is done in two transactions with the functions copy and flush and
copy to here. This procedure overwrites the Open Firmware exception vectors and the main
kernel code begins with execution. The segment table (stab initialize) and the hashed page
table (htab initialize) are initialized to get an initial memory mapping. Both functions need
an initialized systemcfg and naca pointer. By now the kernel branches to start here common,
who converges execution for all platforms and setups the initialized systemcfg and paca
pointer. Last of all, setup system is exeuted (common boot and setup code), followed by
start kernel.
Herewith, the end of the platform dependent initialization is reached. start kernel conducts
as dispatcher function and executes platform dependent and independent code. This function
calls mainly all high-level initializing routines for all subsystems and prints as first job the Linux
startup banner. The boot process is not longer described, because the kernel has overwritten
Open Firmware and is in the high-level initialization phases.
24
Figure 5.1: Linux/PPC64 Boot Procedure
5.3 Interfacing to Open Firmware

All procedures for interfacing to Open Firmware are stored in the file prom.c. This file in-
cludes the function prom init, which is called by start initialization pSeries defined
in head.S. prom init is called very early, before the kernel text and data have been mapped
to KERNELBASE, so references to extern and static variables must be relocated explicitly. Open
Firmware may have mapped I/O devices into the area starting at KERNELBASE, particularly
on CHRP machines. This means that it is not possible to call safely Open Firmware once
the kernel has been mapped to KERNELBASE. Therefore, all Open Firmware calls must be
done within prom init and all routines called within it must be relocated. prom init calls
prom init client services, which initializes the interface to Open Firmware. All following
function, who uses the client interface services needs this initialization. The standard out-
put device is initialized via prom init stdout to print debug and error message over Open
Firmware. After that, the Linux/PPC64 kernel stores all system wide and processors spe-
cific information in the naca data structure. Some of these informations are received with
25
prom initialize naca. If the kernel is running on a SMP (Symmetric Multi-Processors) ma-
chine, it is needful to do some extra handling for the further processors. For example: Open
Firmware and the complete low-level initialization of the kernel is done on only one CPU.
If a machine has two CPUs, the second CPU is hanging in a slave loop for so long as it is
freed on request of the first one. The Linux kernel gets the control of the second CPU, which
is spinning in Open Firmware, with the function prom hold cpus. This function executes a
client interface call and tells Open Firmware that the further CPUs should stop spinning in
Open Firmware and should go further with execution at the give address. In the case of the
Linux kernel this address is the location of a second slave loop in kernel space. That means
that further CPUs are freed from the Open Firmware slave loop and placed again into a slave
loop, which is under the control of the Linux kernel. The last job of the Linux kernel is to copy
the whole device tree (copy device tree). After everything is done, the function prom init
returns and the Open Firmware client interface services could not used anymore.
Figure 5.2: Call Tree for prom init; API functions like prom print, prom print nl, prom print hex,
call prom, prom panic, etc. are not included.
26
Chapter 6
Slimline Prototype Firmware
This chapter describes a first concept of a slimline boot firmware which is based on Open
Firmware. By this developable blueprint, which is partial in design, the attention is put on
packages and its functionality. The target is to show what packages are needed in low-level
firmware and Open Firmware. The concept has thought given to introduced technologies
in Chapter 2, 3 and 4 and to the function of the Linux/PPC64 boot process, illustrated in
Chapter 5. The structure of the slimline prototype firmware stacks looks similar to Open
Firmware, see Section 3.1. The first level builds the low-level firmware, followed by Open
Firmware and at last, the operating system on top. Components like RTAS or a hypervisor
are not included and implemented, because they blow up the size and complexity of the whole
firmware stack. To get hardware abstraction, the prototype uses “Agnostic Device Driver”.
This new technology is described in detail in Chapter 7 and prevents synchronous and high-
latency call-paths. Furthermore, this technology enables new ways of packaging firmware code.
The prototype uses the Forth engine “Paflof” as basis and as execution environment for Open
Firmware. This Forth engine is described in Section 4.5.
6.1 Low-Level Firmware

The idea is that the low-level firmware layer hides the complexity from hardware. Open
Firmware should be able to just request handling for SMP systems or how to talk to the
service processor without to worry about exactly which bits have to get wiggled in what
order. It includes a package of low-level routines and libraries that have been designed to help
developers rapidly bring up Open firmware on PowerPC based development platforms, such as
the IBM JS20 or the Momentum 970 Evaluation Platform. In addition, it interfaces to Open
Firmware to hide and encapsulate intellectual property, which should not be given to Open
Firmware developers. For the later following Open Firmware layer, the low-level firmware does
some early system configuration, e.g. Memory setup and initialization. Shortly described the
27
Chapter 6. Slimline Prototype Firmware
low-level firmware does all jobs, which are needed to start the basis of Open Firmware – the
Forth engine.
Figure 6.1: Low-Level Firmware
6.1.1 Control Flow
As main job, the low-level firmware brings up the machine to such a consistent state, that a
Forth engine could be loaded and executed. As additional task, the low-level firmware must
check if functional impairments exists and must handle these failures in a proper way. The
first task is to take sure that only one CPU is going further with execution. All residual CPUs
must be placed in a loop until they are freed. If this done, the next task initializes the serial
28
port to print checkpoint and debug information or error codes. The serial port is normally
the only way to print information in such an early state, because the handling of it is quite
easy and is fast implemented. The bootstrap component can now load code into the L2-cache
and execute it from there. This code configures and initializes the memory or tests it for bad
memory regions. If everything went well, it copies the rest of the execution code into memory
and goes on with further execution. The low-level firmware setups now GPIOs and I2C buses
and devices. Finally, it establishes an interface which can encapsulate intellectual properties
or system services. This is for example necessary to liberate all looping processors. During
the whole execution time, the low-level firmware uses auxiliary components to read and write
from the PCI bus, to handle the SPU, to talk to a watchdog, or the read and write processor
registers.
6.1.2 Basic Components

SMP Handling
Component: SMP Handling
Description:
This packages must handle in a SMP system further processors. This processors could
be placed in sleep mode our put into an spinning loop. It must also communicate with
the interface to Open Firmware to free all spinning and sleeping processors. The most
important function of this package is to take sure that only one CPU is going further
with the execution of the complete firmware code.
Functions:
master execution
slave loop
slave free
29
Serial Port
Component: Serial Port
Description:
The serial port package is needed to print and get information’s over the serial port.
During development it necessary, because it is the only way to print error information’s.
Later on, it is used to print checkpoint information’s in such an early state. Furthermore,
it is also used to run into several startup modes. For example: When “v” is pressed during
startup – the firmware runs into a special verbose mode and can print more information’s
or debug output.
Functions:
serial init
serial write byte
serial write word
serial write long
serial write double
serial write hex
serial write cp
serial write ec
serial write di
serial write nl
Bootstrap
Component: Bootstrap
Description:
At startup it is not possible to use the memory, because the hardware is not properly
initialized or configured. The job of the bootstrap code is to copy firmware into the
processor cache. This code initializes and tests the memory. Later on, this code copies
the rest of the firmware from NVRAM to the memory and begins with the execution of
the whole low-level firmware.
Functions:
copy to cache
execute from cache
copy to memory
execute from memory
30
Memory
Component: Memory
Description:
Some helper functions to initialize, read, write, and test the memory are implemented in
this component. Here, it is possible to run different test and error patterns to check if
some regions of the memory are defect.
Functions:
mem configure
mem init
mem test
write 8
write 16
write 16 le
write 32
write 32 le
write 64
read 8
read 16
read 16 le
read 32
read 32 le
read 64
I/O
Component: I/O
Description:
The I/O package setups GPIO and rudimentary input and output devices.
Functions:
io setup
I2C
Component: I2C
Description:
To send and receive message from the I2C bus it is necessary to implement some functions.
This package initializes all I2C buses and devices. It also implements core functions to
send and receive messages.
Functions:
i2c init
i2c send
i2c recv
31
IP Interface
Component: IP Interface
Description:
The IP interface can hide intellectual property from Open Firmware programmers. For
example: Special protocols to talk with the service processors are implemented in this
package.
6.1.3 Auxiliary Components

Processor
Component: Processor
Description:
This package implements functions to read and write processors registers. Such registers
could be normal GPRs or FPRs and special (not documented) register to change the
behavior of the whole processor.
Exception Handling
Component: Exception Handling
Description:
The exception package includes all exception handler of the low-level firmware.
PCI
Component: PCI
Description:
To read and write from the PCI bus, it is necessary to implement functionality which
initializes the PCI bus. This package is also useful to read and write byte or words in big
and little endian format. Furthermore, it could include code to walk over the PCI bus or
probe all PCI devices.
Functions:
pci write 8
pci write 16
pci write 32
pci write 64
pci read 8
pci read 16
pci read 32
pci read 64
32
SPU
Component: SPU
Description:
The SPU package includes all code which does for example power management over the
service processor. It is necessary to talk quite often with an service processor for different
purposes. The protocol for this communication is stored here and could be also used for
enabling and disabling the watchdog
Functions:
enable watchdog
disable watchdog
reboot
halt
suspend
manage spu
6.2 Open Firmware

Open Firmware is a portable boot firmware system. Boot firmware is the ROM-based software
that controls a computer from the time that it is turned on until the primary operating system
has taken control of the machine. The main function of boot firmware is to initialize the
hardware and then to “boot” the primary operating system. Secondary functions include
testing the hardware, managing hardware configuration information, and providing tools for
debugging in case of faulty hardware or software. Open Firmware is portable in the sense,
that its design is not tied to any particular processor family, nor to any particular expansion
bus. For more information on Open Firmware see Section 2.1.
6.2.1 Control Flow
Open Firmware is not directly executed by the low-level firmware. The low-level firmware
actually loads and executes a small wrapper component. This wrapper copies all needed
exception vectors and the Forth engine to a specified address in the memory. If everything
was copied well, the wrapper begins with execution of the Forth engine. The wrapper script
is needed, because start addresses or the interfaces could differ in every low-level firmware.
After that, the Forth engine begins to execute Forth code which implements the most Open
Firmware functionality. The first task of this Forth code is to initialize the serial port or the
frame buffer device to get an output possibility. This code can also set the serial port as input
device. With this option, the Forth engine could be programmed or used interactively, over
the serial port to debug errors or to setup Open Firmware environment variables. The device
interface component includes also code to build the device tree, acquire the boot mode, and
start the boot process of a client program. The device tree is created in two stages. The
first stage executes code which inserts hard-coded information into the device tree and the
33
second stage executes code which inserts the information dynamically. For example: To get
this information the whole PCI bus can be probed and every found device can be integrated
into device tree with its properties. Furthermore, the device interface can execute a FCode
programs which sits on the PCI device itself. This code can identify the device or includes
information into the device tree. As next, the device interface can load an ELF image from
network or hard disk. To realize this functionality, additional packages are used, as show in
Figure 6.2. The last job is to load this ELF image by an ELF loader and executes it. This
ELF image could be the Linux kernel. Now, the client program can communicate with Open
Firmware over the client interface to get the device tree, etc. If this is done completely, the
client program overwrites Open Firmware and gets complete control of the machine.
6.2.2 Basic Components

Wrapper
Component: Wrapper
Description:
With the fact that every low-level firmware could be different – it is useful to have an
wrapper. This wrapper can handles different addresses where the low-level is placed and
different target addresses where the low-level firmware wants to start the Forth system,
which builds the beginning of the Open Firmware execution environment.
Low-Level Startup Code

Component: Low-Level Startup Code
Description:
The low-level startup code copies all exception vectors of Open Firmware. It can copy
code and data sections, too.
Forth Engine
Component: Forth Engine
Description:
This component builds the skeletal structure of Open Firmware – the Forth system.
Without such a package, no Forth code could be interpreted, compiled, and executed.
34
Serial Port
Component: Serial Port
Description:
Like the serial port component in the low-level firmware, this package is needed to print
and get information’s over the serial port. The only differenct is, that this package
initializes the serial port in a more effective way and sets it as standard input and output
device.
Functions:
>serial
serial!
serial@
serial-emit
serial-key
serial-init
serial-fini
Frame Buffer
Component: Frame Buffer
Description:
The frame buffer component could be used to print information not only over the serial
port. With this component it is possible to print information’s over the graphic card,
too.
Functions:
>fb
fb!
fb@
fb-emit
fb-key
fb-init
fb-fini
Additional Data:
2 IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration)

Firmware, 1994, See esp. A. 2, “Specification”, p. 144.
Device Interface
The device interface allows Open Firmware to identify and use plug-in devices. The interface
is based on a byte-coded programming language known as FCode. The FCode language is
evaluated by a Open Firmware component known as the FCode evaluator. The Open Firmware
device interface specifies the behavior of a firmware system so that, when compliant devices
are added to a computer system whose firmware is compliant, the firmware may determine the
characteristics of those devices and may use them for various purposes, such as text display
and program loading. A standard FCode evaluator provides a defined environment for the
35
execution of standard FCode programs. A standard FCode evaluator is typically a component

of the boot firmware associated with a CPU board. A standard FCode program is a program
written in the FCode language that obeys prescribed rules for program structure and usage.
Consequently, its behavior is predictable when executed by a standard FCode evaluator. A
standard FCode program is typically resident on a plug-in device. A common use of a standard
FCode program is to implement a standard package that is relevant to the kind of device with
which the FCode program is associated.
IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994, See esp. Chap. 5, “Device Interface”, p. 45.
User Interface
The user interface allows a person to use Open Firmware services for such purposes as configu-
ration management and debugging of hardware, software, and firmware. The interface consists
of facilities for keyboard input, line editing, display output, and an evaluator (the Forth com-
mand interpreter) for the Forth programming language. It also specifies the behavior of a
firmware system so that a human may interact with it for such purposes as configuration
management, control of the booting process, and the debugging of hardware, client programs,
device drivers, and the firmware itself. A standard command interpreter accepts and executes
commands, typically entered interactively by a human, according to define command editing,
syntax, and semantic rules. A standard command intepreter is typically a component of the
boot firmware associated with a CPU board. A command group is a set of commands with
defined behaviors, the group as a whole providing some particular capability (for example,
one group of commands is concerned with client program debugging). Each command in the
group may be executed via a standard command interpreter. A standard program is a program,
written in the language defined by the specification of the standard command interpreter in
conjunction with the specification of one or more command groups, that obeys prescribed rules
for program structure and usage. Consequently, its behavior is predictable when executed by
a standard command interpreter. A standard program is typically either entered interactively
by a human, downloaded from some storage device, or stored within the script.
1994, See esp. Chap. 7, “User Interface”, p. 71.
Client Interface
The client interface allows client programs (programs that have been loaded and executed
under the control of Open Firmware) to make use of services provided by Open Firmware.
The interface consists of a set of software procedures and a mechanism for calling and passing
36
arguments and results to and from those procedures. The Open Firmware client interface
specifies the behavior of a firmware system so that client programs (programs that are loaded
into and execute from RAM) begin their execution with a predictable machine state and may
use various Open Firmware facilities. The client interface consists of both the specification of
the machine environment that exists when the client program begins execution and the set
of services that Open Firmware provides for the program’s use. Client interface services are
those services that Open Firmware provides to client programs, including device tree access,
memory allocation, mapping, console I/O, mass storage and network I/O, and other services.
1994, See esp. Chap. 6, “Client Interface”, p. 63.
6.2.3 Boot Components

ELF Loader
Component: ELF Loader
Description:
The ELF loader must handles ELF images in a proper way. Its job is to copy all existing
sections in an ELF file to the right place into the memory.
Functions:
elf-boot
elf-check-header
elf-load-file
elf-load32
elf-load64
elf-load-segments
37
File Systems
Component: File Systems
Description:
To read and write from different file systems it is necessary to implement this package.
With the file system package it is possible to read a kernel image from Ext2, RaiserFS,
ISO9660, etc. file systems.
Functions:
ext2-open
ext2-close
ext2-read
ext2-seek
iso9660-open
iso9660-close
iso9660-read
iso9660-seek
raiserfs-open
raiserfs-close
raiserfs-read
raiserfs-seek
xfs-open
xfs-close
xfs-read
xfs-seek
Network Protocols
Component: Network Protocols
Description:
To read and write over different network protocols it is necessary to implement this
package. With the file system package it is possible to read a kernel image via TFTP,
BOOTP, etc.
Additional Data:
2 Bill Croft and John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September
1986.
2 R. Droms. Dynamic Host Configuration Protocol (DHCP), RFC 2131, March 1997.
2 K. Sollins. The TFTP Protocol, Rev. 2, RFC 1350, July 1992.
2 J. Postel and J. Reynolds. Telnet Protocol Specification, RFC 854, Mai 1983.
2 J. Postel. User Datagram Protocol, RFC 768, August 1980.
2 Information Sciences Institute, University of Southern California. Transmission

Control Protocol, RFC 793, September 1981.
38
IDE / ATA
Component: IDE / ATA
Description:
This package implements a driver for IDE hard drivers.
USB
Component: USB
Description:
This package implements a driver for USB (OHCI, UHCI, etc.).
Ethernet
Component: Ethernet
Description:
This package implements a driver for a Ethernet card to read and write packages via
network.
Functions:
TODO TODO
6.2.4 Auxiliary Components

Data Structures
Component: Data Structures
Description:
Open Firmware used different data structures for building the device tree or a list of
properties for a device. Some library function for tree and linked lists are implemented
in this package.
Functions:
list-insert
list-delete
list-search
tree-insert
tree-insert-child
tree-insert-sibling
tree-delete
tree-search
39
PCI
Component: PCI
Description:
The PCI package must include dynamic content into the device tree of Open Firmware.
This content could be get via running a stored Fcode program which sits on the de-
vice itself or doing a PCI bus walk, which fetchtes all stored information in the PCI
configuration space.
Functions:
pci-probe-devices
pci-probe-mf
pci-create-props
pci-class-code2name
pci-class,CCSSPP
pci-class,CCSS
pci-VVVV,DDDD.RR
pci-SSSS,ssss
..
.
pci-VVVV,DDDD
pci-enable-bridge
pci-mf?
pci-bridge?
pci-device?
>config
Additional Data:
2 IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (Initialization
Configuration) Firmware, Rev. 2.1, August 1998.
40
Figure 6.2: Open Firmware
41
Chapter 7
Agnostic Device Drivers
One main goal of this diploma thesis is to introduce a new hardware abstraction mechanism,
which is fast and flexible in case of packaging. The chapter shows all advantages und drawbacks
of existing technologies and what kind of problems afflicted with it. As result, an executable
prototype is introduced with detailed description of all components and its functionality. This
new approach runs currently on Linux, but could easily adapt to every existing and new
boot firmware or operating system. Agnostic Device Drivers (ADD) is a technology how
binary program code can be integrated into the device tree of Open Firmware, which is later
executed in the kernel of the running operating system. ADD typically control devices like I2C
and GPIO. Preferably this code is very similar to Open Firmware Code (FCode) to leverage
existing tools and experiences. In a system with a service processor (SPU), functionality of
these services can be implemented as protocol or wrapper to the SPU itself. Agnosticness is
reached by running this interpreted code directly in the operating system. This functionality
prevents synchronous and high-latency call-paths.
7.1 Motivation
At the moment two hardware abstraction possibilities exist, which are based on Open Firmware.
Run-Time Abstraction Services are specified in the Common Hardware Reference Platform and
in the RISC Platform Architecture (see Section 3.2 and 3.3 for detailed information). RTAS
is packaged with the firmware code and stored in the NVRAM of the current system. The
operating system initiates RTAS over Open Firmware during the boot time. Because of this,
RTAS can only be packaged as firmware code and changes in RTAS means also rewrite the
current firmware in NVRAM with the new version. RTAS implements several calls which can
later used by the operating system. This call does mostly power management, reading from
and writing to NVRAM or PCI configuration space, and time management. The problem with
these calls is that RTAS is designed as a synchronous interface. When an operating system
does such a RTAS call, it must wait until the call is completed. Modern computer uses a
42
Chapter 7. Agnostic Device Drivers
difficult thermal calibration system with more than twenty sensors, fans, and sometimes liquid
cooling in it. The algorithms for such systems are quite complex and hard to program. To
implement this functionality with RTAS is not possible, because of the synchronous interface
and the fact that a thermal calibration system must called quite often and repetitive. This
will slow down the operating system. The other problem is that the algorithm must be im-
plemented in the kernel of the operating system itself. RTAS can only get the values from the
sensors or switch fans on. The complete policy and logic can not be done with RTAS. Apple
uses an own hardware abstraction concept with an asynchronous interface. This technology
is called Platform Expert and described in Section 2.1. Platform Expert has no high-latency
call-paths, like RTAS. The main issue with Platform Expert is that it uses three different com-
ponents. These components are packaged with the firmware code and the operating system.
When a new machine comes out – it could be possible that Platform Expert Data, the Plat-
form Expert, and the Platform Expert with the machine dependent part will change. This
means changes in the firmware and the operating system. Platform Expert has like RTAS
the drawback, that the policy and the logic of driver programs must be integrated into the
operating system. Furthermore, Platform Expert was designed for Mac OS X. This means it
is grew together with Mac OS X and can not be used in Linux. Table 7.1 shows a comparison
of RTAS and Platform Expert.
Run-Time Abstraction Services Platform Expert

Performance – ++
Flexibility – +/–
Packaging – ––
High-Level Language Facilities – –
Table 7.1: Comparsion: Run-Time Abstraction Services and Platform Expert
The target of Agnostic Device Drivers is to have an interface without high-latency call-paths,
flexible packaging mechanisms, and good porting features for a new operating system.
7.2 How it works

ADD byte-code programs are created from textual Forth source code by a program called a
tokenizer. A tokenizer reads a sequence of textual Forth words and writes the corresponding
sequence of ADD code bytes. The mapping from textual Forth words to ADD code bytes is
nearly one-to-one, and the preferred source format is very similar to a standard Forth pro-
gram. The ADD byte-code program is placed in the device tree and is later used by an ADD
byte-code evaluator. Such an ADD byte-code evaluator reads a sequence of bytes representing
ADD byte-code numbers and executes or compiles the associated ADD byte-code functions.
43
The ADD byte-code program is stored in the device tree of Open Firmware. During the boot
phases of the operating system, this byte-code program can be fetched via the client interface
and can be used directly and asynchronously in an evaluator, which runs in the kernel of the
operating system. The structure of the device tree and the functionality of the client interface
are specified in IEEE 1275-1994, Standard for Boot (Initialization Configuration) Firmware,
Core Requirements and Practices.
The key benefit of this behavior is to give firmware and hardware vendors the freedom to
implement functions, which are later executed by the operating system in an effective and fast
way. The operating system does not have to know all the details of the hardware, so power-
handling, I2C and GPIO tasks could be easily implemented. With this flexible functionality the
complete RTAS interface could be replaced, so we can keep proper distance of slow synchronous
and high-latency call-paths.
Figure 7.1: ADD – How It Works
7.3 Packaging Options

A problem with RTAS and Platform Export is, that it is inflexible and not a good solution in
case of packaging. The concept of Agnostic Device Driver implements two packaging option
to avoid such problems.
1. The byte-code program could be placed in the device tree of Open Firmware. In this
option, the byte-code program is packaged with the firmware code and stored in the
NVRAM of the machine.
2. This option implements an interface to copy byte-code programs to the ADD virtual
44
machine, which is running in the kernel of the operating system. This transaction is
done during run-time of the operating system. Furthermore, a developer can create and
test byte-code programs without having a long development cycle. Permanents reboots
of the machine are not necessary.
Finally, the biggest advantage is to have high-level language facilities, which gives the option
to implement the logic or policy directly in the ADD byte-code programs. It is possible to use
control structures (loops, branches, etc.) and defining words in byte-code programs. Defining
words are special functions to create and establish the usage of constants, variables, and sub-
routines. With these functions the Agnostic Device Driver concept has not the same problem
like the existing hardware abstraction mechanisms.
Figure 7.2: ADD – Packaging Options

1: ADD Byte-Code is included in the device tree of Open Firmware.
2: ADD Byte-Code is inserted via an user transaction.
7.4 Virtual Machine

The original meaning of virtual machine is the creation of a number of different identical
execution environments on a single computer, each of which exactly emulates the host com-
puter. This provides each user with the illusion of having an entire computer, but one that is
their “private” machine, isolated from other users, all on a single physical machine. A virtual
machine is therefore an abstraction computing architecture or computational engine that is
independent of any particular hardware or operating system that runs on top of a real hard-
ware platform and operating system. The programs for such a virtual machine runs virtually
on any hardware for which the virtual machine is available. To achieve this, the virtual ma-
chine borrows detail functionality form its host machine or operating system and introduces
45
typically its own instruction set, that is used for the execution environments. This instruction
set is independent of the architecture of the operating system or the host hardware. Virtual
machines have often its own memory subsystem and controls or limits access with the virtual
machine’s native function interface. The design and implementation of a virtual machine is
influenced by factors like size, portability, performance, memory consumption, and security.
7.4.1 Design Strategies
At the moment, four main design strategies exits how a virtual machine could be implemented.
These design strategies are:
2 Interpreted:
Embedded Devices uses often interpreted virtual machines. Such an interpreted virtual
machine is fast and easy implemented or ported to new hardware. On the other hand, it
has poor performance, because it executes one byte code at a time. This implementation
strategy is the worst of all possibilities in case of performance.
2 Just-In-Time:
This implementation has the advantage of knowing the hardware, which makes it more
complex to implement it. The performance is far above interpreters (with a pause up
front), because this virtual machine has an immediately prior to execute a program –
it compiles it for the corresponding architecture. A better name for this technology is
“better-late-than-never compiler”.
2 Hotspot:
Hotspot works by analyzing code as it runs, finding the hotspots. It halts program ex-
ecution to take time and optimize those pieces. A virtual machine that uses hotspot is
best suited for long running applications. Doing micro benchmarks on such an imple-
mentation is not representation.
2 Hybrid:
Hotspot is the flagship, but other JIT’s do this in some degree. JIT compile only code
that will run a lot and has no wasting time for JIT’ing initialization code. This virtual
machine has the best overall performance, but also the most complex design.
7.4.2 Design Goal
The goal for the ADD virtual machine is to get a small footprint virtual machine for the
operating system to control resource constrained devices. It should be easy to understand
and maintain. Beside this, it should be small without sacrificing features of programming
drivers for I2C or GPIO devices. Dynamic compilation or other performance techniques are
46
not necessary, but it should run in Linux kernel. Finally, a interpreted virtual machine will
meet all these requirements.
7.5 Components of the Virtual Machine

Figure 7.3 shows all components which are described in the following.
Figure 7.3: ADD – Component Overview
7.5.1 Front-End
The front-end component handles the loading of byte-code into the virtual machine. The ADD
virtual machine has support for two possibilities how to load the byte-code. The byte-code
could be loaded from the device tree which was fetched during the boot process or later during
the run-time of the operating system. To load byte-code programs from the device tree, the
front-end walk through the complete device tree and searches for ADD byte-code programs.
If a program was found, the front-end looks if additional properties for the byte-code program
exist in the device tree and pass it to the virtual machine. The virtual machine can use these
properties to setup the environment or to control the program execution. When a program is
loaded during the run-time of the operating system, program byte-code must be loaded from
user space into the kernel space. In case of the operating system Linux, this is done by a
character device driver which copies the ADD byte-code into kernel-space, so that the virtual
machine can execute it. The advantage of this scenario is, that a developer can program and
test fast ADD programs without to restart the operating system or the machine with new
firmware code.
7.5.2 Byte-Code Verifier
Programs are test on validity by a byte-code verifier. The byte-code verifier of the virtual
machine reads the header of an ADD programs and checks if it is valid. The header includes
the checksum and the length of the corresponding program. Calculated is the checksum by
using two’s complement addition and ignoring overflow. The program length is the quadlet
47
size number of bytes in the program, including the body and the header. These two values
must also be calculated by the byte-code verifier and checked against the checksum in the
program header.
7.5.3 Inner and Outer Interpreter
In a token threaded virtual machine, each execution token is an offset into a table of code
fields. The inner interpreter fetches the execution token pointed by the instruction pointer
and indexes into the token table, where it fetches the code field address of the word. Parameter
fields are used in various ways, depending upon the type of the entry in the token table. The
inner interpreter executes colon definitions which are implemented as primitive and handles
control structures, constants and variables. One reason for a threaded approach is that all of
the altered bindings are conveniently contained in a single table. Each task can be provided
with a save buffer to hold its version of the altered token table. To perform a context switch, it
is possible to merely copy the token table off to the old task’s token save buffer, and copy the
new task’s save buffer into the token table. If access to the source code for the inner interpreter
is available, the inner interpreter finds the token table by an active token table pointer. This
eliminates all the copying and provides a context switch that only requires to swap pointers to
the token table rather than swap the contents of the table. The outer interpreter of the virtual
machines parses all byte-codes which are not implemented as primitive. This is necessary
for the byte-code programs itself and for byte-code build-in functions. Such functions are
implemented in the virtual machine as byte-code and not in C or Assembler. If the outer
interpreter gets such a byte-code, it saves the return address on the data stack and executes
the corresponding primitive function via the inner interpreter. If the corresponding byte-code
is not a primitive, it must do this step until it gets a byte-code which is implemented as
primitiv.
7.5.4 Data and Return Stack
As virtual stack machine, the ADD virtual machine implements two data stacks. The data
stack is used to hold numeric operands. When a number is pushed onto or popped off the
stack, the remaining numbers are not moved. Instead, a pointer is adjusted to indicate the
last used position in a stack memory array. The top-of-stack pointer is kept in a register. The
standard token-table in the ADD virtual machine provides words for simple manipulation of
operands on the stack: SWAP, DUP, DROP, 2SWAP, etc. (see Appendix B for all Functions). In
general, the data stack is used to pass parameter to colon definitions or give back return codes.
The ADD virtual machine also implements a return stack. Like the data stack, the return
stack is also a LIFO list. It is mostly used for system functions of the virtual machine, but may
also be accessed directly by an ADD byte-code program. The return stack serves purposes
48
like holding return addresses for nested definitions and loop parameters. Because the return
stack has multiple uses, care must be exercised to avoid conflicts when accessing it directly.
7.5.5 Token-Tables
The ADD virtual machine uses three different token tables. Every token-table has its own
byte-code and application range (see Table 7.2).
2 Build-In Token-Table:
This table includes all functionality which is hard integrated into the virtual machine.
It is not possible for an ADD program to change this table – only reading from it is
allowed. Build-In Byte-Codes are implemented in the C (the mother language of the
virtual machine) or in the ADD byte-code language itself.
2 Vendor Token-Table:
This table includes all functions that interact with the back-end. Functions like reading
and writing to the I2C bus are implemented in this table. Furthermore, the vendor token-
table includes functions with real-time and performance requirements. No function in
this table is implemented in the byte-code language of the virtual machine and always
in its mother language.
2 Local Token-Table:
When an ADD program uses own variables, constants, or colon definitions a new entry
in this table is created with the corresponding token value. This table is for an ADD
program readable and writeable and differs between every ADD programs. In case of a
multi-tasking virtual machine, every thread must have its own local token-table.
Byte-Code Range Table

0x000 – 0x5FF Build-In Tokens
0x600 – 0x7FF Vendor Tokens
0x800 – 0xFFF Local Tokens
Table 7.2: Token-Table Segmentation
7.5.6 The Doers
Every non-primitive entry in a token-table has a function pointer to a “doer”. A doer is a

machine code fragment that handles these entries in a proper way. At the moment three doer
functions exits for constants, variables, and colon definitions. The advantage of a doer is that
the outer interpreter can handles all non-primitives in the same way. This works, because it
only must execute the function pointer, which points to the corresponding doer.
49
2 Constant-Doer:
The value for a constant number is stored in the parameter field of the token-table entry.
By executing a variable word, this doer takes the value from the parameter field and
puts in onto the data stack.
1 /**
2 * Doer code to handle constants in ADD byte - code programs .
3 */
4 void
5 add_do_con ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
6 {
7 # ifdef __DEBUG__
8 printk ("{% s }" , ttp [ fcode ]. name ) ;
9 # endif
10 add_check_stack (0 , 1) ;
11 {
12 cell tmp ;
13 tmp . u = ttp [ fcode ]. parameter ;
14 add_push ( tmp ) ;
15 }
16 return ;
17 }
Listing 7.1: Constant-Doer
2 Variable-Doer:
When a variable is created, the virtual machine allocates the needed memory and stores
the address of this memory region in the parameter field of its token-table entry. By
executing a variable word, this doer takes the address from the parameter field and puts
in onto the data stack.
1 /**
2 * Doer code to handle variables in ADD byte - code programs .
3 */
4 void
5 add_do_var ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
6 {
7 # ifdef __DEBUG__
9 # endif
10 add_check_stack (0 , 1) ;
11 {
12 cell tmp ;
13 tmp . u = ( type_l ) (&( ttp [ fcode ]. parameter ) ) ;
14 add_push ( tmp ) ;
15 }
16 return ;
17 }
Listing 7.2: Variable-Doer
50
2 Colon Definition-Doer:
For colon definition, the virtual machine stores the beginning address of the byte-code for
a colon definition in the code field address (CFA) field of the corresponding token-table
entry. The doer for colon definitions must put the current instruction pointer onto the
return stack and sets it to the address which is stored in the CFA field. After that, the
outer interpreter begins with the execution of the colon definition until it is completed.
Finally, the instruction pointer on the return stack is restored by the inner interpreter.
1 /**
2 * Doer code to handle colon definitions in ADD byte - code
3 * programs .
4 */
5 void
6 add_do_col ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )
7 {
8 # ifdef __DEBUG__
10 # endif
11 add_check_s tack_r (0 , 1) ;
12 {
13 cell tmp ;
14 tmp . u = ( type_u ) * ip ;
15 add_push_r ( tmp ) ;
16 }
17 (* ip ) = ttp [ fcode ]. cfa ;
18 return ;
19 }
Listing 7.3: Colon Definition-Doer
7.5.7 Back-End
Every operating system has different functions or calls to use the hardware. To keep it modular,
the back-end builds the interface between the virtual machine and the kernel of the running
operating system. All functions that use the back-end are stored in the vendor token-table,
because they are typically implemented as primitive who uses kernel specific functions. It is
also possible to implement own functionality in the back-end and in the vendor token table.
The benefit of programming functionality as primitive in the vendor token-table and not in
the byte-code language itself, is that it gives the possibility to optimize the routine itself. This
is necessary when a program code must care on performance or on real-time requirements.
7.6 Byte-Code
A programmer is greatly influenced by the language in which programs are written;
there is an overwhelming tendency to prefer constructions that are simplest in that
language, rather than those that are best for the machine. By understanding a
51
machine-oriented language, the programmer will tend to use a much more efficient
method; it is much closer to reality.
Donald E. Knuth, The Art of Computer Programming, Volume 1: Fundamental

Algorithms, 1997.
7.6.1 Byte-Code Header
The ADD byte-code header data type appears only at the beginning of an ADD program
following one of the functions start0, start1, start2, or start4. It contains information
about the ADD program as a whole. That information is provided for the benefit of external
software that may wish to characterize the ADD program. A standard ADD virtual machine
is permitted to skip and ignore the ADD byte-code header information, or to use it to verify
that the ADD program is intact.
Byte Name Description

0 header-data-type ADD byte-code header data type (e.g. start1).
1 format The value 0x08 in this filed indicates that this
ADD program is intended to operate with boot
firmware that complies with the Open Firmware
standard. The values 0x09 through 0xFF are
reserved for future revisions.
2 checksum-high High byte of the body checksum. The checksum
is the doublet size sum of the bytes of the pro-
gram body (excluding the header), calculated
using two’s complement addition and ignoring
overflow.
3 checksum-low Low byte of the body checksum.
4 length-high Most significant byte of the program length.
Program length is the quadlet size number of
bytes in the program, including the body and
the header.
5 length-high-middle High middle byte of the program length.
6 length-low-middle Low middle byte of the program length.
7 length-low Least significant byte of the program length.
Table 7.3: ADD Byte-Code Header
7.6.2 Byte-Code Encoding
The following byte-code formats are used to encode ADD programs. An ADD program consists
out of a sequence of bytes, which are read as byte-code numbers (ADD#). Some ADD#
uses additional bytes for representing the byte-code number. Those functions are recognized
52
during interpretation of the ADD program. Some byte-codes use arguments to control the
interpretation in the virtual machine or the compilation with a tokenizer.
ADD#
The byte value 0x00 and 0x10 ... 0xFF encodes an ADD# with the size of one byte. Values
with 0x01 ... 0x0F encode two byte ADD#.
ADD-num32
The byte value 0x10 encodes a 32-bit integer number.
ADD-string
ADD-string encodes a text string. The byte value 0x12 encodes a string where the first byte
(count-byte) is the length of the string (0 to 255), not including the count byte. Subsequent
bytes are the bytes of the string.
53
ADD-offset
Add-offset encodes an 8-bit signed (two’s complement) offset or a 16-bit signed (two’s comple-
ment offset). An ADD-offset specifies the number of bytes in the ADD program between two
corresponding components of a control flow construct.
7.6.3 Control Structures
A conditional or looping control transfer is represented by a pair of ADD byte-code functions.

The ADD-offset is calculated as the number of ADD byte-codes from the first byte of the offset
to the byte after the target of the control transfer. A positive offset corresponds to a transfer
of control in the forward direction, and a negative offset corresponds to the backward direction.
54
7.7 ADD to Linux I2C Binding

7.7.1 I2C Bus
Inter-Integrated Circuit (I2C) is a serial computer bus invented by Philips. It is used to con-
nect low-speed peripherals in an embedded system or motherboard. The original system was
created in the early 1980s as a battery control interface, but it was later used as a simple
internal bus system for building control electronics with various Philips chips. I2C uses only
two bi-directional pins, clock and data, both running at +5V and pulled high with resistors.
The bus operates in several modes, the most common being the 100 kbit/s standard mode and
a 10 kbit/s low-speed mode. Clock frequencies down to zero are allowed. Buses of this type
became popular when engineers realized that much of the expense of an integrated circuit re-
sults from the size of the package and the number of pins. A large package has more pins, thus
more assembly steps when manufactured, more area on a printed circuit board, more weight,
and more connections to fail. All of those cost money to make, assemble and test, and can
increase operational expenses (fuel), or decrease convenience (weight is critical in cell-phones,
for example). A particular strength of I2C is that a microcontroller can control a network of
devices chips with just two general-purpose I/O pins and software. Over 1000 master and/or
slave devices (depending on the mode used) can co-exist on the same two line bus.
Although much slower than most bus systems, the low expense is excellent for peripherals
that have to exist, but need not to be fast. The bus is often used for built-in-tests, volume,
tone and color balance controls, low-speed analog-to-digital and digital-to-analog controllers,
real-time-clocks, small non-volatile memories (used to preserve user-settable options), control
of clock-generators (for computers that can vary their clock speeds) and integrated circuits
that combine a shift-register and power transistors. Chips can also be added or removed from
the bus while the system is running, which makes I2C ideal for environments requiring hot
swappable components. The basic bus has a seven-bit address space, allowing up to 112 nodes
on one bus (16 of the 128 addresses are reserved). In 1992 the first standardized version was
released, version 1.0. This added a new fast mode at 400 kbit/s and a ten-bit addressing mode
to support up to 1024 nodes. The version 2.0 from 1998 added high-speed mode at 3.4 Mbit/s,
while reducing the voltage and current requirements when run in that mode (thus saving power
as well as being faster). The latest version2.1 from 2001 is a minor cleanup of version 2.0. The
System Management Bus or SMBus is similar to the I2C bus, but with differences in clock
frequency range and voltage levels, and an optional extra interrupt-request wire.
55
7.7.2 I2C in Linux 2.6
I2C is commonly used in embedded systems so different components can communicate. For
example: PC motherboards use I2C to talk to different sensor chips. Those sensors typically
report back fan speeds, processor temperatures and a whole raft of system hardware informa-
tion. The protocol also is used in some RAM chips to report information about the DIMM
itself back to the operating system.
The I2C kernel code is splited into a number of logical components: I2C core1 , I2C adapter
driver, I2C algorithm drivers, and I2C chip drivers:
2 I2C Adapter Driver:

An I2C Adapter is implements the I2C bus driver. Each specific I2C adapter driver
depends on one I2C algorithm driver.
2 I2C Algorithm Driver:

An I2C Algorithm is used by the I2C adapter driver to talk to the I2C bus. Most I2C
adapter drivers define their own I2C algorithms and use them. For some classes of I2C
bus drivers, a number of I2C algorithms driver already have been written. Some adapter
driver needs a generic I2C bit shift algorithm.
2 I2C Chip Drivers:

An I2C chip driver controls the process of talking to an individual I2C device that lives
on an I2C bus. I2C chip devices usually monitor a number of different physical devices
on a motherboard, such as different fan speeds, temperature value and voltages.
7.7.3 Byte-Code to Linux I2C Binding
One main target of Agnostic Device Driver is to control I2C chip devices. The ADD virtual
machine borrows and uses functionality of the operating system, where it is running in. This
means, that ADD needs a binding from its own I2C functions to the I2C functions of the op-
erating system. In case of Linux, the best option was to implement the ADD virtual machine
as I2C chip driver. Such an I2C chip driver can have several clients, which controls and talks
to the I2C chip devices.
The i2c driver structure describes an I2C chip driver for the ADD virtual machine. This
structure is defined in the include/linux/i2c.h file. Only the following field are necessary
to create a working chip driver:
1
The I2C core component is not a part of this diploma thesis.
56
2 struct module *owner; — set to the value THIS MODULE that allows the proper module
reference counting.
2 char name[I2C NAME SIZE]; — set to a descriptive name of the I2C chip driver. This
value shows up in the sysfs file name created for every I2C chip device.
2 unsigned int flags; — set to the value I2C DF NOTIFY in order for the chip driver to
be notified of any new I2C devices loaded after this driver is loaded. This field probably
will go away soon, as almost all drivers set this field.
2 int (*attach adapter)(struct i2c adapter *); — called whenever a new I2C bus
driver is loaded in the system. This function is described in more detail below.
2 int (*detach client)(struct i2c client *); — called when the i2c client device
is to be removed from the system. More information about this function is provided
below.
The following code is from the I2C chip driver of the ADD virtual machine. It shows how the
struct i2c driver structure is set up:
1 struct i2c_driver add_driver = {
2 . owner = THIS_MODULE ,
3 . name = " add " ,
4 . flags = I2C_DF_NOTIFY ,
5 . attach_adapter = add_attach_adapter ,
6 . detach_client = add_detach_client ,
7 };
Listing 7.4: i2c driver structure used for the I2C Chip Driver
After the I2C chip driver is registered in init add init(void) by i2c add driver and the
i2c driver structure as parameter, the attach adapter function is called when an I2C bus
driver is loaded. This function checks normally if any I2C devices are on the I2C bus to
which the client driver wants to attach. Almost all I2C chip drivers call the core I2C function
i2c detect to determine this. The i2c detect function takes a function pointer to the chip
detection routine of the dependent chip driver, which is called if any responsible client is found.
It is not possible in the ADD virtual machine to use this function for the I2C device detection,
because this design was made for sensors and not for usage in a virtual machine. Instead, the
attach adapter function exports the i2c adapter structure to use it in the inner interpreter.
With the fact that the i2c detect function is not usable, the inner interpreter needs this
functionality. The ADD byte-code function i2c-ping realizes this functionality. The normal
way to do an I2C chip driver in ADD byte-code language is the following:
57
1. Use i2c-ping to detect if a responsible client with the address addr is attached to the
I2C bus.
2. Create a client with i2c-new which is attached to address addr. The function i2c-new
returns a client-handle. This client-handle should be stored in a variable to use it
twice.
3. Uses read or write functions (shown in Table 7.4) on the client-handle.
4. When the client is not longer necessary, allocated memory can freed with i2c-delete.
Function Stack Comment

i2c-new ( addr -- client-handle)
i2c-delete ( client-handle -- )
i2c-ping ( addr -- status )
i2c-b@ ( client-handle reg -- byte )
i2c-w@ ( client-handle reg -- word )
i2c-l@ ( client-handle reg -- long )
i2c-b! ( client-handle byte reg -- )
i2c-w! ( client-handle word reg -- )
i2c-l! ( client-handle long reg -- )
Table 7.4: I2C ADD Byte-Code Functions
7.8 Further Opportunities

At the moment, the concept of Agnostic Driver Drivers are in a prototype state. The virtual
machine implements control structures and defining words. It is possible to program I2C driver
without to leave out the comfort of high-level language facilities.
But this concept still has space for improvements or further opportunities. Application with
real-time or performance requirements can implemented in C or Assembler and integrated in
the inner interpreter for the vendor token-table functions. The only overhead is that some
cycles for fetching the byte-code and grabbing the corresponding function pointer from the
vendor table are necessary. The virtual machine executes only one program via getting the
byte-code as pointer. If the need exist that the virtual machines must be able to run threads,
some extra source code must added which saves the current instruction pointer and the pointer
of the currently used token-table. Also there is some need to save or not to overwrite the local
token-table, because this table differs in every program. If thread support is integrated, the
virtual machine will also need some options to control scheduling. It is possible that a byte-
code program can use properties (for the virtual machine) in the device tree. These properties
58
can control the scheduling. For example: These properties can control that the byte-code
program is only executed during start-up of the operating system to initialize or deactivate
hardware components. The properties can also tell the virtual machine that the byte-code
programs want to be executed every ten seconds with a high or low priority. Writing driver
could take some time, especially for temperature control algorithms. If the virtual machine is
not only placed in the kernel of the operating system but also in Open Firmware, a firmware
programmer can make use of ADD byte-code programs, too.
59
Chapter 8
Conclusions
At the time there are fighting several companies for pushing its firmware specification to a
level where it is taken as a pseudo standard and is in common usage. This step is logical,
because the firmware builds the interface between the hardware and the operating system.
The company which has the main control of a firmware standard can guide which hardware
and operating system is taken or how a motherboard layout looks. This is one reason why
Intel wants to see the Extensible Firmware Interface on almost every computer system.
A major problem of Open Firmware is that the working group sunk into hibernation. Sup-
plements for new hardware to extend Open Firmware doesn’t exist. Companies that uses
Open Firmware drives apart in case of implementation and strategy. Technologies like Agnos-
tic Device Driver can influence the direction, but it is also necessary to reactivate this Open
Firmware working group, too.
As shown in the last chapter, Agnostic Device Driver is a platform independent concept and
works well on nearly every operating system and hardware. It is fast and easy to understand.
Nevertheless, it is needful to have an open source firmware implementation which can used
by everybody, who is interested in. Such an open source firmware implementation should not
include complex software layes. The motto is: “keep it simple, small, and beautiful.” An open
source community wants to have a piece of software which takes the best out of the hardware.
In the future, the role of boot firmware will increase and it is surly interesting to see how
the things will work. Of course, I will continue to work with the PowerPC architecture, the
Linux/PPC64 Kernel, and certainly will keep an eye on Open Firmware.
60
Appendix A
Glossary
This glossary contains an alphabetical list of terms, phrases, and abbreviations used in this
diploma thesis.
ADD Agnostic Device Driver
ADLO Adhesive Loader
CFA Code Field Address
CHRP Common Hardware Reference Platform
CI Client Interface
DI Device Interface
EFI Extensible Firmware Interface
FPU Floating-Point Unit
GPIO General Purpose I/O
HTAB Hashed Page Table
I2C Inter-Integrated Circuit
IU Integer Unit
LSU Load Store Unit
NACA Node Address Communications Area
OF Open Firmware
PACA Processor Address Communication Area
61
Appendix A. Glossary
PowerPC Performance Optimization With Enhanced RISC PC
PReP PowerPC Reference Platform
RPA RISC Platform Architecture
RTAS Run-Time Abstraction Services
SLOF Slimline Open Firmware
SIMD Single Instruction Multiple Data
SMP Symmetric Multi-Processors
SPR Special Purpose Register
SPU Service Processing Unit
STAB Segment Table
UI User Interface
VPU Vector Processing Unit
62
Appendix B
ADD Byte-Code Functions
ADD# Function Stack Comment

0x000 end0
0x010 b(lit) ( -- n ) ( F: /32bit/ -- )
0x011 b(’) ( -- xt ) ( F: /ADD#/ -- )
0x012 b(") ( -- str len ) ( F: /ADD-string/ -- )
0x013 bbranch ( -- ) ( F: /off/ -- )
0x014 b?branch ( bool -- ) ( F: /off/ -- )
0x015 b(loop) ( -- ) ( F: /off/ -- )
0x016 b(+loop) ( n -- ) ( F: /off/ -- )
0x017 b(do) ( limit start -- ) ( F: /off/ -- )
0x018 b(?do) ( limit start -- ) ( F: /off/ -- )
0x019 i ( -- index ) ( R: sys -- sys )
0x01a j ( -- index ) ( R: sys -- sys )
0x01b b(leave) ( -- )
0x01c b(of) ( sel of-val -- sel | <nil> ) ( F: /off/ -- )
0x01d execute ( ... xt -- ??? )
0x01e + ( n1 n2 -- sum )
0x01f - ( n1 n2 -- diff )
0x020 * ( n1 n2 -- prod )
0x021 / ( n1 n2 -- quot )
0x022 mod ( n1 n2 -- rem )
0x023 and ( x1 x2 -- x3 )
0x024 or ( x1 x2 -- x3 )
0x025 xor ( x1 x2 -- x3 )
0x026 invert ( x1 -- x2 )
0x027 lshift ( x1 u -- x2 )
0x028 rshift ( x1 u -- x2 )
0x029 >>a ( x1 u -- x2 )
0x02a /mod ( n1 n2 -- rem quot )
0x02b u/mod ( u1 u2 -- urem uquot )
0x02c negate ( n1 -- n2 )
0x02d abs ( n -- u )
0x02e min ( n1 n2 -- n1|n2 )
63
Appendix B. ADD Byte-Code Functions
0x02f max ( n1 n2 -- n1|n2 )

0x030 >r ( x -- ) ( R: -- x )
0x031 r> ( -- x ) ( R: x -- )
0x032 r@ ( -- x ) ( R: x -- x )
0x033 exit ( -- ) ( R: sys -- )
0x034 0= ( n -- bool )
0x035 0<> ( n -- bool )
0x036 0< ( n -- bool )
0x037 0<= ( n -- bool )
0x038 0> ( n -- bool )
0x039 0>= ( n -- bool )
0x03a < ( n1 n2 -- bool )
0x03b > ( n1 n2 -- bool )
0x03c = ( x1 x2 -- bool )
0x03d <> ( x1 x2 -- bool )
0x03e u> ( u1 u2 -- bool )
0x03f u<= ( u1 u2 -- bool )
0x040 u< ( u1 u2 -- bool )
0x041 u>= ( u1 u2 -- bool )
0x042 >= ( n1 n2 -- bool )
0x043 <= ( n1 n2 -- bool )
0x044 between ( n min max -- bool )
0x045 within ( n min max -- bool )
0x046 drop ( x -- )
0x047 dup ( x -- x x )
0x048 over ( x1 x2 -- x1 x2 x1 )
0x049 swap ( x1 x2 -- x2 x1 )
0x04a rot ( x1 x2 x3 -- x2 x3 x1 )
0x04b -rot ( x1 x2 x3 -- x3 x1 x2 )
0x04c tuck ( x1 x2 -- x2 x1 x2 )
0x04d nip ( x1 x2 -- x2 )
0x04e pick ( xu ... x1 x0 u -- xu ... x1 x0 xu )
0x04f roll ( xu ... x1 x0 u -- xu-1 .. x1 x0 xu )
0x050 ?dup ( x -- 0 | x x )
0x051 depth ( -- u )
0x052 2drop ( x1 x2 -- )
0x053 2dup ( x1 x2 -- x1 x2 x1 x2 )
0x054 2over ( x1 x2 x3 x4 -- x1 x2 x3 x4 x1 x2 )
0x055 2swap ( x1 x2 x3 x4 -- x3 x4 x1 x2 )
0x056 2rot ( x1 x2 x3 x4 x5 x6 -- x3 x4 x5 x6 x1 x2 )
0x057 2/ ( x1 -- x2 )
0x058 u2/ ( x1 -- x2 )
0x059 2* ( x1 -- x2 )
0x05a /c ( -- 1 )
0x05b /w ( -- 2 )
0x05c /l ( -- 4 )
0x05d /n ( -- n )
0x05e ca+ ( a1 index -- a2 )
64
0x05f wa+ ( a1 index -- a2 )

0x060 la+ ( a1 index -- a2 )
0x061 na+ ( a1 index -- a2 )
0x062 char+ ( a1 -- a2 )
0x063 wa1+ ( a1 index -- a2 )
0x064 la1+ ( a1 index -- a2 )
0x065 cell+ ( a1 -- a2 )
0x066 chars ( n1 -- n2 )
0x067 /w* ( n1 -- n2 )
0x068 /l* ( n1 -- n2 )
0x069 cells ( n1 -- n2 )
0x06a on ( a -- )
0x06b off ( a -- )
0x06c +! ( n a -- )
0x06d @ ( a -- x )
0x06e l@ ( a -- q )
0x06f w@ ( a -- w )
0x070 <w@ ( a -- n )
0x071 c@ ( a -- b )
0x072 ! ( x a -- )
0x073 l! ( q a -- )
0x074 w! ( w a -- )
0x075 c! ( b a -- )
0x076 2@ ( a -- x1 x2 )
0x077 2! ( x1 x2 a -- )
0x078 move ( src dst len -- )
0x079 fill ( a len b -- )
0x07a comp ( a1 a2 len -- bool )
0x07b noop ( -- )
0x07c lwsplit ( q -- w.lo w.hi )
0x07d wljoin ( w.lo w.hi -- q )
0x07e lbsplit ( q -- b.lo b2 b3 b.hi )
0x07f bljoin ( b.lo b2 b3 b.hi -- q )
0x080 wbflip ( w1 -- w2 )
0x081 upc ( c1 -- c2 )
0x082 lcc ( c1 -- c2 )
0x083 pack ( str len a -- pstr )
0x084 count ( pstr -- str len )
0x085 body> ( a -- xt )
0x086 >body ( xt -- a )
0x087 add-revision ( -- n )
0x088 span ( -- a )
0x089 unloop ( -- ) ( R: sys -- )
0x08a expect ( a len -- )
0x08b alloc-mem ( len -- a )
0x08c free-mem ( a len -- )
0x08d key? ( -- bool )
0x08e key ( -- char )
65
0x08f emit ( char -- )

0x090 type ( str len -- )
0x091 (cr ( -- )
0x092 cr ( -- )
0x093 #out ( -- a )
0x094 #line ( -- a )
0x095 hold ( char -- )
0x096 <# ( -- )
0x097 u#> ( u -- str len )
0x098 sign ( n -- )
0x099 u# ( u1 -- u2 )
0x09a u#s ( u -- )
0x09b u. ( u -- )
0x09c u.r ( u size -- )
0x09d . ( n -- )
0x09e .r ( n size -- )
0x09f .s ( ... -- ... )
0x0a0 base ( -- a )
0x0a2 $number ( a len -- true | n false )
0x0a3 digit ( c base -- digit true | c false )
0x0a4 -1 ( -- -1 )
0x0a5 0 ( -- 0 )
0x0a6 1 ( -- 1 )
0x0a7 2 ( -- 2 )
0x0a8 3 ( -- 3 )
0x0a9 bl ( -- 0x20 )
0x0aa bs ( -- 0x08 )
0x0ab bell ( -- 0x07 )
0x0ac bounds ( n cnt -- n+cnt n )
0x0ad here ( -- a )
0x0ae aligned ( n -- n|a )
0x0af wbsplit ( w -- b.lo b.hi )
0x0b0 bwjoin ( b.lo b.hi -- w )
0x0b1 b(<mark) ( -- )
0x0b2 b(>resolve) ( -- )
0x0b5 new-token ( F: /ADD#/ -- )
0x0b6 named-token ( F: ADD-string ADD#/ -- )
0x0b7 b(:) ( -- ) ( E: ... -- ??? )
0x0b8 b(value) ( x -- ) ( E: -- x )
0x0b9 b(variable) ( -- ) ( E: -- a )
0x0ba b(constant) ( n -- ) ( E: -- n )
0x0bb b(create) ( -- ) ( E: -- a )
0x0bc b(defer) ( -- ) ( E: ... -- ??? )
0x0bd b(buffer:) ( size -- ) ( E: -- a )
0x0be b(field) ( offset size -- offset+size) ( E: a -- a+offset )
0x0c0 instance ( -- )
0x0c2 b(;) ( -- )
0x0c3 b(to) ( params -- ) ( F: /ADD#/ -- )
66
0x0c4 b(case) ( sel -- sel )

0x0c5 b(endcase) ( sel | <nil> -- )
0x0c6 b(endof) ( -- ) ( F: /off/ -- )
0x0c7 # ( ud1 -- ud2 )
0x0c8 #s ( ud -- 0 0 )
0x0c9 #> ( ud -- str len )
0x0ca external-token ( F: /ADD-string ADD#/ -- )
0x0cb $find ( str len -- xt true | str len false )
0x0cc offset16 ( -- )
0x0cd evaluate ( ... str len -- ??? )
0x0d0 c, ( b -- )
0x0d1 w, ( w -- )
0x0d2 l, ( q -- )
0x0d3 , ( x -- )
0x0d4 um* ( u1 u2 -- d.prod )
0x0d5 um/mod ( ud u -- urem uquot )
0x0d8 d+ ( d1 d2 -- d.sum )
0x0d9 d- ( d1 d2 -- d.diff )
0x0da get-token ( ADD# -- xt imm? )
0x0db set-token ( xt imm? ADD# -- )
0x0dc state ( -- a )
0x0dd compile, ( xt -- )
0x0de behavior ( defer-xt -- contents-xt )
0x0f0 start0 ( -- )
0x0f1 start1 ( -- )
0x0f2 start2 ( -- )
0x0f3 start4 ( -- )
0x600 i2c-new ( addr -- client-handle)
0x601 i2c-delete ( client-handle -- )
0x602 i2c-ping ( addr -- status )
0x603 i2c-b@ ( client-handle reg -- byte )
0x604 i2c-w@ ( client-handle reg -- word )
0x605 i2c-l@ ( client-handle reg -- long )
0x606 i2c-b! ( client-handle byte reg -- )
0x607 i2c-w! ( client-handle word reg -- )
0x608 i2c-l! ( client-handle long reg -- )
67
Bibliography
[1] Adam Agnew, Adam Sulmicki, Ronald Minnich, and William Arbaugh. Flexibility in
ROM: A Stackable Open Source BIOS. In Proceedings of the FREENIX Track: 2003
USENIX Annual Technical Conference, pages 115–124, 2003.
[2] Inc. Apple Computer. Technical Note 1061 – Fundamentals of Open Firmware, Part I:
The User Interface. Apple Developer Documentation, 2004.
[3] Inc. Apple Computer. Technical Note 1062 – Fundamentals of Open Firmware, Part II:
The Device Tree. Apple Developer Documentation, 2004.
[4] Edward K. Conklin and Elizabeth D. Rather. Forth Programmer’s Handbook. FORTH,
Inc., August 2000.
[5] M. Anton Ertl. Threaded Code Variations and Optimizations. In EuroForth 2001 Con-
ference Proceedings, pages 49–55, 2001.
[6] IBM Corporation. PowerPC Architecture Book, Book I: PowerPC User Instruction Set
Architecture, September 2003.
[7] IBM Corporation. PowerPC Architecture Book, Book II: PowerPC Virtual Environment
Architecture, September 2003.
[8] IBM Corporation. PowerPC Architecture Book, Book III: PowerPC Operating Environ-
ment Architecture, September 2003.
[9] IBM Corporation. PowerPC Microprocessor Family: Programming Environments Manual

for 64 and 32-Bit Microprocessors, June 2003.
[10] IBM Corporation. pSeries RISC Platform Architecture, August 2003.
[11] IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,
1994.
[12] IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (Initialization Con-
figuration) Firmware, Auf. 1998.
68
Bibliography
[13] Intel Corporation. Extensible Firmware Interface Specification, December 2002.
[14] Elizabeth D. Rather, Donald R. Colburn, and Charles H. Moore. The Evolution of Forth.
SIGPLAN Not., pages 177–199, 1993.
[15] Jon Stokes. A Brief Look at the IBM PowerPC 970. Ars Technica!, October 2002.
[16] Jon Stokes. Inside the IBM PowerPC 970, Part I: Design Philosophy and Front End. Ars
Technica!, October 2002.
[17] Jon Stokes. Inside the IBM PowerPC 970, Part II: The Execution Core. Ars Technica!,
May 2003.
[18] Antony Stone. The LinuxBIOS project: Putting Linux on your motherboard. Linux
Magazine, pages 76–80, March 2003.
[19] Sun Microsystems, Inc. Writing FCode 3.x Programs, February 2000.
69

Concept, Design, and Implementation of A Slimline Boot Firmware For Linux On Power Architecture

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Concept, Design, and Implementation of A Slimline Boot Firmware For Linux On Power Architecture

Încărcat de

Drepturi de autor:

Formate disponibile

— Diploma Thesis —

Concept, Design, and Implementation of a Slimline Boot

Heiko Joerg Schick

Heiko Joerg Schick

Dr. rer. nat. Otto Wohlmuth

Prof. Dr. Martin Rieger

Albstadt, August 25, 2004 Heiko Joerg Schick

4 Programming Language “Forth” 15

5 Linux/PPC64 Boot Procedure 22

6 Slimline Prototype Firmware 27

7 Agnostic Device Drivers 42

B ADD Byte-Code Functions 63

2.1 Open Firmware Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.1 Structure of a Dictionary Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1 Linux/PPC64 Boot Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.1 Low-Level Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.1 ADD – How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Physical Memory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7.1 Comparsion: Run-Time Abstraction Services and Platform Expert . . . . . . . 43

The benefits are obvious:

2 Flexible interfaces during boot-time and run-time.

2 Extended debug facilities in case of soft– and hardware problems.

2 Small firmware layers without overdesigned functionality.

2 High portable boot firmware which runs on almost every hardware.

2 Hardware drivers which permits different packaging models.

2 Software which can be customized in case of performance and real-time requirements.

Many manufacturers are geared to Open Firmware. Open Firmware is a hardware-independent

In chapter 7, a new hardware-abstraction mechanism and implementation are introduced.

2.1 Open Firmware

Figure 2.1: Open Firmware Structure

It is designed to have exceptional performance for compute-intensive applications and high

2.2.2 IBM PowerPC 970

2.2.3 Miscellaneous Devices

AMD-8111 HyperTransport I/O Hub

AMD-8131 HyperTransport PCI-X Tunnel

3.1 Open Firmware

3.2 Common Hardware Reference Platform

3.3 RISC Platform Architecture

Figure 3.3: RISC Platform Architecture

3.4 Apple Firmware

Figure 3.4: Apple Firmware

Figure 3.5: LinuxBIOS

3.7 Extensible Firmware Interface

Figure 3.6: Extensible Firmware Interface

machine provides platform- and processor-independent mechanisms to achieve a high level of

Programming Language “Forth”

4.1 Forth Introduction

4.2 Elements of Forth

The basic form of such a colon definition is:

Data Types and Defining Words

CONSTANT <name> Defines a single-precision constant name whose value is x.

4.3 Implementation Strategies

2 Native Code Generation:

4.5 Forth Systems

variables, and is well documented. Gforth combines traditional implementation techniques

Portable Forth Environment

Ficl is a programming language interpreter designed to be embedded into other systems as a

4.6 Advantages and Range of Application

3. There is no substitute for an interactive interpreter in case of debugging and program

Linux/PPC64 Boot Procedure

5.1 Linux/PPC64 Overview