Sunteți pe pagina 1din 42

Crusoe Processor

Seminar Guide : Mr. Jayadas C K

Submitted By : Sarath Kumar T S 42/02

ACKNOWLEDGEMENT

First and foremost I would like to thank God almighty for guiding me throughout this seminar.

I would like to express my heartfelt thanks to respected Principal Prof. Jyothi John for allowing me to conduct the seminar.

I would also like to express my sincere gratitude to the Head Of Department, Electronics and Communication, Prof. T.K.Mani, and Seminar coordinator Mrs. Laila D for their never ending support.

I would also like to give a huge thanks to my Project guide Mr.Jayadas C.K for all the valuable tips without which this seminar wouldnt have seen daylight.

Last, but not the least, I would like to thank all my well wishers and friends for their moral support and co-operation.

ABSTRACT

The ever-changing technology market continues to drive the need for ever more compact designs. This in turn drives the need for optimized system components that offer high performance but are smaller in size, consume less power for longer battery life, and run cooler without the need for fans.

Transmeta's Crusoe processor enables a new class of low power computing and x86 software compatibility. The Crusoe processor consists of a hardware engine based on VLIW architecture, which is surrounded by a software layer, which morphs the x86 codes to the VLIW codes.

The software part of the processor saves millions of transistors thereby reduces the power required and the die size. This low power requirement makes this processor a delight for the mobile computers, increasing the backup time along with the retained performance.

INDEX
Introduction 1 Technology Perspective 1.1 Processor Fundamentals 1.2 Code Morphing Software 1.3 Hardware-Software Line 1.4 Decoding and Scheduling 1.5 Caching 1.6 Filtering 1.7 Prediction and Path Selection 1.8 Long Run Power Management 2 Processor Details 2.1 Crusoe Processor Model TM5800 2.2 Crusoe Processor Model TM5900 3 Applications 3.1 Ultra Low Power Crusoe CPU Board 3.2 Mobile Internet Computing Market 3.3 Module embeds latest Crusoe processor Conclusion References 01 04 05 07 09 10 11 12 13 14 16 16 25 32 32 34 35 37 38

INTRODUCTION

The world is about to be introduced to a new kind of processor. One whose CPU core has very different instruction set from that of the application binaries it is capable of executing. Two examples of this are Intels Itanium family (based on HP/Intel IA-64 architecture) and Transmetas Crusoe. Both Itanium and Crusoe promise the leading edge performance on legacy applications (PA-RISC apps in the case of Itanium, and x86 apps in case of Crusoe). Yet, neither has a PA-RISC/x86 execution engine as its core. Both processors will rely on a combination of dynamic translation software and hardware assist to convert a PA-RISC/x86 instruction stream to their cores native instruction stream on the fly.

Transmetas approach is much more ambitious: the Crusoe CPU cores native instruction set architecture is completely hidden from the application software, making the dynamic software layer appear as an inherent part of microprocessor itself. These are early steps towards a trend that I believe we will see more in the future: the breakdown of the notion that an executable is an immutable object. The past few decades of advances in the compliers, operating systems and architectures have been around an implied assumption that the bits are the bits that are executed by the machine. For example, in most systems the text segment of a binary image is write protected by default, and dynamic code modification generally involves a very high cost (such as an expensive I-

cache flush). Even the legal system is unprepared for the impending change: software copyright law explicitly forbids disassembly and modification of binary bits.

In the beginning of this century, Transmeta Corporation introduces the Crusoe Processors, an x86-compatible family of solutions that combines strong performance with remarkably low power consumption. As might be expected, a new technology for designing and implementing microprocessors underlies the development of these products. As might not be expected, the new technology is fundamentally softwarebased: the power savings come from replacing large number of transistors with software.

The Crusoe processor solutions consist of a hardware engine logically surrounded by a software layer. The engine is a very long instruction word (VLIW) CPU capable of executing up to four operations in each clock cycle. The VLIWs native instruction set bears no resemblance to the x86 instruction set; it has been designed purely for fast low power implementation using conventional CMOS fabrication. The surrounding software layer gives x86 programs the impression that they are running on the x86 hardware. The software is called Code Morphing software because it dynamically morphs x86 instructions into VLIW instructions. The code morphing software includes a number of advanced features to achieve good system level performance. Code Morphing support facilities are also built into the underlying CPUs. In other words, the transmeta designers have judiciously rendered some functions in hardware and some in software, according to the product design goals and constraints. Different goals and constraints in future products may results in different hardware-software partitioning.

Tranmetas Code Morphing technology changes the entire approach to designing microprocessors. By demonstrating that practical microprocessors can be implemented as hardware-software hybrids, Transmeta has dramatically expanded the design space that microprocessors designers explore for optimum solutions. Microprocessor development teams may now enlist software experts and expertise, working largely in parallel with hardware engineers to bring products to market faster. Upgrades to the software portion of a microprocessor can be rolled out independently from the chip. Finally, decoupling the hardware design from the system and application software that use it frees hardware designers to evolve and eventually replace their designs without perturbing legacy software.

TECHNOLOGY PERSPECTIVE

The Transmeta designers have decoupled the x86 instruction set architecture (ISA) from the underlying processor hardware, which allow the hardware to be very different from a conventional x86 implementation. For the same reason, the underlying hardware can be changed radically without affecting legacy x86 software : each new CPU design only requires a new version of code morphing software to translate x86 instructions to the new CPUs native instruction set. For the initial Transmeta products, models TM3120 and TM5400, the hardware designer opted for minimal space and power. By eliminating roughly three quarters of the logic transistors that would be required for an all-hardware design of similar performance, the designers have likewise reduced the power requirements and die size. However, future hardware designs can emphasize different factors and accordingly use different implementation techniques.

Finally, the code morphing software itself offers opportunities to improve performance without altering the underlying hardware. The current system is a firstgeneration embodiment of a new technology that can be further optimized with experience and experimentation. Because the code morphing software would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processors in the field.

Processor Fundamentals

With the code morphing software handling x86 compatibility, Transmeta hardware designers created a very simple, high performance, VLIW engine with two integer units, a floating-point unit, a memory (load/store) unit, and a branch unit. A Crusoe processor long instruction word, called a molecule, can be 64 bits or 128 bits long and contain up to four RISC-like instructions called atoms. All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units; this greatly simplifies the decode and dispatch hardware. Figure 1.1 shows a sample 128-bit molecule and the straightforward mapping from atom slots to functional units. Molecules are executed in order, so there is no complex out-of order hardware. To keep the processor running at full speed, molecules are packed as fully as possible with atoms. In a later section, we describe how the code morphing software accomplishes this.

128-bit molecule FADD ADD LD BRCC

Floating Point

Load/Store

Integer ALU Branch Unit unit Unit Figure 1.1 A molecule can contain up to four atoms, which are executed in parallel

The integer register file has 64 registers, %r0 through %r63. By convention, the code morphing software allocates some of these to hold x86 state while others contain state internal to the system, or can be used as temporary registers, e.g., for register renaming in software. In the assembly code example in this paper, we write one molecule per line, with atoms separated by semicolons. The destination register of an atom is specified first; a .c opcode suffix designates an operation that sets the condition codes. Where a register holds x86 state, we use the x86 name for the register (e.g., %eax instead of the less descriptive %r0).

x86 instructions Micro-ops Super scalar Decode Units Translate Units Dispatch Unit Functional Units In Order Retire Unit

Figure 1.2 Conventional super scalar out of order CPUs use hardware to create and dispatch micro-ops that can execute in parallel.

Super scalar out-of-order x86 processors such as the Pentium II and Pentium III processors; also have multiple functional units that can execute RISC like operations (micro-ops) in parallel. Figure 1.2 depicts the hardware these designs use to translate x86 instructions into micro-ops and schedule (dispatch) the micro-ops to make the best use of the functional units. Since the dispatch unit reorders the micro-ops as required to keep the functional units busy, a separate piece of hardware, the in-order retire unit, is needed to

effectively reconstruct the order of the original x86 instructions, and ensure that they take effect in proper order. Clearly, this type of processor hardware is much more complex than the Crusoe processors simple VLIW engine.

Because the x86 instruction set is quite complex, the decoding and dispatching hardware requires large quantities of power-hungry logic transistors; the chip dissipates heat in rough proportion to their numbers. Table 1 compares the sizes of Intel mobile and Crusoe processor models.

Mobile PII Process On chip 1.1 cache On chip 1.2 cache Die Size .25 cm 32KB 0 130sqmm

Mobile PII .25m shrink 32KB 256KB 180sqmm

Mobile PIII .1Bm 32KB 256KB 106sqmm

TM 3120 .22m 96KB 0 77sqmm

TM 5400 .18m 128KB 256KB 73sqmm

Table 1.1 The code morphing software simplifies chip hardware.

Code Morphing Software

The code morphing software is fundamentally a dynamic translation system, a program that complies instructions for one instruction set architecture (in this case, the x86 target ISA) into instructions for another ISA (the VLIW host ISA). The code morphing software resides in a ROM and is the first program to start executing when the processor boots. The code morphing software supports ISA, and is the only thing x86

code sees; the only program written directly for the VLIW engine is the code morphing software itself. Figure1.3 shows the relationship between the x86 and the code morphing software, and Crusoe processor.

Figure 1.3 The code morphing software mediates between x86 software and Crusoe processor

Because the code morphing software insulates x86 programs - including a PCs BIOS and operating system from the hardware engines native instruction set, that native instruction set can be changed arbitrarily without affecting any x86 software at all. The only program that needs to be ported is the code morphing software itself, and Transmeta does that work for each architectural change.

Coincidentally, hiding the chips ISA behind a software layer also avoids a problem that in the past hampered the acceptance of VLIW machines. A VLIW exposes details of the processor pipeline to the compiler; hence any change to that pipeline would require all existing binaries to be recompiled to make them run on the new hardware. Note that even traditional x86 processors suffer from a related problem: while old applications will run correctly on a new processor, they usually need to be recompiled to take full advantage of new processor implementation. This is not a problem on Crusoe processors, since in effect, the code morphing software always transparently recompiles and optimizes the x86 code it is running.

The flexibility of the software translation approach comes at a price: the processor has to dedicate some of its cycles to running the code morphing software, cycles that a conventional x86 processor could use to execute application code. To deliver good practical system performance, Transmeta has carefully designed the code morphing software for maximum efficiency and low overhead.

Hardware-Software Line Virtualizing an x86 CPU is a challenging undertaking because of the complexity of the x86 architecture. Choosing which functions to implement in hardware and which in software is a major engineering challenge, involving issues such as cost and complexity, overall performance and power consumption. Clearly, there are many possible choices, influenced by market demands, or the latest hardware technologies available. For its

initial products, Transmeta has drawn the line between hardware and software so that software handles the complex task of decoding x86 instructions and generating explicitly parallel molecules, which the hardware executes using a very simple, high-speed, VLIW engine. A few unique hardware features, described later in this paper, were added to better support dynamic translation. The hardware-software line might be drawn differently for another kind of product, for example, a high-end server processor.

Decoding and Scheduling

Conventional x86 superscalar processors fetch x86 binary instructions from memory and decode them into micro-operations, which are then reordered by out-oforder dispatch hardware and fed to the functional units for parallel execution. In contrast (besides being a software rather than a hardware solution), Code Morphing can translate an entire group of x86 instructions at once, creating a translation, whereas a superscalar x86 translates single instructions in isolation. Moreover, while a traditional x86 translates each x86 instruction every time it is executed, Transmetas software translates instructions once, saving the resulting translation in a translation cache. The next time the (now translated) x86 code is executed, the system skips the translation step and directly executes the existing optimized translation.

Implementing the translation step in software as opposed to hardware opens up new opportunities and challenges. Since an out-of-order processor has to translate and schedule instructions every time they execute, it must do so very quickly. This seriously

limits the kinds of transformations it can perform. The Code Morphing approach, on the other hand, can amortize the cost of translation over many executions, allowing it to use much more sophisticated translation and scheduling algorithms. Likewise, the amount of power consumed for the translation process is amortized, as opposed to having to pay it on every execution. Finally, the translation software can optimize the generated code and potentially reduce the number of instructions executed in a translation. In other words, Code Morphing can speed up execution while at the same time reducing power!

Caching

The translation cache, along with the Code Morphing code, resides in a separate memory space that is inaccessible to x86 code. (For better performance, the Code Morphing software copies itself from ROM to DRAM at initialization time.) The size of this memory space can be set at boot time, or the operating system can make the size adjustable. As with all caching, the Code Morphing softwares technique of reusing translations takes advantage of locality of reference. Specifically, the translation system exploits the high repeat rates (the number of times a translated block is executed on average) seen in real-life applications. After a block has been translated once, repeated execution hits in the translation cache and the hardware can then execute the optimized translation at full speed. Some benchmark programs attempt to exercise a large set of features in a small amount of time, with little repetitiona pattern that differs significantly from normal usage.

The overhead of Code Morphing translation is obviously more evident in those benchmarks. Furthermore, as an application executes, Code Morphing learns more about the program and improves it so it will execute faster and faster. Todays benchmarks have not been written with a processor in mind that gets faster over time, and may charge Code Morphing for the learning phase without waiting for the payback. As a result, some benchmarks do not accurately predict the performance of Crusoe processors. On typical applications, due to their high repeat rates, Code Morphing has the opportunity to optimize execution and amortize any initial translation overhead. As an example, consider a multimedia application such as playing a DVDbefore the first video frame has been drawn, the DVD decoder will have been fully translated and optimized, incurring no further overhead during the playing time of the DVD.

Filtering

It is well known that in typical applications, a very small fraction of the applications code (often less than 10%, sometimes as little as 1%) accounts for more than 95% of execution time. Therefore, the translation system needs to choose carefully how much effort to spend on translating and optimizing a given piece of x86 code. Obviously, we want to lavish the optimizers full attention on the most frequently executed code but not waste it on code that executes only once. The Code Morphing software includes in its arsenal a wide choice of execution modes for x86 code, ranging from interpretation (which has no translation overhead at all, but executes x86 code more slowly), through translation using very simple-minded code generation, all the way to

highly optimized code (which takes longest to generate, but which runs fastest once translated). A sophisticated set of heuristics helps choose among these execution modes based on dynamic feedback information gathered during actual execution of the code.

Prediction and Path Selection

One of the many ways in which the Code Morphing software can gather feedback about the x86 program is to instrument translations: the translator adds code whose sole purpose is to collect information such as block execution frequencies, or branch history. This data can be used later to decide when and what to optimize and translate. For example, if a given conditional x86 branch is highly biased (e.g., usually taken), the system can likewise bias its optimizations to favor the most frequently taken path. Alternatively, for more balanced branches (taken as often as not, for example), the translator can decide to speculatively execute code from both paths and select the correct result later. Analogously, knowing how often a piece of x86 code is executed helps decide how much to try to optimize that code. It would be extremely difficult to make similar decisions in a traditional hardware-only x86 implementation.

Long Run Power Management

Although the Code Morphing softwares primary responsibility is ensuring x86 compatibility, it also provides interfaces to capabilities available only in Crusoe processor

models. Long Run power management is one examplea facility that can further minimize that processors already low power consumption. In a mobile setting, most conventional x86 CPUs regulate their power consumption by rapidly alternating between running the processor at full speed and (in effect) turning the processor off. Different performance levels can be obtained by varying the on/off ratio (the duty cycle). However, with this approach, the processor may be shut off just when a time-critical application needs it. The result may be glitches, such as dropped frames during movie playback, that are perceptible (and annoying) to a user.

In contrast, these processors can adjust its power consumption without turning itself offinstead, it can adjust its clock frequency on the fly. It does so extremely quickly, and without requiring an operating system reboot or having to go through a slow sequence of suspending to and restarting from RAM. As a result, software can continuously monitor the demands on the processor and dynamically pick just the right clock speed (and hence power consumption) needed to run the applicationno more and no less. Since the switching happens so quickly, it is not noticeable to the user. Finally, the Code Morphing software can also adjust the Crusoe processors voltage on the fly (since at a lower operating frequency, a lower voltage can be used). Because power varies linearly with clock speed and by the square of the voltage, adjusting both can produce cubic reductions in power consumption whereas conventional CPUs can adjust power only linearly. For example, assume an application program only requires 90% of the processors speed. On a conventional processor, throttling back the processor speed by

10% cuts power by 10%, whereas under the same conditions, Long Run power management can reduce power by almost 30%a noticeable advantage!

Processor Details

2.1

Crusoe Processor Model TM5800

2.1.1 Features

=> VLIW processor and x86 Code Morphing software provide x86compatible mobile platform solution => Processors fabricated in latest 0.13 process technology operate up to 800 MHz at very low power levels =>Standard product speeds of 667, 700, 733, and 800 MHz => Integrated 64K-byte L1 instruction cache, 64K-byte L1 data cache, and 512K-byte L2 write-back cache => Integrated Northbridge core logic features facilitate compact system designs => DDR SDRAM memory controller with 100-133 MHz, 2.5V interface => SDRAM memory controller with 100-133 MHz, 3.3V interface => I bus controller (PCI 2.1 compliant) with 33 MHz, 3.3V interface =>LongRun advanced power management with ultra-low power operation extends mobile battery life => 4-1.0 W @ 367-800 MHz, 0.9-1.3V running typical multimedia applications = >50 mW min deep sleep => Full system management mode(SMM) support => compact 474 pin ceramic package fully pin compatible with existing all processors.

The Transmeta Crusoe processor is an ultra-low power, high-speed microprocessor based on an advanced VLIW core architecture. When used in conjunction with Transmetas x86

Code Morphing software, the Crusoe processor provides x86-compatible software execution using dynamic binary code translation, without requiring code recompilation. In addition to the VLIW core, the processor incorporates separate 64K-byte instruction and data caches, a large 512K-byte L2 write-back cache, 64-bit DDR SDRAM memory controller, 64-bit SDR SDRAM memory controller, and 32-bit PCI controller. These additional functional units, which are typically part of the core system logic that surrounds the microprocessor, allow the Crusoe processor to provide a highly integrated and cost effective platform solution for the x86 mobile market. The processor core operates from a 0.9-1.3V supply, resulting in extremely low power consumption, even at high operating frequencies. With power consumption during typical operation as low as 150 milliwatts, the Crusoe processor is the most energy efficient high-performance x86compatible mobile solution ever offered. Transmeta, Crusoe, Code Morphing, and Long Run are trademarks of Transmeta Corporation.

Architecture The Crusoe processor incorporates integer and floating-point execution units, separate instruction and data caches, a level-2 write-back cache, memory management unit, and multimedia instructions. In addition to these traditional processor features, the device integrates a DDR SDRAM memory controller, SDR SDRAM memory controller, PCI us controller and serial ROM interface controller. These additional units are usually part of the core system logic that surrounds the microprocessor. The VLIW processor, in combination with Code Morphing software and the additional system core logic units,

allow the Crusoe processor to provide a highly integrated, ultra-low power, high performance platform solution for the x86 mobile market. The Crusoe processor block diagram is shown in Figure 2.1

Figure 2.1

Processor Core

The Crusoe processor core architecture is relatively simple by conventional standards. It is based on a Very Long Instruction Word (VLIW) 128-bit instruction set. Within this VLIW architecture, the control logic of the processor is kept very simple and software is send to control the scheduling of instructions. This allows a simplified and very straightforward hardware implementation with an in-order 7-stage integer pipeline and 10-stage floating point pipeline. By streamlining the processor hardware and

reducing the control logic transistor count, the performance-to-power consumption ratio can be greatly improved over traditional x86 architectures. The Crusoe processor includes

a 64K-byte 8-way set-associative Level 1 (L1) instruction cache, and a 64K-byte 16-way set associative L1 data cache. The TM5800 model also includes an integrated 512K-byte Level 2 (L2) write-back cache for improved effective memory bandwidth and enhanced performance. This cache architecture assures maximum internal memory bandwidth for performance intensive mobile applications, while maintaining the same low-power implementation that provides a superior performance-to-power consumption ratio relative to previous x86 implementations. Other than having execution hardware for logical, arithmetic, shift, and floating point instructions, as in conventional processors, the Crusoe processor has very distinctive features from traditional x86 designs. To ease the translation process from x86 to the core VLIW instruction set, the hardware generates the same condition codes as conventional x86 processors and operates on the same 80-bit floating point numbers. Also, the Translation Look-aside Buffer (TLB) has the same protection bits and address mapping as x86 processors. The software component of this solution is used to emulate all other features of the x86 architecture. The software that converts x86 programs into the core VLIW instructions is called Code Morphing software. The combination of Code Morphing software and the VLIW core together act as an x86-compatible solution, as shown in Figure 2.2

Figure 2.2 The typical behavior of the Code Morphing software is to execute a loop which decodes and executes x86 instructions. The first few times a specific x86 code sequence is executed, Code Morphing interprets the code by decoding the instructions one byte at time and then dispatching execution to corresponding VLIW native instruction subroutines. Once the x86 code has been executed several times, Code Morphing translates the x86 instructions into highly optimized and extremely fast VLIW native instructions, executes the translated code, and caches the native instruction translations for future use. If the same x86 code is required to execute again, the high-performance cached translations are executed immediately and no re-translation is required.

Integrated DDR SDRAM Memory Controller

The DDR SDRAM interface is the highest performance memory interface available on the Crusoe processor. The DDR SDRAM controller supports only Double Data Rate (DDR) SDRAM and transfers data at a rate that is twice the clock frequency of

the interface. The DDR SDRAM controller supports up to two banks, the equivalent of two Dual In-line Memory Modules (DIMMs), of DDR SDRAM using a 64-bit wide interface. The DDR SDRAM memory can be populated with 64M-bit, 128M-bit, or 256M-bit devices. For highest performance, it is recommended that the DDR SDRAM devices be soldered to the motherboard rather than incorporated on DIMMs. Also, to reduce signal loading, only x8 or x16 devices should be used. The frequency setting for the DDR SDRAM interface is initialized during the power-on boot sequence. Although the processor supports a DDR interface frequency in the range of 1/2 to 1/15 of the core frequency, the recommended interface frequency is between 100 and 133 MHz.

Integrated SDR SDRAM Memory Controller

The SDR SDRAM memory controller supports up to four banks, equivalent to two Small Outline Dual In-line Memory Modules (SO-DIMMS), of Single Data Rate (SDR) SDRAM that can be configured as 64-bit SO-DIMMs. These SO-DIMMs can be populated with 64M-bit, 128M-bit or 256M-bit devices. All SO-DIMMs must use the same frequency SDRAMs, but there are no restrictions on mixing different SO-DIMM configurations into each SO-DIMM slot. The frequency setting for the SDR SDRAM interface is initialized during the power-on boot sequence. Although the processor supports an SDR interface frequency in the range of 1/2 to 1/15 of the core frequency, the recommended interface frequency is between 100 and 133 MHz. It is also recommended that a maximum of 8 devices per SO-DIMM be used in order to operate at the required frequency with the proper signal integrity.

Integrated PCI Controller

The Crusoe processor includes a PCI bus controller that is PCI 2.1 compliant. The PCI bus is 32 bits wide, operates at 33 MHz, and is compatible with 3.3V signal levels. It is not 5V tolerant, however. The PCI controller on provides a PCI host bridge, the PCI bus arbiter, and a DMA controller. The PCI bus can sustain 132 Mbytes/sec bursts for reads and writes on 4K-byte blocks. The PCI controller snoops ahead on PCI-to-DRAM reads and writes. The 16-Dword CPU-to-PCI write buffer converts sequential memory mapped I/O writes to PCI bursts. The DMA controller handles PCI-to-DRAM reads and writes. The 16-Dword PCI-to- DRAM write buffer converts one 16-Dword burst to eight separate address/data pairs. The 16-Dword DRAM-to-PCI read ahead buffer permits continuation of read ahead activity after hitting in the buffer. The PCI controller tri-states the PCI bus when hot docking.

Serial ROM Interface

The Crusoe processor serial ROM interface is a five-pin interface used to read data from a serial flash ROM. The flash ROM is 1M-byte in size and provides nonvolatile storage for the Code Morphing software. During the boot process, the Code Morphing code is copied from the ROM to the Code Morphing memory space in SDRAM. Once transferred, the Code Morphing code requires 16M-bytes of memory space. The portion of SDRAM space reserved for Code Morphing software is not visible

to x86 code. Transmeta supplies programming information for the flash ROM device. This interface may also be used for in-system reprogramming of the flash ROM.

Software Compatibility

When used in conjunction with Transmetas x86 Code Morphing software, the Crusoe processor provides x86-compatible software execution without requiring code recompilation. Systems based on this solution are capable of executing all standard x86compatible operating systems and applications, including Microsoft Windows 9x, Windows ME, Windows 2000, and Linux.

Operating Power and Power Management

The Crusoe processor operates from a 0.9-1.3V core voltage supply at extremely low power levels, even while the device is operating at very high performance. The TM5800 model incorporates Long Run adaptive power management technology. Long Run power management dynamically reduces the core CPU power consumption to nearoptimal levels in response to processor work load requirements. Long Run achieves this dynamic power reduction by varying the CPU clock and core power supply voltage in response to adaptive power management protocols that monitor processor load demands and control processor power and performance levels. The Long Run power management approach is particularly effective in applications that run predominantly in the normal (active) power state, as described below. Additionally, the Crusoe processor supports

ACPI-compliant power management modes by incorporating five distinct power states: Normal, Auto Halt, Quick Start, Deep Sleep and Off. These power states may be used to reduce the operating power of the processor during system states that require little or no CPU activity. Table 1 lists the recommended state of the processor for each of the ACPI global system states.

Crusoe Processor Model TM5900

Features

=> VLIW processor and x86 Code Morphing software provides x86compatible mobile platform solution => Core operating frequencies up to 1000 MHz => Integrated 64 KByte L1 instruction and data caches, and 256 KByte (TM5700) or 512 KByte (TM5900) L2 write-back cache => Integrated Northbridge core logic features facilitate compact system designs - DDR SDRAM memory controller with 83-133 MHz, 2.5 V interface - PCI bus controller (PCI 2.1 compliant) with 33 MHz, 3.3 V interface => Long Run advanced power management with ultra-low power operation extends battery life - < 1 W running typical multimedia applications - < 150 mW typical in Deep Sleep => Long Run thermal management (Cool Run) dynamically adapts to system thermal environment => Power management controls for ACPI-compliant modes => Full System Management Mode (SMM) support => Compact (21 mm x 21 mm) 399-contact flip-chip organic ball-grid array (FCOBGA) package

The processor core operates from a 0.80-1.40 V supply, resulting in extremely low power consumption even at high operating frequencies. The processor typically consumes below 1 Watt under normal operating conditions. When operating in Deep Sleep, power consumption typically drops below 150 mW.

2.2.2

Architecture

The Transmeta Crusoe processor model TM5900 is an ultra-low power, highspeed microprocessor based on an advanced VLIW core architecture. When used in conjunction with Transmetas x86 Code Morphing software, the TM5900 processor provides x86-compatible software execution using dynamic binary code translation, without requiring code recompilation. In addition to the VLIW core, the processor incorporates separate 64 Kbytes L1 instruction and data caches, a large L2 write-back cache (256 Kbytes on TM5700, 512 Kbytes on TM5900), a 64-bit DDR SDRAM memory controller, and a 32-bit PCI controller. These additional functional units, which are typically part of the chipset system logic that surrounds the microprocessor, allow the TM5900 processor to provide a highly-integrated, cost-effective solution for x86 platforms requiring superior energy efficiency, low power consumption, and low thermal generation. Figure 2.3 Block diagram of the Crusoe TM5900 processor

Figure 2.3

2.1.1

Processor Core

The TM5900 processor core is based on a Very Long Instruction Word (VLIW) instruction set of 64 or 128 bits. Within this VLIW architecture, the control logic of the processor is kept very simple, and software is used to control the scheduling of instructions. This allows a simplified and very straightforward hardware implementation with an in-order 7-stage integer pipeline and a 10-stage floating point pipeline. By streamlining the processor hardware and reducing control logic transistor count, the performance-to-power consumption ratio (energy efficiency) can be greatly improved over traditional x86 architectures. In addition to having the execution hardware for logical, arithmetic, shift, and floating point instructions, as in conventional processors, the TM5900 processor uses a combination of software and hardware to offer full x86

compatibility. The processor hardware generates the same condition codes as conventional x86 processors and operates on the same 80-bit floating point numbers. Also, the translation look-aside buffer (TLB) has the same protection bits and address mapping as x86 processors. The software component of this processor solution is used to emulate all other features of the x86 architecture. The software that converts x86 programs into the core VLIW instructions is called Code Morphing software. The combination of Code Morphing software and the VLIW core together act as an x86compatible processor solution, as shown in Processor Core The typical behavior of Code Morphing software is to execute a loop, which decodes and executes x86 instructions. The first few times a specific x86 code sequence is executed, Code Morphing software interprets the code by decoding the instructions one at a time and then dispatching execution to corresponding VLIW native instruction subroutines. Once the x86 code has been executed several times, Code Morphing software translates the x86 instructions into highly optimized and extremely fast native VLIW instructions, executes the translated code, and caches the native instruction translations for future use. If the same x86 code is required to execute again, the high-performance cached translations are executed immediately and no re-translation is required.

Figure 2.4

2.2.4

DDR SDRAM Power Saving Modes

In addition to the power management states defined in previous sections, TM5900 processors provide additional power saving modes for DDR SDRAM. These power saving modes can be enabled during normal operation by programming processorspecific PSR configuration registers (CD_MISC). In these power saving modes, the clock enable lines to the SDRAM are active only when the SDRAM is being accessed or refreshed. This decreases power dissipation, but increases the latency for a memory cycle by one SDRAM clock. Refer to the TM5700/TM5900 BIOS Programmers Guide and the TM5900 Development and Manufacturing Guide for power saving mode programming information

DDR Memory Interface

TM5900 processors include an integrated high performance DDR (double datarate) SDRAM controller and interface. The DDR controller supports only DDR SDRAM and transfers data at a rate that is twice the clock frequency of the interface. The DDR SDRAM controller supports the equivalent of two DIMMs (up to four ranks) of DDR SDRAM using a 64-bit wide interface. The DDR SDRAM interface does not support parity bits. Systems based on TM5900 processors require careful consideration of the memory subsystem design. There are unique characteristics of the TM5900 memory controller that place specific constraints on the allowable system memory configurations, as described in the sections below. The design guidelines provided below allow system designs that provide robust memory configurations that support a variety of memory upgrade scenarios.

Code Morphing Software Memory

Code morphing software uses a portion (typically 16-24 Mbytes) of the installed system memory for its working area and to store code translations. This portion of memory is completely isolated from the operating system-accessible (x86) memory area. Code morphing software memory is statically configured in the OEM configuration table.

PCI Interface The TM5900 processor PCI bus is revision 2.1 compliant. The PCI bus is 32 bits wide, operates at 33 MHz and is compatible with 3.3 V levels (but is not 5 V tolerant). The PCI controller on the TM5900 provides a PCI host bridge, the central resource and a DMA controller. The TM5700/TM5900 PCI bus can sustain 132 Mbytes/sec bursts for reads and writes on 4 Kbytes blocks. The PCI controller snoops ahead on PCI-to-DRAM reads and writes. The 16 DWORD processor-to-PCI write buffer converts sequential memory mapped I/O writes to PCI bursts. The DMA controller handles PCI-to- DRAM reads and writes. The 16 DWORD PCI-to-DRAM write buffer converts one 16 DWORD burst to eight separate address/data pairs. The 16 DWORD DRAM-to-PCI read ahead buffer permits continuation of read ahead after hitting in the buffer. The PCI controller tri-states the PCI bus when hot docking.

Applications

3.1

Ultra Low Power Crusoe CPU Board

GESPAC's PCISYS-59 is a new Single Slot 3U CompactPCI CPU Board based on the Transmeta Crusoe (x86 compatible) microprocessor. This smart processor is a flexible and efficient hardware-software hybrid that replaces millions of power-hungry transistors with software and consumes 60 to 70 percent less power whilst running much cooler than competing chips. Ideally suited for systems requiring high performance and small, robust form factor, the PCISYS-59 is a perfect fit for embedded real time machine control, data and multimedia applications. The new processor board fills the existing gap between the PCISYS-58 and the PCISYS-PII/III in terms of overall performances while keeping the low power consumption and an excellent performance to price ratio. Key points of the design are the low power, passive heat sink operation capability and the incredible range of peripherals.

In addition to all of the standard PC functions, the PCISYS-59 offers many extended features including two Fast Ethernet controllers, two USB 1.1 host ports, one IEEE 1394 (Fire wire) channel, a high performance PCI graphic controller with 4 MB of fast VRAM, an analog monitor and DVI-I digital output (Panel Link), and finally SoundBlaster Pro compatible sound system. Flash Disk capability for high reliability data

storage up to 192+MB is also supported by using either the ATA-Disk-Chip or the Compact Flash technology. The board is equipped with 128 MB of DDR SDRAM running at 266 MHz. One SODIMM socket allows the installation of 256+ MB of additional SDRAM for optimum performance, while the board boots on 256 kB of Flash EPROM. The 10BaseT/100Base-Tx Ethernet interfaces are implemented using two Intel 82559ER controllers while the dual USB 1.1 is directly handled by the ALI M1535D PCI to ISA Bridge. All memories are directly driven by the TM5800 integrated North Bridge. An OHCIcompatible Fire wire controller adds plug-and play high data rate serial channel networking. Two mezzanine modules are available to bring connections to the external world. The first one referenced as PINSYS-59SND, supports the sound extension, the PS/2 keyboard and mouse connectors and a floppy header. The second module (PINSYS59COM) supports the serial communications ports and a 10Base-T / 100Base-TX Fast Ethernet port. The PCISYS-59's 4HP front panel includes status LEDs and standard connectors for dual USB, Fast Ethernet, Fire wire and DVI-I. The PCISYS-59 is fully compatible with EMC regulations and with the CompactPCI PICMGTM 2.0 R3.0 specifications. The board handles a 32-bit 33 MHz CompactPCI Bus Access and 3.3V/5V signaling. The PCISYS-59 is 100% PC compatible, allowing it to run popular operating systems such as DOS, Window9X, WindowsNT/2000, WindowsXP, and Linux, as well as embedded real time operating systems such as VXWorks and QNX.

3.2

Mobile Internet Computing Market The evolving class of Mobile Internet Computers includes a rich set of products

that spans from Web pads to ultra-light (less than four pound) Mobile PCs that share the common need of x86 software compatibility and long battery life. It represents a significant shift from today's mostly stationary laptops or incompatible handheld devices to a platform that offers greater mobility and access to the Web from most anywhere at anytime.

Ultra-light Mobile PCs operating with the Windows operating system and Microsoft Office applications can take advantage of the Crusoe processor's low power to increase the average user's productivity by operating on a single battery for up to a full work day. Crusoe-based Internet devices such as Web pads and mobile clients can take advantage of the Mobile Linux operating system to create a robust yet economical machine that can handle all the required Internet plug-in applications. Mobile Linux offers an additional advantage in that it is an operating system that can be stored in solid state Flash ROM thus removing the need for an expensive hard disk drive. "Cellular phones became more pervasive when they were made smaller and provided greater battery life," said Dave Ditzel, Transmeta's CEO. "We believe that Crusoe will bring about a change of similar magnitude in Mobile Internet Computers." Commenting on the current state of the mobile market, analyst Martin Reynolds of the market research firm Dataquest concurs with the need for a new mobile processor. "When people build mobile computers today, they use what's basically a desktop

processor in a different package," he said. "There's definitely room for a fresh approach." "Our customers are telling us that significant battery life improvement is the most requested feature by a margin of two to one. That's why Crusoe's low power is so important," said Transmeta's Jim Chapman, vice president of sales and marketing. "The current mobile market needs to evolve from today's heavier (six to ten pound) laptops to lighter weight, high performance mobile PCs. Crusoe will help propel that change." 3.3 Module embeds latest Crusoe processor The new EM05 embedded system module (ESM) is based on Transmeta's new ultra compact, highly efficient TM5900 Crusoe processor. The EM05 ESM was developed for use in cost-critical, deeply embedded industrial PCs and is ideally suited for harsh industrial environments. The EM05 is fully x86 compatible and runs either Windows or Linux-based software applications, such as infotainment applications in trains, planes, buses and other mobile areas. The EM05 ESM uses an 850MHz Transmeta Crusoe TM5900 processor with impressive performance. The TM5900 includes a fully integrated Northbridge, allowing for the implementation of a complete PC using a very small form factor at an attractive price point. The TM5900 processor used in conjunction with the EM05 offers an extremely energy efficient solution, which can support an extended industrial temperature range making it a great choice for a wide variety of embedded applications. Additionally, Transmeta offers an extended availability programme for the TM5900 for up to five years.

The new embedded system module from MEN is a compelling design ideal for use in rugged environments and is based on our new ultra-compact Crusoe TM5900 processor", said Chris Russell, European Operations Manager for Transmeta. "Mens embedded solutions closely mirror the features that have made Crusoe successful, a robust compact design and long-term availability combined with a great price/performance ratio". The EM05 comes with two fast Ethernet channels and two serial interfaces and comes with options for RS232, RS422 or RS485, which can be made available on the front of the board. The ESM comes with an SO-DIMM slot for up to 512Mbyte of the new 133MHz DDRAM and a Compact Flash slot for application software. The ESM graphics as well as other functions are made available through the EM05's J2 system connector to the carrier card. Software support on the EM05 includes full compatibility with Windows, Linux and VxWorks. Thanks to Altera's 250,000-gate Cyclone FPGA, the EM05 can be enhanced by a wide variety of application-specific I/O. The standard version of the EM05 starter kit includes additional functionality such as a TFT connection, USB and IDE interfaces, and floppy drive and keyboard/mouse implementation through the FPGA via connectors on the carrier card. The FPGAs can accommodate two additional CAN controllers, multiple UARTs and audio functions. All FPGA components can be used in VHDL and are internally linked using a standardized Wishbone bus. All physical interfaces for the FPGA functionality are available on a carrier card. The starter kit (including the EM05, carrier card, PSU and various cables/adapters) will cost Eur 1435 plus VAT.

Conclusion
Transmeta set out to expand the reach of microprocessors into new markets by dramatically changing the way microprocessors are designed. The initial market is mobile computing, in which complex power-hungry processors have forced users to give up either battery running time or performance. The Crusoe processor solutions have been designed for lightweight (two to four pound) mobile computers and Internet access devices such as handhelds and web pads. They can give these devices PC capabilities and unplugged running times of up to a day.

To design the Crusoe processor chips, the Transmeta engineers did not resort to exotic fabrication processes. Instead they rethought the fundamentals of microprocessor design. Rather than throwing hardware at design problems, they chose an innovative approach that employs a unique combination of hardware and software. Using software to decompose complex instructions into simple atoms and to schedule and optimize the atoms for parallel execution saves millions of logic transistors and cuts power consumption on the order of 6070% over conventional approacheswhile at the same time enabling aggressive code optimization techniques that are simply not feasible in traditional x86 implementations. Transmetas Code Morphing software and fast VLIW hardware, working together, achieve low power consumption without sacrificing high performance for real-world applications.

References

[1]

The Technology Behind Crusoe Processors, Alexander Klaiber , Transmeta Corporation,January 2000

[2]

Crusoe Processor Software Optimization Guide, Software Developer Bulletin, August 3, 2001

[3]

TM5900 Data Book, Tranmeta Corporation, February 4, 2004

[4]

www.tranmeta.com

[5]

ULTRA LOW POWER CRUSOE CPU Board, www.gespac.com

S-ar putea să vă placă și