Sunteți pe pagina 1din 59

INTRODUCTION TO DSP

INTRODUCTIONTO DSP
OBJECTIVES
This introduction is designed to answer several basic questions for the beginning student and familiarizes the
student experienced in digital signal processing with the design and architecture of Texas Instrument's
TMS320C54x device. The topics presented in this section include:
The definition of digital signal processing (DSP)
The benefits of digital signal processors
Practical applications or uses of digital signal processing
General DSP design and architecture
Specific DSP architecture
INTRODUCTION TO DSP
What I s DSP?
Anal og Comput er
a bit loud
Digi tal Computer
DSP
ADC
1010 1001
DAC
OUTPUT
LECTURE 1 1- 1
Definition of a Digital Signal Processor
A digital signal processor (DSP) is an integrated circuit designed for high-speed data manipulations, and is
used in audio, communications, image manipulation, and other data-acquisition and data-control
applications.
How Digital Signal Processing Works
To explain how digital signal processing works, you must understand the difference between analog and
digital signals. Analog signals, which include sound intensity, pressure, light intensity, etc., are
continuously variable. Each of our senses is sensitive to different kinds of analog signals. Our ears are
sensitive to sound, our eyes are sensitive to light, and so on. Once we receive a signal, our sensory organs
convert it to an electrical signal and send it to our analog computer (the brain). Our brains are very powerful
parallel computers whose performance currently is unmatched by any digital computer. Our brains not only
analyze the information received, but also make decisions using this data.
Digital signals are those that are transmitted within or between computers, in which information is
represented by discrete states for example, high voltages and low voltages rather than by continuously
variable levels in a continuous stream, as in an analog signal.
How Analog and Digital Signals Work Together
Digital technology such as personal computers (PCs), assist us in many ways: writing documents, spell
checking, and drawing. Unfortunately, the world is analog, and electronic analog computers are not as
INTRODUCTION TO DSP
versatile as digital computers. Therefore, in order to make use of the tremendous processing power that
digital technology offers us, we must do the following:
Convert the analog signals into electrical signals, using a transducer (such as a microphone, as
shown in the diagram).
Digitize these signals (i.e., convert them from analog to digital using an analog-to-digital converter
(ADC)), as shown in the diagram.
Once the signal is in digital form, our computer can easily process it through a digital signal processor. The
DSP specializes in processing these signals, which makes it slightly different from microcomputers,
microcontrollers, or general-purpose microprocessors.
After the DSP has processed the signal, the output signal must be converted back to analog form so that we
can sense it. This is the digital-to-analog (DAC) conversion stage in the diagram. A loudspeaker, for
example, would reproduce analog signals coming from the DAC into sound.
So, we can see that to process the signal digitally, we need to convert it at least twice. Is it worth it? As you
will see, it really is, at least until someone designs an analog computer as versatile as a digital one.
INTRODUCTION TO DSP
0001
0 x 8 x 0011 0000
1 x 4 x 0011 0011
+
0010 0 x 2 x 0011 0000
1 x 1 x 0011 0011
0011
Shi f t ed and
5
added
3
Multiply and Add
Add
1+2 = 3
Multipl y 5*3 = 15
=
mul t i pl e t i mes
Most Common Oper at i on i n DSP
A = B* C + D
E = F* G + A
.
.
Mul t ipl y, Add, and Ac cumul at e
MAC I nst r uct i on
MAC Oper at i on
Typi cal l y 70 Cl ock Cycl es Wi t h
Or di nar y Pr ocessor s
Typi cal l y 1 Cl ock Cycl e Wi t h
Di gi t al Si gnal Pr ocessor s
LECTURE 1 1- 3
Why Do We Need Digital Signal Processors?
Why do we need a digital signal processor? Can we not use a general-purpose microprocessor to process
signals as well? Let us try to answer this question by giving an example of some arithmetic operations
performed by DSPs.
Add and Subtract
Add and subtract operations are performed quite simply by general-purpose microprocessors in a single or
very few clock cycles. Digital addition is similar to decimal add. Our example shows adding 1 plus 2. The
result is the decimal 3.
Multiply and Divide
The multiply and divide operations are more complex. A digital multiply operation consists of a series of
shift and add operations. Our example shows a multiplication of 3 by 5. Division, which is more complex,
will not be discussed here. It is discussed in TMS320C54x DSP Reference Set, Vol. 2 Mnemonic Instruction
Set, Chapter 2, reference number SPRU172B. The subtract conditionally (SUBC) instruction set describes
this process.
General-purpose microprocessors are quite slow in performing multiply and divide operations. They will
typically sequentially execute a series of shift, add, and subtract operations from their microcode to perform
a single multiply operation, and may consume many cycles to complete.
INTRODUCTION TO DSP
The DSP performs multiplication in a single cycle by implementing all shift and add operations in parallel.
The circuitry is relatively complex and consumes a considerable number of transistors. The benefit is very
fast multiplication, which is required for processing most digital signals.
When general-purpose DSPs are not fast enough, the signal is either processed using analog circuits (which
may have some drawbacks), or in specialized DSP hardware designed only for that task. This eliminates
many of the benefits of a programmable DSP.
Digital signal processing, by its nature, requires many calculations of the form:
A = B*C + D
This may appear to be a simple task, but when speed is also required, we find that specialized, dedicated
hardware to perform this task is very useful.
Multiply, Accumulate (MAC)
Most DSPs have a specialized instruction that allows them to multiply, add, and save the result in a single
cycle. This instruction is usually called MAC (short for Multiply, Accumulate).
INTRODUCTION TO DSP
Drop in Multiplication Times
TI ME ( ns)
600
500
400
300
200
100
0
5 ns
1971 1976 1998
YEARS
LECTURE 1 1- 4
We have established that for DSPs, we need specialized hardware that is capable of performing multiply and
accumulate functions in the shortest possible time (preferably in a single cycle). However, the central
problem remains. How can we achieve a fast multiply operation? Without a fast multiplier, a worthwhile
DSP design would only be a dream.
Designing fast multipliers was one of the greatest challenges in digital design up until the 1980s. In the
1970s, several of the worlds leading research laboratories sought to make fast digital multipliers a reality.
Multiply Times
In 1971, Lincoln Laboratories designed a multiplier using 10,000 integrated circuits, performing the
operation in just 600 ns. By the mid-1970s, multiply times of 200 ns were becoming commonplace. This
made it possible to design acceptable digital signal processors. These early designs were expensive and
bulky, but fast multiplication was determined to be possible.
In the early 1980s, single-chip DSPs with good performance started to appear, and ever since, multiply times
have continued to drop. Todays 16-bit fixed-point devices can achieve multiply times of 5 ns. Given the
origins of this technology, this is a remarkable achievement.
INTRODUCTION TO DSP
Digital Computers
von Neuman Machi ne
A
STORED
PROGRAM
AND
D
DATA
INPUT/
OUTPUT
ARI THMETI C
LOGI CUNI T
A = ADDRESS
D = DATA
Harvard Architecture
STORED
PROGRAM
A
ARI THMETI C
LOGI CUNI T
D
A
INPUT/
OUTPUT
D
STORED
DATA
1-
Now let us have a closer look at the internal architecture of computers so we can see how this has affected
the design of DSP chips.
Stored Program Machines
Computers need instructions to operate. At every clock cycle, they must be told what to do. If the
instructions are stored, the computer just has to fetch and execute them. Such computers are called stored
program machines. Our computer typically fetches an instruction and then data, operates on the data, and
returns the resulting data to the store.
Stored program machines use two well-known and widely used computer architectures: von Neuman and
Harvard.. The following diagram shows the structure of the two architectures.
von Neuman Architecture
The von Neuman machines store programming and data in the same memory area. In this type of machine,
an instruction contains the operation command and the address of the data on which the operation is
performed. There are two basic operation units within these machines: the arithmetic logic unit (ALU) and
the input/output unit. The ALU performs the core operations: multiply, add, subtract, and many more. It is
on these very simple core operations that complex software, such as word processing software, can be built.
The input/output unit manages the flow of external data for the machine.
INTRODUCTION TO DSP
Harvard Architecture
The primary difference between Harvard architecture and von Neuman architecture is that with Harvard,
program and data memories are physically separated transmission paths. This enables the machine to
transfer instructions and data simultaneously. Such a structure can greatly enhance performance, because
instructions and data can be fetched simultaneously. Harvard machines also have ALUs and input/output
units.
Von Neuman and Harvard Architecture History
The history of these two architectures is very interesting. The Harvard architecture was developed by
Howard Aiken in the late 1930s at Harvard University, with the Harvard Mark 1 becoming operational in
1944. The University of Pennsylvania followed in 1946 with the development of the Electronic Numerical
Integrator and Calculator (ENIAC ).
John von Neuman, a Hungarian-born mathematician, suggested a simpler and lower cost architecture,
namely a single memory for programming and data. This simple solution has set the standard ever since. In
1951, the Institute of Advanced Studies in Princeton built the first von Neuman machine.
Which Architecture is Best Suited for DSP?
Common general-purpose personal computers use processors designed with the von Neuman architecture
while the Harvard architecture is more commonly used in specialized microprocessors for real-time and
embedded applications.
DSPs typically use Harvard architecture, although von Neuman DSPs also exist. Many signal and image
processing applications require fast, real-time machines. The drawback to using a true Harvard architecture
is that since it uses separate program and data memories, it needs twice as many address and data pins on
the chip and twice as much external memory. Unfortunately, as the number of pins or chips increases, so
does the price.
Electronic designers, who have had to tackle problems like these before, have come up with an elegant
solution: a single data and address bus is used externally, while two (or more) separate buses for program
and data are used internally. Timing (multiplexing) handles the separation of program and data
information. In one clock cycle, the program information flows on the pins, and in the second cycle, data
follows on the same pins. Program and data information is then routed onto separate internal program and
data buses. Such machines are called modified Harvard architecture processors because the internal
architecture is Harvard while the external architecture is von Neuman. The performance of modified
Harvard architecture processors typically compares well with the performance of true Harvard architecture
processors because most DSP chips also incorporate multiple internal RAM/ROM cells for high-use
instructions and data. This significantly reduces the time used for external sequential program and data
access associated with classic von Neuman processors.
INTRODUCTION TO DSP
A Typical DSP Syst em
DSP
MEMORY
ADC
DAC
DSP Chi p
Memory
Converters (Opt ional )
Anal og t o Di git al
Di gi t al t o Anal og
Communi cat i on Port s
Serial
Parall el
PORTS
LECTURE 1 1- 2
Components of a Typical DSP System
Typical DSP systems consist of a DSP chip, memory, possibly an analog-to-digital converter (ADC), a
digital-to-analog converter (DAC), and communication channels. Not all DSP systems have the same
architecture with the same components. The selection of components in a DSP system depends on the
application. For example, a sound system would probably require A/D and D/A converters, whereas an
image processing system may not.
DSP Chip
A DSP chip can contain many hardware elements; some of the more common ones are listed below.
Central Arithmetic Unit
This part of the DSP performs major arithmetic functions such as multiplication and addition. It is
the part that makes the DSP so fast in comparison with traditional processors.
Auxiliary Arithmetic Unit
DSPs frequently have an auxiliary arithmetic unit that performs pointer arithmetic, mathematical
calculations, or logical operations in parallel with the main arithmetic unit.
Serial Ports
DSPs normally have internal serial ports for high-speed communication with other DSPs and data
converters. These serial ports are directly connected to the internal buses to improve performance,
to reduce external address decoding problems, and to reduce cost.
INTRODUCTION TO DSP
Memory
Memory holds information, data, and instructions for DSPs and is an essential part of any DSP system.
Although DSPs are intelligent machines, they still need to be told what to do. Memory devices hold a series
of instructions that tell the DSP which operations to perform on the data (i.e., information). In many cases,
the DSP reads some data, operates on it, and writes it back. Almost all DSP systems have some type of
memory device, whether it is on-chip memory or off-chip memory; however, on-chip memory operates
faster.
A/D and D/A Converters
Converters provide the translator function for the DSP. Since the DSP can only operate on digital data,
analog signals from the outside world must be converted to digital signals. When the DSP provides an
output, it may need to be converted back to an analog signal to be perceived by the outside world.
Analog-to-digital converters (ADCs) accept analog input and turn it into digital data that consist of only 0s
and 1s. Digital-to-analog converters (DACs) perform the reverse process; they accept digital data and
convert it to a continuous analog signal.
Ports
Communication ports are necessary for a DSP system. Raw information is received and processed; then that
information is transmitted to the outside world through these ports. For example, a DSP system could output
information to a printer through a port. The most common ports are serial and parallel ports. A serial port
accepts a serial (single) stream of data and converts it to the processor format. When the processor wishes to
output serial data, the port accepts processor data and converts it to a serial stream (e.g., modem connections
on PCs). A parallel port does the same job, except the output and input are in parallel (simultaneous)
format. The most common example of a parallel port is a printer port on a PC.
INTRODUCTION TO DSP
1-
Practical DSP Systems
Hi -Fi Equi pment
Toys
Vi deophones
Modems
Phone Syst ems
3D Graphi cs
Image Processi ng
And More ...
LECTURE 1 1-13
Practical Applications for DSP Systems
Since their introduction to the market, DSPs have found a wide variety of applications. They are used in
everyday hi-fi systems as well as high-end virtual-reality applications. Generally, DSP is not an expensive
technology. Some practical DSP systems are:
Hi-Fi Equipment
Toys
Videophones
Modems
Phone Systems
3D Graphics Systems
Image Processing Systems
Hi-Fi Equipment (Music Systems)
DSPs are now being used in sound processors that can create the illusion of three-dimensional sound or
modify the acoustics of a room to give the illusion of very large rooms and auditoriums. The result is movie
theater quality sound in a home music system.
Toys
Today, DSP technology is integrated in children's toys. Talking toys are commonplace; by pressing the
picture of a dog, children can hear it bark. They can also learn their alphabet by singing along with a
teaching toy. This clearly demonstrates that DSP technology is not expensive.
INTRODUCTION TO DSP
1-
Videophones
Videophones will affect the lives of people from all walks of life. They are quickly improving in quality. It
is only a matter of time before prices drop and videophones become widely used. DSPs are used for
compression and decompression of images in videophones. There are several international standards for
compressing moving images. Programmable DSPs are the perfect answer to evolving standards since this
may only require a software update.
Modems
As the Internet continues to grow, so has the use of modems. To be able to handle the ever-increasing
communications load, modems have become faster and more efficient. DSPs perform vital functions in
modems such as modulating the digital bit stream into a signal compatible with a phone line, canceling line
echoes, and compressing and decompressing data
Phone Systems
These days, it is quite common to call a company and be answered by a machine that provides alternatives
such as: Say 1 for sales, Say 2 for technical support, and so on. These phone systems use DSPs to
perform the function of voice recognition. DSPs are also commonly used in the communications industry for
the add-on features you can get from your telephone company like caller ID, voice messaging, and call back.
3D Graphics Systems
Most flight simulators use 3D real-time graphics to enhance realism. To calculate the necessary details in
three dimensions (and to be able to do this 30 times every second) requires very efficient and powerful
processors. DSPs are now widely used in virtual-reality applications.
Image Processing Systems
Personal handheld digital cameras are also now becoming widespread. DSPs are used to perform the
conversion of charge-coupled device (CCD) chip analog voltages (video) to compressed data, which is then
stored digitally in constant storage EEPROM (electrically erasable ROM). The DSP also senses the buttons,
controls exposure times, provides the CCD gate timing, and downloads images to the PC.
DSPs are also used extensively in image processing, such as robot vision, machine vision, image
compression, and fingerprint recognition. A simple example of an image-processing application is the
inspection of printed circuit boards. The system works by recording the image of a working board and
comparing (subtracting) it to newly manufactured ones as they pass beneath a CCD camera. These systems
also use the efficient multiply and add cycles in DSPs to perform two-dimensional filtering.
INTRODUCTION TO DSP
Analog Advantages
Low cost and simplicity in some applications
Attenuators/amplifiers
Simple filters
Wide bandwidth (GHz)
Low signal levels
Infinite effective sampling rate
Infinite resolution in frequency
No aliasing/reconstruction issues
Infinite resolution in amplitude
No quantitation noise
LECTURE 1 1-14
Digital Signal Processing (DSP)
Advantages
Repeatability
Low sensitivity to component tolerances
Low sensitivity to temperature changes
Low sensitivity to aging effects
Nearly identical performance fromunit to unit
Matchedcircuits cost less
High noise immunity
In many applications DSP offers higher
performance and lower cost
CD players versus phonographic turntable
LECTURE 1 1-15
INTRODUCTION TO DSP
Why Digital Processing?
ADC PROCES S DAC
Advantages t o Digi t al Processing
Programmabi l i t y
Stabilit y
Repeat abi li t y
Speci al Appl i cat i ons
LECTURE 1 1- 8
So, Why Convert From Analog to Digital?
Some applications require analog designs, and some require digital designs. To process signals digitally,
they must be converted from analog to digital numbers. After a signal is processed, it is then often converted
back to analog form. Considering the overhead, digital processing must offer some clear advantages that
include:
Programmability
Stability
Repeatability
Special Applications
INTRODUCTION TO DSP
Programmabilit y
One Hardware = Many Tasks
SOFTWARE 1
SOFTWARE 2
.
SOFTWARE N
S AME
HARDWARE
LOW- P ASS FI LTER
MUSI C SYNT HESI ZER
.
MOT OR CONT ROL
Upgradabilit y and Flexibility
Devel op New Code Upgrade
Anal og Sol der New Component
1-
Programmability
A single piece of digital DSP hardware can perform many functions. For example, a multimedia PC can
play music and also function as a word processor if it is loaded with suitable programs. This ability to use
the same hardware for many functions provides important flexibility. You can implement any new function
you think of, as long as you can program it.
Upgradability
Once you have designed and implemented your system, you may want to upgrade or add new functions.
Perhaps you would like to adapt your system to a new environment. With a digital system, this means
modifying your code. With an analog system, this could involve obtaining and soldering in new
components, or even a complete redesign.
Flexibility
A single DSP board can be made to perform many functions by simply loading new programs into it. In our
demonstrations, we are using the same DSK board as a music tune generator and as a low-pass filter by
simply loading it with different software. This flexibility reduces design time and complexity. With analog
circuits, a new circuit has to be designed for each new function.
INTRODUCTION TO DSP
Stability
The stability of analog circuits depends upon several factors. Analog circuits are affected by temperature
and aging, among other things. Also, two analog systems using the same design and components may differ
Analog Variabilit y
Analog Ci rcui t s are af fected by
Temperat ure
Agi ng
Tol erance of Component s
Two Anal og Syst ems usi ng t he same desi gn and
component s may di f f er i n perf ormance
1k + 10 years = 1.1k
LECTURE 1 1-10
in performance.
Temperature
1k + 10 years = 1.1k
Analog components such as resistors, capacitors, diodes, and operational amplifiers are affected by
temperature, humidity, and aging. A temperature-sensitive analog circuit may perform quite differently in
the UK than in Egypt, where the temperatures are different. This could prove disastrous for a company that
sells its products worldwide.
Digital circuits do not gradually change their characteristics over time, temperature, or humidity. They
either work or they dont work. In other words, digital circuits are repeatable as long as they are designed
with enough tolerance to operate properly over the range of expected conditions.
Aging
The effects of component aging can be detrimental to analog circuits as characteristics and performance
change. These effects can sometimes be anticipated, or their effect may not be critical. Analog designers
must be aware of these effects.
Tolerances
Components such as resistors and capacitors have tolerances. If a component tolerance is only accurate to
within 10%, two apparently identical analog circuits could perform differently enough to cause operational
problems. This can make design, manufacturing, and support expensive.
INTRODUCTION TO DSP
Digital Repeatabilit y
Perfect Reproducibilit y
Nearl y i dent i cal perf ormance f rom uni t t o uni t
Perf ormance not aff ect ed by t ol erance
No dri f t i n perf ormance due t o t emperat ure or agi ng
Guarant eed accuracy
A CD pl ayer al ways pl ays t he same musi c
quality
LECTURE 1 1-11
Digital Repeatability
A properly designed digital circuit will produce the same result every time, in addition to being identical
from unit to unit. If the same multiplication is performed on 500 computers, all 500 computers should
produce the same result. Component tolerances, aging, and temperature drifts also do not affect digital
circuits nearly as much.
A properly designed digital circuit will produce the same results in the UK as in Egypt, even when the
temperatures are different. On the other hand, 500 analog circuits could produce a range of results.
In digital circuitry, logical 1s and 0s are defined when an analog voltage is above or below an analog voltage
threshold. For a digital circuit to be repeatable, the analog voltage which represents the logical 1s and 0s
needs to be sufficiently greater or less than the threshold so as not to be affected by circuit variations or
noise. The only concern is that timing restrictions and maximum device ratings should not be exceeded. If
proper digital inputs are not maintained, the 1s and 0s can be corrupted, making a normally repeatable
digital circuit suddenly fail. On the other hand, analog circuit characteristics will tend to gradually drift.
Digital accuracy is determined by the number of bits used and is guaranteed to remain the same. With
analog circuits, the number of bits is effectively infinite, but the effects of noise, tolerances, and linearity can
rapidly diminish performance.
A digital CD player consistently produces the same high-quality digital music and is primarily only limited
by the analog components that are still required. Analog components in a CD player include the DAC,
laser, laser pickup, read head actuator, spindle motor and headphones.
INTRODUCTION TO DSP
Performance
Some speci al f unct i ons are best i mpl emented
digitally
Lossl ess Compressi on
Adapti ve Fi lt ers Li near Phase Fi lt ers
gain
f phase
frequency frequency
f
1
f
2
LECTURE 1 1-12
Compression
Storage media such as hard disk drives and satellite communications links for telephone and video are
examples where resources are limited in terms of the available size and bandwidths. More would be better,
but installing additional hardware tends to be very expensive. In these cases, costs are passed on to the
consumer in one way or another. An example would be the substantial cost difference between a 20-minute
and a 2-minute phone call, especially if the call is long distance.
Although the prices for installing more advanced hardware tend to be on a downward curve, our need for
more information is on an even more aggressive upward curve. Data compression can be a valuable tool for
providing adequate performance from available resources, and at a reasonable cost.
Let us consider the example of a satellite link or transmission channel. If one megabyte of data is
compressed down to half a megabyte and then transmitted, a decompressor can then recover the original data
at the other end. Considering that the transmission line is only aware that half a megabyte of data has been
passed, the data channel bandwidth is effectively doubled.
A DSP can compress raw binary data and signals through the use of appropriate software programs.
Lossless compression programs are suitable for exact binary data transfers. On the other hand, programs
designed for compressing speech and video offer much higher compression ratios but with some loss in
signal quality. Analog circuits can also be used for some very simple forms of lossless compression but offer
very little flexibility.
INTRODUCTION TO DSP
Adaptive Filters
DSP systems have been developed that cancel some of the noise within cabins of cars, helicopters, and
airplanes. The noises cancelled were those caused by engine vibrations. The noise cancellation systems
used the engine speed as a reference and produced an anti-noise signal from speakers to cancel the cabin
noise. Feedback from microphones in each headrest (or headphone) was used to adapt the characteristics of
the anti-noise until the best possible noise reduction was achieved. The system then continued to adapt
periodically to track changes in the cabin noise.
A DSP system can easily adapt to some changes in environmental variables. An adaptive algorithm simply
calculates the new parameters required and stores them back in main memory, overwriting the previous
values. A very basic level of adaptation is possible in analog systems, but the complete change of a complex
set of filter characteristics (used in noise cancellation) is beyond the practical scope of analog signal
processing.
A notch filter with a steep cut-off frequency would be one example of a filter that might be needed to
implement noise cancellation. In this case, the DSP has the ability to recalculate suitable notches to remove
the vibration noise as the engine RPM changes. It is virtually impossible to produce the many required
tunable filters using analog techniques alone.
Linear Phase Filters
There are some valuable signal processing techniques that are difficult or impossible to produce by analog
procedures. The classic example is that of a linear phase filter that is difficult to design in analog and even
then, over a limited bandwidth. With a digitally implemented filter, it is possible to keep the phase shift of
each component frequency consistent with all other frequencies. This is possible by using a finite impulse
response (FIR) filter. This term will be explained later.
INTRODUCTION TO DSP
DSP Development
ADD A, B
11100010010100001001
ASSEMBLER
HIGH-LEVEL LANGUAGE
CODE TEST
EMULATOR
S/W DESIGN
DSP
N
OK?
Y
Tools of the Trade
PRODUCT
LECTURE 1 1- 7
The Program
The DSP chip is a piece of hardware that cannot function without the intelligence of a program. A program
is a series of instructions that perform certain functions. In our demonstrations, we will see some examples
of programming to compose simple musical tunes. To write these programs, we must use the tools of the
trade.
Assemblers
Assemblers generate machine-level code from text instructions. Let us assume we were given the following
two lines to remember:
ADD A, B
111000100101010001001
Since we understand written words better than a series of 1s and 0s, which line is easier to understand and
remember? Assemblers take our text instructions and convert them into machine language. This relieves us
of the burden of having to remember binary instructions for DSP. We will talk more about assembly
language in the next chapter.
High-Level Language
High-level languages are like assembly languages, but much friendlier. Assembly languages have very basic
instructions, such as multiply, add, and compare. High-level languages have higher-level instructions, such
as print, and repeat until equal to zero. Therefore, it is easier to write programs in high-level languages.
INTRODUCTION TO DSP
While it is easier to write in high-level languages, assembly language can produce programs that are able to
execute faster. For this reason, both have their uses in DSPs. Sometimes it is necessary to write time-critical
sections of a program in assembly. A complete program may have sections of code in assembly and sections
in a high-level language. It is easy to combine both types of code into a single executable program.
Assembly and high-level programming languages make it possible to program DSPs to perform a variety of
functions.
Simulators
Flight simulators make you feel as if you are in the cockpit of a plane without the cost of an actual airplane,
fuel, or risk of crashing. Likewise, a DSP simulator is a software implementation of a DSP chip. A
simulator typically runs on a computer (PC or workstation), simulating almost all of the functionality of the
DSP. They are used to analyze the feasibility of designs before the designs are committed to hardware.
They are also very useful in determining whether or not a particular design will work.
Emulators
An emulator allows us to directly control and debug the results of instructions executing on the DSP.
Modern emulators do not replace the DSP chip on the board but exert their control through a serial
emulation scan path. Using these devices, it is possible to see all of the internal changes in the device at
each step. Developers can execute the instructions one step at a time, check voltage levels for correct
operation, and check each result in their own time. Emulators are invaluable tools in development
environments.
Debugger
A debugger interface is used to display program execution information in a useable format for the
programmer. The data displayed in the debugger windows is essentially a formatted data print of the
contents of the DSP memory. This memory is simply loaded into the PC using either an emulator or a
communications link with the PC using appropriate software.
For example, the memory window can display (and edit) data in hexadecimal, float or integer formats, but
the data is nonetheless binary 1s and 0s to the DSP. Likewise, the disassembly window is simply a
reformatting of the binary value into a recognizable alphanumeric mnemonic.
The CPU register window is a little different in that the C54x register values are not directly accessible as
memory data because they are not memory mapped registers. For the scan emulator, this job is quite easy
since the scan path simply routes through the internal registers of the DSP. For the DSK, this task is
accomplished using a special program that saves and restores the CPU registers from the DSP main memory.
Other than this, the data displayed in the CPU register window is simply another data form.
Debuggers consist of a user interface on the host PC computer, which can control and modify the contents
and execution of the chip. The user interface displays the contents of RAM, registers, and the disassembly of
the currently loaded program. The major advantage of debuggers over simulators is that they operate in real
time, allowing the designer to assess the performance of the system in a real-time environment.
INTRODUCTION TO DSP
Development Cycle
After the feasibility of the design is established through simulation, program design can begin. First, the
software is designed. This stage determines the complexity and the modules of the code. The modules of
software are written and tested, and then the full system is put together and tested. If everything works as
required, the result is version 1.0 of the product on the market. If it does not work as required, the process is
repeated until it does. When new requirements and improvements emerge as a result of user feedback, a new
version is produced via the same process.
INTRODUCTION TO DSP
Number Syst ems
Represent numbers di gi t al l y
Decimal 128 64 32 16 8 4 2 1
Di gi t Nu m be r
2
Di gi t Number
7
2
7
6
2
6
5
2
5
4
2
4
3
2
3
2
2
2
1
2
1
0
2
0
Any number can be represent ed as a seri es of 1s and 0s
Deci mal 3 i n bi nary
Decimal 0 0 0 0 0 0 2+ 1= 3
2
Digit Number
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
Digit Number 7 6 5 4 3 2 1 0
Binary 0 0 0 0 0 0 1 1 0000 0011
Deci mal 26 i n bi nar y
Deci mal 0 0 0 16+ 8+ 0 2= 0 26
Di gi t Nu m b e r
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
2
Digi t Number 7 6 5 4 3 2 1 0
Binary 0 0 0 1 1 0 1 0 0001 1 010
LECTURE 1 1-17
Number Systems
Let us now consider decimal, binary and hexadecimal (hex) number systems. The human-friendly decimal
system uses ten digits, 0 to 9, for number representation. Numbers larger than 9 are represented by carrying
a digit to the left. Number 10 represents one complete decimal count (digit 1x10) and a 0.
Binary
To represent numbers digitally, we are only allowed two binary digits, logic 1 and logic 0. Large numbers
can be represented in the binary system; however, more digits are needed to represent the same number in
the binary than are needed in the decimal system.
Consider the representation of decimal number 3 in binary, as shown in the preceding example. The value
of each binary digit is determined by its position. In the binary system, the maximum value the first digit
can have is 1. The second digit has a maximum value of 2. In an 8-bit system the decimal number 3 is
represented by setting the two least significant digits (LSB) to 1, or 0000 0011b.
To represent the decimal number 26, higher order binary digits are set to achieve the value 0001 1010b.
In an 8-bit binary system, the largest decimal number that can be represented is 2
8
-1 = 255 = 11111111b.
And the largest decimal number that can be represented in a 16-bit binary system is 2
16
-1 = 65,535.
INTRODUCTION TO DSP
Binary and Hex
Deci ma l 0, 1, 2, . , 9
Bi nar y 0, 1
Hex 0, 1, 2, . . , A, B, C, D, E, F
16 Deci ma l 0x10 Hex
20 Deci ma l 0x14 Hex
4 bi t s of bi nary syst em i s represent ed by a si ngl e hex di git
Deci mal 8+ 4+ 2+ 1= 15
D i g i t N u m b e r
2
3
2
2
2
1
2
0
2
Binary 1 1 1 1 1111
Hex F F
Deci mal 26 i n bi nar y and hex
Deci mal 0 0 0 16+ 8+ 0 2= 0 26
D i g i t N u m b e r
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
2
Di gi t Number 7 6 5 4 3 2 1 0
Bi nary 0 0 0 1 1 0 1 0 0001 1010
Hex 1 A 1A
LECTURE 1 1-18
Binary and Hexadecimal
Another useful number system is base 16 or hexadecimal (hex). After digit 9, the alphabet letters A to F are
used to represent the top base numbers 10 through 15. The largest decimal number that can be represented
with a single-digit hex number is 15, which is F in hex. To represent decimal number 16 in hex, the next
digit position is used: 0x10 hex. To distinguish hex numbers from decimal we will use a preceding 0x.
Another common convention is to follow the number with an 'h'. In this case, the first hexadecimal digit
must be decimal numeric digit (0-9) to avoid confusing the resulting string as a symbol in a program. For
example, 0F3h would be a valid hexadecimal representation while F3h could be confused with a symbol.
A similar convention to the trailing 'h' for hexadecimal is used for binary numbers by following the binary 1
and 0 digits with a 'b'. Again, by specifying that the first character in the digit stream as numeric 0 or 1,
with a trailing 'b' the character string can be identified as only being a binary value.
Hex notation is very useful because large numbers can be represented with fewer digits than with the binary
or decimal system. The hex format is also extremely convenient for digital or binary systems because each
hex digit replaces exactly four binary digits. This is because the biggest single hex number (0xF, or 0Fh) is
represented exactly in four binary digits: 0xF hex = 1111b.
The 8-bit binary representation of decimal 26 is 00011010b, or 0x1A (hex) which is much shorter. The hex
system may look confusing at first, but when you need to convert to binary numbers, or represent large
numbers, you will soon realize the benefits.
INTRODUCTION TO DSP
We will explain how to convert from hex to decimal and back in Chapter 6.
Signed I ntegers
Si gned magni t ude i nt eger s
Decimal
Signed
Hex
Binary
Sign Number
2 00 00 00 02 0 000 0000 0000 0000 0000 0000 0000 0010
3 00 00 00 03 0 000 0000 0000 0000 0000 0000 0000 0011
-2
-3
80 00 00 02
80 00 00 03
1
1
000 0000 0000 0000 0000 0000 0000 0010
000 0000 0000 0000 0000 0000 0000 0011
LECTURE 1 1-19
Signed Integers
To perform arithmetic, we need to be able to represent signed numbers. In the binary system, the most
significant bit (MSB) is used to indicate the sign of a number. When the MSB is set to 1, the number is
negative and when it is set to 0, the number is positive. Two conventions, signed magnitude and signed
twos complement (2s complement) exist for representing signed numbers.
The signed magnitude convention is familiar to us since this is how we represent negative decimal numbers.
For example, a +/- symbol is used for the sign bit to represent the negative of +10 as 10. However, this
leaves an interesting question about the value of 0. Is it +0 or 0, or are they the same? Another issue is
how to simplify the digital hardware in a DSP or microprocessor since smaller and faster circuits are an
advantage.
INTRODUCTION TO DSP
Twos Complement Notation
D i g i t n u m b e r
2
7
-2
6
2
5
2
4
2
3
2
2
2
1
2
0
2
Deci mal -128 64 32 16 8 4 2 1
Bi nary
t wo s compl ement
3 0 0 0 0 0 0 1 1
Deci mal
calculati on
0 0 0 0 0 0 +2 +1= 3
Bi nary
t wo s compl ement
-2 1 1 1 1 1 1 1 0
Deci mal
calculati on
-128 +64 +32 +16 +8 +4 +2 +0= -2
LECTURE 1 1-20
The table on the previous page shows binary notation for signed integers; 2, 3, -2 and -3 using a signed 32-
bit system. The first two positive integers are represented in the standard fashion. The last two negative
numbers have the top bit set to 1, but the rest of the representation remains the same.
This sign-representation system is not convenient for a binary or digital machine. The machine needs to
assess the sign bit and then carry out addition or subtraction, depending on the direction of the sign bit. A
more convenient system would use twos compliment notation to perform both addition and subtraction with
the same hardware.
Twos Complement Notation
To make it easier to understand twos complement notation, our example uses an 8-bit binary representation.
For positive numbers, such as the example +3, the MSB is set to 0, and the other bit values are exactly the
same as in standard binary notation.
The twos complement notation of a negative number is quite different. If the MSB is set to 1, the MSB
represents a negative value for that bit position. The top bit in an 8-bit system would therefore represent
negative 2
7
or -128 with the rest of the bits again representing positive values. The sum of the decimal
values of each bit that are set gives us the numbers decimal value. To represent 2 in twos complement, the
top bit is set to 1, representing 128, and the lower-order bits are set to make the result of the addition of all
bits equal to 2. In this case, 2 = 128 + 126.
The hardware method that is used to implement a twos complement converter and adder is even simpler.
This method negates a number by simply inverting all the bits and adding a 1 (as a carry bit) to the LSB. If
a 1 is added to the LSB, this causes a carry into the upper bits, which may ripple carry bits all the way to the
INTRODUCTION TO DSP
top bit. During addition, each binary bit cell receives two bits from the two operands, plus any carry that
may propagate from the next lower-bit cell. The 1 that is added into the LSB is simply implemented as a
carry bit as if it were coming from the next lower-bit cell.
The largest negative and positive values in twos complement form for an 8-bit system are as shown:
Most positive +1 * 2
7
- 1 = +127 >> 0111 1111b >> 0 +64+32+16+8+4+2+1
Most negative 1 * 2
7
= 128 >> 1000 0000b >> 128 +0
A 16-bit system would have a range of
Most positive +1 * 2
15
- 1 = +32767
Most negative 1 * 2
15
= 32768
For a 32-bit system such as a TMS320C31 32-bit processor
Most positive +1 * 2
31
1 = +2147483647
Most negative 1 * 2
31
= 2147483648
INTRODUCTION TO DSP
Fixed-Point Notation
Convent i ons
Number r ange i s bet ween 1 and - 1
Deci mal poi nt i s al ways i n a f ixed l ocat i on ( e. g. , 0. 74, 0. 34, et c. )
Mul t ipl yi ng a f ract ion by a f ract i on al ways resul t s in a f ract ion and wi l l
not pr oduce an over f l ow ( e. g. , 0. 99 x 0. 9999 = l ess t han 1)
Successi ve addi t i ons may cause over f l ow
Why?
Si gnal pr ocessi ng i s mul t i pl i cat i on- i nt ensi ve
Fixed- poi nt not at i on prevent s overf l ow ( usef ul wit h a smal l dynami c
r ange)
Fi xed- poi nt not at i on i s l ess expensi ve
How i s fi xed- poi nt not at i on reali zed i n a DSP?
Most f i xed- poi nt DSPs are 16 bi t s
The r ange of number s t hat can be r epr esent ed i s 3276 7 t o - 32768
The most common f i xed- poi nt f or mat i s Q15
Q15 Not at i on
Bit 15 Bits 14 to 0
si gn t wo s compl ement number
LECTURE 1 1-22
Fixed-Point Notation
Fixed-point notation, sometimes called fractional-point notation or Q format, uses an implied binary point
to represent binary fractions. This point always remains at a fixed location. The dynamic range of a
processor is the range between the smallest and the largest number it can represent. When the dynamic
range is limited,
In a 16-bit processor, the dynamic range is 32767 to 32768. Such a small dynamic range can easily create
overflows. For example, 200 350 = 70000, which is an overflow!
However, if the number range is limited, or more precisely scaled, to +1 to 1, a multiplication could never
produce an overflow. For example, the multiplication of two fractional numbers within the range of 1 to 1
must always produce a result that is also a fraction. The result is therefore confined to be within the range of
1 to 1. Unfortunately, successive additions can produce overflow values outside the range of 1 to 1. This
point should be remembered when performing fixed-point arithmetic.
Signal processing is both multiplication and addition intensive. An overflow can have serious consequences,
(e.g., unintentionally clipping a large signal). A fixed-point system can solve this problem by either
checking for overflows after each math operation, or by knowing that the inputs and outputs of the operation
are input bounded or well behaved.
INTRODUCTION TO DSP
Why Use a Fixed-Point System?
The cost of implementing many DSP systems is strongly dependent on the amount of chip silicon used to get
the job done, with most of the chip silicon being either in the processor or in the surrounding memory. If the
chip silicon is mostly used for data storage, such as long audio delay buffers, video or coefficient tables, the
difference between 16- and 32-bit data storage can be as much as 2:1.
Furthermore, routing twice as many signals around the chip and system board can consume extra space and
drive up the power consumption. Another advantage of short 16-bit fixed-point chips is that by making the
core processor small, not only are the chips smaller and less expensive, they are also usually a bit faster.
This may again lower the price of the DSP chip that, in price-sensitive volume applications, is an important
consideration. However, if a 16-bit system must also perform 32-bit operations, these advantages can be lost
and end up costing more. If a system can tolerate a smaller dynamic range and resolution, then the use of
16-bit data can be an economic advantage.
Fixed-Point Q Notation
As we have seen in multiplication and addition, overflows can be a problem for fixed-point DSPs. To
eliminate this problem, a programming convention called Q format is introduced where fixed-point DSPs
operate on fractional numbers which, by definition, cannot saturate. The principle of Q notation is the
application of a simple scaling coefficient to convert fractions to integers that a fixed-point DSP is designed
to handle. (Note that this is not an issue for floating-point DSP).
The letter Q represents the Quantity of fractional bits and the number following the Q indicates the number
of bits that are used for the fraction. This divides the number into an upper and lower region of bits where
the upper region contains the sign bit and any whole integer values, and the lower bits hold the fraction.
Any Q format is possible, but Q15 is the most widespread in 16-bit DSPs and Q31 is most often used for 32-
bit DSPs.
In Q15 format, an imaginary decimal point is placed between bits 15 and 16. The upper range in this case is
only one MSB (for a 16-bit DSP) which is essentially the sign bit, or bits 1631 in a 32-bit DSP. The
remaining 15 bits are used to represent the fractional part of the number. To convert a Q-format integer to a
floating-point value, a scaling coefficient is needed. If the Q number is 15, the coefficient or resolution of
the fraction will be 2^15 or 30.518e6.
INSTRUCTORS GUIDE INTRODUCTION TO DSP
Q15 Format
Dynamic range in Q15
Number Bi ggest Smal l est
Fr act i onal number 0. 999 - 1. 000
Scal ed i nt eger f or Q15 32767 - 32768
Number represent at i ons i n Q15
Decimal Q15 = Decimal x 2
15
Q15 I nteger
0.5 0. 5 x 32767 16384
0.05 0. 05 x 32767 1638
0.0012 0. 0012 x 32767 39
Rul es f or operati ons
Avoid operat i ons wit h number s lar ger t han 1
2. 0 x ( 0. 5 x 0. 45) = ( 0. 2 x 0. 5 x 0. 45) x 10 = ( 0. 5 x 0. 45) + ( 0. 5 x 0. 45)
Scal e number s bef or e t he oper at i on
0. 5 i n Q15 = 0. 5 x 3276 7 =16384
LECTURE 1 1-23
Dynamic Range in Q15
The dynamic range, or ratio of largest to smallest magnitude levels, is the same for Q15 and normal integers.
It is the scaling coefficient that sets the two apart, and other than this, you may have difficulty knowing
which format is in use. As mentioned previously, to prevent overflows the inputs and outputs can be
constrained to fractions in the range of 1 to 1 by simply applying a scaling coefficient.
Number Representation in Q15
Scaling a number is simple:
Integer = Q15_fractional_number 2
15
The second table on the slide shows several examples of scaling.
Rules for Operations
The most important rule in using the Q15 fixed-point format is to avoid using a number larger than 1 or
smaller than 1. There are some instances where this can be safely violated. For example, a property of a
2s complement adder is that if an addition overflow occurs, exceeding the available 16-bit range, a
subtraction can unwrap the result back down into a valid range. Generally, however, it is best to avoid the
problem in the first place. If a dynamic range greater than 32767 to 32768 (i.e., a 16-bit system) is
required, it is also possible to perform longhand arithmetic in pieces, but this consumes CPU cycles and
data.
The bottom portion of the slide shows an example where multiplying 0.5 and 0.45 (unscaled for clarity)
results in another fraction, which is not a problem. Multiplying the product by 2 can be done using two
methods. One method is to multiply one of the inputs by 2 first. If the result of this intermediate operation
exceeds +/1.0, we will have a problem. The inputs could be scaled down first and then scaled up
INSTRUCTORS GUIDE INTRODUCTION TO DSP
afterwards, but this is also far from efficient. An alternative method is to add the product to itself,
effectively multiplying by 2. This is one of the difficulties of using the fixed-point operation. The
programmer needs to think about these issues and plan ahead.
Another important rule is that all numbers must be scaled to the same Q format (Q15 in our examples),
placing the decimal points of both operands in the same place, before an addition or subtraction is
performed. Generally, this is also practiced in multiplication. However since the scaling coefficients are
multiplied, the correct fraction can be retrieved using yet another scaling constant. Nevertheless, mixing Q
formats is not desired.
INTRODUCTION TO DSP
Q15 Operations
Addi t i on
Decimal Q15 Scal e back
Q15 / 32767
0. 5 + 0.05 = 0. 55 16384 + 1638 =
18022
0.55
0. 5 0.05 = 0.45 16384 1638 =
Multi plicati on
2 x 0. 5 x 0.45 =
Decimal Q15 Back to Q15
Product / 32767
Scal e back
Q15 / 32767
0. 5 x 0. 45 = 0. 225 16384 x 14745 =
241584537
7373
0. 225 + 0. 225 = 0. 45 7373 + 7373 =
14746
0. 45
LECTURE 1 1-24
Q15 Addition
Q15-format addition is shown in the first example above. The numbers 0.5 and 0.55 are each scaled by
32767 (Q15 coefficient) and then added. Since both numbers are scaled to the same Q format, the decimal
point in both will be in the same place (bit 15). The sum is then scaled back to verify the result.
In the second example, the correct subtraction (sum of two's complement) is 14746, and the expected scaled
result is 14746 / 32767 = 0.45.
Q15 Multiplication
When scaled numbers are multiplied, the scaling coefficients are multiplied. To compensate, a second
scaling factor that will put the data in the correct bit position is used. The Q15 multiplication shown gives
an idea of how large the numbers can get. But as we can also see, the division by the Q-15 coefficient scales
the number back down and we get the correct result.
Anticipating this, the multiplier on a 16-bit DSP produces a 32-bit result. In actuality, the result is packed
into the upper bits and comes with two sign bits. The programmer can either downshift to the lower 16 bits,
or can left-shift up by one bit before storing the upper bits. Both methods will produce the same result, but
the DSP is usually optimized to do the up-shift by 1 bit and store, so this is normally what is done.
We can see how a Q15*Q15 multiply works by examining the process in long hand. In particular when the
scaling coefficients are multiplied the result is a new scaling constant with a Q value equal to the sum of the
two Q constants used on the operands. Given A and B in Q15 format, the result C=A*B is
C=(A*2
15
) * (B*2
15
) = A*B * 2
30
C=(A*Qx)*(B*Qy) = A*B * Qx+y
INTRODUCTION TO DSP
It is evident that the output is no longer in Q15 format. To compensate, we need to ask where the new
decimal point is. By noting that 2
30
is the same as saying Q30, we know that the decimal point is at bit 30 of
the 32-bit result. To get back to 2
15
(Q15), we can multiply by 2
15
(a shift right by 15), or by multiplying yet
again by 2
1
to a Q31 result. In this case, the correct bits are packed into the upper 16 bits of the DSP register
The fixed-point Q format has the advantage of preventing overflows but certainly introduces complications
for the DSP programmer.
INTRODUCTION TO DSP
31 ... 24 23 22 .............. 0
e s f
8 bits 1 bit 23 bits
Binary Decimal Equation
s = 0
X = 01.f x 2
e
X = 01.f x 2
e
1
s = 1
X = 10.f x 2
e
X = ( -2 + 0.f ) x 2
e
2
Exponent ( e)
Deci mal 0 1 127 -1 -128
Hex
two s comp.
00 01 7F FF 80
TMS Floating-Point Format
TMS single-precision floating-point format
Bit No
e = exponent is a signed two s compliment 8-bit fiel d and determines
the l ocation of the binary Q point
s = sign of mantissa (s = 0 positive, s =1 negative)
f = fractional part of the mantissa; an implied 1.0 is added to this fraction
but is not all ocated in the bit field since thi s val ue is al ways present
Conversion equations
Special case
s = 0 X = 0 e = -128
LECTURE 1 1-25
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Floating-Point Formats
Although the C54x device is fixed-point, a popular floating-point format (used, for example, in
TMS320C67x devices) standard is IEEE 754. The differences between various floating-point formats are
actually insignificant, and conversion can be performed in ASIC hardware or software.
TMS320 Single-Precision Floating-Point Format
The preceding table shows an example of a TMS320C67x floating-point bit assignments. The top eight bits
represent the exponent (e) in twos complement notation. Bit 23, (s), is the sign bit of the mantissa, and the
lower 23 bits are the fraction (f) of the mantissa. A value of 1.0 is also implied in the mantissa, but is not
allocated a bit position since it is always present. This format is called floating-point because of the implied
binary point floats around, depending on how large the exponent is. The exponent is essentially a variable Q
value that is automatically adjusted for maximum precision and range by the hardware.
Conversion Equations
The middle table on the slide shows the conversion equations for the TMS320 single-precision floating-point
format. The second column shows the binary and the third column shows the decimal version of the same
equation. The decimal version of the equation is easier to understand. There are two different equations for
positive and negative mantissa. We will use decimal examples of both equations to aid in the understanding
of this format.
INTRODUCTION TO DSP
The representation of 0.0 is a special case where any number with an exponent of -128 (0x80) is treated as
zero. Since -128 is the smallest possible value for the exponent, the scaling coefficient for these numbers
would produce very small values. The convention used in the assembler is to represent zero as 0x80000000.
For example, all of the following numbers are treated as the value 0:
0x80000000
0x80123456
0x80876345
This is a special case worth remembering.
INTRODUCTION TO DSP
Floating-Point Numbers
Cal cul at e 1. 0e0
I n hex 00 00 00 00
I n bi nar y 0000 0000 0000 0000 0000 0000 0000 0000
s = 0 Equati on 1 appl i es: X = 01. f x2
e
f = 0
01. 0 x 2
0 e = 0
= 1. 0
Cal cul at e 1. 5e01
I n hex 03 70 00 00
I n bi nar y 0011 0111 0000 0000 0000 0000 0000 0000
s = 0 Equati on 1 appl i es: X = 01. f x2
e
0011 e = 3
s111 ... f = 0. 5 + 0. 25 + 0.125 = 0.875
X = 01. 875 x 2
3
= 15. 0 deci mal
LECTURE 1 1-26
Floating-Point Numbers
Let us now find the binary representation of 1.0e0.
Since this is a positive number, the sign bit s=0. Therefore, Equation 1 applies. The fractional part of the
mantissa (f) is 0 (f=0), and the exponent (e) is also 0 (e=0). Now that we know the decimal values for all the
appropriate parts, we can express the 32-bit binary format. The fractional part of the mantissa (f) is zero, and
is represented by setting bits 0 to 22 and the sign bit to 0. This leaves the top eight bits for the exponent (e),
which are also set to 0. The top part of the slide shows the binary and hex representation for 1.0e0.
The binary representation of the floating-point number decimal 1.5e01 is next. This number is positive,
which implies that the sign bit s = 0 and that Equation 1 applies. Knowing that 1.875 x 8 = 15, the
fractional part of the mantissa is 0.875 (the 1.0 is implied) and the exponent e = 3 (2
3
=8). Adding fractions
0.5, 0.25 and 0.125 together yields 0.875, which corresponds to setting the top three bits (20,21 and 22) of
the fractional to 1. The binary representation of the floating-point value is shown in the bottom part of the
slide.
Calculating negative floating-point numbers is slightly different. Although it is important to understand
how binary representations correlate with decimal floating-point values, it is rarely necessary to perform the
calculation.
INTRODUCTION TO DSP
More on Floating Point
Cal cul at e - 2. 0e0
I n hex 00 80 00 00
I n bi nar y 0000 0000 1000 0000 0000 0000 0000 0000
s = 1 Equati on 2 appl i es: X = ( -2. 0 + 0. f ) x 2
e
f = 0
( -2. 0 + 0. 0 ) x 2
0 e = 0
= -2. 0
Addi t i on
1. 5 + (-2.0) = 0. 5
Multi plicati on
1.5e00 x 1.5e01 = 2. 25e01 = 22. 5
LECTURE 1 1-27
Negative Floating-Point Numbers
The binary representation in TMS320 format for -2.0e0 is now considered. Since the number is negative, the
sign bit s = 1, and equation 2 is applied. The mantissa is actually in twos compliment, so the fraction f (0.f)
is added to a decimal value of -2.0. The mantissa, -2.0+f, is then multiplied by the exponent multiplier, 2
e
.
To arrive at a value of -2.0 the fraction (f) and the exponent (e) are therefor both 0 with the sign bit set to 1.
The binary representation for -2.0e0 is shown on the top portion of the slide.
Addition and Multiplication
Addition and multiplication of floating-point numbers is simplicity in itself. The bottom portion of the slide
shows an example of each. The DSP programmer does not need to do any scaling or take any special
precautions before or after an addition, subtraction or multiply since this is all done in hardware. This is one
of the reasons why it is easier to program floating-point DSP devices.
INTRODUCTION TO DSP
Dynamic Range
Ranges of number syst ems
Numbers Base 2 Decimal
Twos
Complement
Hex
Largest Integer
2
31
- 1
2 147 483 647 7F FF FF FF
Smallest Integer
- 2
31
-2 147 483 648 80 00 00 00
Largest Q15 2
15
- 1 32 767 7F FF
Smallest Q15
- 2
15
-32 768 80 00
Largest Floating Point
127
( 2 - 2
-23
) x 2 3.402823 x
38
10
7F 7F FF FD
Smallest Floating Point -2 x 2
127
-3.402823 x
38
10
83 39 44 6E
The dynami c range of f l oat i ng- poi nt represent ati on i s very l arge
Concl usi on
Lar gest i nt eger x ( 1. 5 x 10
29
) ~ = l ar gest f l oat i ng poi nt
Lar gest Q15 x ( 1. 0 3 x 10
34
) ~ = l ar gest f l oat i ng poi nt
LECTURE 1 1-28
Comparison of Dynamic Ranges
The dynamic range in a number system means the distance, in unit steps, between the largest and smallest
number in that system. The larger the dynamic range is, the less potential it has of creating overflow
conditions. Some signal-processing applications need a larger dynamic range than others. For example, a
radar application may be trying to extract a tiny signal of only a few Vs buried in noise with an average
level of several volts.
The top table on the slide shows the dynamic ranges of a 32-bit signed integer notation, a fixed-point Q15
format used by 16 bit fixed point DSPs, and a 32-bit TMS320 single-precision floating-point format. It is
clear that the TMS320 floating-point format has a larger dynamic range, but to fully appreciate the
difference in dynamic range, you must multiply the largest integer with a constant value to reach the biggest
value in TMS320 single-precision floating-point format. This constant is very large, indicating the vast
difference in the dynamic ranges of the 32-bit signed integer notation and the TMS320 single-precision
floating-point notation. The same comparison with the Q15 format reveals an even bigger difference.
Clearly, the TMS320 single-precision floating-point format has a much larger dynamic range than the other
number systems. The TMS320C67x single-precision floating-point architecture is just one reason for its
popularity in certain signal-processing applications.
Note that the resolution in TMS320 single-precision floating-point is 24 bits. This extra precision is a big
benefit in applications such as digital audio. Humans also tend to respond to audio in a logarithmic way that
is very similar to floating point. Applications and demonstration examples that take advantage of this can be
downloaded from the Texas Instruments web site.
INTRODUCTION TO DSP
Fixed vs. Floating Point
DSP devi ces ar e desi gned as f l oat i ng poi nt or f i xed poi nt
Fi xed- poi nt devi ces ar e usual l y 16- bi t s, e. g. TMS320C5x
Fl oat i ng- poi nt devi ces are usual l y 32- bi t s, e. g. TMS320C3x
Fl oat i ng- poi nt devi ces usual l y have a f ul l set of f i xed- poi nt i nst r uct i ons
Fl oat i ng poi nt devi ces ar e easi er t o pr ogr a m
Fi xed- poi nt devi ces can emul at e f l oat i ng poi nt i n sof t war e
Compari son
Characteristic Floating point Fixed point
Dynamic range much larger smaller
Resolution comparable comparable
Speed comparable comparable
Ease of programming much easier more difficult
Compiler efficiency more efficient less efficient
Power consumption comparable comparable
Chip cost comparable comparable
System cost comparable comparable
Design cost less more
Time to market faster slower
LECTURE 1 1-29
DSP Devices
DSP devices are designed as fixed- or floating-point devices. The design philosophy, data paths, and
internal modules of each device are different. Generally, fixed-point devices address high-volume and
inexpensive applications while floating-point devices target high-performance applications. But these
differences are becoming hard to distinguish because the price of floating-point devices continues to fall.
Fixed-point devices, such as the TMS320C54x device, are usually 16 bits with fewer external pins.
Floating-point devices, such as the TMS320C67x device, are commonly 32 bits. Floating-point devices
usually have a full set of fixed-point instructions and can be used as fixed-point processors without any speed
penalty. Fixed-point devices can emulate floating-point devices in software, but there is a speed penalty
because the conversion from fixed- to floating- point is performed in software.
Comparison of Fixed vs. Floating-Point Devices
A table comparison of fixed- and floating-point devices shows clearly the key component of each of the
systems. Features like floating-point relieves the designer of any consideration of dynamic range in the
design, but can cost more in CPU and additional memory costs.
The speed of fixed point systems will tend to be slightly higher and consume less power, yet with the
parallelism and greater precision of 32 bit data, this can sometimes easily outweigh any speed penalty.
Floating-point devices are much easier to program; there is less concern with scaling, dynamic-range issues
and, in most cases, resolution. Resolution is often determined by bus width but this also drives system cost.
INTRODUCTION TO DSP
C compilers for floating-points devices are much more efficient than C compilers for fixed-point devices.
The primary reason for this is that the fixed-point devices do not have large register sets and therefore need
software modules for number conversions to provide a reasonable C interface. For example, when the C
programmer declares a floating-point number, an assembler routine needs to convert this format into fixed-
point format and back again after the processor has executed the necessary operations. Fixed-point devices
typically have 1 or 2 accumulators, whereas the C67x floating-point family has sixteen 32-bit registers that
can be used for math operations. Having more registers to work with is an advantage for a C compiler.
These are important points in choosing a device for an application. Programs written in C will tend to favor
floating-point devices.
Power consumption depends heavily on both the system architecture and the software that is used. In a
CMOS design, power is consumed when a capacitive node is charged from one supply rail to the other. If
the change in state does not occur, no power is consumed. Since the processor, memory and surrounding
system board may consist of millions of internal and external nodes, it is important to toggle as few as
possible to get complete a given task. The other variable is to try and minimize the capacitance of each
node. Simply put, toggles per second and higher capacitance equates to more power usage.
Power consumption is therefore related to clock rates and data-bus width. If it takes fewer cycles to get the
same job done on a wider bus, the net power usage may be the same or even better. For example, a 16-bit
fixed-point device might use a similar amount of power when compared to a 32-bit floating-point device
using only 16 bits of its data bus.
The cost of floating-point-device chips is also becoming comparable to traditional fixed-point devices.
Floating-point system costs need not be high just because they are internally using 32 bits. Minimum
systems are made possible through the efficient use of internal RAM and fewer external components. The
DSP Starter Kits and all of the applications that run on them would be an excellent example of a minimal
system.
The cost of programming, measured in the salary dollars paid out to a programmer, can also be a deciding
factor. Selling many units to a very large market can absorb the extra time required for a fixed-point design.
For smaller markets, or when time to market is important, the low design costs of floating-point are very
beneficial.
Selecting a device for a particular application is a complex decision and should be considered carefully along
with any other points that are specific to the design. Our discussion highlighted some of the more important
considerations.
INTRODUCTION TO DSP
TMS320 Famil y
16-Bit Fixed Point Devices
C1x Hard-Disk Controllers
C2x Fax Machi nes
C2xx Embedded Cont rol

C5x Voi ce Processi ng


C54x Digital Cellular
Phones
32-Bit Floating Point Devices
C3x Vi deophones
C4x Parallel Processi ng
Other Devices
C6x Advanced VLI W
Processor
Wireless Base
St ati ons/ Pooled
Modems
C8x Vi deo Conferenci ng
LECTURE 1 1-30
TMS320 Family
The Texas Instruments TMS320 family of DSP devices covers a wide range, from a 16-bit fixed-point device
to a single-chip parallel-processor device. In the past, DSPs were used only in specialized applications. Now
they are in many mass-market consumer products that are continuously entering new market segments.
Let us briefly consider the Texas Instruments TMS320 family of DSP devices and their typical applications.
C1x, C2x, C2xx, C5x, C54x
The width of the data bus on these devices is 16 bits. All have modified Harvard architectures. They have
been used in toys, hard disk drives, modems, cellular phones, and active car suspensions.
C3x
The width of the data bus in the C3x series is 32 bits. Because of the reasonable cost and floating-point
performance, these are suitable for many applications. These include almost any filters, analyzers, hi-fi
systems, voice-mail, imaging, bar-code readers, motor control, 3D graphics, or scientific processing.
C4x
This range is designed for parallel processing. The C4x devices have a 32-bit data bus and are floating-point.
They have an optimized on-chip communication channel, which enables a number of them to be put together
to form a parallel-processing cluster. The C4x range devices have been used in virtual reality, image
recognition, telecom routing, and parallel-processing systems.
C6x
The C6x devices feature VelociTI , an advanced very long instruction word (VLIW) architecture developed
by Texas Instruments. Eight functional units, including two multipliers and six arithmetic logic units
INTRODUCTION TO DSP
(ALUs), provide 1600 MIPS of cost-effective performance. The C6x DSPs are optimized for multi-channel,
multifunction applications, including wireless base stations, pooled modems, remote-access servers, digital
subscriber loop systems, cable modems, and multi-channel telephone systems.
C8x
The C80 is the first processor in this range. It has parallel processing on a single piece of silicon with four
advanced DSPs (ADSPs) and a RISC master processor. It is used in high-performance video telephony, 3D
computer graphics, virtual reality, and a number of multimedia applications. A lower-cost version, the C82,
features two ADSPs and the RISC master processor.
INTRODUCTION TO DSP
TMS320C54x Architecture
S y s t e m c o n t r o l
i n t e r f a c e
P r o g r a m a d d r e s s g e n e r a t i o n
l o g i c ( P A G E N )
D a t a a d d r e s s g e n e r a t i o n
l o g i c ( D A G E N )
P C , I P T R , R C ,
B R C , R S A , R E A
A R A U 0 , A R A U 1
A R 0 - A R 7
A R P , B K , D P , S P
P A
B
P
B M e m o r y
a n d
C A e x t e r n a l i
n t e r f a c e
C
D A
B
D
P e r i p h e r a l i
n t e r f a c e
E A
E
B
E X P
e n c o d e r
X D A B
M U X
T r e g i s t e r
T D A A P C D T A B C D S B A C D
S i g n c t r S i g n c t r A ( 4 0 ) B ( 4 0 ) S i g n c t r S i g n c t r S i g n c t r
M u l t i p l i e r ( 1 7 y 1 7 )
0
A B A
M U X
M U B
B a r r e l
A L U ( 4 0 )
F r a c t i o n a l M U X
A d d e r ( 4 0 )
L e g e n
d : A A c c u m u l a t o r A
B A c c u m u l a t o r B
C C B d a t a b u s
D D B d a t a b u s
E E B d a t a b u s
M M A C u n i t
P P B p r o g r a m b u s
S B a r r e l s h i f t e r
A B
M U X
C O M
T R
S
M S W / L S W
s e l e c t
E
Z E R O S A T R O U N D T T r e g i s t e r
U A L U N
T
C
LECTURE 1 1-31
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
TMS320C54x Architecture
The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power with eight
buses. Separate program and data spaces allow simultaneous access to program instructions and data,
providing a high degree of parallelism. For example, three reads and one write can be performed in a single
cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In
addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set
of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle.
Also, the C54x includes the control mechanisms to manage interrupts, repeated operations, and function
calling.
Fixed-point processors represent numbers as a magnitude and sign within a certain number of bits. For the
C54x, it is 16 bits. This is in contrast to floating-point processors that represent numbers as magnitude
multiplied by an exponent. Fixed-point processors have smaller dynamic range the range of numbers that
can be represented) than floating-point processors, but are also less complex and consequently less
expensive. If the extended dynamic range is not needed, a fixed-point processor may be a more cost-efficient
choice.
INTRODUCTION TO DSP
Bus Structure
The C54x architecture is built around eight major 16-bit buses (four program/data buses and four address
buses):
The program bus (PB) carries the instruction code and immediate operands from program
memory.
Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data
address generation logic, program address generation logic, onchip peripherals, and data
memory.
The CB and DB carry the operands that are read from data memory.
The EB carries the data to be written to memory.
Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction
execution.
The C54x can generate up to two data-memory addresses per cycle using the two auxiliary
register arithmetic units (ARAU0 and ARAU1).
The PB can carry data operands stored in program space (for instance, a coefficient table) to the multiplier
and adder for multiply/accumulate operations or to a destination in data space for data move instructions
(MVPD and READA). This capability, in conjunction with the feature of dual-operand read, supports the
execution of single-cycle, 3-operand instructions such as the FIRS instruction. The C54x also has an on-
chip bi-directional bus for accessing on-chip peripherals. This bus is connected to DB and EB through the
bus exchanger in the CPU interface. Accesses that use this bus can require two or more cycles for reads and
writes, depending on the peripherals structure. Table 1 summarizes the buses used by various types of
accesses.
Table 1. Bus Usage for Read and Write Accesses
Access Type
Address Bus Data Bus
PAB CAB DAB EAB PB CB DB EB
Program read
Program write
Data single read
Data dual read
Data long (32bit) read (hw) (lw) (hw) (lw)
Data single write
Data read/data write
Dual read/coefficient read
Peripheral read
Peripheral write
Legend: hw = high 16bit word lw = low 16bit word
INTRODUCTION TO DSP
Internal Memory Organization
The C54x memory is organized into three individually selectable spaces: program, data, and I/O space. All
C54x devices contain both random-access memory (RAM) and read-only memory (ROM). Among the
devices, two types of RAM are represented: dual-access RAM (DARAM) and single-access RAM (SARAM).
Table 2 shows how much ROM, DARAM, and SARAM are available on the different C54x devices. The
C54x also has 26 CPU registers plus peripheral registers that are mapped in data-memory space.
Table 2. Program and Data Memory on the TMS320C54x Devices
Memory Type 541 542 543 545 546 548 549 5402 5410 5420
ROM: 28K 2K 2K 48K 48K 2K 16K 4K 16K 0
Program 20K 2K 2K 32K 32K 2K 16K 4K 16K 0
Program/
data
8K 0 0 16K 16K 0 16K 4K 0 0
DARAM
?
5K 10K 10K 6K 6K 8K 8K 16K 8K 32K
SARAM
?
0 0 0 0 0 24K 24K 0 56K 168K
You can configure the dualaccess RAM (DARAM) and singleaccess RAM (SARAM) as data memory or program/data
memory.
INTRODUCTION TO DSP
Sign ctr
ALU Functional Diagram
CB15 - CB0
T
DB15 - DB0
40 40
A B T C
MUX
D S
MUX
Shifter output (40)
SXM Sign ctr SXM
A B
ACC
MUX
40
40
Y X
ALU
OVM
C16
C
OVA/OVB
ZA/ZB
TC
Legend:
A M U B A Accumulator A
40
MAC
output
B Accumulator B
C CB data bus
D DB data bus
M MAC unit
S Barrel shifter
T T register
U ALU
LECTURE 1 1-32
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Central Processing Unit (CPU)
The C54x CPU contains:
40bit arithmetic logic unit (ALU)
Two 40bit accumulators
Barrel shifter
17 17bit multiplier
40bit adder
Compare, select, and store unit (CSSU)
Data address generation unit
Program address generation unit
Arithmetic Logic Unit (ALU)
The C54x performs 2s-complement arithmetic with a 40-bit arithmetic logic unit (ALU) and two 40-bit
accumulators (accumulators A and B). The ALU can also perform Boolean operations. The ALU uses these
inputs:
16bit immediate value
16bit word from data memory
16bit value in the temporary register, T
Two 16-bit words from data memory
INTRODUCTION TO DSP
32-bit word from data memory
40-bit word from either accumulator
Accumulators
Accumulators A and B (see Figure 1) store the output from the ALU or the multiplier/adder block. They can
also provide a second input to the ALU; accumulator A can be an input to the multiplier/adder. Each
accumulator is divided into three parts:
Guard bits (bits 39-32)
High-order word (bits 31-16)
Low-order word (bits 15-0)
Instructions are provided for storing the guard bits, for storing the high- and the low-order accumulator
words in data memory, and for transferring 32-bit accumulator words in or out of data memory. Also, either
of the accumulators can be used as temporary storage for the other.
Barrel Shifter
The C54x barrel shifter has a 40-bit input connected to the accumulators or to data memory (using CB or
DB), and a 40-bit output connected to the ALU or to data memory (using EB). The barrel shifter can
produce a left shift of 0 to 31 bits and a right shift of 0 to 16 bits on the input data. The shift requirements
are defined in the shift count field of the instruction, the shift count field (ASM) of status register ST1, or in
the temporary register T (when it is designated as a shift count register).
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle. The
LSBs of the output are filled with 0s, and the MSBs can be either zero filled or sign extended, depending on
the state of the sign-extension mode bit (SXM) in ST1. Additional shift capabilities enable the processor to
perform numerical scaling, bit extraction, extended arithmetic, and overflow prevention operations
Multiplier/Adder Unit
The multiplier/adder unit performs 17 x 17-bit 2s-complement multiplication with a 40bit addition in a
single instruction cycle. In the C54x architecture, the 17 x 17-bit multiplier is present to accommodate the
ability to multiply a signed number by an unsigned number. Although the original data from memory is 16-
bit, unsigned numbers are sign-extended into a 17
th
bit so that they can be used by the multiplier.
The multiplier/adder block consists of several elements: a multiplier, an adder, signed/unsigned input
control logic, fractional control logic, a zero detector, a rounder (2s complement), overflow/saturation logic,
and a 16-bit temporary storage register (T). The multiplier has two inputs: one input is selected from T, a
data-memory operand, or high part of accumulator A; the other is selected from program memory, data
memory, accumulator A, or an immediate value.
The fast, on-chip multiplier allows the C54x to perform operations efficiently such as convolution,
correlation, and filtering. In addition, the multiplier and ALU together execute multiply/accumulate (MAC)
computations and ALU operations in parallel in a single instruction cycle. This function is used in
determining the Euclidean distance and in implementing symmetrical and least mean square (LMS) filters,
which are required for complex DSP algorithms.
INTRODUCTION TO DSP
Compare, Select, and Store Unit (CSSU)
The compare, select, and store unit (CSSU) performs maximum comparisons between the accumulators
high and low word, allows both the test/control flag bit (TC) in status register ST0 and the transition register
(TRN) to keep their transition histories, and selects the larger word in the accumulator to store into data
memory. The CSSU also accelerates Viterbi-type butterfly computations with optimized on-chip hardware.
On-Chip ROM
The on-chip ROM is part of the program memory space and, in some cases, part of the data memory space.
The amount of on-chip ROM available on each device varies, as shown in Table 2. On devices with a small
amount of ROM (2K words), the ROM contains a bootloader that is useful for booting to faster on-chip or
external RAM. For bootloading details on all C54x devices except the 548 and 549, see TMS320C54x DSP
Reference Set, Volume 4: Applications Guide, SPRU173.
On devices with larger amounts of ROM, a portion of the ROM may be mapped into both data and program
space (except the 5410). The larger ROMs are also custom ROMs: you provide the code or data to be
programmed into the ROM in object file format, and Texas Instruments generates the appropriate process
mask to program the ROM.
OnChip DualAccess RAM (DARAM)
The DARAM is composed of several blocks. Because each DARAM block can be accessed twice per
machine cycle, the central processing unit (CPU) and peripherals such as the buffered serial port (BSP) and
host port interface (HPI) can read from and write to a DARAM memory address in the same cycle. The
DARAM is always mapped in data space and is primarily intended to store data values. It can also be
mapped into program space and used to store program code.
OnChip SingleAccess RAM (SARAM)
The SARAM is composed of several blocks. Each block is accessible once per machine cycle for either a
read or a write. The SARAM is always mapped in data space and is primarily intended to store data values.
It can also be mapped into program space and used to store program code.
On-Chip Memory Security
The C54x maskable memory security option protects the contents of on-chip memories. When you designate
this option, no instruction that has originated externally can access the on-chip memory spaces.
INTRODUCTION TO DSP
0000h
OVLY = 0 0000h-13FFh External
OVLY = 1 0000h-007Fh Reser ved
0080h-13FFh On-chip DARA
0000h
M
0000h-005Fh Memory-mapped register
0060h-007Fh Scratch-pad DARAM
0080h-13FFh On-chip DARAM
1400h-8FFFh External
1400h-DFFFh External
2000h 2000h
4000h 4000h
6000h 6000h
8000h 8000h
MP/MC = 0 9000h-FF7Fh On-chip ROM
FF80h-FFFFh Interrupt vectors
(internal)
MP/MC = 1 9000h-FF7Fh External
FF80h-FFFFh Interrupt vectors
(external)
A000h A000h
C000h C000h
E000h E000h
DROM = 0 E000h-FFFFh External
DROM = 1 E000h-FEFFh On-chip ROM
FF00h-FFFFh Reserved
Two C54x Memory Maps
541 Program Memory 541 Data Memory
FFFFh FFFFh
LECTURE 1 1-33
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
MemoryMapped Registers
The data memory space contains memory-mapped registers for the CPU and the on-chip peripherals. These
registers are located on data page 0, simplifying access to them. The memory-mapped access provides a
convenient way to save and restore the registers for context switches and to transfer information between the
accumulators and the other registers.
INTRODUCTION TO DSP
Direct Addressing Block Diagram
DP(9)
7 LSBs from IR (dma)
SP(16)
CPL
CPL
0
1
DAGEN
EA = DP : offset(IR)
EA = SP + offset(IR)
DAB(16) (read)
EAB(16) (write)
or
CAB(16)
(32-bit read)
Data bus DB(16)
Data bus EB(16)
Legend: EA Effective address
IR Instruction register
LECTURE 1 1-34
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Data Addressing
The C54x offers seven basic data addressing modes:
Immediate addressing uses the instruction to encode a fixed value.
Absolute addressing uses the instruction to encode a fixed address.
Accumulator addressing uses accumulator A to access a location in program memory as data.
Direct addressing uses seven bits of the instruction to encode the lower seven bits of an address.
The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to determine
the actual memory address.
Indirect addressing uses the auxiliary registers to access memory.
Memorymapped register-addressing uses the memory-mapped registers without modifying
either the current DP value or the current SP value.
Stack addressing manages adding and removing items from the system stack.
During the execution of instructions using direct, indirect, or memory-mapped register addressing, the data
address generation logic (DAGEN) computes the addresses of datamemory operands.
INTRODUCTION TO DSP
C54x Program Memory
PAGEN
PC
Repeat registers
RC
BRC
RSA
REA
LECTURE 1 1-35
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Program Memory Addressing
Program memory is usually addressed on a C54x device with the program counter (PC). With some
instructions, however, absolute addressing may be used to access data items that have been stored in program
memory.
The PC is loaded by the program-address generation logic (PAGEN). And is used to fetch individual
instructions. Typically, the PAGEN increments the PC as sequential instructions are fetched. However, the
PAGEN may load the PC with a nonsequential value as a result of some instructions or other operations.
Operations that cause a discontinuity include branches, calls, returns, conditional operations, single
instruction repeats, multipleinstruction repeats, reset, and interrupts. For calls and interrupts, the current
PC is saved onto the stack; it is referenced by the stack pointer (SP). When the called function or interrupt
service routine is finished, the PC value that was saved is restored from the stack via a return instruction.
For a detailed discussion of the hardware and software factors in program address generation, see Chapter 7,
Program Memory Addressing.
INTRODUCTION TO DSP
C54x Pipeline
Loads PAB with
the PCs contents
Loads IR with the contents
of PB
Decodes the IRs contents
Loads DB with the data1
read operand
Loads CB with the data2
read operand
Loads EAB with the data3
write address, if required
Prefetch Fetch Decode Access Read Execute/write
Loads PB with the
fetched instruction
word
Loads DAB with the data1 read
address, if required
Loads CAB with the data2 read
address, if required
Updates auxiliary registers and
stack pointer
Executes the instruction
and loads EB with write
data
Time
LECTURE 1 1-36
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Pipeline Operation
An instruction pipeline consists of a sequence of operations that occur during the execution of an instruction.
The C54x pipeline has six levels: prefetch, fetch, decode, access, read, and execute. At each of the levels, an
independent operation occurs. Because these operations are independent, from one to six instructions can be
active in any given cycle, each instruction at a different stage of completion. Typically, the pipeline is full
with a sequential set of instructions, each at one of the six stages. When a PC discontinuity occurs, such as
during a branch, call, or return, one or more stages of the pipeline may be temporarily unused. For more
details about the pipeline operation, see Chapter 7, Pipeline.
OnChip Peripherals
All the C54x devices have the same CPU, but different onchip peripherals are connected to their CPUs. The
C54x devices have these onchip peripheral options:
General-purpose I/O pins: XF and BIO
Timer
Clock generator
Host port interface
8-bit standard (542, 545, 548, 549)
8-bit enhanced (5402, 5410?-?see note below)
16-bit enhanced (5420?-?see note below)
Synchronous serial port (541, 545, and 546)
INTRODUCTION TO DSP
Buffered serial port (542, 543, 545, 546, 548, and 549)
Multichannel buffered serial port (McBSP) (5402, 5410, and 5420?-?see note below)
Time-division multiplexed (TDM) serial port (542, 543, 548, and 549).
Software-programmable wait-state generator
Programmable bank-switching module
Note: Enhanced Peripherals For more detailed information on the enhanced peripherals, see SPRU302,
TMS320C54xDSP, Enhanced Peripherals: Volume 5.
GeneralPurpose I/O Pins
Each C54x device has two general-purpose I/O pins: BIO and XF. BIO is an input pin that can be used to
monitor the status of external devices. XF is a software-controlled output pin that allows you to signal
external devices.
SoftwareProgrammable WaitState Generator
The software-programmable waitstate generator extends external bus cycles up to seven machine cycles (14
machine cycles in the 549, 5402, 5410, and 5420) to interface with slower off-chip memory and I/O
devices. The software wait-state generator is incorporated without any external hardware. For off-chip
memory accesses, from zero to seven wait states can be specified within the software wait-state register
(SWWSR) for each 32K-word block of program and data memory, and for the 64K-word block of I/O space.
Programmable Bank-Switching Logic
The programmable bankswitching logic can automatically insert one cycle when an access crosses memory
bank boundaries inside program memory or data memory. One cycle can also be inserted when an access
crosses from program memory to data memory. This extra cycle prevents bus contention by allowing
memory devices to release the bus before other devices start driving the bus. The size of memory bank for
bank switching is defined by the bank switching control register (BSCR).
Host Port Interface
The host port interface (HPI) is a parallel port that provides an interface to a host processor. Information is
exchanged between the C54x and the host processor through C54x on-chip memory that is accessible to both
the host processor and the C54x. Table 3 identifies the HPI-equipped C54x devices.
Table 3. Host Port Interfaces on the TMS320C54x DSP
Host Port
Interface
541 542 543 545 546 548 549 5402 5410 5420
Standard 8bit
HPI
0 1 0 1 0 1 1 0 0 0
Enhanced 8bit
HPI
0 0 0 0 0 0 0 1 1 0
Enhanced 16bit
HPI
0 0 0 0 0 0 0 0 0 1
INTRODUCTION TO DSP
Hardware Timer
The C54x features a 16bit timing circuit with a 4-bit prescaler. The timer counter is decremented by 1 at
every CLKOUT cycle. Each time the counter decrements to 0, a timer interrupt is generated. The timer can
be stopped, restarted, reset, or disabled by specific status bits.
Clock Generator
The clock generator consists of an internal oscillator and a phaselocked loop (PLL) circuit. The clock
generator can be driven internally by a crystal resonator with the internal oscillator or externally by a clock
source. The PLL circuit can generate an internal CPU clock by multiplying the clock source by a specific
factor; thus, you should use a clock source with a lower frequency than that of the CPU.
INTRODUCTION TO DSP
Serial Port Interface Block
Diagram
Data Bus
16
DRR (16)
(Load)
Load
control
logic
16
DXR (16)
16
RINT on
RSR-DRR
transfer
Load
Control
Logic
16
(Load)
XINT on
DXR-XSR
transfer
RSR (16)
XSR (16)
Byte/word
counter
(Clear)
(Clock)
(Clear)
(Clock)
Byte/word
counter
FSR FSX
DR CLKRCLKX DX
LECTURE 1 1-37
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
Serial Ports
The serial ports on the C54x vary by device, and are represented by four types: synchronous, buffered,
multichannel buffered (McBSP), and time-division multiplexed (TDM). See Table 4 for the number of each
type on the various C54x devices. The sections that follow provide an introduction to the four types of serial
ports. For more details about these ports, see Chapter 9, Serial Ports. For detailed information about the
McBSPs, see volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number
SPRU302.
Table 4. Serial Port Interfaces on the TMS320C54x Devices
Serial Ports 541 542 543 545 546 548 549 5402 5410 5420
Synchronous 2 0 0 1 1 0 0 0 0 0
Buffered 0 1 1 1 1 2 2 0 0 0
Multichannel
Buffered
0 0 0 0 0 0 0 2 3 6
TDM 0 1 1 0 0 1 1 0 0 0
INTRODUCTION TO DSP
Synchronous Serial Ports
Synchronous serial ports are high-speed, full-duplexed serial ports that provide direct communication with
serial devices such as codecs, analog-to-digital (A/D) converters, and other serial systems. When more than
one synchronous serial port resides on a C54x, these ports are identical but independent. Each synchronous
serial port can operate at up to one-fourth the machine cycle rate (CLKOUT). The synchronous serial port
transmitter and receiver are double buffered and individually controlled by maskable external interrupt
signals. Data is framed either as bytes or as words.
Buffered Serial Ports
A buffered serial port (BSP) is a synchronous serial port that is enhanced with an autobuffering unit and is
clocked at the full CLKOUT rate. It is fullduplexed and doublebuffered to offer flexible data stream
length. The autobuffering unit supports highspeed transfers and reduces the overhead of servicing
interrupts.
Multichannel Buffered Serial Ports (McBSPs)
The McBSP is an enhanced buffered serial port that includes the following standard features: buffered data
registers, full duplex communication, and independent clocking and framing for receive and transmit. In
addition, the McBSP includes the following enhanced features: internal programmable clock and frame
generation, multichannel mode, and general purpose I/O. For detailed information about the McBSPs, see
volume 5 of this reference set: TMS320C54x DSP, Enhanced Peripherals, literature number SPRU302.
TDM Serial Ports
A time-division multiplexed (TDM) serial port is a synchronous serial port that is enhanced to allow time
division multiplexing of the data. It can be configured for either synchronous operations or for TDM
operations and is commonly used in multiprocessor applications.
INTRODUCTION TO DSP
C54x External Bus Interface
CLKOUT
PB Fetch
CB/DB Reads
EB Write
A(22 - 0)
D(15 - 0) Write Read Read Fetch
LECTURE 1 1-38
Copyright 1998, Texas Instruments Incorporated All Rights Reserved
External Bus Interface
The C54x can address up to 64K words of data memory, 64K words of program memory (8M words in the
548, 549, and 5410; 1M words in the 5402; 256K words in the 5420), and up to 64K words of 16bit
parallel I/O ports. Accesses to either external memory or I/O ports take place through the external interface.
Individual space-select signals, DS, PS, and IS, allow the selection of physically separate spaces.
The interfaces external ready input signal and softwaregenerated wait states allow the processor to
interface with memory and I/O devices of many different speeds. The interfaces hold modes allow an
external device to take control of the C54x buses; in this way, an external device can access the resources in
the program, data, and I/O spaces.
External memory can be accessed by most C54x instructions. However, accessing I/O ports requires the use
of special instructions: PORTR and PORTW.
IEEE Standard 1149.1 Scanning Logic
The IEEE Standard 1149.1 scanninglogic circuitry is used for emulation and testing purposes only. This
logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-pin
continuity as well as to perform operational tests on devices peripheral to the C54x. The IEEE Standard
1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of the onchip
resources. Thus, the C54x can perform on-board emulation using the IEEE Standard 1149.1 serial scan pins
and the emulationdedicated pins.
INTRODUCTION TO DSP
REFERENCES
Ahmed, Irfan (ed.). [1991]. Digital Control Applications With the TMS320 Family, Texas Instruments,
Dallas, TX, 1991.
Allen, J. [1975]. Computer Architecture for Signal Processing, Proceedings of the IEEE, vol. 63, no. 4,
pp. 624-633, April 1975
Arazi, Benjamin. [1988]. A Commonsense Approach to the Theory of Error Correcting Codes, MIT Press,
Cambridge, MA
Augarten, S. [1984]. Bit by Bit, Ticknor & Fields, New York
Auslander, E. [1993]. Digital signal processing and the emerging markets of the 90s, Le Traitement du
Signal et ses Applications, Actes des Conferences, DSP93
Bell, C. G. and Newell, A. [1971]. Computer Structures, McGraw-Hill, New York
Bowen, B. A. and Brown, W. R. [1982]. VLSI Systems Design for Digital Signal Processing, Volume 1:
Signal Processing and Signal Processors, Prentice-Hall, Englewood Cliffs, NJ
Cooley, J. W., Lewis, P. A. W. and Welch, P. D. [1967]. Historical Notes on the Fast Fourier Transform,
IEEE Transactions on Audio and Electroacoustics, Vol AU-15, No. 2, pp.76-79, June 1967
Cooley, J. W. and Tukey J. W. [1965]. An algorithm for the machine computation of complex Fourier
Math. Of Comput., Vol 19, pp. 297-301
Danielson, C. G. and Lanczos, C. [1942]. Some improvements in practical Fourier analysis and their
J. Franklin Inst., Vol 233, pp. 365-380 and 435-452,
April 1942
DeFatta, David J.; Lucas, Joseph G. and Hodgkiss, William S. [1988]. Digital Signal Processing: A System
Design Approach, John Wiley, New York
Dote, Y. [1990]. Servo Motor and Motion Control using Digital Signal Processors, Prentice-Hall,
Englewood Cliffs, NJ
Hanselmann, H. [1987]. Implementation of Digital Controllers - A Survey, Automatica, Vol. 23, No. 1,
1987
Hayes, John P. [1979]. Computer Architecture and Organization, McGraw-Hill International
Heidemann, Michael T., Johnson, Don. H. and Burrus, C. Sidney [1984]. Gauss and the History of the Fast
IEEE ASSP Magazine, pp. 14-21, October 1984
INTRODUCTION TO DSP
Jury, E. I. [1964]. Theory and Application of the Z-Transform Method, John Wiley, New York Lewis, F.
[1992]. Applied Optimal Control & Estimation: Digital Design & Implementation, Prentice-Hall,
Englewood Cliffs, NJ
Lynn, Paul A. [1982]. The Analysis and Processing of Signals, MacMillan, London
Oppenheim, A. V. and Schafer, R. W. [1975 and 1988]. Digital Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ
Runge, C. [1903]. Zeit. fur Math. and Physik, Vol 48, p. 43.

S-ar putea să vă placă și