Documente Academic
Documente Profesional
Documente Cultură
4. Dual address generators. The processor has two data address gen-
erators (DAGs) that provide immediate or indirect (pre- and
post-modify) addressing. Modulus, bit-reverse, and broadcast oper-
ations are supported with no constraints on data buffer placement.
5. Efficient program sequencing. In addition to zero-overhead loops,
the processor supports single-cycle setup and exit for loops. Loops
are both nestable (six levels in hardware) and interruptable. The
processors support both delayed and non-delayed branches.
Architectural Overview
The SHARC processors form a complete system-on-a-chip, integrating a
large, high speed SRAM and I/O peripherals supported by I/O buses. The
following sections summarize the features of each functional block.
Processor Core
The processor core consists of two processing elements (each with three
computation units and data register file), a program sequencer, two
DAGs, a timer, and an instruction cache. All processing occurs in the pro-
cessor core. The following list and Figure 1-1 describes some of the
features of the SHARC core processor.
PM ADDRESS 24
DMD/PMD 64 5 STAGE
PROGRAM SEQUENCER
PM DATA 48
DAG1 DAG2
16x32 16x32
PM ADDRESS 32
SYSTEM
I/F
DM ADDRESS 32
USTAT
PM DATA 64 4x32-BIT
PX
DM DATA 64
64-BIT
RF DATA RF
ALU Rx/Fx SWAP Sx/SFx
MULTIPLIER SHIFTER ALU SHIFTER MULTIPLIER
PEx PEy
16x40-BIT 16x40-BIT
STYKx STYKy
generate an interrupt and asserts their timer expired output. The count
register is automatically reloaded from a 32-bit period register and the
countdown resumes immediately.
Instruction cache. The program sequencer includes a 32-word instruction
cache that effectively provides three-bus operation for fetching an
instruction and two data values. The cache is selective; only instructions
whose fetches conflict with data accesses using the PM bus are cached.
This caching allows full speed execution of core, looped operations such as
digital filter multiply-accumulates, and FFT butterfly processing. For
more information on the cache, refer to “Operating Modes” on page 4-88.
Data bus exchange. The data bus exchange (PX) register permits data to be
passed between the 64-bit PM data bus and the 64-bit DM data bus, or
between the 40-bit register file and the PM/DM data bus. These registers
contain hardware to handle the data width difference. For more informa-
tion, see “Register Files” on page 2-1.
JTAG Port
The JTAG port supports the IEEE standard 1149.1 Joint Test Action
Group (JTAG) standard for system test. This standard defines a method
for serially scanning the I/O status of each component in a system. Emula-
tors use the JTAG port to monitor and control the processor during
emulation. Emulators using this port provide full speed emulation with
access to inspect and modify memory, registers, and processor stacks.
JTAG-based emulation is non-intrusive and does not effect target system
loading or timing.
Core Buses
The processor core has two buses—PM data and DM data. The PM bus is
used to fetch instructions from memory, but may also be used to fetch
data. The DM bus can only be used to fetch data from memory. In con-
junction with the cache, this Super Harvard Architecture allows the core
to fetch an instruction and two pieces of data in the same cycle that a data
I/O Buses
The I/O buses are used solely by the IOP to facilitate DMA transfers.
These buses give the I/O processor access to internal memory for DMA
without delaying the processor core (in the absence of memory block con-
flicts). One of the I/O buses is used for all peripherals (SPORT, SPI, IDP,
UART, TWI etc.) while the second I/O bus is only used for the external
port. The address bus is 19 bits wide, and both I/O data buses are 32 bits
wide.
Bus capacities. The PM and DM address buses are both 32 bits wide,
while the PM and DM data buses are both 64 bits wide.
These two buses provide a path for the contents of any register in the pro-
cessor to be transferred to any other register or to any data memory
location in a single cycle. When fetching data over the PM or DM bus, the
address comes from one of two sources: an absolute value specified in the
instruction (direct addressing) or the output of a data address generator
(indirect addressing). These two buses share the same port of the memory.
Each of the four memory blocks can be accessed by any of the two dedi-
cated core and I/O buses assuming the accesses are conflict free.
Data transfers. Nearly every register in the processor core is classified as a
universal register ( Ureg). Instructions allow the transfer of data between
any two universal registers or between a universal register and memory.
This support includes transfers between control registers, status registers,
and data registers in the register file. The bus connect (PX) registers permit
data to be passed between the 64-bit PM data bus and the 64-bit DM data
bus, or between the 40-bit register file and the PM/DM data bus. These
registers contain hardware to handle the data width difference. For more
information, see “Processing Element Registers” on page A-14.
GPIO Flags 4 11 15 15
Data Sizes
64-bit (LW) No Yes Yes Yes
48-bit (NW) Yes Yes Yes Yes
40-bit (NW) Yes Yes Yes Yes
32-bit (NW) Yes Yes Yes Yes
16-bit (SW) Yes Yes Yes Yes
Development Tools
The processor is supported by a complete set of software and hardware
development tools, including Analog Devices’ emulators and the Cross-
Core Embedded Studio or VisualDSP++ development environment. (The
emulator hardware that supports other Analog Devices processors also
emulates the processor.)
The development environments support advanced application code devel-
opment and debug with features such as:
• Create, compile, assemble, and link application programs written
in C++, C, and assembly
• Load, run, step, halt, and set breakpoints in application programs
• Read and write data and program memory
• Read and write core and peripheral registers
• Plot memory
Analog Devices DSP emulators use the IEEE 1149.1 JTAG test access
port to monitor and control the target board processor during emulation.
The emulator provides full speed emulation, allowing inspection and
modification of memory, registers, and processor stacks. Nonintrusive
in-circuit emulation is assured by the use of the processor JTAG inter-
face—the emulator does not affect target system loading or timing.
Software tools also include Board Support Packages (BSPs). Hardware
tools also include standalone evaluation systems (boards and extenders). In
addition to the software and hardware development tools available from
Analog Devices, third parties provide a wide range of tools supporting the
Blackfin processors. Third party software tools include DSP libraries,
real-time operating systems, and block diagram design tools.
Features
The register files have the following features.
• The non memory-mapped registers are called universal registers
and can be used by almost all instructions
• Data registers are used for computation units
• Complementary data registers are used for the complementary
computation units
• System registers are used for bit manipulation
Functional Description
The following sections provide a functional description of the register
files.
ASTATx Element x arithmetic status flags, bit test flag, and so on.
csreg ASTATy Element y arithmetic status flags, bit test flag, and so on.
Data Registers
Each of the processor’s processing elements has a data register file, which
is a set of data registers that transfers data between the data buses and the
computational units. These registers also provide local storage for oper-
ands and results.
The two register files consist of 16 primary registers and 16 alternate (sec-
ondary) registers. The data registers are 40 bits wide. Within these
registers, 32-bit data is left-justified. If an operation specifies a 32-bit data
transfer to these 40-bit registers, the eight LSBs are ignored on register
reads, and the LSBs are cleared to zeros on writes.
Program memory data accesses and data memory accesses to and from the
register file(s) occur on the PM data (PMD) bus and DM data (DMD)
bus, respectively. One PMD bus access for each processing element and/or
one DMD bus access for each processing element can occur in one cycle.
Transfers between the register files and the DMD or PMD buses can
move up to 64 bits of valid data on each bus.
Note that 16 data registers are sufficient to store the intermediate result of
a FFT radix-4 butterfly stage.
the two processing elements. Identical instructions execute on the PEx and
PEy units; the difference is the data. The data registers for PEy operations
are identified (implicitly) from the PEx registers in the instruction. This
implicit relationship between PEx and PEy data registers corresponds to
the complementary register pairs in Table 2-3.
R0 R1 S0 S1
R2 R3 S2 S3
R4 R5 S4 S5
R6 R7 S6 S7
R8 R9 S8 S9
1 For fixed-point operations, the prefixes are Rx (PEx) or Sx (PEy). For floating-point operations,
the prefixes are Fx (PEx) or SFx (PEy)
R7 <-> S7;
R2 <-> S0;
R1 = MODE1;
R1 = BSET R1 by 21; /* sets PEYEN bit */
R1 = BSET R1 by 24; /* sets CBUFEN bit */
MODE1 = R1;
R1 = dm(SYSCTL);
R1 = BSET R1 by 11; /* sets IMDW2 bit 11 */
R1 = BSET R1 by 12; /* sets IMDW3 bit 12 */
dm(SYSCTL) = R1;
BTST R1 by 11; /* clears SZ bit */
IF SZ jump func;
BTST R1 by 12; /* clears SZ bit */
IF SZ jump func;
The core has four user status registers (USTAT4–1) also classified as system
registers but for general-purpose use. These registers allow flexible manip-
ulation/testing of single or multiple individual bits in a register without
affecting neighbor bits as shown in the following example.
USTAT1= dm(SYSCTL);
BIT SET USTAT1 IMDW2|IMDW3; /* sets bits 12-11 */
dm(SYSCTL)=USTAT1;
USTAT1= dm(SYSCTL);
BIT TST USTAT1 IMDW2|IMDW3; /* test bits 12-11 */
PX2 PX1 PX
31 0 31 0 31 0 31 0
The USTAT4-1 and PX2-1 registers allow load and store operations from
memory. However, direct computations using universal registers is not
supported and therefore a data move to the data register is required.
The alignment of PX1 and PX2 within PX appears in Figure 2-2. The com-
bined PX register is an universal register (UREG) that is accessible for
register-to-register or memory-to-register transfers.
PX to DREG Transfers
The PX register to data register transfers are either 40-bit transfers for the
combined PX or 32-bit transfers for PX1 or PX2. Figure 2-2 shows the bit
alignment and gives an example of instructions for register-to-register
transfers. shows that during a transfer between PX1 or PX2 and a data regis-
ter (Dreg), the bus transfers the upper 32 bits of the register file and
zero-fills the eight least significant bits (LSBs). During a transfer between
the combined PX register and a register file, the bus transfers the upper 40
bits of PX and zero-fills the lower 24 bits.
All transfers between the PX register (or any other internal register or
memory) and any I/O processor register are 32-bit transfers (least
PX to Memory Transfers
The PX register-to-internal memory transfers over the DMD or PMD bus
are either 48-bit transfers for the combined PX or 32-bit transfers (on bits
31-0 of the bus) for PX1 or PX2. Figure 2-3 shows these transfers.
Figure 2-3 also shows that during a transfer between PX1 or PX2 and inter-
nal memory, the bus transfers the lower 32 bits of the register. During a
transfer between the combined PX register and internal memory, the bus
transfers the upper 48 bits of PX and zero-fills the lower 16 bits.
PX to Memory LW Transfers
Figure 2-4 shows the transfer size between PX and internal memory over
the PMD or DMD bus when using the long word (LW) option.
The LW notation in Figure 2-4 shows an important feature of PX regis-
ter-to-internal memory transfers over the PM or DM data bus for the
combined PX register. The PX register transfers to memory are 48-bit
(three column) transfers on bits 63-16 of the PM or DM data bus, unless a
long word transfer is used, or the transfer is forced to be 64-bit (four col-
umn) with the LW (long word) mnemonic.
The LW mnemonic affects data accesses that use the NW (normal word)
addresses irrespective of the settings of the PEYEN (processor element Y
enable) and IMDWx (internal memory data width) bits.
PX = PM (0xB8000)(LW);
DM (LW) or PM (LW)
Data Bus Transfer
64 bits
63 31 0
64 bits
63 31 0
Combined PX
I0 = 0X4F800;
M0 = 0X1;
L0 = 0x0;
DM(I0,M0) = 0xabbaabba;
Operating Modes
The following sections detail the operation of the register files.
Note that there is a one cycle latency from the time when writes are
made to the register until an alternate register set can be
MODE1
accessed.
The alternate register sets for data and results are shown in Figure 2-5. For
more information on alternate data address generator registers, see “Alter-
nate (Secondary) DAG Registers” on page 6-28. Bits in the MODE1 register
can activate independent alternate data register sets: the lower half (R0–
R7) and the upper half (R8–R15). To share data between contexts, a pro-
gram places the data to be shared in one half of either the current
processing element’s register file or the opposite processing element’s reg-
ister file and activates the alternate register set of the other half. For
information on how to activate alternate data registers, see the description
of the MODE1 register below. The register files consist of a primary set of 16
x 40-bit registers and an alternate set of 16 x 40-bit registers.
RF DATA RF
Rx/Fx SWAP Sx/SFx
PEx PEy
16x40-BIT 16x40-BIT
R0 R8 S0 S8
R1 R9 S1 S9
R2 R10 S2 S10
R3 R11 S3 S11
R4 R12 S4 S12
R5 R13 S5 S13
R6 R14 S6 S14
R7 R15 S7 S15
AVAILABLE REGISTERS-SISD MODE PEx UNIT AVAILABLE REGISTERS-SIMD MODE PEy UNIT
USTAT3 USTAT4
PX1 PX2
• Transfers between DAG and other system registers and the PX reg-
ister as shown in the following example:
I0 = PX; /* Moves PX1 to I0 */
PX = I0; /* Loads both PX1 and PX2 with I0 */
LCNTR = PX; /* Loads LCNTR with PX1 */
PX = PC; /* Loads both PX1 and PX2 with PC */
PX = USTAT1;
PX1 PX2
32 bits 32 bits
31 0 31 0
32 bits 32 bits
31 0 31 0
USTAT1 USTAT2