Documente Academic
Documente Profesional
Documente Cultură
v 6.1
User's Guide
Preface.............................................................................................................................. 11
1 Introduction to the Software Development Tools .......................................................... 15
1.1 Software Development Tools Overview ................................................................................. 16
1.2 C/C++ Compiler Overview ................................................................................................ 17
1.2.1 ANSI/ISO Standard ............................................................................................... 17
1.2.2 Output Files ........................................................................................................ 18
1.2.3 Compiler Interface ................................................................................................. 18
1.2.4 Utilities .............................................................................................................. 18
List of Figures
1-1 TMS320C6000 Software Development Flow ........................................................................... 16
3-1 Compiling a C/C++ Program With Optimization ........................................................................ 54
3-2 Software-Pipelined Loop .................................................................................................. 56
4-1 4-Bank Interleaved Memory ............................................................................................. 103
4-2 4-Bank Interleaved Memory With Two Memory Spaces............................................................. 103
7-1 Char and Short Data Storage Format .................................................................................. 156
7-2 32-Bit Data Storage Format ............................................................................................. 156
7-3 40-Bit Data Storage Format Signed 40-bit long ...................................................................... 157
7-4 Unsigned 40-bit long ..................................................................................................... 157
7-5 64-Bit Data Storage Format Signed 64-bit long ...................................................................... 157
7-6 Unsigned 64-bit long ..................................................................................................... 158
7-7 Double-Precision Floating-Point Data Storage Format .............................................................. 158
7-8 Bit-Field Packing in Big-Endian and Little-Endian Formats ......................................................... 160
7-9 Register Argument Conventions ........................................................................................ 163
7-10 Format of Initialization Records in the .cinit Section ................................................................. 184
7-11 Format of Initialization Records in the .pinit Section ................................................................. 185
7-12 Autoinitialization at Run Time ........................................................................................... 186
7-13 Initialization at Load Time ............................................................................................... 187
8-1 Interaction of Data Structures in I/O Functions ....................................................................... 194
8-2 The First Three Streams in the Stream Table ........................................................................ 194
List of Tables
2-1 Options That Control the Compiler ...................................................................................... 21
2-2 Options That Control Symbolic Debugging and Profiling ............................................................. 22
2-3 Options That Change the Default File Extensions ..................................................................... 22
2-4 Options That Specify Files ................................................................................................ 22
2-5 Options That Specify Directories ......................................................................................... 22
2-6 Options That Are Machine-Specific ...................................................................................... 23
2-7 Options That Control Parsing ............................................................................................. 24
2-8 Parser Options That Control Preprocessing ............................................................................ 24
2-9 Parser Options That Control Diagnostics ............................................................................... 25
2-10 Options That Control Optimization ....................................................................................... 25
2-11 Options That Control the Assembler ..................................................................................... 26
2-12 Options That Control the Linker .......................................................................................... 26
2-13 Compiler Backwards-Compatibility Options Summary ................................................................ 35
2-14 C6000 Predefined Macro Names ........................................................................................ 38
2-15 Raw Listing File Identifiers ................................................................................................ 45
2-16 Raw Listing File Diagnostic Identifiers ................................................................................... 45
3-1 Options That You Can Use With --opt_level=3 ......................................................................... 64
3-2 Selecting a File-Level Optimization Option ............................................................................. 64
3-3 Selecting a Level for the --gen_opt_info Option ........................................................................ 64
3-4 Selecting a Level for the --call_assumptions Option................................................................... 65
3-5 Special Considerations When Using the --call_assumptions Option ................................................ 66
4-1 Options That Affect the Assembly Optimizer ........................................................................... 84
4-2 Assembly Optimizer Directives Summary ............................................................................... 89
5-1 Initialized Sections Created by the Compiler ......................................................................... 121
5-2 Uninitialized Sections Created by the Compiler ...................................................................... 121
6-1 TMS320C6000 C/C++ Data Types ..................................................................................... 125
6-2 Valid Control Registers................................................................................................... 126
6-3 GCC Extensions Supported ............................................................................................. 147
• In syntax descriptions, the instruction, command, or directive is in a bold typeface and parameters are
in an italic typeface. Portions of a syntax that are in bold should be entered as shown; portions of a
syntax that are in italics describe the type of information that should be entered.
• Square brackets ( [ and ] ) identify an optional parameter. If you use an optional parameter, you specify
the information within the brackets. Unless the square brackets are in the bold typeface, do not enter
the brackets themselves. The following is an example of a command that has an optional parameter:
cl6x [options] [filenames] [--run_linker [link_options] [object files]]
• Braces ( { and } ) indicate that you must choose one of the parameters within the braces; you do not
enter the braces themselves. This is an example of a command with braces that are not included in the
actual syntax but indicate that you must specify either the --rom_model or --ram_model option:
• In assembler syntax statements, column 1 is reserved for the first character of a label or symbol. If the
label or symbol is optional, it is usually not shown. If it is a required parameter, it is shown starting
against the left margin of the box, as in the example below. No instruction, command, directive, or
parameter, other than a symbol or label, can begin in column 1.
symbol .usect "section name", size in bytes[, alignment]
• Some directives can have a varying number of parameters. For example, the .byte directive can have
up to 100 parameters. This syntax is shown as [, ..., parameter].
• The TMS320C6200 core is referred to as C6200. The TMS320C6400 core is referred to as C6400.
The TMS320C6700 core is referred to as C6700. TMS320C6000 and C6000 can refer to either C6200,
C6400, C6400+, C6700, C6700+, or C6740.
Related Documentation
You can use the following books to supplement this user's guide:
ANSI X3.159-1989, Programming Language - C (Alternate version of the 1989 C Standard), American
National Standards Institute
C: A Reference Manual (fourth edition), by Samuel P. Harbison, and Guy L. Steele Jr., published by
Prentice Hall, Englewood Cliffs, New Jersey
DWARF Debugging Information Format Version 3, DWARF Debugging Information Format Workgroup,
Free Standards Group, 2005 (http://dwarfstd.org)
ISO/IEC 14882-1998, International Standard - Programming Languages - C++ (The C++ Standard),
International Organization for Standardization
ISO/IEC 9899:1989, International Standard - Programming Languages - C (The 1989 C Standard),
International Organization for Standardization
ISO/IEC 9899:1999, International Standard - Programming Languages - C (The C Standard),
International Organization for Standardization
Programming Embedded Systems in C and C++, by Michael Barr, Andy Oram (Editor), published by
O'Reilly & Associates; ISBN: 1565923545, February 1999
Programming in C, Steve G. Kochan, Hayden Book Company
The Annotated C++ Reference Manual, Margaret A. Ellis and Bjame Stroustrup, published by
Addison-Wesley Publishing Company, Reading, Massachusetts, 1990
The C Programming Language (second edition), by Brian W. Kernighan and Dennis M. Ritchie,
published by Prentice-Hall, Englewood Cliffs, New Jersey, 1988
The C++ Programming Language (second edition), Bjame Stroustrup, published by Addison-Wesley
Publishing Company, Reading, Massachusetts, 1990
Tool Interface Standards (TIS) DWARF Debugging Information Format Specification Version 2.0,
TIS Committee, 1995
The TMS320C6000 is supported by a set of software development tools, which includes an optimizing
C/C++ compiler, an assembly optimizer, an assembler, a linker, and assorted utilities.
This chapter provides an overview of these tools and introduces the features of the optimizing C/C++
compiler. The assembly optimizer is discussed in Chapter 4. The assembler and link step are discussed in
detail in the TMS320C6000 Assembly Language Tools User's Guide.
C/C++
source
files
Macro
source C/C++ Linear
files compiler assembly
Assembler Assembly
Archiver
source optimizer
Macro Assembly
library Assembler optimized
file
Debugging
Library-build tools
Object
Archiver process
files
Run-time-
Library of support
object library
files Linker
Executable
object file
Hex-conversion
utility
1.2.4 Utilities
The following features pertain to the compiler utilities:
• Library-build process
The library-build process lets you custom-build object libraries from source for any combination of
run-time models. For more information, see Section 8.5.
• C++ name demangler
The C++ name demangler (dem6x) is a debugging aid that translates each mangled name it detects to
its original name found in the C++ source code. For more information, see Chapter 9.
The compiler translates your source program into code that the TMS320C6000 can execute. Source code
must be compiled, assembled, and linked to create an executable object file. All of these steps are
executed at once by using the compiler.
--compile_only Suppresses the linker and overrides the --run_linker option, which
specifies linking. The --compile_only option's short form is -c. Use this
option when you have --run_linker specified in the C6X_C_OPTION
environment variable and you do not want to link. See Section 5.1.3.
--define_name=name[=def] Predefines the constant name for the preprocessor. This is equivalent to
inserting #define name def at the top of each C source file. If the
optional[=def] is omitted, the name is set to 1. The --define_name
option's short form is -D.
If you want to define a quoted string and keep the quotation marks, do
one of the following:
• For Windows®, use --define_name=name="\"string def\"". For
example, --define_name=car="\"sedan\""
• For UNIX®, use --define_name=name='"string def"'. For example,
--define_name=car='"sedan"'
• For Code Composer Studio, enter the definition in a file and include
that file with the --cmd_file option.
--exceptions Enables support of C++ exception handling. The compiler will generate
code to handle try/catch/throw statements in C++ code. See Section 6.5.
--fp_mode={relaxed|strict} Supports relaxed floating-point mode. In this mode, if the result of a
double-precision floating-point expression is assigned to a
single-precision floating-point or an integer, the computations in the
expression are converted to single-precision computations. Any
double-precision constants in the expression are also converted to
single-precision if they can be correctly represented as single-precision
constants. This behavior does not conform with ISO; but it results in
faster code, with some loss in accuracy. In the following example, where
N is a number, iN=integer variable, fN=float variable, dN=double
variable:
--keep_asm Retains the assembly language output from the compiler or assembly
optimizer. Normally, the compiler deletes the output assembly language
file after assembly is complete. The --keep_asm option's short form is -k.
--preinclude=filename Includes the source code of filename at the beginning of the compilation.
This can be used to establish standard macro definitions. The filename is
searched for in the directories on the include search list. The files are
processed in the order in which they were specified.
--quiet Suppresses banners and progress information from all the tools. Only
source filenames and error messages are output. The --quiet option's
short form is -q.
--run_linker Runs the linker on the specified object files. The --run_linker option and
its parameters follow all other options on the command line. All
arguments that follow --run_linker are passed to the linker. The
--run_linker option's short form is -z. See Section 5.1.
--sat_reassoc={on|off} Enables or disables the reassociation of saturating arithmetic.
--skip_assembler Compiles only. The specified source files are compiled but not
assembled or linked. The --skip_assembler option's short form is -n. This
option overrides --run_linker. The output is assembly language output
from the compiler.
--src_interlist Invokes the interlist feature, which interweaves optimizer comments or
C/C++ source with assembly source. If the optimizer is invoked
(--opt_level=n option), optimizer comments are interlisted with the
assembly language output of the compiler, which may rearrange code
significantly. If the optimizer is not invoked, C/C++ source statements are
interlisted with the assembly language output of the compiler, which
allows you to inspect the code generated for each C/C++ statement. The
--src_interlist option implies the --keep_asm option. The --src_interlist
option's short form is -s.
--tool_version Prints the version number for each tool in the compiler. No compiling
occurs.
--undefine_name=name Undefines the predefined constant name. This option overrides any
--define_name options for the specified constant. The --undefine_name
option's short form is -U.
--verbose Displays progress information and toolset version while compiling.
Resets the --quiet option.
--symdebug:profile_coff Adds the necessary debug directives to the object file which are
needed by the profiler to allow function level profiling with minimal
impact on optimization (when used). Using --symdebug:coff may
hinder some optimizations to ensure that debug ability is
maintained, while this option will not hinder optimization.
You can set breakpoints and profile on function-level boundaries in
Code Composer Studio, but you cannot single-step through code
as with full debug ability.
--symdebug:skeletal Generates as much symbolic debugging information as possible
without hindering optimization. Generally, this consists of
global-scope information only. This option reflects the default
behavior of the compiler.
See Section 2.3.11 for a list of deprecated symbolic debugging options.
For information about how you can alter the way that the compiler interprets individual filenames, see
Section 2.3.6. For information about how you can alter the way that the compiler interprets and names the
extensions of assembly source and object files, see Section 2.3.9.
You can use wildcard characters to compile or assemble multiple files. Wildcard specifications vary by
system; use the appropriate form listed in your operating system manual. For example, to compile all of
the files in a directory with the extension .cpp, enter the following:
cl6x *.cpp
For example, if you have a C source file called file.s and an assembly language source file called assy,
use the --asm_file and --c_file options to force the correct interpretation:
cl6x --c_file=file.s --asm_file=assy
The following example assembles the file fit.rrr and creates an object file named fit.o:
cl6x --asm_extension=.rrr --obj_extension=.o fit.rrr
The period (.) in the extension is optional. You can also write the example above as:
cl6x --asm_extension=rrr --obj_extension=o fit.rrr
--list_directory=directory Specifies the destination directory for assembly listing files and
cross-reference listing files. The default is to use the same directory as
the object file directory. For example:
cl6x --list_directory=d:\listing
--copy_file=filename Copies the specified file for the assembly module; acts like a .copy
directive. The file is inserted before source file statements. The copied file
appears in the assembly listing files.
--cross_reference Produces a symbolic cross-reference in the listing file.
--include_file=filename Includes the specified file for the assembly module; acts like a .include
directive. The file is included before source file statements. The included
file does not appear in the assembly listing files.
--machine_regs Displays reg operands as machine registers in the assembly file for
debugging purposes.
--no_compress Prevents compression in the assembler. For C6400+ and C6740,
compression is the changing of 32-bit instructions to 16-bit instructions,
where possible/profitable.
--no_reload_errors Turns off all reload-related loop buffer error messages in assembly code
for C6400+ and C6740.
--output_all_syms Puts labels in the symbol table. Label definitions are written to the COFF
symbol table for use with symbolic debugging.
--syms_ignore_case Makes letter case insignificant in the assembly language source files. For
example, --syms_ignore_case makes the symbols ABC and abc
equivalent. If you do not use this option, case is significant (this is the
default).
Additionally, the --symdebug:profile_coff option has been added to enable function-level profiling of
optimized code with symbolic debugging using the STABS debugging format (the --symdebug:coff or -gt
option).
Since C6400+ and C6740 produce only DWARF debug information, the -gp, -gt/--symdebug:coff, and
--symdebug:profile_coff options are not supported for C6400+ and C6740.
Environment variable options are specified in the same way and have the same meaning as they do on
the command line. For example, if you want to always run quietly (the --quiet option), enable C/C++
source interlisting (the --src_interlist option), and link (the --run_linker option) for Windows, set up the
C6X_C_OPTION environment variable as follows:
In the following examples, each time you run the compiler, it runs the linker. Any options following
--run_linker on the command line or in C6X_C_OPTION are passed to the linker. Thus, you can use the
C6X_C_OPTION environment variable to specify default compiler and linker options and then specify
additional compiler and linker options on the command line. If you have set --run_linker in the environment
variable and want to compile only, use the compiler --compile_only option. These additional examples
assume C6X_C_OPTION is set as shown above:
cl6x *c ; compiles and links
cl6x --compile_only *.c ; only compiles
cl6x *.c --run_linker lnk.cmd ; compiles and links using a command file
cl6x --compile_only *.c --run_linker lnk.cmd
; only compiles (--compile_only overrides --run_linker)
For details on compiler options, see Section 2.3. For details on linker options, see Section 5.2.
The pathnames are directories that contain input files. The pathnames must follow these constraints:
• Pathnames must be separated with a semicolon.
• Spaces or tabs at the beginning or end of a path are ignored. For example, the space before and after
the semicolon in the following is ignored:
set C6X_C_DIR=c:\path\one\to\tools ; c:\path\two\to\tools
• Spaces and tabs are allowed within paths to accommodate Windows directories that contain spaces.
For example, the pathnames in the following are valid:
set C6X_C_DIR=c:\first path\to\tools;d:\second path\to\tools
The environment variable remains set until you reboot the system or reset the variable by entering:
Carefully organizing the include directives across multiple files so that their header files maximize common
usage can increase the compile time savings when using precompiled headers.
A precompiled header file is produced only if the header stop point and the code prior to it meet certain
requirements.
(1)
Specified by the ISO standard
You can use the names listed in Table 2-14 in the same manner as any other defined name. For example,
printf ( "%s %s" , __TIME__ , __DATE__);
The table below shows how to invoke the compiler. Select the command for your operating system:
Operating System Enter
UNIX cl6x --include_path=tools/files source.c
Windows cl6x --include_path=c:\tools\files source.c
2.6.8 Generating a List of Files Included With the #include Directive (--preproc_includes
Option)
The --preproc_includes option performs preprocessing only, but instead of writing preprocessed output,
writes a list of files included with the #include directive. If you do not supply an optional filename, the list is
written to a file with the same name as the source file but with a .pp extension.
By default, the source line is omitted. Use the --verbose_diagnostics compiler option to enable the display
of the source line and the error position. The above example makes use of this option.
The message identifies the file and line involved in the diagnostic, and the source line itself (with the
position indicated by the ^ character) follows the message. If several diagnostics apply to one source line,
each diagnostic has the form shown; the text of the source line is displayed several times, with an
appropriate position indicated each time.
Long messages are wrapped to additional lines, when necessary.
You can use the --display_error_number command-line option to request that the diagnostic's numeric
identifier be included in the diagnostic message. When displayed, the diagnostic identifier also indicates
whether the diagnostic can have its severity overridden on the command line. If the severity can be
overridden, the diagnostic identifier includes the suffix -D (for discretionary); otherwise, no suffix is
present. For example:
"Test_name.c", line 7: error #64-D: declaration does not declare anything
struct {};
^
"Test_name.c", line 9: error #77: this declaration has no storage class or type specifier
xxxxx;
^
Because an error is determined to be discretionary based on the error severity associated with a specific
context, an error can be discretionary in some cases and not in others. All warnings and remarks are
discretionary.
For some messages, a list of entities (functions, local variables, source files, etc.) is useful; the entities are
listed following the initial error message:
"test.c", line 4: error: more than one instance of overloaded function "f"
matches the argument list:
function "f(int)"
function "f(float)"
argument types are: (double)
f(1.5);
^
In some cases, additional context information is provided. Specifically, the context information is useful
when the front end issues a diagnostic while doing a template instantiation or while generating a
constructor, destructor, or assignment operator function. For example:
"test.c", line 7: error: "A::A()" is inaccessible
B x;
^
detected during implicit generation of "B::B()" at line 7
Without the context information, it is difficult to determine to what the error refers.
If you invoke the compiler with the --quiet option, this is the result:
"err.c", line 9: warning: statement is unreachable
"err.c", line 12: warning: statement is unreachable
Because it is standard programming practice to include break statements at the end of each case arm to
avoid the fall-through condition, these warnings can be ignored. Using the --display_error_number option,
you can find out the diagnostic identifier for these warnings. Here is the result:
[err.c]
"err.c", line 9: warning #111-D: statement is unreachable
"err.c", line 12: warning #111-D: statement is unreachable
Next, you can use the diagnostic identifier of 111 as the argument to the --diag_remark option to treat this
warning as a remark. This compilation now produces no diagnostic messages (because remarks are
disabled by default).
Although this type of control is useful, it can also be extremely dangerous. The compiler often emits
messages that indicate a less than obvious problem. Be careful to analyze all diagnostics emitted before
using the suppression options.
The --gen_acp_raw option also includes diagnostic identifiers as defined in Table 2-16.
S One of the identifiers in Table 2-16 that indicates the severity of the diagnostic
filename The source file
line number The line number in the source file
column number The column number in the source file
diagnostic The message text for the diagnostic
Diagnostics after the end of file are indicated as the last line of the file with a column number of 0. When
diagnostic message text requires more than one line, each subsequent line contains the same file, line,
and column information but uses a lowercase version of the diagnostic identifier. For more information
about diagnostic messages, see Section 2.7.
/*****************************************************************************/
/* string.h vx.xx (Excerpted) */
/* Copyright (c) 1993-1999 Texas Instruments Incorporated */
/*****************************************************************************/
#ifdef _INLINE
#define _IDECL static inline
#else
#define _IDECL extern _CODE_ACCESS
#endif
#ifdef _INLINE
/****************************************************************************/
/* strlen */
/****************************************************************************/
static inline size_t strlen(const char *string)
{
size_t n = (size_t)-1;
const char *s = string - 1;
#endif
/****************************************************************************/
/* strlen */
/****************************************************************************/
#undef _INLINE
#include <string.h>
{
_CODE_ACCESS size_t strlen(cont char * string)
size_t n = (size_t)-1;
const char *s = string - 1;
RTS Library Files Are Not Built With the --interrupt_threshold Option
Note: The run-time-support library files provided with the compiler are not built with the interrupt
flexibility option. Please refer to the readme file to see how the run-time-support library files
were built for your release. See Section 8.5 to build your own run-time-support library files
with the interrupt flexibility option.
The --c_src_interlist option prevents the compiler from deleting the interlisted assembly language output
file. The output assembly file, function.asm, is assembled normally.
When you invoke the interlist feature without the optimizer, the interlist runs as a separate pass between
the code generator and the assembler. It reads both the assembly and C/C++ source files, merges them,
and writes the C/C++ statements into the assembly file as comments.
Using the --c_src_interlist option can cause performance and/or code size degradation.
Example 2-4 shows a typical interlisted assembly file.
For more information about using the interlist feature with the optimizer, see Section 3.13.
_main:
--entry_hook[=name] Enables entry hooks. If specified, the hook function is called name. Otherwise,
the default entry hook function name is __entry_hook.
--entry_param{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
--exit_hook[=name] Enables exit hooks. If specified, the hook function is called name. Otherwise,
the default exit hook function name is __exit_hook.
--exit_param{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
The presence of the hook options creates an implicit declaration of the hook function with the given
signature. If a declaration or definition of the hook function appears in the compilation unit compiled with
the options, it must agree with the signatures listed above.
In C++, the hooks are declared extern "C". Thus you can define them in C (or assembly) without being
concerned with name mangling.
Hooks can be declared inline, in which case the compiler tries to inline them using the same criteria as
other inline functions.
Entry hooks and exit hooks are independent. You can enable one but not the other, or both. The same
function can be used as both the entry and exit hook.
You must take care to avoid recursive calls to hook functions. The hook function should not call any
function which itself has hook calls inserted. To help prevent this, hooks are not generated for inline
functions, or for the hook functions themselves.
See Section 6.8.17 for information about the NO_HOOKS pragma.
The --remove_hooks_when_inlining option removes entry/exit hooks for functions that are auto-inlined by
the optimizer.
The compiler tools can perform many optimizations to improve the execution speed and reduce the size of
C and C++ programs by simplifying loops, software pipelining, rearranging statements and expressions,
and allocating variables into registers.
This chapter describes how to invoke different levels of optimization and describes which optimizations are
performed at each level. This chapter also describes how you can use the Interlist feature when
performing optimization and how you can profile or debug optimized code.
The easiest way to invoke optimization is to use the compiler program, specifying the --opt_level=n option
on the compiler command line. You can use -On to alias the --opt_level option. The n denotes the level of
optimization (0, 1, 2, and 3), which controls the type and degree of optimization.
• --opt_level=0 or -O0
– Performs control-flow-graph simplification
– Allocates variables to registers
– Performs loop rotation
– Eliminates unused code
– Simplifies expressions and statements
– Expands calls to functions declared inline
• --opt_level=1 or -O1
Performs all --opt_level=0 (-O0) optimizations, plus:
– Performs local copy/constant propagation
– Removes unused assignments
– Eliminates local common expressions
• --opt_level=2 or -O2
Performs all --opt_level=1 (-O1) optimizations, plus:
– Performs software pipelining (see Section 3.2)
– Performs loop optimizations
– Eliminates global common subexpressions
– Eliminates global unused assignments
– Converts array references in loops to incremented pointer form
– Performs loop unrolling
The optimizer uses --opt_level=2 (or -O2) as the default if you use --opt_level (-O) without an
optimization level.
• --opt_level=3 or -O3
Performs all --opt_level=2 (or -O2) optimizations, plus:
– Removes all functions that are never called
– Simplifies functions with return values that are never used
– Inlines calls to small functions
– Reorders function declarations; the called functions attributes are known when the caller is
optimized
– Propagates arguments into function bodies when all calls pass the same value in the same
argument position
– Identifies file-level variable characteristics
If you use --opt_level=3 (or -O3), see Section 3.6 and Section 3.7 for more information.
The levels of optimizations described above are performed by the stand-alone optimization pass. The
code generator performs several additional optimizations, particularly processor-specific optimizations. It
does so regardless of whether you invoke the optimizer. These optimizations are always enabled,
although they are more effective when the optimizer is used.
A1
B1 A2
Pipelined-loop prolog
C1 B2 A3
D1 C2 B3 A4
E1 D2 C3 B4 A5 Kernel
E2 D3 C4 B5
E3 D4 C5
Pipelined-loop epilog
E4 D5
E5
If you enter comments on instructions in your linear assembly input file, the compiler moves the comments
to the output file along with additional information. It attaches a 2-tuple <x, y> to the comments to specify
the iteration and cycle of the loop an instruction is on in the software pipeline. The zero-based number x
represents the iteration the instruction is on during the first execution of the loop kernel. The zero-based
number y represents the cycle that the instruction is scheduled on within a single iteration of the loop.
For more information about software pipelining, see the TMS320C6000 Programmer's Guide.
;*----------------------------------------------------------------------------*
The terms defined below appear in the software pipelining information. For more information on each
term, see the TMS320C6000 Programmer's Guide.
• Loop unroll factor. The number of times the loop was unrolled specifically to increase performance
based on the resource bound constraint in a software pipelined loop.
• Known minimum trip count. The minimum number of times the loop will be executed.
• Known maximum trip count. The maximum number of times the loop will be executed.
• Known max trip count factor. Factor that would always evenly divide the loops trip count. This
information can be used to possibly unroll the loop.
• Loop label. The label you specified for the loop in the linear assembly input file. This field is not
present for C/C++ code.
• Loop carried dependency bound. The distance of the largest loop carry path. A loop carry path
occurs when one iteration of a loop writes a value that must be read in a future iteration. Instructions
that are part of the loop carry bound are marked with the ^ symbol.
• Initiation interval (ii). The number of cycles between the initiation of successive iterations of the loop.
The smaller the initiation interval, the fewer cycles it takes to execute a loop.
• Resource bound. The most used resource constrains the minimum initiation interval. If four
instructions require a .D unit, they require at least two cycles to execute (4 instructions/2 parallel .D
units).
• Unpartitioned resource bound. The best possible resource bound values before the instructions in
the loop are partitioned to a particular side.
• Partitioned resource bound (*). The resource bound values after the instructions are partitioned.
• Resource partition. This table summarizes how the instructions have been partitioned. This
information can be used to help assign functional units when writing linear assembly. Each table entry
has values for the A-side and B-side registers. An asterisk is used to mark those entries that determine
the resource bound value. The table entries represent the following terms:
– .L units is the total number of instructions that require .L units.
– .S units is the total number of instructions that require .S units.
– .D units is the total number of instructions that require .D units.
– .M units is the total number of instructions that require .M units.
– .X cross paths is the total number of .X cross paths.
– .T address paths is the total number of address paths.
– Long read path is the total number of long read port paths.
– Long write path is the total number of long write port paths.
– Logical ops (.LS) is the total number of instructions that can use either the .L or .S unit.
– Addition ops (.LSD) is the total number of instructions that can use either the .L or .S or .D unit
• Bound(.L .S .LS). The resource bound value as determined by the number of instructions that use the
.L and .S units. It is calculated with the following formula:
Bound(.L .S .LS ) = ceil((.L + .S + .LS) / 2)
• Bound(.L .S .D .LS .LSD). The resource bound value as determined by the number of instructions that
use the .D, .L, and .S units. It is calculated with the following formula:
Bound(.L .S .D .LS .SLED) = ceil((.L + .S + .D + .LS + .LSD) / 3)
• Minimum required memory pad. The number of bytes that are read if speculative execution is
enabled. See Section 3.2.3 for more information.
This example shows that on cycle 0 (first execute packet) of the loop kernel, registers A0, A1, A2, A6, A7,
A8, A9, B0, B1, B2, B4, B5, B6, B7, B8, and B9 are all live during this cycle.
3.2.3 Collapsing Prologs and Epilogs for Improved Performance and Code Size
When a loop is software pipelined, a prolog and epilog are generally required. The prolog is used to pipe
up the loop and epilog is used to pipe down the loop.
In general, a loop must execute a minimum number of iterations before the software-pipelined version can
be safely executed. If the minimum known trip count is too small, either a redundant loop is added or
software pipelining is disabled. Collapsing the prolog and epilog of a loop can reduce the minimum trip
count necessary to safely execute the pipelined loop.
Collapsing can also substantially reduce code size. Some of this code size growth is due to the redundant
loop. The remainder is due to the prolog and epilog.
The prolog and epilog of a software-pipelined loop consists of up to p-1 stages of length ii, where p is the
number of iterations that are executed in parallel during the steady state and ii is the cycle time for the
pipelined loop body. During prolog and epilog collapsing the compiler tries to collapse as many stages as
possible. However, over-collapsing can have a negative performance impact. Thus, by default, the
compiler attempts to collapse as many stages as possible without sacrificing performance. When the
--opt_for_space=0 or --opt_for_space=1 options are invoked, the compiler increasingly favors code size
over performance.
If the minimum safe trip count is greater than the minimum known trip count, use of --speculate_loads is
highly recommended, not only for code size, but for performance.
When using --speculate_loads, you must ensure that potentially speculated loads will not cause illegal
reads. This can be done by padding the data sections and/or stack, as needed, by the required memory
pad in both directions. The required memory pad for a given software-pipelined loop is also provided in the
comment block for that loop.
;* Minimum required memory pad : 8 bytes
For safety, the example loop requires that array data referenced within this loop be preceded and followed
by a pad of at least 5 bytes. This pad can consist of other program data. The pad will not be modified. In
many cases, the threshold value (namely, the minimum value of the argument to --speculate_loads that is
needed to achieve a particular schedule and level of collapsing) is the same as the pad. However, when it
is not, the comment block will also include the minimum threshold value. In the case of this loop, the
threshold value must be at least 7 to achieve this level of collapsing.
However, you need to consider whether a larger threshold value would facilitate additional collapsing. This
information is also provided, if applicable. For example, in the above comment block, a threshold value of
14 might facilitate further collapsing.
When the C6000 tools cannot determine the trip count for a loop, then by default two loops and control
logic are generated. The first loop is not pipelined, and it executes if the run-time trip count is less than the
loop's minimum trip count. The second loop is the software pipelined loop, and it executes when the
run-time trip count is greater than or equal to the minimum trip count. At any given time, one of the loops
is a redundant loop. For example:
foo(N) /* N is the trip count */
{
for (I=0; I <; N; I++) /* I is the trip counter */
}
After finding a software pipeline for the loop, the compiler transforms foo() as below, assuming the
minimum trip count for the loop is 3. Two versions of the loop would be generated and the following
comparison would be used to determine which version should be executed:
foo(N)
{
if (N <; 3)
{
for (I=0; I <; N; I++) /* Unpipelined version */
}
else
}
for (I=0; I <; N; I++) /* Pipelined version */
}
}
foo(50); /* Execute software pipelined loop */
foo(2); /* Execute loop (unpipelined)*/
You may be able to help the compiler avoid producing redundant loops with the use of
--program_level_compile --opt_level=3 (see Section 3.7) or the use of the MUST_ITERATE pragma (see
Section 6.8.15).
3.4 Utilizing the Loop Buffer Using SPLOOP on C6400+ and C6740
The C6400+ and C6740 ISA has a loop buffer which improves performance and reduces code size for
software pipelined loops. The loop buffer provides the following benefits:
• Code size. A single iteration of the loop is stored in program memory.
• Interrupt latency. Loops executing out of the loop buffer are interruptible.
• Improves performance for loops with unknown trip counts and eliminates redundant loops.
• Reduces or eliminates the need for speculated loads.
• Reduces power usage.
You can tell that the compiler is using the loop buffer when you find SPLOOP(D/W) at the beginning of a
software pipelined loop followed by an SPKERNEL at the end. Refer to the TMS320C6400/C6400+ CPU
and Instruction Set Reference Guide for information on SPLOOP.
When the --opt_for_space option is not used, the compiler will not use the loop buffer if it can find a faster
software pipelined loop without it. When using the --opt_for_space option, the compiler will use the loop
buffer when it can.
The compiler does not generate code for the loop buffer (SPLOOP/D/W) when any of the following occur:
• ii (initiation interval) > 14 cycles
• Dynamic length (of a single iteration) > 48 cycles
• The optimizer completely unrolls the loop
• Code contains elements that disqualify normal software pipelining (call in loop, complex control code in
loop, etc.). See the TMS320C6000 Programmer's Guide for more information.
In certain circumstances, the compiler reverts to a different --call_assumptions level from the one you
specified, or it might disable program-level optimization altogether. Table 3-5 lists the combinations of
--call_assumptions levels and conditions that cause the compiler to revert to other --call_assumptions
levels.
In some situations when you use --program_level_compile and --opt_level=3, you must use a
--call_assumptions option or the FUNC_EXT_CALLED pragma. See Section 3.7.2 for information about
these situations.
The run-time-support function and pdd6x append to their respective output files and do not overwrite
them. This enables collection of data sets from multiple runs of the application.
You can specify two environment variables to control the destination of the code-coverage information file.
• The TI_COVDIR environment variable specifies the directory where the code-coverage file should be
generated. The default is the directory where the compiler is invoked.
• The TI_COVDATA environment variable specifies the name of the code-coverage data file generated
by the compiler. the default is filename.csv where filename is the base-name of the file being compiled.
For example, if foo.c is being compiled, the default code-coverage data file name is foo.csv.
If the code-coverage data file already exists, the compiler appends the new dataset at the end of the file.
The full filename, function name, and comments appear within quotation marks ("). For example:
"/some_dir/zlib/c64p/deflate.c","_deflateInit2_",216,5,1,"( strm->zalloc )"
Other tools, such as a spreadsheet program, can be used to format and view the code coverage data.
API
TI_start_pprof_collection() Clears the profile counters to file
TI_stop_pprof_collection() Writes out all profile counters to file
PPHDNL Device driver handle for low-level C I/O based driver for writing out profile
data from a target program.
Files Created
*.pdat Profile data file, which is created by executing an instrumented program and
used as input to the profile data decoder
*.prf Profiling feedback file, which is created by the profile data decoder and
used as input to the re-compilation step
3.9.1 Use the --aliased_variables Option When Certain Aliases are Used
The compiler, when invoked with optimization, assumes that any variable whose address is passed as an
argument to a function is not subsequently modified by an alias set up in the called function. Examples
include:
• Returning the address from a function
• Assigning the address to a global variable
If you use aliases like this in your code, you must use the --aliased_variables option when you are
optimizing your code. For example, if your code is similar to this, use the --aliased_variables option:
int *glob_ptr;
g()
{
int x = 1;
int *p = f(&x);
*p = 5; /* p aliases x */
*glob_ptr = 10; /* glob_ptr aliases x */
h(x);
}
p[2] = 5;
}
• The --no_bad_aliases option indicates that indirect references on two pointers, P and Q, are not
aliases if P and Q are distinct parameters of the same function activated by the same call at run time.
If you have code similar to the following example, do not use the --no_bad_aliases option:
g(int j)
{
int a[20];
int g()
{
return f(5, -4); /* -4 is a negative index */
return f(0, 96); /* 96 exceeds 20 as an index */
return f(4, 16); /* This one is OK */
}
When you use the --c_src_interlist and --optimizer_interlist options with optimization, the compiler inserts
its comments and the interlist feature runs before the assembler, merging the original C/C++ source into
the assembly file.
Example 3-3 shows the function from Example 3-2 compiled with the optimization (--opt_level=2) and the
--c_src_interlist and --optimizer_interlist options. The assembly file contains compiler comments and C
source interlisted with assembly code.
Example 3-2. The Function From Example 2-4 Compiled with the --opt_level=2 and
--optimizer_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
.line 3
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
.line 4
ZERO .L1 A4
.line 5
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
Example 3-3. The Function From Example 2-4 Compiled with the --opt_level=2, --optimizer_interlist,
and --c_src_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
;------------------------------------------------------------------------------
; 5 | printf("Hello, world\n");
;------------------------------------------------------------------------------
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
;------------------------------------------------------------------------------
; 6 | return 0;
;------------------------------------------------------------------------------
ZERO .L1 A4
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
The assembly optimizer allows you to write assembly code without being concerned with the pipeline
structure of the C6000 or assigning registers. It accepts linear assembly code, which is assembly code
that may have had register-allocation performed and is unscheduled. The assembly optimizer assigns
registers and uses loop optimizations to turn linear assembly into highly parallel assembly.
Profile
Efficient Yes
Complete
enough?
No
Refine C/C++ code
Phase 2:
Refine C/C++ Compile
code
Profile
Efficient Yes
Complete
enough?
No
Yes
More C/C++
optimizations?
No
Write/refine linear assembly
Phase 3:
Write linear Assembly optimize
assembly
Profile
No
Efficient
enough?
Yes
Complete
• TMS320C6000 instructions
When you are writing your linear assembly, your code does not need to indicate the following:
– Pipeline latency
– Register usage
– Which unit is being used
As with other code generation tools, you might need to modify your linear assembly code until you are
satisfied with its performance. When you do this, you will probably want to add more detail to your
linear assembly. For example, you might want to partition or assign some registers.
The C6000 assembly optimizer reads up to 200 characters per line. Any characters beyond 200 are
truncated. Keep the operational part of your source statements (that is, everything other than comments)
less than 200 characters in length for correct assembly. Your comments can extend beyond the character
limit, but the truncated portion is not included in the .asm file.
Follow these guidelines in writing linear assembly code:
• All statements must begin with a label, a blank, an asterisk, or a semicolon.
• Labels are optional; if used, they must begin in column 1.
• One or more blanks must separate each field. Tab characters are interpreted as blanks. You must
separate the operand list from the preceding field with a blank.
• Comments are optional. Comments that begin in column 1 can begin with an asterisk or a semicolon (*
or ;) but comments that begin in any other column must begin with a semicolon.
• If you set up a conditional instruction, the register must be surrounded by square brackets.
• A mnemonic cannot begin in column 1 or it is interpreted as a label.
Refer to the TMS320C6000 Assembly Language Tools User's Guide for information on the syntax of
C6000 instructions, including conditional instructions, labels, and operands.
loop: .trip 25
LDW *a_0++[2], val0 ; load a[0-1]
LDW *b_0++[2], val1 ; load b[0-1]
MPY val0, val1, prod1 ; a[0] * b[0]
MPYH val0, val1, prod2 ; a[1] * b[1]
ADD prod1, prod2, tmp0 ; sum0 += (a[0]*b[0]) +
ADD tmp0, sum0, sum0 ; (a[1]*b[1])
.return sum
.endproc
int sum, I;
The old method of partitioning registers indirectly by partitioning instructions can still be used. Side and
functional unit specifiers can still be used on instructions. However, functional unit specifiers (.L/.S/.D/.M)
are ignored. Side specifiers are translated into partitioning constraints on the corresponding symbolic
names, if any. For example:
MV .1 x, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
There are several ways to enter the unit specifier filed in linear assembly. Of these, only the specific
register side information is recognized and used:
• You can specify the particular functional unit (for example, .D1).
• You can specify the .D1 or .D2 functional unit followed by T1 or T2 to specify that the nonmemory
operand is on a specific register side. T1 specifies side A and T2 specifies side B. For example:
LDW .D1T2 *A3[A4], B3
LDW .D1T2 *src, dst
• You can specify only the data path (for example, .1), and the assembly optimizer assigns the functional
type (for example, .L1).
For more information on functional units refer to the TMS320C6000 CPU and Instruction Set Reference
Guide.
.reg t0,t1,p,i,sh:sl
MVK 100,i
ZERO sh
ZERO sl
.return sh:sl
.endproc
To disable this format with symbolic names and display assembly instructions with actual registers
instead, compile with the --machine_regs option.
By default, the compiler generates near calls and the linker utilizes trampolines if the
near call will not reach its destination. To force a far call, you must explicitly load the
address of the function into a register, and then issue an indirect call. For example:
MVK func,reg
MVKH func,reg
.call reg(op1) ; forcing a far call
If you want to use * for indirection, you must abide by C/C++ syntax rules, and use the
following alternate syntax:
.call [ret_reg =] (* ireg)([arg1, arg2,...])
For example:
.call (*driver)(op1, op2) ; indirect call
.reg driver
.call driver(op1, op2) ; also an indirect call
Here are other valid examples that use the .call syntax.
.call fir(x, h, y) ; void function
Since you can use machine register names anywhere you can use symbolic registers, it
may appear you can change the function calling convention. For example:
.call A6 = compute()
The compiler assumes that it is safe to speculate any load using an explicitly declared
circular addressing variable as the address pointer and may exploit this assumption to
perform optimizations.
When a symbol is declared with the .circ directive, it is not necessary to declare that
symbol with the .reg directive.
The .circ directive is equivalent to using .map with a circular declaration.
Example Here the symbolic name Ri is assigned to actual machine register Mi and Ri is declared
as potentially being used for circular addressing.
.CIRC R1/M1, R2/M2 ...
LOOP:
AND cword,mask,cond ; cond = codeword & mask
[cond] MVK 1,cond ; !(!(cond))
CMPEQ theta,cond,if ; (theta == !(!(cond)))
LDH *a++,ai ; a[i]
[if] ADD sum,ai,sum ; sum += a[i]
[!if] SUB sum,ai,sum ; sum -= a[i]
SHL mask,1,mask ; mask = mask << 1
[cntr] ADD -1,cntr,cntr ; decrement counter
[cntr] B LOOP ; for LOOP
.return sum
.endproc
When a symbol is declared with the .map directive, it is not necessary to declare that
symbol with the .reg directive.
Example Here the .map directive is used to assign x to register A6 and y to register B7. The
symbols are used with a move statement.
.map x/A6, y/B7
MV x, y ; equivalent to MV A6, B7
The symbol used to name a memory reference has the same syntax restrictions as any
assembly symbol. (For more information about symbols, refer to the TMS320C6000
Assembly Language Tools User's Guide.) It is in the same space as the symbolic
registers. You cannot use the same name for a symbolic register and annotating a
memory reference.
The .mdep directive tells the assembly optimizer that there is a dependence between
two memory references.
The .mdep directive is valid only within procedures; that is, within occurrences of the
.proc and .endproc directive pair or the .cproc and .endproc directive pair.
Example Here is an example in which .mdep is used to indicate a dependence between two
memory references.
.mdep ld1, st1
The .mptr directive tells the assembly optimizer that when the symbol or memref is used
as a memory pointer in an LD(B/BU)(H/HU)(W) or ST(B/H/W) instruction, it is initialized
to point to base + offset and is incremented by stride each time through the loop.
The .mptr directive is valid within procedures only; that is, within occurrences of the .proc
and .endproc directive pair or the .cproc and .endproc directive pair.
The symbolic addresses used for base symbol names are in a name space separate
from all other labels. This means that a symbolic register or assembly label can have the
same name as a memory bank base name. For example:
.mptr Darray,Darray
Example Here is an example in which .mptr is used to avoid memory bank conflicts.
_blkcp: .cproc I
loop: .trip 50
; potential conflict
LDW *ptr1++, tmp1 ; load *0, bank 0
STW tmp1, *ptr2++{foo} ; store *8, bank 0
.endproc
Syntax .no_mdep
Description The .no_mdep directive tells the assembly optimizer that no memory dependencies
occur within that function, with the exception of any dependencies pointed to with the
.mdep directive.
Example Here is an example in which .no_mdep is used.
fn: .cproc dst, src, cnt
.no_mdep ;no memory aliasing in this function
...
.endproc
There is no guarantee that the symbol will be assigned to any register in the specified
group. The compiler may ignore the preference.
When a symbol is declared with the .pref directive, it is not necessary to declare that
variable with the .reg directive.
Example Here x is given a preference to be assigned to either A6 or B7. However, It would be
correct for the compiler to assign x to B3 (for example) instead.
.PREF x/A6/B7 ; Preference to assign x to either A6 or B7
A value is live in if it has been defined before the procedure and is used as an input to
the procedure. A value is live out if it has been defined before or within the procedure
loop:
LDW *B4++, A1
MV A1, B1
STW B1, *A4++
ADD -4, B0, B0
[B0] B loop
.endproc
Example 1 This example uses the same code as the block move example shown for .proc/.endproc
but the .reg directive is used:
move .cproc dst, src, cnt
.endproc
Notice how this example differs from the .proc example: symbolic registers declared with
.reg are allocated as machine registers.
Example 2 The code in the following example is invalid, because a variable defined by the .reg
directive cannot be used outside of the defined procedure:
move .proc A4
.reg tmp
LDW *A4++, top
MV top, B5
.endproc
MV top, B6 ; WRONG: top is invalid outside of the procedure
The .rega and .regb directives are valid within procedures only; that is, within
occurrences of the .proc and .endproc directive pair or the .cproc and .endproc directive
pair.
When a symbol is declared with the .rega or .regb directive, it is not necessary to declare
that symbol with the .reg directive.
The old method of partitioning registers indirectly by partitioning instructions can still be
used. Side and functional unit specifiers can still be used on instructions. However,
functional unit specifiers (.L/.S/.D/.M) and crosspath information are ignored. Side
specifiers are translated into partitioning constraints on the corresponding symbol
names, if any. For example:
MV .1X z, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
Example 1 The .reserve in this example guarantees that the assembly optimizer does not use A10
to A13 or B10 to B13 for the variables tmp1 to tmp5:
test .proc a4, b4
.reg tmp1, tmp2, tmp3, tmp4, tmp5
.reserve a10, a11, a12, a13, b10, b11, b12, b13
.....
.endproc a4
Example This example uses a symbolic register, tmp, and a machine-register, A5, as .return
arguments:
.cproc ...
.reg tmp
...
.return tmp = legal symbolic name
...
.return a5 = legal actual name
If the assembly optimizer cannot ensure that the trip count is large enough to pipeline a
loop for maximum performance, a pipelined version and an unpipelined version of the
same loop are generated. This makes one of the loops a redundant loop. The pipelined
or the unpipelined loop is executed based on a comparison between the trip count and
the number of iterations of the loop that can execute in parallel. If the trip count is
greater or equal to the number of parallel iterations, the pipelined loop is executed;
otherwise, the unpipelined loop is executed. For more information about redundant
loops, see Section 3.3.
You are not required to specify a .trip directive with every loop; however, you should use
.trip if you know that a loop iterates some number of times. This generally means that
redundant loops are not generated (unless the minimum value is really small) saving
code size and execution time.
If you know that a loop always executes the same number of times whenever it is called,
define maximum value (where maximum value equals minimum value) as well. The
compiler may now be able to unroll your loop thereby increasing performance.
When you are compiling with the interrupt flexibility option (--interrupt_threshold=n),
using a .trip maximum value allows the compiler to determine the maximum number of
cycles that the loop can execute. Then, the compiler compares that value to the
threshold value given by the --interrupt_threshold option. See Section 2.12 for more
information.
• Direct branches to the label associated with a .proc directive are not allowed. If you require a branch
back to the start of the linear assembly function, then use the .call directive. Here is an example of a
direct branch to the label of a .proc directive:
_func: .proc
...
B _func <= illegal
...
.endproc
• An .if/.endif loop must be entirely inside or outside of a proc or .cproc region. It is not allowed to have
part of an .if/.endif loop inside of a .proc or .cproc region and the other part of the .if/.endif loop outside
of the .proc or .cproc region. Here are two examples of legal .if/.endif loops. The first loop is outside a
.cproc region, the second loop is inside a .proc region:
.if
.cproc
...
.endproc
.endif
.proc
.if
...
.endif
.endproc
Here are two examples of .if/.endif loops that are partly inside and partly outside of a .cproc or .proc
region:
.if
.cproc
.endif
.endproc
.proc
.if
...
.else
.endproc
.endif
• The following assembly instructions cannot be used from linear assembly:
– EFI
– SPLOOP, SPLOOPD and SPLOOPW and all other loop-buffer related instructions
– C6700+ instructions
– ADDKSP and DP-relative addressing
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
For devices that have more than one memory space (Figure 4-2), an access to bank 0 in one memory
space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
Bank 0 Bank 1 Bank 2 Bank 3
Memory 8M 8M + 1 8M + 2 8M + 3 8M + 4 8M + 5 8M + 6 8M + 7
space 1
For example:
.mptr a_0,a+0,16
.mptr a_4,a+4,16
LDW *a_0++[4], val1 ; base=a, offset=0, stride=16
LDW *a_4++[4], val2 ; base=a, offset=4, stride=16
.mptr dptr,D+0,8
LDH *dptr++, d0 ; base=D, offset=0, stride=8
LDH *dptr++, d1 ; base=D, offset=2, stride=8
LDH *dptr++, d2 ; base=D, offset=4, stride=8
LDH *dptr++, d3 ; base=D, offset=6, stride=8
In this example, the offset for dptr is updated after every memory access. The offset is updated only when
the pointer is modified by a constant. This occurs for the pre/post increment/decrement addressing modes.
See the .mptr topic for more information.
Example 4-4 shows loads and stores extracted from a loop that is being software pipelined.
Example 4-4. Load and Store Instructions That Specify Memory Bank Information
.mptr Ain,IN,-16
.mptr Bin,IN-4,-16
.mptr Aco,COEF,16
.mptr Bco,COEF+4,16
.mptr Aout,optr+0,4
.mptr Bout,optr+2,4
_dot: .cproc a, b
.reg sum0, sum1, I
.reg val1, val2, prod1, prod2
loop: .trip 50
LDW *a++,val1 ; load a[0-1] bank0
LDW *b++,val2 ; load b[0-1] bank2
MPY val1,val2,prod1 ; a[0] * b[0]
MPYH val1,val2,prod2 ; a[1] * b[1]
ADD prod1,sum0,sum0 ; sum0 += a[0] * b[0]
ADD prod2,sum1,sum1 ; sum1 += a[1] * b[1]
It is not always possible to control fully how arrays and other memory objects are aligned. This is
especially true when a pointer is passed into a function and that pointer may have different alignments
each time the function is called. A solution to this problem is to write a dot product routine that cannot
have memory hits. This would eliminate the need for the arrays to use different memory banks.
If the dot product loop kernel is unrolled once, then four LDW instructions execute in the loop kernel.
Assuming that nothing is known about the bank alignment of arrays a and b (except that they are word
aligned), the only safe assumptions that can be made about the array accesses are that a[0-1] cannot
conflict with a[2-3] and that b[0-1] cannot conflict with b[2-3]. Example 4-8 shows the unrolled loop kernel.
Example 4-8. Dot Product From Example 4-6 Unrolled to Prevent Memory Bank Conflicts
ADD 4,a_0,a_4
ADD 4,b_0,b_4
MVK 25,i ; I = 100/4
ZERO sum0 ; multiply result = 0
ZERO sum1 ; multiply result = 0
.mptr a_0,a+0,8
.mptr a_4,a+4,8
.mptr b_0,b+0,8
.mptr b_4,b+4,8
loop: .trip 25
The goal is to find a software pipeline in which the following instructions are in parallel:
LDW *a0++[2],val1 ; load a[0-1] bankx
|| LDW *a2++[2],val2 ; load a[2-3] bankx+2
LDW *b0++[2],val1 ; load b[0-1] banky
|| LDW *b2++[2],val2 ; load b[2-3] banky+2
Without the .mptr directives in Example 4-8, the loads of a[0-1] and b[0-1] are scheduled in parallel, and
the loads of a[2-3] and b[2-3] might be scheduled in parallel. This results in a 50% chance that a memory
conflict will occur on every cycle. However, the loop kernel shown in Example 4-9 can never have a
memory bank conflict.
In Example 4-6, if .mptr directives had been used to specify that a and b point to different bases, then the
assembly optimizer would never find a schedule for a 1-cycle loop kernel, because there would always be
a memory bank conflict. However, it would find a schedule for a 2-cycle loop kernel.
.mptr a,RS
.mptr b,RS
.mptr c,XY
.mptr d,XY+2
LDW *a++[i0a],A0 ; a and b always conflict with each other
LDW *b++[i0b],B0 ;
STH A1,*c++[i1a] ; c and d never conflict with each other
STH B2,*d++[i1b] ;
This means that whenever ld1 accesses memory at location X, some later time in code execution, st1 may
also access location X. This is equivalent to adding a dependence between these two instructions. In
terms of the software pipeline, these two instructions must remain in the same order. The ld1 reference
must always occur before the st1 reference; the instructions cannot even be scheduled in parallel.
It is important to note the directional sense of the directive from ld1 to st1. The opposite, from st1 to ld1, is
not implied. In terms of the software pipeline, while every ld1 must occur before every st1, it is still legal to
schedule the ld1 from iteration n+1 before the st1 from iteration n.
Example 4-12 is a picture of the software pipeline with the instructions from two different iterations in
different columns. In the actual instruction sequence, instructions on the same horizontal line are in
parallel.
Example 4-12. Software Pipeline Using .mdep ld1, st1
STW { st1 }
If that schedule does not work because the iteration n st1 might write a value the iteration n+1 ld1 should
read, then you must note a dependence relationship from st1 to ld1.
.mdep st1, ld1
Both directives together force the software pipeline shown in Example 4-13.
Example 4-13. Software Pipeline Using .mdep st1, ld1 and .mdep ld1, st1
...
STW { st1 }
LDW { ld1 }
...
STW { st1 }
Indexed addressing, *+base[index], is a good example of an addressing mode where you typically do not
know anything about the relative sequence of the memory accesses, except they sometimes access the
same location. To correctly model this case, you need to note the dependence relation in both directions,
and you need to use both directives.
.mdep ld1, st1
.mdep st1, ld1
.return tmp
.endproc
• Example 2
Here, .mdep r2, r1 indicates that STW must occur before LDW. Since STW is after LDW in the code,
the dependence relation is across loop iterations. The STW instruction writes a value that may be read
by the LDW instruction on the next iteration. In this case, a 6-cycle recurrence is created.
fn: .cproc dst, src, cnt
.reg tmp
.no_mdep
.mdep r2, r1
.endproc
Volatile References
Note: For volatile references, use .volatile rather than .mdep.
The C/C++ compiler and assembly language tools provide two methods for linking your programs:
• You can compile individual modules and link them together. This method is especially useful when you
have multiple source files.
• You can compile and link in one step. This method is useful when you have a single source module.
This chapter describes how to invoke the linker with each method. It also discusses special requirements
of linking C/C++ code, including the run-time-support libraries, specifying the type of initialization, and
allocating the program into memory. For a complete description of the linker, see the TMS320C6000
Assembly Language Tools User's Guide.
5.1 Invoking the Linker Through the Compiler (-z Option) ................. 112
5.2 Linker Options ........................................................................ 114
5.3 Linker Code Optimizations ....................................................... 117
5.4 Controlling the Linking Process ................................................ 118
When you specify a library as linker input, the linker includes and links only those library members that
resolve undefined references. The linker uses a default allocation algorithm to allocate your program into
memory. You can use the MEMORY and SECTIONS directives in the linker command file to customize
the allocation process. For information, see the TMS320C6000 Assembly Language Tools User's Guide.
You can link a C/C++ program consisting of modules prog1.obj, prog2.obj, and prog3.obj, with an
executable filename of prog.out with the command:
cl6x --run_linker --rom_model prog1 prog2 prog3 --output_file=prog.out
--library=rts6200.lib
The --run_linker option divides the command line into the compiler options (the options before
--run_linker) and the linker options (the options following --run_linker). The --run_linker option must follow
all source files and compiler options on the command line.
All arguments that follow --run_linker on the command line are passed to the linker. These arguments can
be linker command files, additional object files, linker options, or libraries. These arguments are the same
as described in Section 5.1.1.
All arguments that precede --run_linker on the command line are compiler arguments. These arguments
can be C/C++ source files, assembly files, linear assembly files, or compiler options. These arguments are
described in Section 2.2.
You can compile and link a C/C++ program consisting of modules prog1.c, prog2.c, and prog3.c, with an
executable filename of prog.out with the command:
cl6x prog1.c prog2.c prog3.c --run_linker --rom_model --output_file=prog.out --library=rts6200.lib
--library= libraryname Names an archive library file or linker command filename as linker
input. The libraryname is an archive library name and must follow
operating system conventions. The --library option's short form is -l.
--linker_help Produces a help listing displaying syntax and available options
--make_global=global_symbol Defines global_symbol as global even if the global symbol has
been made static with the --make_static option
--make_static Makes all global symbols static; global symbols are essentially
hidden. This allows external symbols with the same name (in
different files) to be treated as unique.
--map_file=filename Produces a map or listing of the input and output sections, including
null areas, and places the listing in filename. The filename must
follow operating system conventions.
--mapfile_contents=filter[,filter] Controls the information that appears in the map file. Enter
--mapfile_contents=help on the command line to produce a listing
of available options.
--no_demangle Disables demangling of symbol names in diagnostics
--no_sym_merge Disables merge of symbolic debugging information in COFF object
files. The linker keeps the duplicate entries of symbolic debugging
information commonly generated when a C program is compiled for
debugging. (Deprecated option; use the strip utility described in the
TMS320C6000 Assembly Language Tools User's Guide.
--no_sym_table Creates a smaller output section by stripping symbol table
information and line number entries from the output module.
--no_warnings Suppresses warning diagnostics (errors are still issued). See
Section 5.4.1.2 for more information.
--output_file=filename Names the executable output module. The filename must follow
operating system conventions. If the --output_file option is not used,
the default filename is a.out.
--priority Satisfies each unresolved reference by the first library that contains
a definition for that symbol
--ram_model Initializes variables at load time. See Section 7.8.5 for more
information.
--relocatable Retains relocation entries in the output module.
--reread_libs Forces rereading of libraries. The linker continues to reread libraries
until no more references can be resolved.
--rom_model Autoinitializes variables at run time. See Section 7.8.4 for more
information.
--run_abs Produces an absolute listing file.
--scan_libraries Scans all libraries during a link to look for duplicate symbol
definitions to those symbols that are actually included in the link.
--set_error_limit=num Sets the error limit to num. The linker abandons linking after this
number of errors. (The default is 100.) See Section 5.4.1.2 for more
information.
--stack_size=size Sets the C/C++ system stack size to size bytes and defines a
global symbol that specifies the stack size. The default is 1K bytes.
For more information on linker options, see the TMS320C6000 Assembly Language Tools User's Guide.
In addition to placing each function in a separate subsection, the compiler also annotates that subsection
with a conditional linking directive, .clink. This directive marks the section as a candidate to be removed if
it is not referenced by any other section in the program. The compiler does not place a .clink directive in a
subsection for a trap or interrupt function, as these may be needed by a program even though there is no
symbolic reference to them anywhere in the program.
If a section that has been marked for conditional linking is never referenced by any other section in the
program, that section is removed from the program. Conditional linking is disabled when performing a
partial link or when relocation information is kept with the output of the link. Conditional linking can also be
disabled with the --disable_clink link option.
Regardless of the method you choose for invoking the linker, special requirements apply when linking
C/C++ programs. You must:
• Include the compiler's run-time-support library
• Specify the type of initialization
• Determine how you want to allocate your program into memory
This section discusses how these factors are controlled and provides an example of the standard default
linker command file.
For more information about how to operate the linker, see the linker description in the TMS320C6000
Assembly Language Tools User's Guide.
You must link all C/C++ programs with a run-time-support library. The library contains standard C/C++
functions as well as functions used by the compiler to manage the C/C++ environment. You must use the
--library linker option to specify which C6000 run-time-support library to use. The --library option also tells
the linker to look at the --search_path options and then the C6X_C_DIR environment variable to find an
archive path or object file. To use the --library linker option, type on the command line:
cl6x --run_linker {--rom_model | --ram_model} filenames --library=libraryname
<Linking>
When you link your program, you must specify where to allocate the sections in memory. In general,
initialized sections are linked into ROM or RAM; uninitialized sections are linked into RAM. With the
exception of .text, the initialized and uninitialized sections created by the compiler cannot be allocated into
internal program memory. See Section 7.1.1 for a complete description of how the compiler uses these
sections.
The linker provides MEMORY and SECTIONS directives for allocating sections. For more information
about allocating sections into memory, see the TMS320C6000 Assembly Language Tools User's Guide.
The MEMORY and possibly the SECTIONS directives, might require modification to work with your
system. See the C6000 Assembly Language Tools User's Guide for more information on these directives.
Example 5-1. Sample Link Step Command File
--rom_model
--heap_size=0x2000
--stack_size=0x0100
--library=rts6200.lib
MEMORY
{
VECS: o = 0x00000000 l = 0x000000400 /* reset & interrupt vectors */
PMEM: o = 0x00000400 l = 0x00000FC00 /* intended for initialization */
BMEM: o = 0x80000000 l = 0x000010000 /* .bss, .sysmem, .stack, .cinit */
}
SECTIONS
{
vectors > VECS
.text > PMEM
.data > BMEM
.stack > BMEM
.bss > BMEM
.sysmem > BMEM
.cinit > BMEM
.const > BMEM
.cio > BMEM
.far > BMEM
}
The C/C++ compiler supports the C/C++ language standard that was developed by a committee of the
American National Standards Institute (ANSI/ISO) to standardize the C programming language.
The C++ language supported by the C6000 is defined by the ANSI/ISO/IEC 14882-1998 standard with
certain exceptions.
(1)
Figures are minimum precision.
6.4 Keywords
The C6000 C/C++ compiler supports the standard const, register, restrict, and volatile keywords. In
addition, the C6000 C/C++ compiler extends the C/C++ language through the support of the cregister,
interrupt, near, and far keywords.
Using the const keyword, you can define large constant tables and allocate them into system ROM. For
example, to allocate a ROM table, you could use the following definition:
far const int digits[] = {0,1,2,3,4,5,6,7,8,9};
The cregister keyword can be used only in file scope. The cregister keyword is not allowed on any
declaration within the boundaries of a function. It can only be used on objects of type integer or pointer.
The cregister keyword is not allowed on objects of any floating-point type or on any structure or union
objects.
The cregister keyword does not imply that the object is volatile. If the control register being referenced is
volatile (that is, can be modified by some external control), then the object must be declared with the
volatile keyword also.
To use the control registers in Table 6-2, you must declare each register as follows. The c6x.h include file
defines all the control registers through this syntax:
extern cregister volatile unsigned int register ;
Once you have declared the register, you can use the register name directly. IFR is read only. See
theTMS320C62x DSP CPU and Instruction Set Reference Guide, TMS320C64x/C64x+ DSP CPU and
Instruction Set Reference Guide, or TMS320C67x/C67x+ DSP CPU and Instruction Set Reference Guide
for detailed information on the control registers.
See Example 6-1 for an example that declares and uses control registers.
Example 6-1. Define and Use Control Registers
The name c_int00 is the C/C++ entry point. This name is reserved for the system reset interrupt. This
special interrupt routine initializes the system and calls the function main. Because it has no caller, c_int00
does not save any registers.
Use the alternate keyword, __interrupt, if you are writing code for strict ANSI/ISO mode (using the
--strict_ansi compiler option).
far keyword The compiler cannot access the data item via the DP. This can be required if the
total amount of program data is larger than the offset allowed (32K) from the DP.
For example:
MVKL _address,a1
MVKH _address,a1
LDW *a1,a0
When data objects do not have the near or far keyword specified, the compiler will use far accesses to
aggregate data and near accesses to non-aggregate data. For more information on the data memory
model and ways to control accesses to data, see Section 7.1.5.1.
far keyword The compiler is told by you that the call is not within ± 1 M word.
MVKL _func,al
MVKH _func,al
B _func
By default, the compiler generates small-memory model code, which means that every function call is
handled as if it were declared near, unless it is actually declared far.
For more information on function calls, see Section 7.1.6.
Example 6-3 illustrates using the restrict keyword when passing arrays to a function. Here, the arrays c
and d should not overlap, nor should c and d point to the same array.
Example 6-3. Use of the restrict Type Qualifier With Arrays
In this example, *ctrl is a loop-invariant expression, so the loop is optimized down to a single-memory
read. To correct this, define *ctrl as:
volatile unsigned int *ctrl;
Here the *ctrl pointer is intended to reference a hardware location, such as an interrupt flag.
The compiler copies the argument string directly into your output file. The assembler text must be
enclosed in double quotes. All the usual character string escape codes retain their definitions. For
example, you can insert a .byte directive that contains quotes as follows:
asm("STR: .byte \"abc\"");
The inserted code must be a legal assembly language statement. Like all assembly language statements,
the line of code inside the quotes must begin with a label, a blank, a tab, or a comment (asterisk or
semicolon). The compiler performs no checking on the string; if there is an error, the assembler detects it.
For more information about the assembly language statements, see the TMS320C6000 Assembly
Language Tools User's Guide.
The asm statements do not follow the syntactic restrictions of normal C/C++ statements. Each can appear
as a statement or a declaration, even outside of blocks. This is useful for inserting directives at the very
beginning of a compiled module.
Use the alternate statement __asm("assembler text") if you are writing code for strict ANSI/ISO C mode
(using the --strict_ansi option).
The CODE_SECTION pragma is useful if you have code objects that you want to link into an area
separate from the .text section.
The following examples demonstrate the use of the CODE_SECTION pragma.
int fn(int x)
{
return x;
}
.sect "my_sect"
.global _fn
;******************************************************************************
;* FUNCTION NAME: _fn *
;* *
;* Regs Modified : SP *
;* Regs Used : A4,B3,SP *
;* Local Frame Size : 0 Args + 4 Auto + 0 Save = 4 byte *
;******************************************************************************
_fn:
;** --------------------------------------------------------------------------*
RET .S2 B3 ; |6|
SUB .D2 SP,8,SP ; |4|
STW .D2T1 A4,*+SP(4) ; |4|
ADD .S2 8,SP,SP ; |6|
NOP 2
; BRANCH OCCURS ; |6|
Both global and local variables can be aligned with the DATA_MEM_BANK pragma. The
DATA_MEM_BANK pragma must reside inside the function that contains the local variable being aligned.
The symbol can also be used as a parameter in the DATA_SECTION pragma.
When optimization is enabled, the tools may or may not use the stack to store the values of local
variables.
The DATA_MEM_BANK pragma allows you to align data on any data memory bank that can hold data of
the type size of the symbol. This is useful if you need to align data in a particular way to avoid memory
bank conflicts in your hand-coded assembly code versus padding with zeros and having to account for the
padding in your code.
This pragma increases the amount of space used in data memory by a small amount as padding is used
to align data onto the correct bank.
For C6200, the code in Example 6-6 guarantees that array x begins at an address ending in 4 or c (in
hexadecimal), and that array y begins at an address ending in 4 or c. The alignment for array y affects its
stack placement. Array z is placed in the .z_sect section, and begins at an address ending in 0 or 8.
Example 6-6. Using the DATA_MEM_BANK Pragma
void main()
{
#pragma DATA_MEM_BANK (y, 2);
short y[100];
...
}
The DATA_SECTION pragma is useful if you have data objects that you want to link into an area separate
from the .bss section. If you allocate a global variable using a DATA_SECTION pragma and you want to
reference the variable in C code, you must declare the variable as extern far.
Example 6-7 through Example 6-9 demonstrate the use of the DATA_SECTION pragma.
Example 6-7. Using the DATA_SECTION Pragma C Source File
char bufferA[512];
#pragma DATA_SECTION("my_sect")
char bufferB[512];
.global _bufferA
.bss _bufferA,512,4
.global _bufferB
_bufferB: .usect "my_sect",512,4
Except for _c_int00, which is the name reserved for the system reset interrupt for C/C++programs, the
name of the interrupt (the func argument) does not need to conform to a naming convention.
When you use program-level optimization, you may need to use the FUNC_EXT_CALLED pragma with
certain options. See Section 3.7.2.
The code for the function will return via the IRP (interrupt return pointer).
Except for _c_int00, which is the name reserved for the system reset interrupt for C programs, the name
of the interrupt (the func argument) does not need to conform to a naming convention.
The arguments min and max are programmer-guaranteed minimum and maximum trip counts. The trip
count is the number of times a loop iterates. The trip count of the loop must be evenly divisible by multiple.
All arguments are optional. For example, if the trip count could be 5 or greater, you can specify the
argument list as follows:
#pragma MUST_ITERATE(5);
However, if the trip count could be any nonzero multiple of 5, the pragma would look like this:
#pragma MUST_ITERATE(5, , 5); /* Note the blank field for max */
It is sometimes necessary for you to provide min and multiple in order for the compiler to perform
unrolling. This is especially the case when the compiler cannot easily determine how many iterations the
loop will perform (that is, the loop has a complex exit condition).
When specifying a multiple via the MUST_ITERATE pragma, results of the program are undefined if the
trip count is not evenly divisible by multiple. Also, results of the program are undefined if the trip count is
less than the minimum or greater than the maximum specified.
If no min is specified, zero is used. If no max is specified, the largest possible number is used. If multiple
MUST_ITERATE pragmas are specified for the same loop, the smallest max and largest min are used.
In this example, the compiler attempts to generate a software pipelined loop even without the pragma.
However, if MUST_ITERATE is not specified for a loop such as this, the compiler generates code to
bypass the loop, to account for the possibility of 0 iterations. With the pragma specification, the compiler
knows that the loop iterates at least once and can eliminate the loop-bypassing code.
MUST_ITERATE can specify a range for the trip count as well as a factor of the trip count. For example:
#pragma MUST_ITERATE(8, 48, 8);
This example tells the compiler that the loop executes between 8 and 48 times and that the trip_count
variable is a multiple of 8 (8, 16, 24, 32, 40, 48). The multiple argument allows the compiler to unroll the
loop.
You should also consider using MUST_ITERATE for loops with complicated bounds. In the following
example:
for(i2 = ipos[2]; i2 <; 40; i2 += 5) { ...
The compiler would have to generate a divide function call to determine, at run time, the exact number of
iterations performed. The compiler will not do this. In this case, using MUST_ITERATE to specify that the
loop always executes eight times allows the compiler to attempt to generate a software pipelined loop:
#pragma MUST_ITERATE(8, 8);
The code generated for the function will return via the NRP versus the IRP as for a function declared with
the interrupt keyword or INTERRUPT pragma.
Except for _c_int00, which is the name reserved for the system reset interrupt for C programs, the name
of the interrupt (function) does not need to conform to a naming convention.
Where min and max are the minimum and maximum trip counts of the loop in the common case. The trip
count is the number of times a loop iterates. Both arguments are optional.
For example, PROB_ITERATE could be applied to a loop that executes for eight iterations in the majority
of cases (but sometimes may execute more or less than eight iterations):
#pragma PROB_ITERATE(8, 8);
If only the minimum expected trip count is known (say it is 5), the pragma would look like this:
#pragma PROB_ITERATE(5);
This pragma guarantees that the alignment of the named type or the base type of the named typedef is at
least equal to that of the expression. (The alignment may be greater as required by the compiler.) The
alignment must be a power of 2. The type must be a type or a typedef name. If a type, it must be either a
structure tag or a union tag. If a typedef, its base type must be either a structure tag or a union tag.
Since ANSI/ISO C declares that a typedef is simply an alias for a type (i.e. a struct) this pragma can be
applied to the struct, the typedef of the struct, or any typedef derived from them, and affects all aliases of
the base type.
This example aligns any st_tag structure variables on a page boundary:
typedef struct st_tag
{
int a;
short b;
} st_typedef;
Any use of STRUCT_ALIGN with a basic type (int, short, float) or a variable results in an error.
If possible, the compiler unrolls the loop so there are n copies of the original loop. The compiler only
unrolls if it can determine that unrolling by a factor of n is safe. In order to increase the chances the loop is
unrolled, the compiler needs to know certain properties:
• The loop iterates a multiple of n times. This information can be specified to the compiler via the
multiple argument in the MUST_ITERATE pragma.
• The smallest possible number of iterations of the loop
• The largest possible number of iterations of the loop
The compiler can sometimes obtain this information itself by analyzing the code. However, sometimes the
compiler can be overly conservative in its assumptions and therefore generates more code than is
necessary when unrolling. This can also lead to not unrolling at all.
Furthermore, if the mechanism that determines when the loop should exit is complex, the compiler may
not be able to determine these properties of the loop. In these cases, you must tell the compiler the
properties of the loop by using the MUST_ITERATE pragma.
Specifying #pragma UNROLL(1); asks that the loop not be unrolled. Automatic loop unrolling also is not
performed in this case.
If multiple UNROLL pragmas are specified for the same loop, it is undefined which pragma is used, if any.
The linkname of foo is _foo__Fi, indicating that foo is a function that takes a single argument of type int.
To aid inspection and debugging, a name demangling utility is provided that demangles names into those
found in the original C++ source. See Chapter 9 for more information.
.bss: {} = 0x00;
...
}
Because the linker writes a complete load image of the zeroed .bss section into the output COFF file, this
method can have the unwanted effect of significantly increasing the size of the output file (but not the
program).
6.10.2 Initializing Static and Global Variables With the const Type Qualifier
Static and global variables of type const without explicit initializations are similar to other static and global
variables because they might not be preinitialized to 0 (for the same reasons discussed in Section 6.10).
For example:
const int zero; /* may not be initialized to 0 */
However, the initialization of const global and static variables is different because these variables are
declared and initialized in a section called .const. For example:
const int zero = 0 /* guaranteed to be 0 */
This feature is particularly useful for declaring a large table of constants, because neither time nor space
is wasted at system startup to initialize the table. Additionally, the linker can be used to place the .const
section in ROM.
You can use the DATA_SECTION pragma to put the variable in a section other than .const. For example,
the following C code:
#pragma DATA_SECTION (var, ".mysect");
const int zero=0;
To simplify the process of compiling existing C programs with the ANSI/ISO C/C++ compiler, the compiler
has a K&R option (--kr_compatible) that modifies some semantic rules of the language for compatibility
with older code. In general, the --kr_compatible option relaxes requirements that are stricter for ANSI/ISO
C than for K&R C. The --kr_compatible option does not disable any new features of the language such as
function prototypes, enumerations, initializations, or preprocessor constructs. Instead, --kr_compatible
simply liberalizes the ANSI/ISO rules without revoking any of the features.
The specific differences between the ANSI/ISO version of C and the K&R version of C are as follows:
• The integral promotion rules have changed regarding promoting an unsigned type to a wider signed
type. Under K&R C, the result type was an unsigned version of the wider type; under ANSI/ISO, the
result type is a signed version of the wider type. This affects operations that perform differently when
applied to signed or unsigned operands; namely, comparisons, division (and mod), and right shift:
unsigned short u;
int i;
if (u < i) /* SIGNED comparison, unless --kr_compatible used */
• ANSI/ISO prohibits combining two pointers to different types in an operation. In most K&R compilers,
this situation produces only a warning. Such cases are still diagnosed when --kr_compatible is used,
but with less severity:
int *p;
char *q = p; /* error without --kr_compatible, warning with --kr_compatible */
• External declarations with no type or storage class (only an identifier) are illegal in ANSI/ISO but legal
in K&R:
a; /* illegal unless --kr_compatible used */
• ANSI/ISO interprets file scope definitions that have no initializers as tentative definitions. In a single
module, multiple definitions of this form are fused together into a single definition. Under K&R, each
definition is treated as a separate definition, resulting in multiple definitions of the same object and
usually an error. For example:
int a;
int a; /* illegal if --kr_compatible used, OK if not */
Under ANSI/ISO, the result of these two definitions is a single definition for the object a. For most K&R
compilers, this sequence is illegal, because int a is defined twice.
• ANSI/ISO prohibits, but K&R allows objects with external linkage to be redeclared as static:
extern int a;
static int a; /* illegal unless --kr_compatible used */
• Unrecognized escape sequences in string and character constants are explicitly illegal under ANSI/ISO
but ignored under K&R:
char c = '\q'; /* same as 'q' if --kr_compatible used, error if not */
• ANSI/ISO specifies that bit fields must be of type int or unsigned. With --kr_compatible, bit fields can
be legally defined with any integral type. For example:
struct s
{
short f : 2; /* illegal unless --kr_compatible used */
};
• K&R syntax allows a trailing comma in enumerator lists:
enum { a, b, c, }; /* illegal unless --kr_compatible used */
• K&R syntax allows trailing tokens on preprocessor directives:
#endif NAME /* illegal unless --kr_compatible used */
6.11.2 Enabling Strict ANSI/ISO Mode and Relaxed ANSI/ISO Mode (--strict_ansi and
--relaxed_ansi Options)
Use the --strict_ansi option when you want to compile under strict ANSI/ISO mode. In this mode, error
messages are provided when non-ANSI/ISO features are used, and language extensions that could
invalidate a strictly conforming program are disabled. Examples of such extensions are the inline and asm
keywords.
Run-Time Environment
This chapter describes the TMS320C6000 C/C++ run-time environment. To ensure successful execution
of C/C++ programs, it is critical that all run-time code maintain this environment. It is also important to
follow the guidelines in this chapter if you write assembly language functions that interface with C/C++
code.
7.1.1 Sections
The compiler produces relocatable blocks of code and data called sections. The sections are allocated
into memory in a variety of ways to conform to a variety of system configurations. For more information
about sections and allocating them, see the introductory object module information in the TMS320C6000
Assembly Language Tools User's Guide.
There are two basic types of sections:
• Initialized sections contain data or executable code. The C/C++ compiler creates the following
initialized sections:
– The .cinit section contains tables for initializing variables and constants.
– The .const section contains string literals, floating-point constants, and data defined with the
C/C++ qualifier const (provided the constant is not also defined as volatile).
– The .pinit section contains the table for calling global object constructors at run time.
– The .switch section contains jump tables for large switch statements.
– The .text section contains all the executable code.
• Uninitialized sections reserve space in memory (usually RAM). A program can use this space at run
time to create and store variables. The compiler creates the following uninitialized sections:
– The .bss section reserves space for global and static variables. When you specify the
--rom_model linker option, at program startup, the C boot routine copies data out of the .cinit
section (which can be in ROM) and stores it in the .bss section. The compiler defines the global
symbol $bss and assigns $bss the value of the starting address of the .bss section.
– The .far section reserves space for global and static variables that are declared far.
– The .stack section allocates memory for the system stack. This memory passes arguments to
functions and allocates local variables.
– The .sysmem section reserves space for dynamic memory allocation. The reserved space is used
by the malloc, calloc, realloc, and new functions. If a C/C++ program does not use these functions,
the compiler does not create the .sysmem section.
Stack Overflow
Note: The compiler provides no means to check for stack overflow during compilation or at run
time. Place the beginning of the .stack section in the first address after an unmapped
memory space so stack overflow will cause a simulator fault. This makes this problem easy
to detect. Be sure to allow enough space for the stack to grow.
The --mem_model:data options do not affect the access to objects explicitly declared with the near of far
keyword.
By default, all run-time-support data is defined as far.
For more information on near and far accesses to data, see Section 6.4.4.
Consts that are declared far, either explicitly through the far keyword or implicitly using
--mem_model:const are always placed in the .const section.
.global _a
.bss _a,4,4
All near direct accesses are relative to the DP.
• Near indirect memory access
MVK (_a - $bss),A0
ADD DP,A0,A0
The expression (_a - $bss) calculates the offset of the symbol _a from the start of the .bss section. The
compiler defines the global $bss in generated assembly code. The value of $bss is the starting
address of the .bss section.
• Initialized near pointers
The .cinit record for an initialized near pointer value is stored as an offset from the beginning of the
.bss section. During the autoinitialization of global variables, the data page pointer is added to these
offsets. (See Section 7.8.3.)
7.2.1.2 enum, float, and int Data Types (signed and unsigned)
The int, unsigned int, enum, and float data types are stored in memory as 32-bit objects (see Figure 7-2).
Objects of these types are loaded to and stored from bits 0-31 of a register. In big-endian mode, 4-byte
objects are loaded to registers by moving the first byte (that is, the lower address) of memory to bits 24-31
of the register, moving the second byte of memory to bits 16-23, moving the third byte to bits 8-15, and
moving the fourth byte to bits 0-7. In little-endian mode, 4-byte objects are loaded to registers by moving
the first byte (that is, the lower address) of memory to bits 0-7 of the register, moving the second byte to
bits 8-15, moving the third byte to bits 16-23, and moving the fourth byte to bits 24-31.
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
31 0
LEGEND: S = sign, M = mantissa, E = exponent, MS = most significant, LS = least significant
The parameter d is the offset to be added to the beginning of the class object for this pointer. The
parameter I is the index into the virtual function table, offset by 1. The index enables the NULL pointer to
be represented. Its value is -1 if the function is nonvirtual. The parameter f is the pointer to the member
function if it is nonvirtual, when I is 0. The 0 is the offset to the virtual function pointer within the class
object.
A0 represents the least significant bit of the field A; A1 represents the next least significant bit, etc. Again,
storage of bit fields in memory is done with a byte-by-byte, rather than bit-by-bit, transfer.
Big-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
A A A A A A A B B B B B B B B B B C C C D D E E E E E E E E E X
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2 1 0 1 0 8 7 6 5 4 3 2 1 0 X
Little-endian register
MS LS
X E E E E E E E E E D D C C C B B B B B B B B B B A A A A A A A
X 8 7 6 5 4 3 2 1 0 1 0 2 1 0 9 8 7 6 5 4 3 2 1 0 6 5 4 3 2 1 0
31 0
Little-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
B A A A A A A A B B B B B B B B E E D D C C C B X E E E E E E E
0 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 1 0 1 0 2 1 0 9 X 8 7 6 5 4 3 2
LEGEND: X = not used, MS = most significant, LS = least significant
All other control registers are not saved or restored by the compiler.
The compiler assumes that control registers not listed in Table 7-2 that can have an effect on compiled
code have default values. For example, the compiler assumes all circular addressing-enabled registers
are set for linear addressing (the AMR is used to enable circular addressing). Enabling circular addressing
and then calling a C/C++ function without restoring the AMR to a default setting violates the calling
convention. You must be certain that control registers which affect compiler-generated code have a default
value when calling a C/C++ function from assembly.
Assembly language programmers must be aware that the linker assumes B15 contains the stack pointer.
The linker needs to save and restore values on the stack in trampoline code that it generates. If you do
not use B15 as the stack pointer in assembly code, you should use the linker option that disables
trampolines, --trampolines=off. Otherwise, trampolines could corrupt memory and overwrite register
values.
A4 A4 B4 A6
int func2( int a, float b, int c, struct A float e, int f, int g);
d,
A4 A4 B4 A6 B6 A8 B8 A10
int func3( int a, double b, float c, long double d;
A4 A4 B5:B4 A6 B7:B6
/* NOTE: The following function has a variable number of arguments */
A4 A4 B4 A6 stack ...
struct A func4( int y);
A3 A4
If the function returns a structure, the caller allocates space for the structure and passes the address of
the return space to the called function in A3. To return a structure, the called function copies the
structure to the memory block pointed to by the extra argument.
In this way, the caller can be smart about telling the called function where to return the structure. For
example, in the statement s = f(x), where s is a structure and f is a function that returns a structure, the
caller can actually make the call as f(&s, x). The function f then copies the return structure directly into
s, performing the assignment automatically.
If the caller does not use the return structure value, an address value of 0 can be passed as the first
argument. This directs the called function not to copy the return structure.
You must be careful to declare functions properly that return structures, both at the point where they
are called (so that the extra argument is passed) and at the point where they are declared (so the
function knows to copy the result).
7. Any register numbered A10 to A15 or B10 to B15 that was saved in Step 1 is restored.
8. If A15 was used as a frame pointer (FP), the old value of A15 is restored from the stack. The space
allocated for the function in Step 1 is reclaimed at the end of the function by adding a constant to
register B15 (SP).
9. The function returns by jumping to the value of the return register (B3) or the saved value of the return
register.
A function accesses its stack arguments and local nonregister variables indirectly through register A15
(FP) or through register B15 (SP), one of which points to the top of the stack. Since the stack grows
toward smaller addresses, the local and argument data for a function are accessed with a positive offset
from FP or SP. Local variables, temporary storage, and the area reserved for stack arguments to functions
called by this function are accessed with offsets smaller than the constant subtracted from FP or SP at the
beginning of the function.
Stack arguments passed to this function are accessed with offsets greater than or equal to the constant
subtracted from register FP or SP at the beginning of the function. The compiler attempts to keep register
arguments in their original registers if optimization is used or if they are defined with the register keyword.
Otherwise, the arguments are copied to the stack to free those registers for further allocation.
For information on whether FP or SP is used to access local variables, temporary storage, and stack
arguments, see Section 7.4.2. For more information on the C/C++ System stack, see Section 7.1.2.
extern "C" {
extern int asmfunc(int a); /* declare external as function */
int gvar = 4; /* define global variable */
}
void main()
{
int I = 5;
.global _asmfunc
.global _gvar
_asmfunc:
LDW *+b14(_gvar),A3
NOP 4
ADD a3,a4,a3
STW a3,*b14(_gvar)
MV a3,a4
B b3
NOP 5
In the C++ program in Example 7-1, the extern declaration of asmfunc is optional because the return type
is int. Like C/C++ functions, you need to declare assembly functions only if they return noninteger values
or pass noninteger parameters.
SP Semantics
Note: The stack pointer must always be 8-byte aligned. This is automatically performed by the C
compiler and system initialization code in the run-time-support libraries. Any hand assembly
code that has interrupts enabled or calls a function defined in C or linear assembly source
should also reserve a multiple of 8 bytes on the stack.
Stack Allocation
Note: Even though the compiler guarantees a doubleword alignment of the stack and the stack
pointer (SP) points to the next free location in the stack space, there is only enough
guaranteed room to store one 32-bit word at that location. The called function must allocate
space to store the doubleword.
Because you are referencing only the symbol's value as stored in the symbol table, the symbol's declared
type is unimportant. In Example 7-5, int is used. You can reference linker-defined symbols in a similar
manner.
The intrinsics listed in Table 7-3 are included for all C6000 devices. They correspond to the indicated
C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference
Guide for more information.
See Table 7-4 for the listing of C6400-specific intrinsics. See Table 7-5 for the listing of C6400+- and
C6740-specific intrinsics. See Table 7-6 for the listing of C6700-specific intrinsics.
(1)
See the TMS320C6000 Programmer's Guide for more information.
(2)
See Section 7.5.6 for details on manipulating 8-byte data quantities.
The intrinsics listed in Table 7-4 are included only for C6400 devices. The intrinsics shown correspond to
the indicated C6000 assembly language instruction(s). See the TMS320C6000CPU and Instruction Set
Reference Guide for more information.
See Table 7-3 for the listing of generic C6000 intrinsics. See Table 7-5 for the listing of C6400+- and
C6740-specific intrinsics. See Table 7-6 for the listing of C6700-specific intrinsics.
(1)
See Section 7.5.6 for details on manipulating 8-byte data quantities.
(2)
See the TMS320C6000 Programmer's Guide for more information.
The intrinsics listed in Table 7-5 are included only for C6400+ and C6740 devices. The intrinsics shown
correspond to the indicated C6000 assembly language instruction(s). See the TMS320C6000 CPU and
Instruction Set Reference Guide for more information.
See Table 7-3 for the listing of generic C6000 intrinsics. See Table 7-4 for the general listing of
C6400-specific intrinsics. See Table 7-6 for the listing of C6700-specific intrinsics.
The intrinsics listed in Table 7-6 are included only for C6700 devices. The intrinsics shown correspond to
the indicated C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set
Reference Guide for more information.
See Table 7-3 for the listing of generic C6000 intrinsics. See Table 7-4 for the listing of C6400-specific
intrinsics. See Table 7-5 for the listing of C6400+- and C6740-specific intrinsics.
The _disable_interrupts() and _enable_interrupts( ) intrinsics both return an unsigned int that can be
subsequently passed to _restore_interrupts( ) to restore the previous interrupt state. These intrinsics
provide a barrier to optimization and are therefore appropriate for implementing a critical (or atomic)
section. For example,
unsigned int restore_value;
restore_value = _disable_interrupts();
if (sem) sem--;
_restore_interrupts(restore_value);
The example code disables interrupts so that the value of sem read for the conditional clause does not
change before the modification of sem in the then clause. The intrinsics are barriers to optimization, so the
memory reads and writes of sem do not cross the _disable_interrupts or _restore_interrupts locations.
Overwrites CSR
Note: The _restore_interrupts( ) intrinsic overwrites the CSR control register with the value in the
argument. Any CSR bits changed since the _disable_interrupts( ) intrinsic or
_enable_interrupts( ) intrinsic will be lost.
On C6400+ and C6740, the _restore_interrupts( ) intrinsic does not use the RINT instruction.
*high = _hi(d);
*low = _lo(d);
}
7.5.7 Using MUST_ITERATE and _nassert to Enable SIMD and Expand Compiler
Knowledge of Loops
Through the use of MUST_ITERATE and _nassert, you can guarantee that a loop executes a certain
number of times.
This example tells the compiler that the loop is guaranteed to run exactly 10 times:
#pragma MUST_ITERATE(10,10);
for (I = 0; I <; trip_count; I++) { ...
MUST_ITERATE can also be used to specify a range for the trip count as well as a factor of the trip count.
For example:
#pragma MUST_ITERATE(8,48,8);
for (I = 0; I <; trip; I++) { ...
This example tells the compiler that the loop executes between 8 and 48 times and that the trip variable is
a multiple of 8 (8, 16, 24, 32, 40, 48). The compiler can now use all this information to generate the best
loop possible by unrolling better even when the --interrupt_thresholdn option is used to specify that
interrupts do occur every n cycles.
The TMS320C6000 Programmer's Guide states that one of the ways to refine C/C++ code is to use word
accesses to operate on 16-bit data stored in the high and low parts of a 32-bit register. Examples using
casts to int pointers are shown with the use of intrinsics to use certain instructions like _mpyh. This can be
automated by using the _nassert(); intrinsic to specify that 16-bit short arrays are aligned on a 32-bit
(word) boundary.
The following two examples generate the same assembly code:
• Example 1
int dot_product(short *x, short *y, short z)
{
int *w_x = (int *)x;
int *w_y = (int *)y;
int sum1 = 0, sum2 = 0, I;
for (I = 0; I < z/2; I++)
{
sum1 += _mpy(w_x[i], w_y[i]);
sum2 += _mpyh(w_x[i], w_y[i]);
}
return (sum1 + sum2);
}
• Example 2
int dot_product(short *x, short *y, short z)
{
int sum = 0, I;
The following subsections describe methods you can use to ensure the data referenced by ptr is aligned.
You have to employ one of these methods at every place in your code where f() is called.
...
f(buffer);
When compiling for C6400, C6400+, and C6740 devices, such an array is automatically aligned to an
8-byte boundary. When compiling for C6200 or C6700, such an array is automatically aligned to 4-byte
boundary, or, if the base type requires it, an 8-byte boundary. This is true whether the array is global,
static, or local. This automatic alignment is all that is required to achieve SIMD optimization on those
respective devices. You still need to include the _nassert because, in the general case, the compiler
cannot guarantee that ptr holds the address of a properly aligned array.
If you always pass the base address of an array to pointers like ptr, then you can use the following macro
to reflect that fact.
#if defined(_TMS320C6400)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 8 == 0)
#elif defined(_TMS320C6200) || defined(_TMS320C6700)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 4 == 0)
#else
#define ALIGNED_ARRAY(ptr) /* empty */
#endif
The macro works regardless of which C6x device you build for, or if you port the code to another target.
This code passes an unaligned address to ptr, thus violating the presumption coded in the _nassert().
There is no direct remedy for this case. Avoid this practice whenever possible.
If you are using BIOS memory allocation routines, be sure to pass the alignment factor as the last
argument using the syntax that follows:
buffer = MEM_alloc(segid, 100 * sizeof(short), 8);
See the TMS320C6000 DSP/BIOS Help for more information about BIOS memory allocation routines and
the segid parameter in particular.
struct s
{
...
short buf1[50];
...
} g;
...
f(g.buf1);
class c
{
public :
short buf1[50];
void mfunc(void);
...
};
void c::mfunc()
{
f(buf1);
...
}
The most straightforward way to align an array in a structure or class is to declare, right before the array,
a scalar that requires the desired alignment. So, if you want 8-byte alignment, use a long or double. If you
want 4-byte alignment, use an int or float. For example:
struct s
{
long not_used; /* 8-byte aligned */
short buffer[50]; /* also 8-byte aligned */
...
};
If you want to declare several arrays contiguously, and maintain a given alignment, you can do so by
keeping the array size, measured in bytes, an even multiple of the desired alignment. For example:
struct s
{
long not_used; /* 8-byte aligned */
short buf1[50]; /* also 8-byte aligned */
short buf2[50]; /* 4-byte aligned */
...
};
Because the size of buf1 is 50 * 2-bytes per short = 100 bytes, and 100 is an even multiple of 4, not 8,
buf2 is only aligned on a 4-byte boundary. Padding buf1 out to 52 elements makes buf2 8-byte aligned.
Within a structure or class, there is no way to enforce an array alignment greater than 8. For the purposes
of SIMD optimization, this is not necessary.
If a C/C++ interrupt routine does not call any other functions, only those registers that the interrupt handler
attempts to define are saved and restored. However, if a C/C++ interrupt routine does call other functions,
these functions can modify unknown registers that the interrupt handler does not use. For this reason, the
routine saves all usable registers if any other functions are called. Interrupts branch to the interrupt return
pointer (IRP). Do not call interrupt handling functions directly.
Interrupts can be handled directly with C/C++ functions by using the interrupt pragma or the interrupt
keyword. For more information, see Section 6.8.14 and Section 6.4.3.
You are responsible for handling the AMR control register and the SAT bit in the CSR correctly inside an
interrupt. By default, the compiler does not do anything extra to save/restore the AMR and the SAT bit.
Macros for handling the SAT bit and the AMR register are included in the c6x.h header file.
For example, you are using circular addressing in some hand assembly code (that is, the AMR does not
equal 0). This hand assembly code can be interrupted into a C code interrupt service routine. The C code
interrupt service routine assumes that the AMR is set to 0. You need to define a local unsigned int
temporary variable and call the SAVE_AMR and RESTORE_AMR macros at the beginning and end of
your C interrupt service routine to correctly save/restore the AMR inside the C interrupt service routine.
Example 7-11. AMR and SAT Handling
#include <c6x.h>
/* restore the AMR for you hand assembly code before exiting */
RESTORE_AMR(temp_amr);
}
If you need to save/restore the SAT bit (i.e. you were performing saturated arithmetic when interrupted
into the C interrupt service routine which may also perform some saturated arithmetic) in your C interrupt
service routine, it can be done in a similar way as the above example using the SAVE_SAT and
RESTORE_SAT macros.
For C6400+ and C6740, the compiler saves and restores the ILC and RILC control registers if needed.
Initializing Variables
Note: In ANSI/ISO C, global and static variables that are not explicitly initialized must be set to 0
before program execution. The C/C++ compiler does not perform any preinitialization of
uninitialized variables. Explicitly initialize any variable that must have an initial value of 0.
The easiest method is to set a fill value of 0 in the linker control map for the .bss section.
You cannot use these methods with code that is burned into ROM.
Global variables are either autoinitialized at run time or at load time. For information, see Section 7.8.4
and Section 7.8.5. Also see Section 6.10.
int x;
short i = 23;
int *p =
int a[5] = {1,2,3,4,5};
.global _x
.bss _x,4,4
.sect ".cinit:c"
.align 8
.field (CIR - $) - 8, 32
.field _I+0,32
.field 23,16 ; _I @ 0
.sect ".text"
.global _I
_I: .usect ".bss:c",2,2
.sect ".cinit:c"
.align 4
.field _x,32 ; _p @ 0
.sect ".text"
.global _p
_p: .usect ".bss:c",4,4
Example 7-13. Initialized Information for Variables Defined in Example 7-12 (continued)
.sect ".cinit"
.align 8
.field IR_1,32
.field _a+0,32
.field 1,32 ; _a[0] @ 0
.field 2,32 ; _a[1] @ 32
.field 3,32 ; _a[2] @ 64
.field 4,32 ; _a[3] @ 96
.field 5,32 ; _a[4] @ 128
IR_1: .set 20
.sect ".text"
.global _a
.bss _a,20,4
;**********************************************************************
;* MARK THE END OF THE SCALAR INIT RECORD IN CINIT:C *
;**********************************************************************
The .cinit section must contain only initialization tables in this format. When interfacing assembly language
modules, do not use the .cinit section for any other purpose.
The table in the .pinit section simply consists of a list of addresses of constructors to be called (see
Figure 7-11). The constructors appear in the table after the .cinit initialization.
Address of constructor 1
Address of constructor 2
Address of constructor 3
•
•
•
Address of constructor n
When you use the --rom_model or --ram_model option, the linker combines the .cinit sections from all the
C modules and appends a null word to the end of the composite .cinit section. This terminating record
appears as a record with a size field of 0 and marks the end of the initialization tables.
Likewise, the --rom_model or --ram_model link option causes the linker to combine all of the .pinit sections
from all C/C++ modules and append a null word to the end of the composite .pinit section. The boot
routine knows the end of the global constructor table when it encounters a null constructor address.
The const-qualified variables are initialized differently; see Section 6.4.1.
cint Initialization
.cinit
Loader tables
section
(EXT_MEM)
Boot
routine
.bss
section
(D_MEM)
.cinit Loader
.bss
Regardless of the use of the --rom_model or --ram_model options, the .pinit section is always loaded and
processed at run time.
Some of the tasks that a C/C++ program performs (such as I/O, dynamic memory allocation, string
operations, and trigonometric functions) are not part of the C/C++ language itself. However, the ANSI/ISO
C standard defines a set of run-time-support functions that perform these tasks. The C/C++ compiler
implements the complete ISO standard library except for those facilities that handle locale issues
(properties that depend on local language, nationality, or culture). Using the ANSI/ISO standard library
ensures a consistent set of functions that provide for greater portability.
In addition to the ANSI/ISO-specified functions, the run-time-support library includes routines that give you
processor-specific commands and direct C language I/O requests. These are detailed in Section 8.1 and
Section 8.2.
A library-build process is provided with the code generation tools that lets you create customized
run-time-support libraries. This process is described in Section 8.5 .
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 189
Submit Documentation Feedback
C and C++ Run-Time Support Libraries www.ti.com
190 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com C and C++ Run-Time Support Libraries
TI does not provide documentation that covers the functionality of the C++ library. We suggest referring to
one of the following sources:
• The Standard C++ Library: A Tutorial and Reference,Nicolai M. Josuttis, Addison-Wesley, ISBN
0-201-37926-0
• The C++ Programming Language (Third or Special Editions), Bjarne Stroustrup, Addison-Wesley,
ISBN 0-201-88954-4 or 0-201-70073-5
• Dinkumware's online reference at http://dinkumware.com/manuals
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 191
Submit Documentation Feedback
C and C++ Run-Time Support Libraries www.ti.com
192 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com The C I/O Functions
main()
{
FILE *fid;
fid = fopen("myfile","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
Issuing the following compiler command compiles, links, and creates the file main.out from the
run-time-support library:
cl6x main.c --run_linker --heap_size=400 --library=rts6200.lib --output_file=main.out
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 193
Submit Documentation Feedback
The C I/O Functions www.ti.com
open
read
The first three streams in the stream table are predefined to be stdin, stdout, and stderr and they point to
the host device and associated device drivers.
At the next level are the user-definable device-level drivers. They map directly to the low-level I/O
functions. The run-time-support library includes the device drivers necessary to perform I/O on the host on
which the debugger is running.
The specifications for writing device-level routines to interface with the low-level routines follow. Each
function must set up and maintain its own data structures as needed. Some function definitions perform no
action and should just return.
194 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com add_device — Add Device to Device Table
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 195
Submit Documentation Feedback
close — Close File or Device for I/O www.ti.com
196 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com open — Open File or Device for I/O
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 197
Submit Documentation Feedback
read — Read Characters from Buffer www.ti.com
198 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com unlink — Delete File
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 199
Submit Documentation Feedback
write — Write Characters to Buffer www.ti.com
2. Use the low-level function add_device() to add your device to the device_table. The device table is a
statically defined array that supports n devices, where n is defined by the macro _NDEVICE found in
stdio.h/cstdio. The structure representing a device is also defined in stdio.h/cstdio and is composed of
the following fields:
name String for device name
flags Flags that specify whether the device supports multiple streams or not
function pointers Pointers to the device-level functions:
• CLOSE • RENAME
• LSEEK • WRITE
• OPEN • UNLINK
• READ
The first entry in the device table is predefined to be the host device on which the debugger is running.
The low-level routine add_device() finds the first empty position in the device table and initializes the
device fields with the passed-in arguments. For a complete description, see the add_device function .
3. Once the device is added, call fopen() to open a stream and associate it with that device. Use
devicename:filename as the first argument to fopen().
Example 8-1 illustrates adding and using a device for C I/O:
Example 8-1. Program for C I/O Device
#include <stdio.h>
/****************************************************************************/
/* Declarations of the user-defined device drivers */
/****************************************************************************/
extern int my_open(const char *path, unsigned flags, int fno);
extern int my_close(int fno);
extern int my_read(int fno, char *buffer, unsigned count);
extern int my_write(int fno, const char *buffer, unsigned count);
extern long my_lseek(int fno, long offset, int origin);
extern int my_unlink(const char *path);
extern int my_rename(const char *old_name, char *new_name);
main()
{
FILE *fid;
add_device("mydevice", _MSA, my_open, my_close, my_read, my_write, my_lseek,
my_unlink, my_rename);
fid = fopen("mydevice:test","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
}
200 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com Handling Reentrancy (_register_lock() and _register_unlock() Functions)
The arguments to _register_lock() and _register_unlock() should be functions which take no arguments
and return no values, and which implement some sort of global semaphore locking:
extern volatile sig_atomic_t *sema = SHARED_SEMAPHORE_LOCATION;
static int sema_depth = 0;
static void my_lock(void)
{
while (ATOMIC_TEST_AND_SET(sema, MY_UNIQUE_ID) != MY_UNIQUE_ID);
sema_depth++;
}
static void my_unlock(void)
{
if (!--sema_depth) ATOMIC_CLEAR(sema);
}
The run-time-support nests calls to _lock(), so the primitives must keep track of the nesting level.
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 201
Submit Documentation Feedback
Library-Build Process www.ti.com
If you are using Code Composer Studio, include the C6700 FastMath library in your project, and ensure it
appears before the standard run-time-support library in the Link Order tab in the Build Options dialog box.
For details, refer to the TMS320C67x FastRTS Library Programmer's Reference.
202 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
www.ti.com Library-Build Process
For information on the C6700 FastMath source library, fastmathc67x.src, see Section 8.4.
SPRU187O – May 2008 Using Run-Time-Support Functions and Building Libraries 203
Submit Documentation Feedback
204 Using Run-Time-Support Functions and Building Libraries SPRU187O – May 2008
Submit Documentation Feedback
Chapter 9
SPRU187O – May 2008
The C++ compiler implements function overloading, operator overloading, and type-safe linking by
encoding a function's signature in its link-level name. The process of encoding the signature into the
linkname is often referred to as name mangling. When you inspect mangled names, such as in assembly
files or linker output, it can be difficult to associate a mangled name with its corresponding name in the
C++ source code. The C++ name demangler is a debugging aid that translates each mangled name it
detects to its original name found in the C++ source code.
These topics tell you how to invoke and use the C++ name demangler. The C++ name demangler reads
in input, looking for mangled names. All unmangled text is copied to output unaltered. All mangled names
are demangled before being copied to output.
By default, the C++ name demangler outputs to standard out. You can use the -o file option if you want to
output to a file.
class banana {
public:
int calories(void);
banana();
~banana();
};
int calories_in_a_banana(void)
{
banana x;
return x.calories();
}
_calories_in_a_banana__Fv:
;** ----------------------------------------------------------------------*
CALL .S1 ___ct__6bananaFv ; |10|
STW .D2T2 B3,*SP--(16) ; |9|
MVKL .S2 RL0,B3 ; |10|
MVKH .S2 RL0,B3 ; |10|
ADD .S1X 8,SP,A4 ; |10|
NOP 1
RL0: ; CALL OCCURS ; |10|
CALL .S1 _calories__6bananaFv ; |12|
MVKL .S2 RL1,B3 ; |12|
ADD .S1X 8,SP,A4 ; |12|
MVKH .S2 RL1,B3 ; |12|
NOP 2
RL1: ; CALL OCCURS ; |12|
CALL .S1 ___dt__6bananaFv ; |13|
STW .D2T1 A4,*+SP(4) ; |12|
ADD .S1X 8,SP,A4 ; |13|
MVKL .S2 RL2,B3 ; |13|
MVK .S2 0x2,B4 ; |13|
MVKH .S2 RL2,B3 ; |13|
RL2: ; CALL OCCURS ; |13|
LDW .D2T1 *+SP(4),A4 ; |12|
LDW .D2T2 *++SP(16),B3 ; |13|
NOP 4
RET .S2 B3 ; |13|
NOP 5
; BRANCH OCCURS ; |13|
Executing the C++ name demangler demangles all names that it believes to be mangled. If you enter:
dem6x calories_in_a_banana.asm
the result is shown in Example 9-3. The linknames in Example 9-2 ___ct__6bananaFv,
_calories__6bananaFv, and ___dt__6bananaFv are demangled.
calories_in_a_banana():
;** ----------------------------------------------------------------------*
CALL .S1 banana::banana() ; |10|
STW .D2T2 B3,*SP--(16) ; |9|
MVKL .S2 RL0,B3 ; |10|
MVKH .S2 RL0,B3 ; |10|
ADD .S1X 8,SP,A4 ; |10|
NOP 1
RL0: ; CALL OCCURS ; |10|
CALL .S1 banana::calories() ; |12|
MVKL .S2 RL1,B3 ; |12|
ADD . S1X 8,SP,A4 ; |12|
MVKH .S2 RL1,B3 ; |12|
NOP 2
RL1: ; CALL OCCURS ; |12|
CALL .S1 banana::~banana() ; |13|
STW .D2T1 A4,*+SP(4) ; |12|
ADD .S1X 8,SP,A4 ; |13|
MVKL .S2 RL2,B3 ; |13|
MVK . S2 0x2,B4 ; |13|
MVKH . S2 RL2,B3 ; |13|
RL2: ; CALL OCCURS ; |13|
LDW .D2T1 *+SP(4),A4 ; |12|
LDW .D2T2 *++SP(16),B3 ; |13|
NOP 4
RET .S2 B3 ; |13|
NOP 5
; BRANCH OCCURS ; |13|
Glossary
absolute lister— A debugging tool that allows you to create assembler listings that contain absolute
addresses.
alias disambiguation— A technique that determines when two pointer expressions cannot point to the
same location, allowing the compiler to freely optimize such expressions.
aliasing— The ability for a single object to be accessed in more than one way, such as when two pointers
point to a single object. It can disrupt optimization, because any indirect reference could refer to
any other object.
allocation— A process in which the linker calculates the final memory addresses of output sections.
ANSI— American National Standards Institute; an organization that establishes standards voluntarily
followed by industries.
archive library— A collection of individual files grouped into a single file by the archiver.
archiver— A software program that collects several individual files into a single file called an archive
library. With the archiver, you can add, delete, extract, or replace members of the archive library.
assembler— A software program that creates a machine-language program from a source file that
contains assembly language instructions, directives, and macro definitions. The assembler
substitutes absolute operation codes for symbolic operation codes and absolute or relocatable
addresses for symbolic addresses.
assembly optimizer— A software program that optimizes linear assembly code, which is assembly code
that has not been register-allocated or scheduled. The assembly optimizer is automatically invoked
with the compiler program, cl6x, when one of the input files has a .sa extension.
assignment statement— A statement that initializes a variable with a value.
autoinitialization— The process of initializing global C variables (contained in the .cinit section) before
program execution begins.
autoinitialization at run time— An autoinitialization method used by the linker when linking C code. The
linker uses this method when you invoke it with the --rom_model link option. The linker loads the
.cinit section of data tables into memory, and variables are initialized at run time.
big endian— An addressing protocol in which bytes are numbered from left to right within a word. More
significant bytes in a word have lower numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also little endian
block— A set of statements that are grouped together within braces and treated as an entity.
.bss section— One of the default object file sections. You use the assembler .bss directive to reserve a
specified amount of space in the memory map that you can use later for storing data. The .bss
section is uninitialized.
byte— Per ANSI/ISO C, the smallest addressable unit that can hold a character.
C/C++ compiler— A software program that translates C source statements into assembly language
source statements.
code generator— A compiler tool that takes the file produced by the parser or the optimizer and produces
an assembly language source file.
COFF— Common object file format; a system of object files configured according to a standard developed
by AT&T. These files are relocatable in memory space.
command file— A file that contains options, filenames, directives, or commands for the linker or hex
conversion utility.
comment— A source statement (or portion of a source statement) that documents or improves readability
of a source file. Comments are not compiled, assembled, or linked; they have no effect on the
object file.
compiler program— A utility that lets you compile, assemble, and optionally link in one step. The
compiler runs one or more source modules through the compiler (including the parser, optimizer,
and code generator), the assembler, and the linker.
compression— The assembler process of converting 32-bit instructions into 16-bit instructions (C6400+
and C6740 only). Depending on the --opt_for_space level, the compiler selects and tailors certain
instructions so that the assembler can convert them to 16-bit instructions. Compression can be
turned off with the --no_compress option.
configured memory— Memory that the linker has specified for allocation.
constant— A type whose value cannot change.
cross-reference listing— An output file created by the assembler that lists the symbols that were defined,
what line they were defined on, which lines referenced them, and their final values.
.data section— One of the default object file sections. The .data section is an initialized section that
contains initialized data. You can use the .data directive to assemble code into the .data section.
direct call— A function call where one function calls another using the function's name.
directives— Special-purpose commands that control the actions and functions of a software tool (as
opposed to assembly language instructions, which control the actions of a device).
disambiguation— See alias disambiguation
dynamic memory allocation— A technique used by several functions (such as malloc, calloc, and
realloc) to dynamically allocate memory for variables at run time. This is accomplished by defining a
large memory pool (heap) and using the functions to allocate memory from the heap.
ELF— Executable and linking format; a system of object files configured according to the System V
Application Binary Interface specification.
emulator— A hardware development system that duplicates the TMS320C6000 operation.
entry point— A point in target memory where execution starts.
environment variable— A system symbol that you define and assign to a string. Environmental variables
are often included in Windows batch files or UNIX shell scripts such as .cshrc or .profile.
epilog— The portion of code in a function that restores the stack and returns.See also pipelined-loop
epilog.
executable module— A linked object file that can be executed in a target system.
expression— A constant, a symbol, or a series of constants and symbols separated by arithmetic
operators.
external symbol— A symbol that is used in the current program module but defined or declared in a
different program module.
little endian— An addressing protocol in which bytes are numbered from right to left within a word. More
significant bytes in a word have higher numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also big endian
live in— A value that is defined before a procedure and used as an input to that procedure.
live out— A value that is defined within a procedure and used as an output from that procedure.
loader— A device that places an executable module into system memory.
loop unrolling— An optimization that expands small loops so that each iteration of the loop appears in
your code. Although loop unrolling increases code size, it can improve the performance of your
code.
macro— A user-defined routine that can be used as an instruction.
macro call— The process of invoking a macro.
macro definition— A block of source statements that define the name and the code that make up a
macro.
macro expansion— The process of inserting source statements into your code in place of a macro call.
map file— An output file, created by the linker, that shows the memory configuration, section composition,
section allocation, symbol definitions and the addresses at which the symbols were defined for your
program.
memory map— A map of target system memory space that is partitioned into functional blocks.
name mangling— A compiler-specific feature that encodes a function name with information regarding the
function's arguments return types.
object file— An assembled or linked file that contains machine-language object code.
object library— An archive library made up of individual object files.
object module— A linked, executable object file that can be downloaded and executed on a target
system.
operand— An argument of an assembly language instruction, assembler directive, or macro directive that
supplies information to the operation performed by the instruction or directive.
optimizer— A software tool that improves the execution speed and reduces the size of C programs. See
also assembly optimizer.
options— Command-line parameters that allow you to request additional or specific functions when you
invoke a software tool.
output module— A linked, executable object file that is downloaded and executed on a target system.
output section— A final, allocated section in a linked, executable module.
parser— A software tool that reads the source file, performs preprocessing functions, checks the syntax,
and produces an intermediate file used as input for the optimizer or code generator.
partitioning— The process of assigning a data path to each instruction.
pipelined-loop epilog— The portion of code that drains a pipeline in a software-pipelined loop. See also
epilog
pipelined-loop prolog— The portion of code that primes the pipeline in a software-pipelined loop. See
also prolog
pipelining— A technique where a second instruction begins executing before the first instruction has
been completed. You can have several instructions in the pipeline, each at a different processing
stage.
subsection— A relocatable block of code or data that ultimately will occupy continuous space in the
memory map. Subsections are smaller sections within larger sections. Subsections give you tighter
control of the memory map.
symbol— A string of alphanumeric characters that represents an address or a value.
symbol table— A portion of a COFF object file that contains information about the symbols that are
defined and used by the file.
symbolic debugging— The ability of a software tool to retain symbolic information that can be used by a
debugging tool such as a simulator or an emulator.
target system— The system on which the object code you have developed is executed.
.text section— One of the default object file sections. The .text section is initialized and contains
executable code. You can use the .text directive to assemble code into the .text section.
trigraph sequence— A 3-character sequence that has a meaning (as defined by the ISO 646-1983
Invariant Code Set). These characters cannot be represented in the C character set and are
expanded to one character. For example, the trigraph ??' is expanded to ^.
trip count— The number of times that a loop executes before it terminates.
unconfigured memory— Memory that is not defined as part of the memory map and cannot be loaded
with code or data.
uninitialized section— A object file section that reserves space in the memory map but that has no actual
contents. These sections are built with the .bss and .usect directives.
unsigned value— A value that is treated as a nonnegative number, regardless of its actual sign.
variable— A symbol representing a quantity that can assume any of a set of values.
veneer— A sequence of instructions that serves as an alternate entry point into a routine if a state
change is required.
word— A 32-bit addressable location in target memory