Sunteți pe pagina 1din 39

Transitioning to Cortex-M3 based MCUs

Paul Kimelman - CTO


Luminary Micro, Inc
Contents
ƒ Introduction
ƒ C/C++ with no assembly code needed?
ƒ Performance and code size – what you should know
ƒ Interrupts and style of application
ƒ High integration to save BOM cost
ƒ Use of special instructions – from C/C++

Confidential 2
Introduction
ƒ I assume you have heard Shyam’s presentation
ƒ Focus on transition vs. porting
ƒ Porting is mostly about making it work
ƒ Transitioning covers:
ƒ Change in peripheral interface
ƒ Change in application approach
ƒ Porting issues for performance, size, behavior
ƒ Will cover issues coming from 8-bit/16-bit, ARM7, and ARM9
ƒ Time to unlearn bad habits forced on you
ƒ Do not be fooled by MHz or “MIPS”
ƒ Application style to best fit the needs
ƒ More integrated in HW and more done in SW
ƒ Focus on code/data size, performance, BOM cost, power

Confidential 3
C/C++ vs. Assembly
ƒ Why do you normally end up having to write assembly?
ƒ Vector table (needs “call” or “mov”, etc)
ƒ Interrupt entry/exit stubs
ƒ Keyword usually will not support priority nesting
ƒ Compiled code too big and/or slow – parts of application must be hand coded
ƒ Specialized features, not compiler friendly
ƒ Initialization code – unless acceptable one in C runtime lib
ƒ So, why not with Cortex-M3?
ƒ Vector table is C array of pointers. 1st entry is Stack pointer.
ƒ All ISRs are normal C functions with no special keyword, even Reset
ƒ Priority nesting supported in all cases (including faults, system handling)
ƒ Instruction set is compiler friendly.
ƒ Compilers can detect cases of special instructions
ƒ e.g. REVerse and REV16
ƒ ((x & 0x00ff) << 8) | ((x & 0xff00) >> 8)
ƒ Initialization is C function (ResetISR) with stack already setup by HW

Confidential 4
Coming from PIC
ƒ Cortex-M3 uses standard C code
ƒ No #fuses or #uses, set configuration as and when needed
ƒ No #INT_xxx function tags – just normal C functions
ƒ Just point to function from one or more vectors in C array
ƒ ISRs can call other functions, have all registers available
ƒ Stack may be common or separate one for all ISRs
ƒ Hardware routes directly to each ISR – no software looking at flags
ƒ Can change ISRs dynamically (vector table can be moved to SRAM or
elsewhere in Flash, not just one “alternate” as in PIC24+)
ƒ At least 8 priorities (vs. 2 on 8-bit, 7 on PIC24+), easily set/changed, with
priority masking. Faults can be prioritized also
ƒ NMI for safety use - cannot be masked off
ƒ All GPIOs/Peripherals are direct writable/readable/configurable.
ƒ ARM GPIOs allow up to 8 GPIOs to be accessed in one LDR/STR

Confidential 5
Coming from 8051
ƒ One unified address space, divided into ½GB regions
ƒ Same instructions used for all locations, no speed penalty
ƒ Code (Flash) from 0, SRAM (internal) from 0x20000000, Peripherals (internal)
from 0x40000000
ƒ External RAM/Peripherals in middle (but not likely used)
ƒ System registers (interrupt contoller, etc) from 0xE0000000
ƒ Bit access for 1st 1MB of RAM and 1st 1MB of Peripherals
ƒ Same model as 8051 (can access same location by bits & byte/half/word)
ƒ RMW is atomic
ƒ Does not need special instructions, so compiler friendly
ƒ Any pointer or variable may be used
ƒ System/peripheral registers (SFR) are memory mapped
ƒ All accessible using normal C code
ƒ Similar interrupt model: enables, priority, fixed assignments
ƒ More than two levels of priority, no SW save of PSW/ACC/etc, vector pointers

Confidential 6
Coming from MSP430
ƒ Cortex-M3 uses standard C/C++
ƒ 32-bit words vs. 16-bit
ƒ Contiguous RAM, starts at ½GB
ƒ Contiguous Flash for code and data/configuration, starts at 0
ƒ Multiply (and Divide) is safe for interrupts
ƒ Interrupts are also vectored
ƒ User set priority vs. position in table
ƒ Nesting is automatic (by priority) vs. GIE
ƒ No special code or special work in C
ƒ RISC oriented instruction set
ƒ 13 general purpose registers (3 more are for SP, LR, PC)
ƒ Constants from MOV instruction
ƒ No indirect addressing, but PC-relative address “literals” in Flash
ƒ Most instructions are 1 cycle, not related to size
ƒ ARM GPIOs are similar design
ƒ But consistent (all follow same rules) and more pin control

Confidential 7
Performance
ƒ Do not be fooled by MHz or “MIPS”
ƒ Only valid measure is amount of work done in a given period of time
ƒ Faster MHz for many processors/MCUs barely runs faster
ƒ Introduces code wait states (Flash and/or RAM)
ƒ Stalls on peripherals – often big part of application
ƒ Non-deterministic behavior
ƒ Instruction “prediction”, branch caches, caches, etc
ƒ MIPS is measure of instruction set style, not work
ƒ E.g. 3-12 cycle hardware 32-bit/32-bit DIVide vs. 50+ cycles in software
ƒ 50+ cycles has higher MIPS!
ƒ Less RISCy instructions get more work done, lower MIPS number
ƒ Better code density from less RISCy instructions
ƒ DMIPS mainly tells you about 3 things – memcpy(), strcmp(), and div
ƒ Div and strcmp() are often “gamed” by compiler vendors
ƒ More and more compilers “cheat” using auto-inlining, whole program opt

Confidential 8
Peripherals and their bus interface
ƒ Wait states on peripherals are a hidden cost
ƒ Watch for slower peripheral buses – on any processor
ƒ When peripheral bus is slower than core clock – wait states
ƒ Impacts even store when have to write more than one in a row
ƒ Impacts maximum toggle rate of GPIOs, ability to feed/drain data, etc
ƒ You want a fast bus, regardless of peripheral rate
ƒ Wait states means processor stalled
ƒ Affects what you can do, but also interrupt latency
ƒ Ideal is feed/drain peripheral FIFO quickly, then have lots of time
before need to service peripheral again
ƒ Even more critical if you have to bit-bang through GPIOs

Confidential 9
Performance – interrupt overhead
ƒ Real measure: Time from HW trigger to 1st line of real user code
ƒ Longest instruction which stops interrupt adds to latency (tA)
ƒ e.g. LDM of 8 elements on ARM7 holds off interrupts for 10 cycles if from 0 wait state
memory, for 26 cycles if 2 wait state peripheral, etc
ƒ Cortex-M3 uses interrupt-continue for LDM/STM and abandon for DIV/UMLAL/etc
ƒ Pushing registers, messing with modes, etc (tB)
ƒ Many example applications use direct entry, but that does not scale to multiple interrupts
or multiple at same time (nesting based on priority)
ƒ Often more than 20 cycles of difference in timing when allow nesting
ƒ Cortex-M3 does in HW in 12 cycles (saves registers and loads pipeline).
ƒ So, user code is now running – but be aware of function prologue on any.
ƒ Popping registers and resetting interrupt controller (tD)
ƒ Even when leaving one to enter another – pops all then pushes again
ƒ Cortex-M3 “tail chains” – skips pop/push and just jumps to new ISR (skips tE)
ƒ Higher priority interrupt held off by any of above
ƒ This is the case you have to allow for. If no nesting, then add longest ISR! (tC)
ƒ Cortex-M3: full priorities and nesting, pre-empt anytime, take over during transitions
tA tB tC tD tE

Confidential 10
Interrupt jitter
ƒ Interrupt jitter is variability of response to interrupt trigger
(external or internal)
ƒ Priority jitter is a given (higher priority interrupt should delay lower
one).
ƒ Jitter on high priority interrupt is a serious matter.
ƒ Most common jitter cause is high priority interrupt being held off
when in overhead for lower priority one (registers/mode-save).
ƒ Even worse is case where processor does not allow nesting
ƒ High priority interrupt delayed by length of lower priority ISR
Trigger
Time to ISR on different invocations

t
Range of time before
ISR serviced=jitter

Confidential 11
Interrupt response jitter
ƒ If you have two (or more) interrupts, what happens when they intersect?
ƒ Gpio1 is higher priority than Gpio2. Gpio2 is fixed periodic.
ƒ Both ISRs take the same time (for this example).
ƒ Shows skew in start time for Gpio2 and Gpio1. Key:
ƒ Expect priority based jitter for Gpio2 Interrupt entry overhead
ƒ Issue is Gpio1 jitter (purple double line) Interrupt exit overhead
Pre-empted

CM3 Other w/nesting Other, no nesting


Gpio2 triggers Gpio2 triggers Gpio2 triggers
1. Gpio2 completes before Gpio1
(no delay for either)
2. Gpio1 comes when in Gpio2
(Gpio2 gets delayed if nesting)
3. Gpio1 comes before Gpio2
(Gpio2 gets delayed)
4. Gpio1 comes during overhead
(Depends on processor)

Confidential 12
Interrupt response jitter
ƒ If you have two (or more) interrupts, what happens when they intersect?
ƒ Gpio1 is higher priority than Gpio2. Gpio1 is fixed periodic (this time).
ƒ Both ISRs take the same time (for this example).
ƒ Shows skew in start time for Gpio2 and Gpio1.
ƒ Expect priority based jitter for Gpio2
ƒ Issue is jitter for Gpio1 (purple double line)

CM3 Other w/nesting Other, no nesting


Gpio1 triggers Gpio1 triggers Gpio1 triggers
1. Gpio1 completes before Gpio2
(no delay for either)
2. Gpio2 comes when in Gpio1
(Gpio2 gets delayed - pri)
3. Gpio2 comes before Gpio1
(Gpio2 gets pre-empted if can)
4. Gpio2 comes earlier before Gpio1
(Gpio2 gets pre-empted if can)

Confidential 13
Effect of Critical sections
ƒ Critical sections tend to “pack” interrupts at the enable point.
ƒ This is made worse when triggers are result of outputs (cycles)
ƒ The input/output cycle moves against enable
ƒ Over time, more do this, and inputs tend to land on each other
ƒ Long latency instructions, on processors that block, do this too
ƒ CM3 provides ways to mitigate most and avoid many
ƒ When you need them, use priority masking (BASEPRI) not disable
ƒ Don’t punish the ISRs that are not using the critical data!

Confidential 14
Performance and size
ƒ Code ported from 8-bit/16-bit may bloat on 32-bit
ƒ Short/char locals can cause 40%+ increase in size and speed impact
ƒ Use ints (unsigned, int, long, unsigned long) – they are optimal
ƒ Can up-cast from smaller global/statics (e.g. extern short x; int lx = (int)x;)
ƒ Do not take address of local, forces to stack – otherwise in register only
ƒ How you access peripherals affects performance and size a lot
ƒ Casted constants may be worst way! (e.g. *((unsigned*)0x40001008)
ƒ Smaller number of larger functions more optimal (opposite of 8-bit)
ƒ Back-to-back loads from peripheral is faster and smaller
ƒ Avoiding back-to-back stores to peripheral is faster
ƒ Use optimizer
ƒ Many 8-bit/16-bit compilers have no real optimizer – very important on 32-bit
ƒ Code size and performance are dramatically affected (often >30%)
ƒ Check if compiler defaults to optimize for size or speed – not consistent
ƒ Use volatile for peripheral pointers (#define or not) and peripheral objects
ƒ Optimizer may get rid of code, reverse order, or otherwise “optimize”

Confidential 15
Using locals smaller than register size
Locals of size int Locals of size short int (half word)
typedef int BASE; typedef short BASE;
BASE foo(BASE last, BASE x, BASE y) { BASE foo(BASE last, BASE x, BASE y) {
0: 2300 movs r3, #0 0: f04f 0c00 mov.w ip, #0 ; 0x0
2: e002 b.n a <foo+0xa> 4: e004 b.n 10 <foo+0x10>
BASE i; BASE i;
for (i = 0; i < last; i++) for (i = 0; i < last; i++)
x += (y * x); x += (y * x);
4: fb02 1101 mla r1, r2, r1, r1 6: fb02 1301 mla r3, r2, r1, r1
8: 3301 adds r3, #1 a: f10c 0c01 add.w ip, ip, #1 ;
a: 4283 cmp r3, r0 0x1
c: dbfa blt.n 4 <foo+0x4> e: b219 sxth r1, r3
e: ebc2 0001 rsb r0, r2, r1 10: fa0f f38c sxth.w r3, ip
return(x-y);} 14: 4283 cmp r3, r0
12: 4770 bx lr 16: dbf6 blt.n 6 <foo+0x6>
18: ebc2 0001 rsb r0, r2, r1
1c: b200 sxth r0, r0
return(x-y);}
1e: 4770 bx lr

ƒ A short int local added 12 extra bytes to a function of 20 bytes.


ƒ Worse, it has added 2 extra cycles to each iteration (a 5 cycle loop)
ƒ Note: ARM7/ARM9 using Thumb code is 28 bytes with int (but much slower)
ƒ 40 bytes with short int (so, +12)
ƒ Extra 12 bytes for the short int for Thumb is due to using shift-left and then shift-right
to sign or unsign extend, so 4 extra cycles per loop.

Confidential 16
Application style
ƒ Application design affects performance, size, power use
ƒ Three most common types
ƒ Pure interrupt
ƒ Polling (PLC, DSP style, event/PID loop, etc)
ƒ Polling/RTOS with ISRs
ƒ Many people move to polling due to processor issues
ƒ When 30% or more lost to interrupts, context switching, etc, what
choice?
ƒ Pure interrupt ideal for many smaller applications
ƒ Polling/RTOS with ISRs gives excellent design options
ƒ Communications in ISRs
ƒ Time critical operations in ISRs
ƒ The rest is easier to design and program

Confidential 17
Application design – mixed example
Motor control ISRs (e.g. PWM, ADC)

Communication ISRs (e.g. ENET, CAN)

Main application (foreground)


t
ƒ Main application runs as foreground (base level)
ƒ Easy to write since no “factoring” – just normal application or RTOS based
ƒ Can use PLC style state-machine poll loop safely: ISRs keep data available
ƒ ISRs for Motor control are highest priority(ies)
ƒ PWM, ADCs, Timer(s), Fault (may be highest), Temp sensor, etc
ƒ ISRs for communications below that
ƒ Ethernet, CAN, and/or serial
ƒ May use other priorities as needed
ƒ Very fast interrupt response time, true nested interrupts, priority masking, easy ISR setup
all contribute to making an easy solution
ƒ Application uses priority masking vs. interrupt-disable if needs critical region

Confidential 18
Avoiding interrupt latency on Cortex-M3
ƒ I have critical data, don’t I just create latency with int disable?
ƒ Three easy ways to avoid this
ƒ BASEPRI and BASEPRI_MAX: set priority to mask, don’t disable
ƒ If critical data used by priorities 5 to 7, set BASEPRI to 5
ƒ Interrupts 0 to 4 can still activate as normal (e.g. motor control)
ƒ BASEPRI_MAX will only change if makes higher priority mask
ƒ No compare needed. Set, critical-section, restore w/BASEPRI
ƒ Exclusives (LDREX/STREX for byte, half, word)
ƒ Much better than test-and-set
ƒ ISRs can set/clear data non-locking/non-blocking
ƒ main loop and lower priority ISRs just try again – no block/lock
ƒ E.g. RTOS queues between thread/ISR with no critical section
ƒ Bit band forms atomic read-modify-write on SRAM and Peripherals
ƒ Set population/claim/request bits
ƒ E.g. Thread-wake population bit + PendSV

Confidential 19
Polling vs. interrupt
ƒ Polling is poor use of processor (wastes time)
ƒ Introduces jitter (based on loop size, load time, etc)
ƒ Performance degrades quickly as more checks added
ƒ Most common reason used is easier to understand
ƒ If interrupt overhead is low, it is better use
ƒ Some processors add so much overhead that polling is better
ƒ Cortex-M3 offers low overhead and low latency
ƒ With multiple priorities and low latency, easily understood behavior
ƒ FIFOed communication peripherals offer best of both
ƒ Amortize whatever interrupt overhead, but no extra spins polling
ƒ If interrupt overhead is too high, then FIFO needed just to work at all

Confidential 20
Poll loop – simple example
for (i = 0; i < loops; i++)
{ // poll loop
if(HWREG(GPIO_IN_PORT_CLOCK))
{ // detect high, drive high
HWREG(GPIO_SCOPE_PORT) = PIN_OUT_SCOPE;
break;
}
else
{ // detect low, drive low
HWREG(GPIO_SCOPE_PORT) = 0;
}
}
// now capture data

Confidential 21
Polling: read of input, write output
C-M3
(50MHz)
INPUT
(CLOCK)

OUTPUT
(SCOPE)

190ns 255ns jitter


fastest

Confidential 22
Polling: ARM7 (same clock speed)
ARM7
(60MHz)
INPUT
(CLOCK)

OUTPUT
(SCOPE)

230ns fastest 500ns jitter

Confidential 23
Interrupt driven
ƒ CM3 – can nest and prioritize, etc
void GPIO_trigger_ISR(void) { // on falling
define locals for rest of routine here
HWREG(GPIO_SCOPE_PORT) = PIN_OUT_SCOPE;// drive high
// now capture data, etc

HWREG(GPIO_SCOPE_PORT) = 0; // drive low
return; // done

ƒ ARM7 version – no nest: one at a time (fastest if no nest)


__irq __arm void GPIO_trigger_ISR(void) { // on falling
// __irq means pushes registers at start, pops at
// end, and special return instruction
define locals for rest of routine here
HWREG(GPIO_SCOPE_PORT) = PIN_OUT_SCOPE;// drive high
// now capture data, etc

VICVectAddr = 0; // clear VIC
HWREG(GPIO_SCOPE_PORT) = 0; // drive low
return; // done

Confidential 24
Interrupt: drive high on falling edge (+work)
C-M3
(50MHz)
FRAME

CLOCK

DATA

SCOPE 540ns Min


(ISR) 27 cyc ~640ns

Cost of interrupt (12), prologue


of function, GPIO address load, NO JITTER
STR and propagation. Will be
less for some functions.
Confidential 25
Interrupt: ARM7 (same operation)
ARM7
(60MHz)
FRAME

CLOCK

DATA

SCOPE 820 ns Min


(ISR) 39 cyc ~1340ns

Note: time will increase if interrupts a


long instruction. Nesting support
adds >18 cycles (mode change)

Confidential 26
RTOS
ƒ Concerns about using an RTOS
ƒ Efficiency of Task Switching
ƒ Extra Memory Used
ƒ Cortex-M3 has many RTOS-friendly features
ƒ Faster/easier context switch - PendSV
ƒ Separation of service call (SVC) and context switch
ƒ Option of separate thread vs. interrupt/system stack
ƒ User/privilege for those that need it (use SVC vs. call)
ƒ Standard timer, SYSTICK
ƒ Standard interrupt controller
ƒ MPU for safety

Confidential 27
PendSV for context switch
ƒ PendSV is software triggered exception
ƒ Pended, so executes when priority allows
ƒ Can be set by scheduler, ISR, or system code
ƒ Can be used with SVC or not (all privileged code)
ƒ Set at low(est) priority in the system
ƒ Ensures it is the last handler to run (tail chaining)
ƒ On entry, half of interrupted thread is already saved
ƒ Steps are simple:
ƒ Save other half on old process stack
ƒ Retrieve new process stack from TCB
ƒ Switch process stack
ƒ Load half of new process context from process stack
ƒ Exception return (loads rest in HW)

Confidential 28
RTOS using PendSV (and maybe SVC)
Thread calls system for request, which then uses
Threads Privileged: SVC when blocking/thread-change needed. SVCall
uses PendSV to cause dispatch. Key for all figures:
App
T1 T2 System
Kernel
Tail chain
Thread calls system for request, which then uses Tail chain
PendSV to cause dispatch.

T1 T2

Threads User, System Privileged:


Thread uses SVC to make system request. SVCall
uses PendSV to cause dispatch if request causes
blocking/thread-change.

T1 T2

Interrupt comes in, makes system call, changes next thread:

PendSV was re-pended and so tail-chains to


itself – causing a possible rescheduling.

T1 T3

Interrupt calls system, side effect is


rescheduling needed, so pends PendSV.

Confidential 29
FreeRTOS.org Context Switching
ARM7 (SWI) Cortex-M3 (PendSV)
Context Save Context Save
Save R0 Get PSP in R0
Get task SP in R0 Push R4-R11 on task stack
Save return address Push nesting depth
Restore R0 Store new SP in TCB
Push all registers (task stack) (11 Thumb-2 instructions, far
Push SPSR fewer instructions, ints not
Push nesting depth on stack blocked)
Store new task SP in TCB
(19 ARM instructions, many
cycles, ints blocked in Push)
Context Restore Context Restore
Get task SP from TCB Get SP from TCB
Pop nesting depth Pop nesting depth
Pop SPSR Pop R4-R11
Pop all registers Load PSP with new task stack
Pop return address If non-zero nesting, mask ints
Return (new task) Return (new task)
(12 ARM instructions, …) (14 Thumb-2 instructions,
12 or 13 executed)

Confidential 30
Example FreeRTOS.org timing
Cortex-M3 ARM7
Time per switch 4 µs/switch† 6.9 µs/switch
(thread+kernel)
Switches/second 250K/sec 145K/sec

Image size 3504 4676 (thumb)

ƒ Simple 2-task application that switches between them


ƒ Cortex-M3 at 50MHz, ARM7 at 60MHz
ƒ GCC for both, Thumb mode for ARM7 threads
ƒ † - Cortex-M3 is even faster with newer FreeRTOS
version that has just been released.

Confidential 31
Many excellent RTOS ports available
ƒ CMX Systems CMX-RTX and CMX-Tiny
ƒ Express Logic ThreadX
ƒ FreeRTOS.org FreeRTOS
ƒ IAR PowerPac
ƒ Interniche NicheTask
ƒ Keil/ARM RTX
ƒ Micrium μC/OS-II
ƒ Pumpkin Salvo
ƒ Segger embOS
ƒ Others…

Confidential 32
Lower total BOM cost
ƒ Do more in SW
ƒ For example, motor control
ƒ Bit-bang vs. CPLD or FPGA
ƒ High speed serial to accomplish more
ƒ Use lower cost components when can offload work
ƒ Higher end peripherals
ƒ More supportable with Cortex-M3, so can do more
ƒ Can service higher rates
ƒ e.g. 100baseT, 1Mbps CAN, 1Msps ADC, 25MHz SPI, etc
ƒ Safety (e.g. IEC 61508)
ƒ Faults, MPU, lock-up, NMI, prioritized ISRs for deterministic response
ƒ What was two or three 8-bit MCUs can be done in one
ƒ Acts like virtual multi-processor (via ISRs)

Confidential 33
Special instructions
ƒ Thumb-2 and Cortex-M3 have many special instructions
ƒ Many are directly used by compiler
ƒ e.g. SDIV/UDIV, MUL/MLA/MLS, UMULL/SMULL/SMLAL/UMLAL,
SBFX/UBFX, BFI/BFC, MOVT/MOVW, SXTH/UXTH/SXTB/UXTB
ƒ Some compilers may detect some cases and use:
ƒ e.g. REV/REV16/REVSH, CLZ
ƒ Else, use access “instruction intrinsics” (e.g. ntohs/htons inlined)
ƒ Others available through “instruction intrinsics”
ƒ e.g. USAT, SSAT, RBIT, WFI, WFE, SEV, MSR, MRS, CPS, etc
ƒ System features available as memory mapped registers
ƒ NVIC controls, setup, management
ƒ Most system controls, systick, reset control, MPU, etc
ƒ MPU optimized to allow STM/LDM to handle multiple regions at once
ƒ Also allows sub-regions for better granularity

Confidential 34
Sleep primitives
ƒ Sleep vs. Deep-sleep – memory mapped register
ƒ Deep sleep allows chip vendor more cycles to wakeup
ƒ Sleep-on-exit control
ƒ When last ISR returns, sleep
ƒ Idle thread – skips pop/push for no purpose
ƒ WFI – wait for interrupt to wake up, sleep until
ƒ WFE – wait for event, sleep until
ƒ Trip-latch – remembers previous set (SEV, or event)
ƒ Wakes on interrupt pending if SEVONPEND
ƒ Used for intelligent polling
ƒ Makes for non-bus contending poll

Confidential 35
Using SWV to get interrupt trace
ƒ Accurate to the cycle (e.g. 20ns at 50MHz)
ƒ Can see jitter, variability of execution time, periodicity, etc
ƒ Allows seeing nesting behavior (pre-emption)
ƒ Can also see related to sleep time and main thread time
ƒ Can be intermixed with other traced info, to see real behavior
ƒ For example, RTOS trace, watch-trace, host strings, etc

Confidential 36
Using SWV for extreme accuracy profiling
ƒ HW PC Sampling at speeds such as 48,828 samples/second
ƒ CPI calculations add detailed information on mix of instructions and overhead

Confidential 37
Concerned about Cortex-M3 maturity?
ƒ Cortex-M3 has exceeded the high reliability and maturity
standard set by previous cores by a wide margin
ƒ The r1p0 core used in Stellaris Sandstorm parts, and the r1p1 core
used in Stellaris Fury parts have had no application affecting bugs
ƒ Additions/changes have been features and minor trace related fixes
ƒ This stability and lack of errors has shown the high quality of the
modern ARM validation and test model
ƒ It has also shown the value of the support that Luminary and other
lead partners has given ARM in ensuring the highest quality core
ƒ Moving forward
ƒ Shyam has covered
ƒ Goal oriented: focus on end users
ƒ ARM and its partners working together to get best benefit
ƒ Ultra low power, specific performance, specialized areas

Confidential 38
Conclusion
ƒ You may move to all C/C++ and off-the-shelf code
ƒ Assembly should be unnecessary – you can use intrinsics if needed
ƒ If coming from 8-bit/16-bit make sure using ints/unsigned
ƒ Optimizer is important – size and/or performance (can mix/match)
ƒ Do not be afraid to use interrupts
ƒ Use priority masking vs. interrupt disable for critical sections
ƒ Do not be afraid to use an RTOS if application suits
ƒ Reduce BOM cost by reducing parts on board, reducing
number of MCUs, and doing more in SW
ƒ Cortex-M3 based MCUs exceed quality and reliability
standards

Confidential 39

S-ar putea să vă placă și