Sunteți pe pagina 1din 81

Processor Modes

* The ARM has six operating modes:


• User (unprivileged mode under which most tasks run)

ARM
Advanced RISC Machines
• FIQ (entered when a high priority (fast) interrupt is raised)
• IRQ (entered when a low priority (normal) interrupt is raised)
• Supervisor (entered on reset and when a Software Interrupt instruction is
executed)
• Abort (used to handle memory access violations)
The ARM Instruction Set • Undef (used to handle undefined instructions)
* ARM Architecture Version 4 adds a seventh mode:
• System (privileged mode using the same registers as user mode)

The ARM Instruction Set - ARM University Program - V1.0 1 The ARM Instruction Set - ARM University Program - V1.0 2

The Registers Register Organisation


General registers and Program Counter
* ARM has 37 registers in total, all of which are 32-bits long. User32 / System FIQ32 Supervisor32 Abort32 IRQ32 Undefined32
• 1 dedicated program counter r0 r0 r0 r0 r0 r0
r1 r1 r1 r1 r1 r1
• 1 dedicated current program status register r2 r2 r2 r2 r2 r2
r3 r3 r3 r3 r3 r3
• 5 dedicated saved program status registers r4 r4 r4 r4 r4 r4
r5 r5 r5 r5 r5 r5
• 30 general purpose registers r6 r6 r6 r6 r6 r6
r7 r7 r7 r7 r7 r7
* However these are arranged into several banks, with the accessible r8 r8_fiq r8 r8 r8 r8

bank being governed by the processor mode. Each mode can access r9
r10
r9_fiq
r10_fiq
r9
r10
r9
r10
r9
r10
r9
r10

• a particular set of r0-r12 registers r11


r12
r11_fiq
r12_fiq
r11
r12
r11
r12
r11
r12
r11
r12

• a particular r13 (the stack pointer) and r14 (link register) r13 (sp)
r14 (lr)
r13_fiq
r14_fiq
r13_svc
r14_svc
r13_abt
r14_abt
r13_irq
r14_irq
r13_undef
r14_undef

• r15 (the program counter) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc)

• cpsr (the current program status register) Program Status Registers


and privileged modes can also access cpsr cpsr
sprsr_fiq
spsr_fiq
cpsr
spsr_svc
cpsr
spsr_abt
cpsr
sprsr_fiq
spsr_irq
cpsr
spsr_undef
sprsr_fiq

• a particular spsr (saved program status register)

The ARM Instruction Set - ARM University Program - V1.0 3 The ARM Instruction Set - ARM University Program - V1.0 4

Register Example: Accessing Registers using


User to FIQ Mode ARM Instructions
Registers in use Registers in use * No breakdown of currently accessible registers.
User Mode FIQ Mode • All instructions can access r0-r14 directly.
r0 r0
r1
r2
r1
r2
• Most instructions also allow use of the PC.
r3 r3 * Specific instructions to allow access to CPSR and SPSR.
r4 r4
r5
r6
r5
r6
* Note : When in a privileged mode, it is also possible to load / store the
r7 r7 (banked out) user mode registers to or from memory.
r8 r8_fiq EXCEPTION r8 r8_fiq
r9 r9_fiq r9 r9_fiq • See later for details.
r10 r10_fiq r10 r10_fiq
r11 r11_fiq r11 r11_fiq
r12 r12_fiq r12 r12_fiq
r13 (sp) r13_fiq r13 (sp) r13_fiq
r14 (lr) r14_fiq r14 (lr) r14_fiq
r15 (pc) r15 (pc)
Return address calculated from User mode
cpsr
PC value and stored in FIQ mode LR cpsr
spsr_fiq spsr_fiq

User mode CPSR copied to FIQ mode SPSR

The ARM Instruction Set - ARM University Program - V1.0 5 The ARM Instruction Set - ARM University Program - V1.0 6
The Program Status Registers
Condition Flags
(CPSR and SPSRs)
Logical Instruction Arithmetic Instruction
31 28 8 4 0
Flag
N Z CV I F T Mode
Negative No meaning Bit 31 of the result has been set
(N=‘1’) Indicates a negative number in
Copies of the ALU status flags (latched if the signed operations
instruction has the "S" bit set).
Zero Result is all zeroes Result of operation was zero
* Condition Code Flags * Interrupt Disable bits. (Z=‘1’)
N = Negative result from ALU flag. I = 1, disables the IRQ.
Z = Zero result from ALU flag. F = 1, disables the FIQ. Carry After Shift operation Result was greater than 32 bits
C = ALU operation Carried out (C=‘1’) ‘1’ was left in carry flag
V = ALU operation oVerflowed * T Bit (Architecture v4T only)
T = 0, Processor in ARM state oVerflow No meaning Result was greater than 31 bits
T = 1, Processor in Thumb state (V=‘1’) Indicates a possible corruption of
* Mode Bits the sign bit in signed
M[4:0] define the processor mode. numbers

The ARM Instruction Set - ARM University Program - V1.0 7 The ARM Instruction Set - ARM University Program - V1.0 8

Exception Handling
The Program Counter (R15)
and the Vector Table
* When the processor is executing in ARM state:
• All instructions are 32 bits in length * When an exception occurs, the core: Reset
0x00000000
• All instructions must be word aligned • Copies CPSR into SPSR_<mode>
Undefined Instruction
0x00000004
• Therefore the PC value is stored in bits [31:2] with bits [1:0] equal to • Sets appropriate CPSR bits
Software Interrupt
0x00000008
zero (as instruction cannot be halfword or byte aligned). If core implements ARM Architecture 4T and is
0x0000000C Prefetch Abort
currently in Thumb state, then
* R14 is used as the subroutine link register (LR) and stores the return 0x00000010 Data Abort
ARM state is entered.
address when Branch with Link operations are performed, Reserved
Mode field bits 0x00000014
calculated from the PC.
0x00000018 IRQ
* Thus to return from a linked branch Interrupt disable flags if appropriate.
• MOV r15,r14 • Maps in appropriate banked registers 0x0000001C FIQ

• Stores the “return address” in LR_<mode>


or
• Sets PC to vector address
• MOV pc,lr
* To return, exception handler needs to:
• Restore CPSR from SPSR_<mode>
• Restore PC from LR_<mode>
The ARM Instruction Set - ARM University Program - V1.0 9 The ARM Instruction Set - ARM University Program - V1.0 10

The Instruction Pipeline Quiz #1 - Verbal


* The ARM uses a pipeline in order to increase the speed of the flow of * What registers are used to store the program counter and link register?
instructions to the processor.
• Allows several operations to be undertaken simultaneously, rather than
serially.
* What is r13 often used to store?
ARM

PC FETCH Instruction fetched from memory

* Which mode, or modes has the fewest available number of registers


available? How many and why?
PC - 4 DECODE Decoding of registers used in instruction

Register(s) read from Register Bank


PC - 8 EXECUTE
Shift and ALU operation
Write register(s) back to Register Bank
* Rather than pointing to the instruction being executed, the
PC points to the instruction being fetched.

The ARM Instruction Set - ARM University Program - V1.0 11 The ARM Instruction Set - ARM University Program - V1.0 12
ARM Instruction Set Format
Conditional Execution
31 2827 1615 87 0 Instruction type
Cond 0 0 I Opcode S Rn Rd Operand2 Data processing / PSR Transfer * Most instruction sets only allow branches to be executed conditionally.
Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
* However by reusing the condition evaluation hardware, ARM effectively
Cond 0 0 0 0 1 U A S RdHi RdLo Rs 1 0 0 1 Rm Long Multiply (v3M / v4 only) increases number of instructions.
Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
• All instructions contain a condition field which determines whether the
Cond 0 1 I P U B W L Rn Rd Offset Load/Store Byte/Word CPU will execute them.
Cond 1 0 0 P U S W L Rn Register List Load/Store Multiple • Non-executed instructions soak up 1 cycle.
Cond 0 0 0 P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Halfword transfer : Immediate offset (v4 only)
– Still have to complete cycle so as to allow fetching and decoding of
Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword transfer: Register offset (v4 only)
following instructions.
Cond 1 0 1 L Offset Branch
* This removes the need for many branches, which stall the pipeline (3
Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange (v4T only) cycles to refill).
Cond 1 1 0 P U N W L Rn CRd CPNum Offset Coprocessor data transfer
• Allows very dense in-line code, without branches.
Cond 1 1 1 0 Op1 CRn CRd CPNum Op2 0 CRm Coprocessor data operation
• The Time penalty of not executing several conditional instructions is
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op2 1 CRm Coprocessor register transfer frequently less than overhead of the branch
Cond 1 1 1 1 SWI Number Software interrupt or subroutine call that would otherwise be needed.

The ARM Instruction Set - ARM University Program - V1.0 13 The ARM Instruction Set - ARM University Program - V1.0 14

The Condition Field Using and updating the


Condition Field
28 24 20 16 4 0
31 12 8
* To execute an instruction conditionally, simply postfix it with the
Cond appropriate condition:
• For example an add instruction takes the form:
– ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)
0000 = EQ - Z set (equal) 1001 = LS - C clear or Z (set unsigned
lower or same) • To execute this only if the zero flag is set:
0001 = NE - Z clear (not equal)
0010 = HS / CS - C set (unsigned
1010 = GE - N set and V set, or N clear – ADDEQ r0,r1,r2 ; If zero flag set then…
and V clear (>or =) ; ... r0 = r1 + r2
higher or same)
1011 = LT - N set and V clear, or N clear
0011 = LO / CC - C clear (unsigned and V set (>) * By default, data processing operations do not affect the condition flags
lower) (apart from the comparisons where this is the only effect). To cause the
1100 = GT - Z clear, and either N set and
0100 = MI -N set (negative)
V set, or N clear and V set (>) condition flags to be updated, the S bit of the instruction needs to be set
0101 = PL - N clear (positive or by postfixing the instruction (and any condition code) with an “S”.
1101 = LE - Z set, or N set and V clear,or
zero)
N clear and V set (<, or =) • For example to add two numbers and set the condition flags:
0110 = VS - V set (overflow)
1110 = AL - always – ADDS r0,r1,r2 ; r0 = r1 + r2
0111 = VC - V clear (no overflow) ; ... and set flags
1111 = NV - reserved.
1000 = HI - C set and Z clear
(unsigned higher)
The ARM Instruction Set - ARM University Program - V1.0 15 The ARM Instruction Set - ARM University Program - V1.0 16

Branch instructions (1) Branch instructions (2)


* Branch : B{<cond>} label * When executing the instruction, the processor:
* Branch with Link : BL{<cond>} sub_routine_label • shifts the offset left two bits, sign extends it to 32 bits, and adds it to PC.
31 28 27 25 24 23 0
* Execution then continues from the new PC, once the pipeline has been
refilled.
Cond 1 0 1 L Offset
* The "Branch with link" instruction implements a subroutine call by
Link bit 0 = Branch writing PC-4 into the LR of the current bank.
1 = Branch with link
• i.e. the address of the next instruction following the branch with link
Condition field
(allowing for the pipeline).
* The offset for branch instructions is calculated by the assembler: * To return from subroutine, simply need to restore the PC from the LR:
• By taking the difference between the branch instruction and the • MOV pc, lr
target address minus 8 (to allow for the pipeline).
• Again, pipeline has to refill before execution continues.
• This gives a 26 bit offset which is right shifted 2 bits (as the
bottom two bits are always zero as instructions are word – * The "Branch" instruction does not affect LR.
aligned) and stored into the instruction encoding. * Note: Architecture 4T offers a further ARM branch instruction, BX
• This gives a range of  32 Mbytes. • See Thumb Instruction Set Module for details.

The ARM Instruction Set - ARM University Program - V1.0 17 The ARM Instruction Set - ARM University Program - V1.0 18
Data processing Instructions Arithmetic Operations
* Largest family of ARM instructions, all sharing the same instruction * Operations are:
format. • ADD operand1 + operand2
* Contains: • ADC operand1 + operand2 + carry
• Arithmetic operations • SUB operand1 - operand2
• Comparisons (no results - just set condition codes) • SBC operand1 - operand2 + carry -1
• Logical operations • RSB operand2 - operand1
• Data movement between registers • RSC operand2 - operand1 + carry - 1
* Remember, this is a load / store architecture * Syntax:
• These instruction only work on registers, NOT memory. • <Operation>{<cond>}{S} Rd, Rn, Operand2
* They each perform a specific operation on one or two operands. * Examples
• First operand always a register - Rn • ADD r0, r1, r2
• Second operand sent to the ALU via barrel shifter. • SUBGT r3, r3, #1
* We will examine the barrel shifter shortly. • RSBLES r4, r5, #5

The ARM Instruction Set - ARM University Program - V1.0 19 The ARM Instruction Set - ARM University Program - V1.0 20

Comparisons Logical Operations


* The only effect of the comparisons is to * Operations are:
• UPDATE THE CONDITION FLAGS. Thus no need to set S bit. • AND operand1 AND operand2
* Operations are: • EOR operand1 EOR operand2
• CMP operand1 - operand2, but result not written • ORR operand1 OR operand2
• CMN operand1 + operand2, but result not written • BIC operand1 AND NOT operand2 [ie bit clear]
• TST operand1 AND operand2, but result not written * Syntax:
• TEQ operand1 EOR operand2, but result not written • <Operation>{<cond>}{S} Rd, Rn, Operand2
* Syntax: * Examples:
• <Operation>{<cond>} Rn, Operand2 • AND r0, r1, r2
* Examples: • BICEQ r2, r3, #7
• CMP r0, r1 • EORS r1,r3,r0
• TSTEQ r2, #5

The ARM Instruction Set - ARM University Program - V1.0 21 The ARM Instruction Set - ARM University Program - V1.0 22

Data Movement Quiz #2


* Operations are: Start
• MOV operand2 * Convert the GCD
algorithm given in this
• MVN NOT operand2
flowchart into
Note that these make no use of operand1.
r0 = r1 Yes
Stop 1) “Normal” assembler,
* Syntax: ? where only branches can
• <Operation>{<cond>}{S} Rd, Operand2 be conditional.
No
* Examples: 2) ARM assembler, where
all instructions are
• MOV r0, r1
Yes r0 > r1 No
conditional, thus
• MOVS r2, #10 ? improving code density.
• MVNEQ r1,#0
r0 = r0 - r1 r1 = r1 - r0 * The only instructions you
need are CMP, B and SUB.

The ARM Instruction Set - ARM University Program - V1.0 23 The ARM Instruction Set - ARM University Program - V1.0 24
Quiz #2 - Sample Solutions The Barrel Shifter
“Normal” Assembler * The ARM doesn’t have actual shift instructions.

gcd cmp r0, r1 ;reached the end?


* Instead it has a barrel shifter which provides a mechanism to carry out
beq stop
shifts as part of other instructions.
blt less ;if r0 > r1
sub r0, r0, r1 ;subtract r1 from r0
bal gcd * So what operations does the barrel shifter support?
less sub r1, r1, r0 ;subtract r0 from r1
bal gcd
stop

ARM Conditional Assembler

gcd cmp r0, r1 ;if r0 > r1


subgt r0, r0, r1 ;subtract r1 from r0
sublt r1, r1, r0 ;else subtract r0 from r1
bne gcd ;reached the end?

The ARM Instruction Set - ARM University Program - V1.0 25 The ARM Instruction Set - ARM University Program - V1.0 26

Barrel Shifter - Left Shift Barrel Shifter - Right Shifts


* Shifts left by the specified amount (multiplies by powers of two) e.g. Logical Shift Right
Logical Shift Right
LSL #5 = multiply by 32 •Shifts right by the
specified amount
(divides by powers of ...0 Destination CF
two) e.g.
LSR #5 = divide by 32

Logical Shift Left (LSL)


Arithmetic Shift Right Arithmetic Shift Right
•Shifts right (divides by
powers of two) and
CF Destination 0 preserves the sign bit, Destination CF
for 2's complement
operations. e.g. Sign bit shifted in
ASR #5 = divide by 32

The ARM Instruction Set - ARM University Program - V1.0 27 The ARM Instruction Set - ARM University Program - V1.0 28

Using the Barrel Shifter:


Barrel Shifter - Rotations
The Second Operand
Rotate Right (ROR) Rotate Right Operand Operand * Register, optionally with shift
• Similar to an ASR but the 1 2 operation applied.
bits wrap around as they * Shift value can be either be:
leave the LSB and appear as Destination CF
the MSB. • 5 bit unsigned integer
• Specified in bottom byte of
e.g. ROR #5 Barrel another register.
• Note the last bit rotated is Shifter
also used as the Carry Out. * Immediate value
• 8 bit number
Rotate Right Extended (RRX) • Can be rotated right through
Rotate Right through Carry an even number of
• This operation uses the ALU positions.
CPSR C flag as a 33rd bit. • Assembler will calculate
• Rotates right by 1 bit. Destination CF rotate for you from
Encoded as ROR #0. constant.
Result
The ARM Instruction Set - ARM University Program - V1.0 29 The ARM Instruction Set - ARM University Program - V1.0 30
Second Operand : Second Operand :
Shifted Register Using a Shifted Register
* The amount by which the register is to be shifted is contained in * Using a multiplication instruction to multiply by a constant means first
either: loading the constant into a register and then waiting a number of
internal cycles for the instruction to complete.
• the immediate 5-bit field in the instruction
* A more optimum solution can often be found by using some combination
– NO OVERHEAD of MOVs, ADDs, SUBs and RSBs with shifts.
– Shift is done for free - executes in single cycle. • Multiplications by a constant equal to a ((power of 2)  1) can be done in
• the bottom byte of a register (not PC) one cycle.
– Then takes extra cycle to execute * Example: r0 = r1 * 5
– ARM doesn’t have enough read ports to read 3 registers at Example: r0 = r1 + (r1 * 4)
once. ï ADD r0, r1, r1, LSL #2
– Then same as on other processors where shift is * Example: r2 = r3 * 105
separate instruction. Example: r2 = r3 * 15 * 7
* If no shift is specified then a default shift is applied: LSL #0 Example: r2 = r3 * (16 - 1) * (8 - 1)
ï RSB r2, r3, r3, LSL #4 ; r2 = r3 * 15
• i.e. barrel shifter has no effect on value in register. ï RSB r2, r2, r2, LSL #3 ; r2 = r2 * 7

The ARM Instruction Set - ARM University Program - V1.0 31 The ARM Instruction Set - ARM University Program - V1.0 32

Second Operand : Second Operand :


Immediate Value (1) Immediate Value (2)
* There is no single instruction which will load a 32 bit immediate constant * This gives us:
into a register without performing a data load from memory. • 0 - 255 [0 - 0xff]
• All ARM instructions are 32 bits long • 256,260,264,..,1020 [0x100-0x3fc, step 4, 0x40-0xff ror 30]
• ARM instructions do not use the instruction stream as data. • 1024,1040,1056,..,4080 [0x400-0xff0, step 16, 0x40-0xff ror 28]
* The data processing instruction format has 12 bits available for • 4096,4160, 4224,..,16320 [0x1000-0x3fc0, step 64, 0x40-0xff ror 26]
operand2 * These can be loaded using, for example:
• If used directly this would only give a range of 4096. • MOV r0, #0x40, 26 ; => MOV r0, #0x1000 (ie 4096)
* Instead it is used to store 8 bit constants, giving a range of 0 - 255. * To make this easier, the assembler will convert to this form for us if
* These 8 bits can then be rotated right through an even number of simply given the required constant:
positions (ie RORs by 0, 2, 4,..30). • MOV r0, #4096 ; => MOV r0, #0x1000 (ie 0x40 ror 26)
• This gives a much larger range of constants that can be directly loaded, * The bitwise complements can also be formed using MVN:
though some constants will still need to be loaded
from memory. • MOV r0, #0xFFFFFFFF ; assembles to MVN r0, #0
* If the required constant cannot be generated, an error will
be reported.
The ARM Instruction Set - ARM University Program - V1.0 33 The ARM Instruction Set - ARM University Program - V1.0 34

Loading full 32 bit constants Multiplication Instructions


* Although the MOV/MVN mechansim will load a large range of constants * The Basic ARM provides two multiplication instructions.
into a register, sometimes this mechansim will not generate the required * Multiply
constant.
• MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs
* Therefore, the assembler also provides a method which will load ANY 32
bit constant: * Multiply Accumulate - does addition for free
• LDR rd,=numeric constant • MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn
* If the constant can be constructed using either a MOV or MVN then this * Restrictions on use:
will be the instruction actually generated. • Rd and Rm cannot be the same register
* Otherwise, the assembler will produce an LDR instruction with a PC- – Can be avoid by swapping Rm and Rs around. This works because
relative address to read the constant from a literal pool. multiplication is commutative.
• LDR r0,=0x42 ; generates MOV r0,#0x42 • Cannot use PC.
• LDR r0,=0x55555555 ; generate LDR r0,[pc, offset to lit pool]
These will be picked up by the assembler if overlooked.
* As this mechanism will always generate the best instruction for a given
case, it is the recommended way of loading constants. * Operands can be considered signed or unsigned
• Up to user to interpret correctly.

The ARM Instruction Set - ARM University Program - V1.0 35 The ARM Instruction Set - ARM University Program - V1.0 36
Multiplication Implementation Extended Multiply Instructions
* The ARM makes use of Booth’s Algorithm to perform integer * M variants of ARM cores contain extended multiplication
multiplication. hardware. This provides three enhancements:
* On non-M ARMs this operates on 2 bits of Rs at a time. • An 8 bit Booth’s Algorithm is used
• For each pair of bits this takes 1 cycle (plus 1 cycle to start with). – Multiplication is carried out faster (maximum for standard
• However when there are no more 1’s left in Rs, the multiplication will instructions is now 5 cycles).
early-terminate. • Early termination method improved so that now completes
* Example: Multiply 18 and -1 : Rd = Rm * Rs multiplication when all remaining bit sets contain
– all zeroes (as with non-M ARMs), or
Rm 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18 Rs – all ones.
Rs -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 Rm Thus the previous example would early terminate in 2 cycles in
both cases.
17 cycles 4 cycles • 64 bit results can now be produced from two 32bit operands
– Higher accuracy.
* Note: Compiler does not use early termination criteria to
decide on which order to place operands. – Pair of registers used to store result.

The ARM Instruction Set - ARM University Program - V1.0 37 The ARM Instruction Set - ARM University Program - V1.0 38

Multiply-Long and
Quiz #3
Multiply-Accumulate Long
1. Specify instructions which will implement the following:
* Instructions are a) r0 = 16 b) r1 = r0 * 4
• MULL which gives RdHi,RdLo:=Rm*Rs c) r0 = r1 / 16 ( r1 signed 2's comp.) d) r1 = r2 * 7
• MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
* However the full 64 bit of the result now matter (lower precision
multiply instructions simply throws top 32bits away) 2. What will the following instructions do?
• Need to specify whether operands are signed or unsigned a) ADDS r0, r1, r1, LSL #2 b) RSB r2, r1, #0
* Therefore syntax of new instructions are:
• UMULL{<cond>}{S} RdLo,RdHi,Rm,Rs 3. What does the following instruction sequence do?
• UMLAL{<cond>}{S} RdLo,RdHi,Rm,Rs ADD r0, r1, r1, LSL #1
• SMULL{<cond>}{S} RdLo, RdHi, Rm, Rs SUB r0, r0, r1, LSL #4
• SMLAL{<cond>}{S} RdLo, RdHi, Rm, Rs ADD r0, r0, r1, LSL #7
* Not generated by the compiler.
Warning : Unpredictable on non-M ARMs.
The ARM Instruction Set - ARM University Program - V1.0 39 The ARM Instruction Set - ARM University Program - V1.0 40

Load / Store Instructions Single register data transfer


* The ARM is a Load / Store Architecture: * The basic load and store instructions are:
• Does not support memory to memory data processing operations. • Load and Store Word or Byte
• Must move data values into registers before using them. – LDR / STR / LDRB / STRB
* This might sound inefficient, but in practice isn’t: * ARM Architecture Version 4 also adds support for halfwords and signed
• Load data values from memory into registers. data.
• Process data in registers using a number of data processing • Load and Store Halfword
instructions which are not slowed down by memory access. – LDRH / STRH
• Store results from registers out to memory. • Load Signed Byte or Halfword - load value and sign extend it to 32 bits.
* The ARM has three sets of instructions which interact with main – LDRSB / LDRSH
memory. These are: * All of these instructions can be conditionally executed by inserting the
• Single register data transfer (LDR / STR). appropriate condition code after STR / LDR.
• Block data transfer (LDM/STM). • e.g. LDREQB
• Single Data Swap (SWP). * Syntax:
• <LDR|STR>{<cond>}{<size>} Rd, <address>
The ARM Instruction Set - ARM University Program - V1.0 41 The ARM Instruction Set - ARM University Program - V1.0 42
Load and Store Word or Byte: Load and Store Word or Byte:
Base Register Offsets from the Base Register
* The memory location to be accessed is held in a base register * As well as accessing the actual location contained in the base register,
• STR r0, [r1] ; Store contents of r0 to location pointed to these instructions can access a location offset from the base register
; by contents of r1. pointer.
• LDR r2, [r1] ; Load r2 with contents of memory location * This offset can be
; pointed to by contents of r1. • An unsigned 12bit immediate value (ie 0 - 4095 bytes).
• A register, optionally shifted by an immediate value
r0 Memory
Source * This can be either added or subtracted from the base register:
Register 0x5
• Prefix the offset value or register with ‘+’ (default) or ‘-’.
for STR
* This offset can be applied:
• before the transfer is made: Pre-indexed addressing
r1 r2
Base Destination – optionally auto-incrementing the base register, by postfixing the
Register 0x200 0x200 0x5 0x5 Register instruction with an ‘!’.
for LDR
• after the transfer is made: Post-indexed addressing
– causing the base register to be auto-incremented.

The ARM Instruction Set - ARM University Program - V1.0 43 The ARM Instruction Set - ARM University Program - V1.0 44

Load and Store Word or Byte: Load and Store Word or Byte:
Pre-indexed Addressing Post-indexed Addressing
r0 Memory
* Example: STR r0, [r1,#12] Memory Source * Example: STR r0, [r1], #12
0x5 Register
for STR r0
Updated r1 Offset Source
Offset
Base 0x20c 12 0x20c
0x5 Register
12 0x20c 0x5 Register for STR
r1
Base
Register 0x200 0x200 0x200 0x5
r1
Original
Base 0x200
Register
* To auto-increment the base register to location 0x1f4 instead use:
* To store to location 0x1f4 instead use: STR r0, [r1,#-12] • STR r0, [r1], #-12
* To auto-increment base pointer to 0x20c use: STR r0, [r1, #12]! * If r2 contains 3, auto-incremenet base register to 0x20c by multiplying
this by 4:
* If r2 contains 3, access 0x20c by multiplying this by 4:
• STR r0, [r1], r2, LSL #2
• STR r0, [r1, r2, LSL #2]

The ARM Instruction Set - ARM University Program - V1.0 45 The ARM Instruction Set - ARM University Program - V1.0 46

Load and Stores Example Usage of


with User Mode Privilege Addressing Modes
* Imagine an array, the first element of which is pointed to by the contents
* When using post-indexed addressing, there is a further form of of r0.
Memory
Load/Store Word/Byte: * If we want to access a particular element, element Offset
• <LDR|STR>{<cond>}{B}T Rd, <post_indexed_address> then we can use pre-indexed addressing:
• r1 is element we want.
* When used in a privileged mode, this does the load/store with user mode • LDR r2, [r0, r1, LSL #2] 3 12

privilege. Pointer to 2 8
start of array
* If we want to step through every 1 4
• Normally used by an exception handler that is emulating a memory
access instruction that would normally execute in user mode. element of the array, for instance r0 0 0
to produce sum of elements in the
array, then we can use post-indexed addressing within a loop:
• r1 is address of current element (initially equal to r0).
• LDR r2, [r1], #4
Use a further register to store the address of final element,
so that the loop can be correctly terminated.

The ARM Instruction Set - ARM University Program - V1.0 47 The ARM Instruction Set - ARM University Program - V1.0 48
Offsets for Halfword and
Effect of endianess
Signed Halfword / Byte Access
* The Load and Store Halfword and Load Signed Byte or Halfword * The ARM can be set up to access its data in either little or big
instructions can make use of pre- and post-indexed addressing in much endian format.
the same way as the basic load and store instructions.
* Little endian:
* However the actual offset formats are more constrained:
• Least significant byte of a word is stored in bits 0-7 of an addressed
• The immediate value is limited to 8 bits (rather than 12 bits) giving an word.
offset of 0-255 bytes.
* Big endian:
• The register form cannot have a shift applied to it.
• Least significant byte of a word is stored in bits 24-31 of an
addressed word.
* This has no real relevance unless data is stored as words and then
accessed in smaller sized quantities (halfwords or bytes).
• Which byte / halfword is accessed will depend on the endianess of
the system involved.

The ARM Instruction Set - ARM University Program - V1.0 49 The ARM Instruction Set - ARM University Program - V1.0 50

Endianess Example Quiz #4


r0 = 0x11223344 * Write a segment of code that add together elements x to x+(n-1) of an
31 24 23 16 15 87 0
array, where the element x=0 is the first element of the array.
11 22 33 44
* Each element of the array is word sized (ie. 32 bits).
* The segment should use post-indexed addressing.
* At the start of your segments, you should assume that:
STR r0, [r1]
• r0 points to the start of the array.
Elements
• r1 = x
31 24 23 16 15 87 0 31 24 23 16 15 87 0 • r2 = n
Memory x + (n - 1)

{
r1 = 0x100 11 22 33 44 44 33 22 11 r1 = 0x100

n elements
Little-endian LDRB r2, [r1] Big-endian x+1
x
31 24 23 16 15 87 0 31 24 23 16 15 87 0

00 00 00 44 00 00 00 11
r0 0
r2 = 0x44 r2 = 0x11

The ARM Instruction Set - ARM University Program - V1.0 51 The ARM Instruction Set - ARM University Program - V1.0 52

Quiz #4 - Sample Solution Block Data Transfer (1)


* The Load and Store Multiple instructions (LDM / STM) allow betweeen
ADD r0, r0, r1, LSL#2 ; Set r0 to address of element x 1 and 16 registers to be transferred to or from memory.
ADD r2, r0, r2, LSL#2 ; Set r2 to address of element n+1 * The transferred registers can be either:
MOV r1, #0 ; Initialise counter
• Any subset of the current bank of registers (default).
loop
LDR r3, [r0], #4 ; Access element and move to next
• Any subset of the user mode bank of registers when in a priviledged
mode (postfix instruction with a ‘^’).
ADD r1, r1, r3 ; Add contents to counter
31 28 27 24 23 22 21 20 19 16 15 0
CMP r0, r2 ; Have we reached element x+n?
BLT loop ; If not - repeat for Cond 1 0 0 P U S W L Rn Register list

; next element
; on exit sum contained in r1 Condition field Base register Each bit corresponds to a particular
Up/Down bit register. For example:
Load/Store bit • Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base 0 = Store to memory • Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base 1 = Load from memory
At least one register must be
Pre/Post indexing bit Write- back bit transferred as the list cannot be empty.
0 = Post; add offset after transfer, 0 = no write-back
1 = Pre ; add offset before transfer 1 = write address into base
PSR and force user bit
0 = don’t load PSR or force user mode
1 = load PSR or force user mode

The ARM Instruction Set - ARM University Program - V1.0 53 The ARM Instruction Set - ARM University Program - V1.0 54
Block Data Transfer (2) Stacks
* Base register used to determine where memory access should occur. * A stack is an area of memory which grows as new data is “pushed” onto
• 4 different addressing modes allow increment and decrement inclusive or the “top” of it, and shrinks as data is “popped” off the top.
exclusive of the base register location. * Two pointers define the current limits of the stack.
• Base register can be optionally updated following the transfer (by • A base pointer
appending it with an ‘!’. – used to point to the “bottom” of the stack (the first location).
• Lowest register number is always transferred to/from lowest memory • A stack pointer
location accessed.
– used to point the current “top” of the stack.
* These instructions are very efficient for
PUSH
• Saving and restoring context {1,2,3} POP
– For this useful to view memory as a stack.
SP 3 Result of
• Moving large blocks of data around memory 2 SP 2 pop = 3
– For this useful to directly represent functionality of the instructions. 1 1
SP
BASE BASE
BASE

The ARM Instruction Set - ARM University Program - V1.0 55 The ARM Instruction Set - ARM University Program - V1.0 56

Stack Operation Stack Examples


STMFD sp!, STMED sp!, STMFA sp!, STMEA sp!,
{r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5}
* Traditionally, a stack grows down in memory, with the last “pushed”
value at the lowest address. The ARM also supports ascending stacks, 0x418
where the stack structure grows up through memory. SP r5 SP
r4 r5
* The value of the stack pointer can either:
r3 r4
• Point to the last occupied address (Full stack) r3
r1
– and so needs pre-decrementing (ie before the push) r0 r1
• Point to the next occupied address (Empty stack) Old SP Old SP r5 Old SP Old SP r0 0x400
r5 r4
– and so needs post-decrementing (ie after the push)
r4 r3
* The stack type to be used is given by the postfix to the instruction:
r3 r1
• STMFD / LDMFD : Full Descending stack r1 r0
• STMFA / LDMFA : Full Ascending stack. SP r0 SP
• STMED / LDMED : Empty Descending stack 0x3e8
• STMEA / LDMEA : Empty Ascending stack
* Note: ARM Compiler will always use a Full descending stack.
The ARM Instruction Set - ARM University Program - V1.0 57 The ARM Instruction Set - ARM University Program - V1.0 58

Direct functionality of
Stacks and Subroutines
Block Data Transfer
* One use of stacks is to create temporary register workspace for * When LDM / STM are not being used to implement stacks, it is clearer to
subroutines. Any registers that are needed can be pushed onto the stack specify exactly what functionality of the instruction is:
at the start of the subroutine and popped off again at the end so as to • i.e. specify whether to increment / decrement the base pointer, before or
restore them before return to the caller : after the memory access.
STMFD sp!,{r0-r12, lr} ; stack all registers
* In order to do this, LDM / STM support a further syntax in addition to
........ ; and the return address
the stack one:
........
• STMIA / LDMIA : Increment After
LDMFD sp!,{r0-r12, pc} ; load all the registers
; and return automatically • STMIB / LDMIB : Increment Before
* See the chapter on the ARM Procedure Call Standard in the SDT • STMDA / LDMDA : Decrement After
Reference Manual for further details of register usage within • STMDB / LDMDB : Decrement Before
subroutines.
* If the pop instruction also had the ‘S’ bit set (using ‘^’) then the transfer
of the PC when in a priviledged mode would also cause the SPSR to be
copied into the CPSR (see exception handling module).

The ARM Instruction Set - ARM University Program - V1.0 59 The ARM Instruction Set - ARM University Program - V1.0 60
Example: Block Copy Quiz #5
• Copy a block of memory, which is an exact multiple of 12 words long * The contents of registers r0 to r6 need to be swapped around thus:
from the location pointed to by r12 to the location pointed to by r13. r14 • r0 moved into r3
points to the end of block to be copied. • r1 moved into r4
• r2 moved into r6
; r12 points to the start of the source data
; r14 points to the end of the source data • r3 moved into r5
; r13 points to the start of the destination data • r4 moved into r0
r13
loop LDMIA r12!, {r0-r11} ; load 48 bytes • r5 moved into r1
STMIA r13!, {r0-r11} ; and store them r14 Increasing • r6 moved into r2
CMP r12, r14 ; check for the end Memory
* Write a segment of code that uses full descending stack operations to
BNE loop ; and loop until done carry this out, and hence requires no use of any other registers for
temporary storage.
r12
• This loop transfers 48 bytes in 31 cycles
• Over 50 Mbytes/sec at 33 MHz

The ARM Instruction Set - ARM University Program - V1.0 61 The ARM Instruction Set - ARM University Program - V1.0 62

Swap and Swap Byte


Quiz #5 - Sample Solution
Instructions
STMFD sp!, LDMFD sp!, LDMFD sp!, LDMFD sp!, * Atomic operation of a memory read followed by a memory write
{r0-r6} {r3,r4,r6} {r5} {r0-r2} which moves byte or word quantities between registers and
memory.
Old SP SP
r6 r6 r6 * Syntax:
r5 r5 r5
• SWP{<cond>}{B} Rd, Rm, [Rn]
r4 r4 SP r4
r3 SP r3
r2
1
r1 Rn
SP r0 temp

2 3
r3 = r0 r5 = r3 r0 = r4
r4 = r1 r1 = r5 Memory
r6 = r2 r2 = r6 Rm Rd

* Thus to implement an actual swap of contents make Rd = Rm.


* The compiler cannot produce this instruction.

The ARM Instruction Set - ARM University Program - V1.0 63 The ARM Instruction Set - ARM University Program - V1.0 64

Software Interrupt (SWI) PSR Transfer Instructions


31 28 27 24 23 0
* MRS and MSR allow contents of CPSR/SPSR to be transferred from
Cond 1 1 1 1 Comment field (ignored by Processor)
appropriate status register to a general purpose register.
• All of status register, or just the flags, can be transferred.
Condition Field * Syntax:
• MRS{<cond>} Rd,<psr> ; Rd = <psr>
* In effect, a SWI is a user-defined instruction. • MSR{<cond>} <psr>,Rm ; <psr> = Rm
* It causes an exception trap to the SWI hardware vector (thus causing a • MSR{<cond>} <psrf>,Rm ; <psrf> = Rm
change to supervisor mode, plus the associated state saving), thus causing
the SWI exception handler to be called. where
* The handler can then examine the comment field of the instruction to • <psr> = CPSR, CPSR_all, SPSR or SPSR_all
decide what operation has been requested. • <psrf> = CPSR_flg or SPSR_flg
* By making use of the SWI mechansim, an operating system can * Also an immediate form
implement a set of privileged operations which applications running in • MSR{<cond>} <psrf>,#Immediate
user mode can request.
• This immediate must be a 32-bit immediate, of which the 4
* See Exception Handling Module for further details. most significant bits are written to the flag bits.
The ARM Instruction Set - ARM University Program - V1.0 65 The ARM Instruction Set - ARM University Program - V1.0 66
Using MRS and MSR Coprocessors
* Currently reserved bits, may be used in future, therefore: * The ARM architecture supports 16 coprocessors
• they must be preserved when altering PSR * Each coprocessor instruction set occupies part of the ARM instruction
• the value they return must not be relied upon when testing other bits. set.
* There are three types of coprocessor instruction
31 28 8 4 0
• Coprocessor data processing
N Z CV I F T Mode
• Coprocessor (to/from ARM) register transfers
* Thus read-modify-write strategy must be followed when modifying any • Coprocessor memory transfers (load and store to/from memory)
PSR: * Assembler macros can be used to transform custom coprocessor
• Transfer PSR to register using MRS mneumonics into the generic mneumonics understood by the processor.
• Modify relevant bits * A coprocessor may be implemented
• Transfer updated value back to PSR using MSR • in hardware
* Note: • in software (via the undefined instruction exception)
• In User Mode, all bits can be read but only the flag bits can • in both (common cases in hardware, the rest in software)
be written to.
The ARM Instruction Set - ARM University Program - V1.0 67 The ARM Instruction Set - ARM University Program - V1.0 68

Coprocessor Register
Coprocessor Data Processing
Transfers
* This instruction initiates a coprocessor operation * These two instructions move data between ARM registers and
* The operation is performed only on internal coprocessor state coprocessor registers
• For example, a Floating point multiply, which multiplies the contents of • MRC : Move to Register from Coprocessor
two registers and stores the result in a third register • MCR : Move to Coprocessor from Register
* Syntax: * An operation may also be performed on the data as it is transferred
• CDP{<cond>} <cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>} • For example a Floating Point Convert to Integer instruction can be
implemented as a register transfer to ARM that also converts the data
from floating point format to integer format.
31 28 27 26 25 24 23 20 19 16 15 12 11 8 7 5 4 3 0
* Syntax
Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm
• <MRC|MCR>{<cond>} <cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 5 4 3 0
Destination Register Opcode
Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm
Source Registers
Opcode
Condition Code Specifier
ARM Source/Dest Register Opcode
Coprocesor Source/Dest Registers
Condition Code Specifier Transfer To/From Coprocessor
Opcode

The ARM Instruction Set - ARM University Program - V1.0 69 The ARM Instruction Set - ARM University Program - V1.0 70

Coprocessor Memory Coprocessor Memory


Transfers (1) Transfers (2)
* Load from memory to coprocessor registers * Syntax of these is similar to word transfers between ARM and memory:
* Store to memory from coprocessor registers. • <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<address>
– PC relative offset generated if possible, else causes an error.
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 0 • <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn,offset]{!}>
Cond 1 1 0 P U N W L Rn CRd cp_num Offset – Pre-indexed form, with optional writeback of the base register
Source/Dest Register Address Offset • <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn],offset>
Base Register
Load/Store – Post-indexed form
Condition Code Specifier Base Register Writeback
Transfer Length where
Add/Subtract Offset
Pre/Post Increment • <L> when present causes a “long” transfer to be performed (N=1) else
causes a “short” transfer to be performed (N=0).
– Effect of this is coprocessor dependant.

The ARM Instruction Set - ARM University Program - V1.0 71 The ARM Instruction Set - ARM University Program - V1.0 72
Quiz #6 Quiz #6 - Sample Solution
* Write a short code segment that performs a mode change by modifying * Set up useful constants:
the contents of the CPSR
• The mode you should change to is user mode which has the value 0x10. mmask EQU 0x1f ; mask to clear mode bits
• This assumes that the current mode is a priveleged mode such as userm EQU 0x10 ; user mode value
supervisor mode.
• This would happen for instance when the processor is reset - reset code
would be run in supervisor mode which would then need to switch to * Start off here in supervisor mode.
user mode before calling the main routine in your application. MRS r0, cpsr ; take a copy of the CPSR
• You will need to use MSR and MRS, plus 2 logical operations. BIC r0,r0,#mmask ; clear the mode bits
ORR r0,r0,#userm ; select new mode
MSR cpsr, r0 ; write back the modified
31 28 8 4 0
; CPSR
N Z CV I F T * End up here in user mode.
Mode

The ARM Instruction Set - ARM University Program - V1.0 73 The ARM Instruction Set - ARM University Program - V1.0 74

Main features of the


ARM Instruction Set
* All instructions are 32 bits long.
* Most instructions execute in a single cycle.
* Every instruction can be conditionally executed.
* A load/store architecture
• Data processing instructions act only on registers
– Three operand format
– Combined ALU and shifter for high speed bit manipulation
• Specific memory access instructions with powerful auto-indexing
addressing modes.
– 32 bit and 8 bit data types
and also 16 bit data types on ARM Architecture v4.
– Flexible multiple register load and store instructions
* Instruction set extension via coprocessors

The ARM Instruction Set - ARM University Program - V1.0 75


ARM Assembly Language
Examples & Assembler

ARM Assembly Language


Examples

CS 160 Ward 1 CS 160 Ward 2

Example 1: C to ARM Assembler Example 2: C to ARM Assembler


• C: • C:
x = (a + b) - c; y = a*(b+c);
• ARM: • ARM:
ADR r4,a ; get address for a ADR r4,b ; get address for b
LDR r0,[r4] ; get value of a LDR r0,[r4] ; get value of b
ADR r4,b ; get address for b, reusing r4 ADR r4,c ; get address for c
LDR r1,[r4] ; get value of b LDR r1,[r4] ; get value of c
ADD r3,r0,r1 ; compute a+b ADD r2,r0,r1 ; compute partial result
ADR r4,c ; get address for c ADR r4,a ; get address for a
LDR r2,[r4] ; get value of c LDR r0,[r4] ; get value of a
SUB r3,r3,r2 ; complete computation of x MUL r2,r2,r0 ; compute final value for y
ADR r4,x ; get address for x ADR r4,y ; get address for y
STR r3,[r4] ; store value of x STR r2,[r4] ; store y

CS 160 Ward 3 CS 160 Ward 4


Example 3: C to ARM Assembler Example 4: Condition Codes
• C: C:
z = (a << 2) | (b & 15); if (i == 0)
• ARM: {
ADR r4,a ; get address for a i = i +10;
LDR r0,[r4] ; get value of a }
MOV r0,r0,LSL#2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND ARM: (assume i in R1)
ORR r1,r0,r1 ; perform OR SUBS R1, R1, #0
ADDEQ R1, R1, #10
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z

CS 160 Ward 5 CS 160 Ward 6

Example 5: Condition Codes Example 6: if statement [1]


C: • C:
for ( i = 0 ; i < 15 ; i++) if (a < b) { x = 5; y = c + d; } else x = c - d;
{
• ARM:
j = j + j;
; compute and test condition
}
ADR r4,a ; get address for a
ARM: LDR r0,[r4] ; get value of a
SUB R0, R0, R0 ; i -> R0 and i = 0 ADR r4,b ; get address for b
start CMP R0, #15 ; is i < 15? LDR r1,[r4] ; get value for b
ADDLT R1, R1, R1 ; j = j + j CMP r0,r1 ; compare a < b
ADDLT R0, R0, #1 ; i++ BGE fblock ; if a >= b, branch to false block
BLT start

CS 160 Ward 7 CS 160 Ward 8


Example 6: if statement [2] Example 6: if statement [3]
; true block ; false block
MOV r0,#5 ; generate value for x fblock ADR r4,c ; get address for c
ADR r4,x ; get address for x LDR r0,[r4] ; get value of c
STR r0,[r4] ; store x ADR r4,d ; get address for d
ADR r4,c ; get address for c
LDR r1,[r4] ; get value for d
LDR r0,[r4] ; get value of c
SUB r0,r0,r1 ; compute a-b
ADR r4,d ; get address for d
ADR r4,x ; get address for x
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y STR r0,[r4] ; store value of x
ADR r4,y ; get address for y after ...
STR r0,[r4] ; store y
B after ; branch around false block

CS 160 Ward 9 CS 160 Ward 10

Example 6: Heavy Conditional Instruction Use [1] Example 6: Heavy Conditional Instruction Use [2]

Same C code; different ARM ADRLT r4,x ; get address for x


STRLT r0,[r4] ; store x
implementation ADRLT r4,c ; get address for c
ARM: LDRLT r0,[r4] ; get value of c
; Compute and test the ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
condition
ADDLT r0,r0,r1 ; compute y
ADR r4,a ; get address ADRLT r4,y ; get address for y
for a STRLT r0,[r4] ; store y
LDR r0,[r4] ; get value of ; false block
a ADRGE r4,c ; get address for c

ADR r4,b ; get address


CS 160 Ward 11 CS 160 Ward 12
Example 6: Heavy Conditional Instruction Use [3]

LDRGE r0,[r4] ; get value of c


ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b ARM Assembler
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x

CS 160 Ward 13 CS 160 Ward 14

Assembly Language Basics General Layout

CS 160 Ward 15 CS 160 Ward 16


Simple Example Description Assembly Directives

and memory type.

CS 160 Ward 17 CS 160 Ward 18

sum1.s: Compute 1+2+…+n sum2.s: Compute 1+2+…+n


AREA SUM, CODE, READONLY
EXPORT sum1 AREA SUM, CODE, READONLY
; r0 = input variable n EXPORT sum
; r0 = output variable sum ; r0 = input variable n
; r0 = output variable sum
sum1
MOV r1,#0 ; set sum = 0 sum
MLA r1,r0,r0,r0 ; n*(n+1) = n*n + n
sum_loop MOV r0,r1,LSR#1 ; divide by 2
ADD r1,r1,r0 ; set sum = sum+n
SUBS r0,r0,#1 ; set n = n-1 sum_rtn
BNE sum_loop MOV pc,lr

sum_rtn END
MOV r0,r1 ; set return value
MOV pc,lr

END
CS 160 Ward 19 CS 160 Ward 20
log.s: Compute k (n <= 2^k)
AREA LOG, CODE, READONLY
EXPORT log
; r0 = input variable n
; r0 = output variable m (0 by default)
; r1 = output variable k (n <= 2^k)

log
MOV r2, #0 ; set m = 0
MOV r1, #-1 ; set k = -1

log_loop
TST r0, #1 ; test LSB(n) == 1
ADDNE r2, r2, #1 ; set m = m+1 if true
ADD r1, r1, #1 ; set k = k+1
MOVS r0, r0, LSR #1 ; set n = n>>1
BNE log_loop ; continue if n != 0

CMP r2, #1 ; test m ==1


MOVEQ r0, #1 ; set m = 1 if true

log_rtn
MOV pc,lr

CS 160 END Ward 21


Outline
o ARM Exceptions
o Entering and Leaving an Exception
ARM Exceptions o Installing an Exception Handler
o SWI Handlers
Hsung-Pin Chang o Interrupt Handlers
Department of Computer Science o Reset Handlers
National Chung Hsing University o Undefined Instruction Handlers
o Prefetch Abort Handler
o Data Abort Handler

ARM Exceptions ARM Exception Types


o ARM Exception Types o Reset
o Undefined instruction
o ARM Exception Vector Table o Software Interrupt (SWI)
o Prefetch Abort
o ARM Exception Priorities o Data Abort
o IRQ
o Use of Modes and Registers by Exceptions o FIQ

ARM Exceptions Types (Cont.) ARM Exceptions Types (Cont.)


o Reset o Software Interrupt (SWI)
n Occurs when the processor reset pin is asserted n User-defined interrupt instruction
o For signaling Power-up n Allow a program running in User mode to request
o For resetting as if the processor has just powered up privileged operations that are in Supervisor mode
o For example, RTOS functions
n Software reset
o Can be done by branching to the reset vector (0x0000) o Prefetch Abort
n Fetch an instruction from an illegal address, the
o Undefined instruction instruction is flagged as invalid
n Occurs when the processor or coprocessors cannot n However, instructions already in the pipeline continue to
recognize the currently execution instruction execute until the invalid instruction is reached and then a
Prefetch Abort is generated.
ARM Exceptions Types (Cont.) Vector Table
o Data Abort o At the bottom of the memory map
n A data transfer instruction attempts to load or store data at
an illegal address
o Each entry has only 32 bit
o IRQ
n Not enough to contain the full code for a handler
n The processor external interrupt request pin is asserted
(LOW) and the I bit in the CPSR is clear (enable) n Thus, usually is a branch instruction or load pc
instruction to the actual handler
o FIQ
n The processor external fast interrupt request pin is
asserted (LOW) and the F bit in the CPSR is clear (enable) o Example: armc_startup.s

ARM Exception ARM Exception Vector Table


SWI handler

IRQ handler
…. (2)
(1)
FIQ 0x1C
IRQ 0x18
Reserved 0x14
Exception Data Abort 0x10
Vector Table
Prefetch Abort 0x0C
Software Interrupt 0x08
Undefined Instruction 0x04
Reset 0x00

ARM Exception Events ARM Exception Priorities


Use of Modes and Registers by
Exceptions Register Organization in ARM States
o An exception changes the processor mode
o Thus, each exception handler has access to a
certain subset of banked registers
n Its own r13 or Stack Pointer (r13_mode or
sp_mode)
n Its own r14 or Link Register (r14_mode or
lr_mode)
n Its own Saved Program Status Register (SPSR_
mode).

Entering and Leaving an Exception The Process Response to an Exception


Copies the CPSR into the SPSR for the mode in which the
o The Process Response to an Exception o
exception is to be handled.
n Saves the current mode, interrupt mask, and condition flags.
o Changes the appropriate CPSR mode bits
o Returning from an Exception Handler n Change to the appropriate mode
o Map in the appropriate banked registers for that mode
o Disable interrupts
IRQs are disabled when any exception occurs.
o The Return Address and Return Instruction n
n FIQs are disabled when a FIQ occurs, and on reset
o Set lr_mode to the return address
n Discuss in the next few slides
o Set the program counter to the vector address for the
exception

The Process Response to an Exception The Process Response to an Exception


(Cont.) (Cont.)
o For example, when reset, ARM Reset R14_svc = unexpected
SPSR_svc = unexpected
n Overwrites R14_svc and SPSR_svc by copying CPSR[4:0] = 0b10011 //Supervisor Mode
the current values of the PC and CPSR into them CPSR[5] = 0 // ARM state
CPSR[6] = 1 // Disable FIQ
n Forces M[4:0] to 10011 (Supervisor mode), sets
CPSR[7] = 1 // Disable IRQ
the I and F bits in the CPSR, and clears the PC = 0x00000000
CPSR's T bit Undefined Instructions R14_und = PC+4
n Forces the PC to fetch the next instruction from SPSR_und = CPSR
address 0x00. CPSR[4:0] = 0b11011 //Undefined Mode
CPSR[5] = 0 // ARM state
n Execution resumes in ARM state. CPSR[6] unchanged
CPSR[7] = 1 // Disable IRQ
PC = 0x0000004
The Process Response to an Exception The Process Response to an Exception
(Cont.) (Cont.)
Software Interrupt R14_svc = PC + 4 Data Abort R14_abt = PC + 8
SPSR_svc = CPSR SPSR_abt = CPSR
CPSR[4:0] = 0b10011 //Supervisor Mode CPSR[4:0] = 0b10111 //Abort Mode
CPSR[5] = 0 // ARM state CPSR[5] = 0 // ARM state
CPSR[6] unchanged CPSR[6] unchanged
CPSR[7] = 1 // Disable IRQ CPSR[7] = 1 // Disable IRQ
PC = 0x00000008 PC = 0x00000010
Prefetch Abort R14_abt = PC+4 Interrupt Request R14_abt = PC+4
SPSR_abt = CPSR SPSR_abt = CPSR
CPSR[4:0] = 0b10111 //Abort Mode CPSR[4:0] = 0b10010 //Abort Mode
CPSR[5] = 0 CPSR[5] = 0
CPSR[6] unchanged CPSR[6] unchanged
CPSR[7] = 1 // Disable IRQ CPSR[7] = 1 // Disable IRQ
PC = 0x000000C PC = 0x0000018

The Process Response to an Exception


(Cont.) Returning From an Exception Handler
Fast Interrupt Request R14_abt = PC + 4
SPSR_abt = CPSR
o Returning from an exception handler
CPSR[4:0] = 0b10010 //IRQ Mode n Depend on whether the exception handler uses
CPSR[5] = 0 // ARM state the stack operations or not
CPSR[6] = 1 //Disable FIQ
CPSR[7] = 1 // Disable IRQ
PC = 0x0000001C o Generally, to return execution to the original
execution place
n Restore the CPSR from spsr_mode
n Restore the program counter using the return
address stored in lr_mode

Returning From an Exception Handler : Returning From an Exception Handler :


Simple Return Complex Return
o If not require the destination mode registers o If an exception handler entry code uses the stack
to be restored from the stack to store registers
n Above two operations can be carried out by a n Must be preserved while handling the exception
data processing instruction with
o The S flag (bit 20) set
Update the CPSR flags when executing the data processing
n
instruction o To return from such an exception handler, the
n SUBS, MOVS stored register must be restored from the stack
o The program counter as the destination register
n Return by a load multiple instruction with ^ qualifier
n Example: MOVS pc, lr //pc = lr
n For example: LDMFD sp!, {r0-r12,pc}^
Returning from SWI and Undefined
Returning From an Exception Handler Instruction Handlers
o Note, do not need to return from the reset o SWI and undefined instruction exceptions are
handler generated by the instruction itself
n The reset handler executes your main code directly n lr_mode = pc + 4 //next instruction
o Restoring the program counter
n If not using stack: MOVS pc, lr //pc = lr
o The actual location when an exception is taken n If using stack to store the return address
depends on the exception type STMFD sp!, {reglist, lr} //when entering the handler
n The return address may not necessarily be the next …
instruction pointed to by the pc LDMFD sp!, {reglist, pc}^ //when leaving the handler

Returning from FIQ and IRQ Returning from FIQ and IRQ (Cont.)
o FIQ and IRQ are generated only after the o Restoring the program counter
execution of an instruction n If not using stack: SUBS pc, lr, #4 //pc = lr-4
n The program counter has been updated
FIQ or IRQ occurs
PC
n If using stack to store the return address
PC+4 SUB lr, lr, #4 //when entering the handler
n lr_mode = PC + 4 STMFD sp!, {reglist, lr}
o Point to one instruction beyond the end of the …
instruction in which the exception occurred LDMFD sp!, {reglist, pc}^ //when leaving the handler

Return from Prefetch Abort Return from Prefetch Abort (Cont.)


o If the processor supports MMU (Memory Management Unit) o So the address to be restored is at lr_ABT – 4
n The exception handler loads the unmapped instruction into physical
memory o Thus, with simple return
Then, uses the MMU to map the virtual memory location into the
n
physical one. SUBS pc,lr,#4
o In contrast, with complex return
o After that, the handler must return to retry the instruction that SUB lr,lr,#4 ;handler entry code
caused the exception.
STMFD sp!,{reglist,lr}
o However, the lr_ABT points to the instruction at the address ;...
following the one that caused the abort exception LDMFD sp!,{reglist,pc}^ ; handler exit code
Return from Data Abort Return from Data Abort (Cont.)
o lr_ABT points two instructions beyond the o So the address to be restored is at lr_ABT – 8
instruction that caused the abort o Thus, with simple return
n Since when a load or store instruction tries to SUBS pc,lr,#8
access memory, the program counter has been
updated. o In contrast, with complex return
n Thus, the instruction caused the data abort SUB lr,lr,#8 ;handler entry code
exception is at lr_ABT – 8 STMFD sp!,{reglist,lr}
;...
o So the address to be restored is at lr_ABT – 8 LDMFD sp!,{reglist,pc}^ ; handler exit code

Summary Install an Exception Handler


o Any new exception handler must be installed in the
vector table

o Exception handlers can be installed in two ways


n Branch instruction: simple but have one limitation
o Branch instruction only has a range of 32 MB relative to the pc
NOTES
o
1. PC is the address of the BL/SWI/Undefined Instruction fetch which had the prefetch
n Load pc instruction: set pc by
abort. o Load instruction to load the handler address into the program
2. PC is the address of the instruction which did not get executed since the FIQ or IRQ
took priority. counter
3. PC is the address of the Load or Store instruction which generated the data abort.
4. The value saved in R14_svc upon reset is unpredictable.

Install an Exception Handler: Method Install an Exception Handler: Method


1 2
Vector_Init_Block Vector_Init_Block
LDR PC, Reset_Addr
b Reset_Addr
LDR PC, Undefined_Addr
b Undefined_Addr LDR PC, SWI_Addr
b SWI_Addr LDR PC, Prefetch_Addr
b Prefetch_Addr LDR PC, Abort_Addr
b Abort_Addr NOP ;Reserved vector
NOP ;Reserved vector LDR PC, IRQ_Addr
b IRQ_Addr LDR PC, FIQ_Addr
b FIQ_Addr
Reset_Addr DCD Start_Boot
Undefined_Addr DCD Undefined_Handler
Reset_Addr … SWI_Addr DCD SWI_Handler
Undefined_Addr … Prefetch_Addr DCD Prefetch_Handler
SWI_Addr … Abort_Addr DCD Abort_Handler
Prefetch_Addr … DCD 0 ;Reserved vector
Abort_Addr … IRQ_Addr DCD IRQ_Handler
IRQ_Addr … FIQ_Addr DCD FIQ_Handler
FIQ_Addr …
DCD SWI Handlers
o Allocates one or more words of memory, aligned on 4- o Top-Level SWI Handlers
byte boundaries, and defines the initial runtime
o SWI Routine in Assembly Language
contents of the memory
o Examples o SWI Routine in C
data1 DCD 1,5,20 ; defines 3 words containing o How to Pass Values in and out of a SWI
; decimal values 1, 5, and 20 Routine
data2 DCD mem06 + 4 ; defines 1 word containing 4 + o Calling SWIs from an Application
; the address of the label mem06

SWI Handlers Top-Level SWI Handlers


o When the SWI handler is entered, it must know o Because SVC only has its own LR_svc and SP_svc
which SWI is being called n Save all other r0~r12 to the stack
n The SWI number is stored in bits 0-23 of the instruction o To calculate the SWI number
n Or passed in an integer register, usually one of r0-r3 n Calculate the instruction address causing the SWI
o Since lr_SVC holds the address of the instruction that follows the
SWI instruction, thus
o LDR r0, [lr, #-4] ; derive the SWI instruction’s address
n The SWI number is extracted by clearing the top eight bits
of the opcode:
o BIC r0, r0, #0xFF000000

Top-Level SWI Handlers (Cont.) Top-Level SWI Handlers (Cont.)


SWI_Handler ; top-level handler
STMFD sp!,{r0-r12,lr} ; Store registers.
o Above program is called top-level handler
LDR r0,[lr,#-4] ; Calculate address of SWI instruction n Must always be written in ARM assembly
; and load it into r0.
BIC r0,r0,#0xff000000 ; Mask off top 8 bits of instruction
language
;to give SWI number.
;
; Use value in r0 to determine which SWI routine to execute. o However, the routines to handle each SWI can
;
LDMFD sp!, {r0-r12,pc}^ ; Restore registers and return.
be written in either assembly language or in C
END ; Mark end of this file.
SWI Routine in Assembly Language
SWI Routine in Assembly Language (Cont.)
CMP r0, #MaxSWI ; Range check
o If the routines to handle each SWI in written LDRLS pc, [pc,r0,LSL #2] ; LDRLS: LS (cond. exec.) lower or the same
; PC = PC + r0 * 4, (LSL: logical shift left)
in Assembly Language B SWIOutOfRange
SWIJumpTable
n The easiest way is using a jump table DCD SWInum0 ; stores the address of a routine
DCD SWInum1 ; stores the address of a routine
o In the top-level handler, the r0 contains the … ; DCD for each of other SWI routines
SWInum0 ; SWI number 0 code
SWI number …..
B EndofSWI
o Thus, the following code can be inserted into SWInum1 ; SWI number 1 code
…..
the top-level handler, i.e., SWI_Handler B EndofSWI
; Rest of SWI handling code
n Following on from the BIC instruction EndofSWI
; Return execution to top level SWI handler so as to restore
; registers and return to program.

SWI Routine in C ARM Procedure Call Convention


o If the routines to handle each SWI in written in C o Use registers r0-r3 to pass parameter values into
o The top-level handler uses a BL (branch and link) routines
instruction to jump to the appropriate C function n Correspond to the first to fourth arguments in the C
n BL C_SWI_Handler ;call C routine to handle routines
o Remaining parameters are allocated to the stack in
order
o Then, we must invoke the C routine that handles
respective SWI o A function can return
n But, how to pass the SWI number, which is now stored in n A one-word integer value in r0
r0, to the C function? n A two to four-word integer value in r0-r1, r0-r2 or r0-r3.

SWI Routine in C (Cont.) SWI Routine in C (Cont.)


Thus, the C handler is like the following
o
void C_SWI_handler (unsigned number) o However, how to pass more parameters ?
{ n Make use of the stack (supervisor stack)
switch (number)
{ o The top-level SWI handler can pass the stack
case 0 : /* SWI number 0 code */
break; pointer value (i.e. r13) to the SWI C routine
case 1 : /* SWI number 1 code */
break;
as the, for example, second parameter, i.e., r1
: n sp is pointing to the supervisor stack,
:
default : /* Unknown SWI - report error */ MOV r1, sp
} BL C_SWI_Handler
}
How to Pass Values in and out of a
SWI Routine in C (Cont.) SWI Routine
Then, the C_SWI_Handler can access it
o
void C_SWI_handler (unsigned number, unsigned *reg) o How the main program code passes values in
{ and out of a SWI routine?
value_in_reg_0 = reg [0]; //can read from them:
value_in_reg_1 = reg [1]; o Note that
value_in_reg_2 = reg [2];
value_in_reg_3 = reg [3]; n The main program code is executing in the User
mode
reg [0] = updated_value_0; // write back to them
reg [1] = updated_value_1; n The SWI handler and their routines are in the
reg [2] = updated_value_2; Supervisor mode
reg [3] = updated_value_3;
} n However, both mode has the same r0~r12 registers

How to Pass Values in and out of a


SWI Routine (Cont.) Calling SWIs from an Application
o Thus, the application code and SWI routine o The application code can call a SWI from
can communicate by r0~r12 registers assembly language or C/C++
o In assembly language
n Set up any required register value
n Then issue the relevant SWI
n For example:
MOV r0, #65 ; load r0 with the value 65
SWI 0x0 ; Call SWI 0x0 with parameter value in r0

Calling SWIs from an Application Calling SWIs from an Application


(Cont.) (Cont.)
o From C/C++, declare the SWI as an __SWI o __SWI function allow a SWI to compiled
function, and call it. inline
o Example: n Without additional overhead
__swi(0) void my_swi(int);
. n However, it must have the restrictions that
. o Any arguments are passed in r0-r3 only
o Any results are returned in r0-r3 only
my_swi(65);
Calling SWIs from an Application
Example (Cont.)
#include <stdio.h>
#include "swi.h" o swi.h
int main( void )
{
int result1, result2;
__swi(0) int multiply_two(int, int);
struct four_results res_3; __swi(1) int add_two(int, int);
Install_Handler( (unsigned) SWI_Handler, swi_vec );
printf("result1 = multiply_two(2,4) = %d\n", result1 =
multiply_two(2,4));
printf("add_two( result1, result2 ) = %d\n", add_two( result1,
result2 ));
return 0;
}

Interrupt Handlers Interrupt Handlers (Cont.)


o The ARM processor has two levels of external o How the FIQ performs faster than IRQ
interrupt n FIQ vector is the last entry in the vector table
n FIQ and IRQ o FIQ handler can be placed directly at the vector
location and run sequentially from that address
o FIQs have higher priority than IRQs because n Removes the need for a branch and its associated delays
n FIQs are serviced first when multiple interrupts n If the system has a cache, the vector table and FIQ handler
may all be locked down in one block.
occur.
n Servicing a FIQ causes IRQs to be disabled until n FIQ has more banked registers than IRQ
after the FIQ handler has re-enabled them o r8_FIQ~r12_FIQ registers
o By restoring the CPSR from the SPSR at the end of the o Have less time in the register save/restore
handler

IRQ Handler
IRQ_Handler: ; top-level handler
STMFD sp!,{r0-r12,lr} ; Store registers.
BL ISR_IRQ

LDMFD sp!, {r0-r12,pc} ; Restore registers and return


SUBS pc, lr, #4
END ; Mark end of this file.
International Journal of Scientific and Research Publications, Volume 3, Issue 4, April 2013 1
ISSN 2250-3153

An Overview of Advance Microcontroller Bus Architec-


ture Relate on APB Bridge
Ms. Radhika Koti, Ms. Divya Meshram

Student (M. Tech - VLSI), Priyadarshini College of Engineering, Nagpur - India


Lecturer, Priyadarshini College of Engineering, Nagpur – India

Abstract — The Advanced Microcontroller Bus Architeture be used without royalties AMBA’s target is to help designer of
(AMBA) is a widely used interconnection standard for System embedded system to meet challenges like design for low pow-
on Chip (SoC) design. An AMBA-based microcontroller typi- er consumption, to facilitate the right-first-time development
cally consists of a high-performance system backbone bus of Embedded Microcontroller Products with one or more
(AMBA AHB or AMBA ASB), able to sustain the external CPUs or signal processors, to be technology-independent and
memory bandwidth, on which the CPU, on-chip memory and to encourage modular system [4]. To minimize the silicon in-
other Direct Memory Access (DMA) devices reside. This bus frastructure required supporting efficient on-chip and off-chip
provides a high-bandwidth interface between the elements that communication for both operation and manufacturing test [1].
are involved in the majority of transfers. This paper present
three distinct buses and their comparison. By considering mer- This paper discusses the architecture of AMBA in the section
its of APB , AMBA can be design by using HDL. II, section III deals with the various bus methods and their
comparison is discuss in section IV. Finally section V and VI
Index Terms — AMBA, AHB, ASB, APB, Difference of bus- gives proposed work and conclude the paper.
es
II. ARCHITECTURE OF AMBA BASED SIMPLE
I. INTRODUCTION
MICROCONTROLLER
Today in the era of modern technology micro-electronics play
a very vital role in every aspects of life of an individual, in- An AMBA-based microcontroller typically consists of a
creasing use for micro-electronics equipment increases the high-performance system backbone bus (AMBA AHB or
demand for manufacturing its components and its availability AMBA ASB), able to sustain the external memory band-
[4].Embedded system designers have a choice of using a share width, on which the CPU, on-chip memory and other Direct
or point-to-point bus in their designs [2]. Typically, an embed- Memory Access (DMA) devices reside. This bus provides a
ded design will have a general purpose processor, cache, high-bandwidth interface between the elements that are in-
SDRAM, DMA port, and Bridge port to a slower I/O bus, such volved in the majority of transfers[3]. Fig1 shows AMBA
as the Advanced Micro controller Bus Architecture (AMBA) based Simple Microcontroller. Also located on the high per-
Advanced Peripheral Bus (APB). In addition, there might be a formance bus is a bridge to the lower bandwidth APB, where
port to a DSP processor, or hardware accelerator, common most of the peripheral devices in the system are located.
with the increased use of video in many applications. As chip- AMBA APB provides the basic peripheral macro cell com-
level device geometries become smaller and smaller, more and munications infrastructure as a secondary bus from the higher
more functionality can be added without the concomitant [2] bandwidth pipelined main system bus [1]. Such peripherals
increase in power and cost per die as seen in prior generations. typically:

The Advanced Microcontroller Bus Architecture (AMBA) was (i) Have interfaces which are memory-mapped registers
introduced by ARM Ltd 1996 and is widely used as the on- (ii) Have no high-bandwidth interfaces
chip bus in system on chip (SoC) designs. AMBA is a regis- (iii) Are accessed under programmed control.
tered trademark of ARM Ltd. The first AMBA buses were Ad-
vanced System Bus (ASB) and Advanced Peripheral Bus The AMBA specification [2] has become a de-facto standard
(APB). In its 2nd version, AMBA 2, ARM added AMBA for the semiconductor industry, it has been adopted by more
High-performance Bus (AHB) that is a single clock-edge pro- than 95% of ARM’s partners and a number of IP providers.
tocol. In 2003, ARM introduced [2,4] the 3rd generation, The specification has been successfully implemented in sever-
AMBA 3, including AXI to reach even higher performance in- al ASIC designs. Since the AMBA interface is processor and
terconnect and the Advanced Trace Bus (ATB) as part of the technology independent, it enhances the reusability of periph-
Core Sight on-chip debug and trace solution. eral and system components across a wide range of applica-
These protocols are today the de-facto standard for 32-bit em- tions.
bedded processors because they are well documented and can
www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 4, April 2013 2
ISSN 2250-3153

The AMBA specification [1,3] has been derived to satisfy the Memory Access (DMA) or Digital Signal Processor (DSP)
following four key requirements. to be included as bus masters.

(i) To facilitate the right-first-time development of Embedded The external memory interface, APB bridge and any inter-
Microcontroller Products with one or more CPUs or signal nal memory are the most common AHB slaves. Any other
processors. peripheral in the system could also be included as an AHB
(ii) To be technology-independent and ensure that highly reus- slave. However, low-bandwidth peripherals typically reside
able peripheral and system macro cells can be migrated across on the APB.
a diverse range of IC processes and be appropriate for full-
custom, standard cell and gate array technologies. (B) The Advanced System Bus (ASB): ASB is the first
(iii) To encourage modular system design to improve proces- generation of AMBA system bus. A typical AMBA ASB
sor independence, providing a development road-map for ad- system may contain one or more bus masters. For example,
vanced cached CPU cores and the development of peripheral at least the processor and test interface. However, it would
libraries. also be common for a Direct Memory Access (DMA) or
(iv)To minimize the silicon infrastructure required supporting Digital Signal Processor (DSP) to be included as bus mas-
efficient on-chip and off-chip communication for both opera- ters.
tion and manufacturing test.
The external memory interface, APB bridge and any inter-
nal memory are the most common ASB slaves. Any other
peripheral in the system could also be included as an ASB
slave. However, low-bandwidth peripherals typically reside
on the APB.

(C) The Advanced Peripheral Bus (APB): The APB is part


of the AMBA hierarchy of buses and is optimized for min-
imal power consumption and reduced interface complexity.
The AMBA APB appears as a local secondary bus that is
encapsulated as a single AHB or ASB slave device. APB
provides a low-power extension to the system bus which
builds on AHB or ASB signals directly. The APB bridge
appears as a slave module which handles the bus handshake
and control signal retiming on behalf of the local peripheral
bus. By defining the APB interface from the starting point
of the system bus, the benefits of the system diagnostics
Fig. 1. AMBA based Simple Microcontroller and test methodology can be exploited.

The AMBA APB should be used to interface to any periph-


III. DIFFERENT AMBA BUSES
erals which are low bandwidth and do not require the high
The Advanced Microcontroller Bus Architecture (AMBA) performance of a pipelined bus interface. The latest revi-
is ARM’s no-cost, open specification[1,3,4], which defines sion of the APB[6,7] is specified so that all signal transi-
an on-chip communications standard for designing high tions are only related to the rising edge of the clock. This
performance Embedded Microcontrollers. Three distinct improvement ensures the APB peripherals can be integrated
buses are defined within the AMBA specification: easily into any design flow. These changes to the APB also
make it simpler to interface it to the new AHB. An AMBA
(A) The Advanced High-performance Bus (AHB) APB implementation typically contains a single APB
(B) The Advanced System Bus (ASB) bridge which is required to convert AHB or ASB transfers
(C) The Advanced Peripheral Bus (APB). into a suitable format for the slave devices on the APB. The
bridge provides latching of all address, data and control
(A) The Advanced High-performance Bus (AHB): AHB is signals, as well as providing a second level of decoding to
a new generation of AMBA bus which is intended to ad- generate slave select signals for the APB peripherals.
dress the requirements of high-performance synthesizable
designs. It is a high-performance system bus that supports
multiple bus masters and provides high-bandwidth opera- IV. COMPARISON OF BUSES
tion. Bridging between this higher level of bus and the cur- Following table shows the comparison among the AMBA
rent ASB/APB can be done efficiently to ensure that any buses:
existing designs can be easily integrated. An AMBA AHB
design may contain one or more bus masters, typically a
system would contain at least the processor and test inter-
face. However, it would also be common for a Direct

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 4, April 2013 3
ISSN 2250-3153

[6] Jaehoon Song, Student member, IEEE, Hyunbean Yi, Member,IEEE,


AHB ASB APB Juhee Han, and Sungju Park, Member, IEEE,”An Efficient SOC Test
Technique by Reusing On/Off-Chip Bus Bridge”IEEE Transcactions
High perfor- High perfor- Low power on Circuits and Systems-I: Regular Papers, Vol,56,No.3,March2009.
mance mance
[7] Sangik Choi and Shinwook Kang, Mobile SamsungElectronics
Co.,Ltd, “Implementation of an On-Chip Bus Bridge between Hetero-
Latched ad-
Pipelined opera- Pipelined opera- geneous Buses with Different Clock Frequencies”.IEEE, IDE-
dress and
tion tion AS’05,1098-8068/2005.
control

Multiple bus Multiple bus Simple inter- AUTHORS


masters masters face 1 First Author: Ms. Radhika Koti
Student M.Tech (VLSI)
Priyadarshini College of Engineering
It consists of It consists of It consist of
Nagpur – India
master, slave, master, slave, APB bridge
Email ID: kotiradhika10@rediffmail.com
arbiter decoder arbiter decoder and slave
2 Second Author: Ms. Divya Meshram
Table 1: Comparison of buses Lecturer, M.Tech
Priyadarshini College of Engineering
Nagpur – India
V. PROPOSED WORK Email ID: divyameshram@gmail.com

As discuss above, APB is good choice for implementing


Advanced Microcontroller Bus Architecture by using HDL.

VI. CONCLUSION

Implementation of proposed work i.e. AMBA APB prvides


the basic peripheral macro cell communications infrastruc-
ture as a secondary bus from the higher bandwidth pipe-
lined main system bus .Such peripherals typically:

(i) Have interfaces which are memory-mapped registers


(ii) Have no high-bandwidth interfaces
(iii) Are accessed under programmed control.

REFERENCES
[1] AMBA specification, version 2.0.

[2] Akhilesh kumar and Richa Sinha, “design and verification analysis of
ABP3 protocol with coverage ”Inaternational journal of advance in
engineering and Technology vol. 1 issue 5 pp.310-317,Nov 2011.

[3] Priyanka Gandhani, Charu Patel “ Moving from AMBA AHB to AXI
Bus in SoC Designs: A Comparative Study” Int. J Comp Sci. Emerging
Tech Vol-2 No 4 ,pp.476-479 August, 2011.

[4] Vani. R. M. and Roopa. M “Design of AHB2APB Bridge for different


phase and Frequency” International Journal of Computer and Electri-
cal Engineering, Vol. 3, No. 2,pp. April, 2011.

[5] Wang Zhonghai,Ye Yizheng,Wang Jinxing, and Yu Mingyan, “Design-


ing AHB/PCI Bridge,”in Proceedings of 4th International Conference
on ASIC, Oct 2001,pp.578-580.

www.ijsrp.org
Agenda
 Cortex-M3 Overview
v7-M Architecture/Programmers Model
Data Path and Pipelines
ARM Cortex-M3 Tools and mbed Platform

Introduction
ARM University Relations

1 2

What’s Happening in Microcontrollers? ARM Cortex-M3 Processor


ETM
 Microcontrollers are getting cheap NVIC
ARM Instruction
Trace
 32-bit ARM Cortex-M3 Microcontrollers @ $1
1-240 Interrupts Cortex-M3
8-256 Priorities Core (5-pins)
 Some microcontrollers sell for as little as $0.65 I D TPIU Trace Port


Trace Port Serial-Wire
Microcontrollers are getting powerful DAP Viewer
 Lots of processing, memory, I/O in one package JTAG/SWD
MPU
ITM (1-pin)
Instrumentation
 Floating-point is even available in some! Trace

 Microcontrollers are getting interactive DWT


FPB
 Internet connectivity, new sensors and actuators BKPT
Data Trace

 LCD and display controllers are common Bus Matrix


Code Buses System Bus
to Flash to Stack SRAM
to Code SRAM to Peripherals

 Creates new opportunities for microcontrollers

3 4
ARM Cortex-M3 Microcontroller ARM Cortex-M3 Microcontroller
 18 x 32-bit registers  ARMv7M Architecture
 Excellent compiler target  No Cache - No MMU

 Reduced pin count requirements  Debug is optimized for microcontroller applications

 Efficient interrupt handling  Vector table contains addresses, not instructions


 DIV instruction
 Power management
 Interrupts automatically save/restore state
 Efficient debug and development support features
 Exceptions programmed in C (No Coprocessor 15 - All registers are memory-mapped)
 Breakpoints, Watchpoints,  Interrupt controller is part of Cortex-M3 macrocell
 Flash Patch support,  Fixed memory map
 Instruction Trace  Bit-banding
 Strong OS support  Non-Maskable Interrupt (NMI)
 User/Supervisor model  Only one processor status reg

 OS support features  Thumb-2 processing core




Mix of 16 and 32 bit instructions for very high code density
Designed to be fully programmed in C (even reset, interrupts and
 Gives complete Thumb compatibility
exceptions)

5 6

ARM and Thumb Performance The Thumb-2 instruction set


 Variable-length instructions
 ARM instructions are a fixed length of 32 bits
30000  Thumb instructions are a fixed length of 16
bits
25000
 Thumb-2 instructions can be either 16-bit or
20000 32-bit
Dhrystone 2.1/sec
@ 20MHz
15000 ARM
Thumb
 Thumb-2 gives approximately 26%
10000 improvement in code density over ARM
5000

0
 Thumb-2 gives approximately 25%
32-bit 16-bit 16-bit with improvement in performance over
32-bit stack
Thumb
Memory width (zero wait state)

7 8
Agenda Cortex-M3 Register Set
Cortex-M3 Overview Main

 v7-M Architecture/Programmers Model  Very compiler friendly r0

Data Path and Pipelines  Load/Store Architecture


r1
r2
 32-bit registers r3
Tools and mbed Platform r4
 Flexible register scheme r5

 Linear 32-bit address space


r6
r7
r8
r9
r10
r11
r12 Process
sp
sp
lr
r15 (pc)

xPSR

9 10

Program Status Register An Example AMBA System


31 28 27 26 25 24 23 16 15 10 7 0

N Z C V Q IT T IT/ICI ISR Number


High Performance
APB
ARM processor UART

 One Status Register consisting of High


Timer
Bandwidth AHB
 APSR - Application Program Status Register – ALU flags External
APB
Bridge
 IPSR - Interrupt Program Status Register – Interrupt/Exception No. Memory Keypad
Interface
 EPSR - Execution Program Status Register
 IT field – If/Then block information High-bandwidth DMA PIO
 ICI field – Interruptible-Continuable Instruction information on-chip RAM Bus Master

 xPSR Low Power


Non-pipelined
High Performance
 Composite of the 3 PSRs Pipelined
Simple Interface

 Stored on the stack on exception entry Burst Support


Multiple Bus Masters

11 12
Memory Map NXP LPC1311/13/42/43 Block Diagram
 Very simple linear 4GB memory map
 The Bus Matrix partitions memory access via the AHB and PPB buses
FFFFFFFF
System
E0100000
The image cannot be display ed. Your computer may not hav e enough memory to open the image,
APB Debug Components
or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the
red x still appears, y ou may hav e to delete the image and then insert it again.

E0040000
CM3 Instruction SCS + NVIC
E0000000
Core Data
External Peripheral
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupt ed. Restart y our computer, and then open the file
again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
1 GB
Bus Matrix INTERNAL PPB
A0000000
with SYSTEM AHB
SYSTEM AHB External RAM
Debug Bit- Bander
Debug ICODE AHB 1 GB
Aligner
and Patch DCODE AHB 60000000
Peripheral ½GB
40000000
RAM
½GB
20000000
Code Space ½GB
00000000

13 14

NXP LPC1311/13/42/43 Memory Map Processor Privilege


ARM Cortex-M3

Privileged Aborts
Supervisor Interrupts
Reset
Handler Mode
OS

System Call (SVCall)


Undefined Instruction

User Non-Privileged
Thread Mode Application code

Memory

Instructions & Data

15 16
Memory Protection Unit (MPU) Cortex-M3 Bit Banding

Traditional Method of Atomic Manipulation


 MPU provides access control for various memory regions

0 0 0 0 0 0 0 0 Read byte from SRAM


 Zero Latency Memory Protection 0x02000000

 8 register-stored regions
 Same regions used for instructions and data
 Minimum region size 32 Bytes (max 4GB) Mask and Modify
x x x x x 1 x x
Bit Element
 No address translation or page tables 0x02000000

 Configured via memory-mapped control registers


0 0 0 0 0 1 0 0 Write byte to SRAM
0x02000000

17 18

Cortex-M3 Bit Banding Conditional Execution


 If – Then (IT) instruction added (16 bit)
 Writes to a word address in the  Up to 3 additional “then” or “else” conditions maybe specified (T or E)
bit band alias affect a single bit in  Makes up to 4 following instructions conditional
the bit band region
 The write is translated to an atomic ITTET EQ MOVEQ
read-modify-write by the Cortex-M3 Inst 1 ADDEQ
bus matrix Inst 2
SUBNE
 Bit 0 of the stored register is written Inst 3
ORREQ
to the appropriate bit Inst 4
 Any normal ARM condition code can be used
Word alias 32MB
32MB Bit band alias  16-bit instructions in block do not affect condition code flags
 Apart from comparison instruction
31MB

Physical bit 1MB Bit band region


 32 bit instructions may affect flags (normal rules apply)
32MB Bit band alias
 Current “if-then status” stored in CPSR
 Conditional block maybe safely interrupted and returned to
31MB
 Must NOT branch into or out of ‘if-then’ block
1MB Bit band region

19 20
Interrupt Handling Exception Handling
 One Non-Maskable Interrupt (INTNMI) supported
 Reset
 1-240 prioritizable interrupts supported
 Interrupts can be masked  NMI
 Implementation option selects number of interrupts supported  Faults
 Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core  Hard Fault
 Interrupt inputs are active HIGH
 Memory Manage
 Bus Fault
INTNMI
 Usage Fault

NVIC
 SVCall
1-240 Interrupts Cortex-M3
 Debug Monitor

INTISR[239:0] Processor Core

 PendSV
 SysTick Interrupt

Cortex-M3  External Interrupt

21 22
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016

Learning Embedded System using advanced


Microcontroller and Real Time Operating System
Nivesh Dwivedi

 time system which is based on performing mono-task


Abstract— This Paper emphasizes the learning of embedded mechanism that hardly satisfies the current requirements(one
system's programming and design through porting of µCOS-II task) thus will cause more power consumption.
on ARM Cortex M-3.This paper will help the engineering and Related Works: - Already the porting of µCOS-II has been
Embedded System students to start their projects and designing
done earlier but the thing is- Can we apply some different way
of real time embedded system. It deals with the porting of
Micro COS-II in ARM based microcontroller for the
than to earlier? Yes, let’s try it some different way.
implementation of multitasking and time scheduling. Here the Earlier, porting of µCOS-II is given in Micrium 'µCOS-II and
real time operating system is the software that manages the time ARM Cortex M-3'. They used IAR IDE and a different SOC. I
of a micro controller to ensure that all time critical events are really appreciate the book and author. But this paper
processed as efficiently as possible. Different interface modules intensively will give a constraint and whole idea to students to
of ARM Cortex M-3 microcontroller like LED, SYSTIC work on embedded system and its design.
TIMER, BUZZER, UART, LCD, ADC, SPI etc. are tested. This Detailed Works: - To complete this project we need to have
paper mainly concentrates on the porting of µCOS-II. the knowledge of the followings-
1. ARM Cortex M-3 and its peripherals.
2. Familiar with Keil IDE.
Index Terms— Embedded systems, ARM Cortex M-3, KEIL
IDE and real time Kernel. 3. µCOS-II, Real Time Operating System.

I. INTRODUCTION 1. ARM Cortex M-3


What Is the ARM (advance RISC machine) Cortex M-3?
Engineering students study many subjects like The microcontroller market is very vast. A bewildering array
 Microprocessor of vendors, devices, and architectures is competing in this
market. The requirement for higher performance
 Microcontroller microcontrollers has been driven globally by the industry’s
changing needs; for example, microcontrollers are required to
 Operating Systems
handle more work without increasing a product’s frequency or
power. In addition, microcontrollers are becoming
 Embedded Systems etc.
increasingly connected, whether by Universal Serial Bus
But they are not able to make use of all these things in real (USB), Ethernet, or wireless radio, and hence, the processing
time applications. And in fact it is not possible in very hectic needed to support these communication channels and
schedule of their college life to cover all the perspectives. advanced peripherals are growing. Similarly, general
This paper inclusively will make them able to learn embedded application complexity is on the increase, driven by more
system and design real time application monitoring systems. sophisticated user interfaces, multimedia requirements,
Reading this paper you will feel that you can confidently start system speed, and convergence of functionalities.
to work on real time systems and their design. I will make it The Cortex-M3 is a 32-bit microprocessor. It has a 32-bit data
evident in detailed work of the paper using advanced path, a 32-bit register bank, and 32-bit memory interfaces.
microcontroller and real time kernel. The processor has a Harvard architecture, which means that it
has a separate instruction bus and data bus. This allows
The important trait of using real time kernel is instructions and data accesses to take place at the same time,
MULTITASKING.' Using a real time operating system we and as a result of this, the performance of the processor
can design real time systems performing multiple tasks increases because data accesses do not affect the instruction
simultaneously like LED blinking/glowing, alarm, pipeline. This feature results in multiple bus interfaces on
temperature sensor, displaying LCD and serial Cortex-M3, each with optimized usage and the ability to be
communication etc. Real time systems that are intensively used simultaneously. However, the instruction and data buses
used in critical areas like space research and defense share the same memory space (a unified memory system). In
applications etc. To realize an industrial real time application other words, you cannot get 8 GB of memory space just
Monitoring Systems. The heart of the system is a real time because you have separate bus interfaces. For complex
kernel that uses preemptive scheduling to achieve applications that require more memory system features, the
multitasking on any embedded platform. Cortex-M3 processor has an optional Memory Protection
Earlier systems are non-real time operating systems which are Unit (MPU), and it is possible to use an external cache if it’s
often quite non-deterministic and slow responsiveness. So use required. Both little endian and big endian memory systems
of real time operating system is just an overcome on non-real are supported. The Cortex-M3 processor includes a number
of fixed internal debugging components. These components
provide debugging operation supports and features, such as
Nivesh Dwivedi, B.Tech/Electronics IV year, Hans Raj College,
University of Delhi. breakpoints and watch points. In addition, optional

91 www.erpublication.org
Learning Embedded System using advanced Microcontroller and Real Time Operating System

components provide debugging features, such as instruction enhanced determinism, improved code density, Ease of use,
trace, and various types of debugging interfaces. Lower cost solutions, Wide choice of development tools
ARM cores use a 32-bit, Load-Store RISC architecture. It These above are the merits that make ARM Cortex m-3
means that the core cannot directly manipulate the memory of suitable for our porting purpose.
system. All data manipulation must be done by loading The Cortex-M3 processor is based on one profile of the v7
registers with information located in memory, performing the architecture, called ARM v7-M, an architecture specification
data operation and then storing the value back to memory. for microcontroller products. Cortex-M3 supports only the
The Cortex-M3 processor has registers R0 through R15. Thumb-2 (and traditional Thumb) instruction set. Instead of
R0–R12 are 32-bit general-purpose registers for data using ARM instructions for some operations, as in traditional
operations. Some 16-bit Thumb instructions can only access a ARM processors, it uses the Thumb-2 instruction set for all
subset of these registers (low registers, R0–R7). The operations.
Cortex-M3 contains two stack pointers (R13). They are The details of the ARMv7-M architecture are documented in
banked so that only one is visible at a time. The two stack The ARMv7-M Architecture Application Level Reference
pointers are follows- Manual. This document can be obtained via the ARM web
site through a simple registration process. The ARMv7-M
• Main Stack Pointer (MSP): The default stack pointer, used architecture contains the following key areas:
by the operating system (OS) kernel and exception handlers. • Programmer’s model
• Process Stack Pointer (PSP): Used by user application code. • Instruction set
R14 (The link register): - When a subroutine is called, the • Memory model
return address is stored in the link register. • Debug architecture
R15 (The program Counter):- The program counter is the Processor-specific information, such as interface details and
current program address. This register can be written to timing, is documented in the Cortex-M3 Technical Reference
control the program flow. Manual (TRM). This manual can be accessed freely on the
Special registers: The Cortex-M3 processor also has a ARM website.
number of special registers. They are as follows-
• Program Status Register (PSRs) Cortex-M3 Processor Applications
•Interrupt Mask registers (PRIMASK, With its high performance and high code density and small
FAULTMASK, and BASEPRI) silicon footprint, the Cortex-M3 processor is ideal for a wide
• Control registers (CONTROL) variety of applications as-
These registers have special functions and can be accessed
only by special instructions. They cannot be used for normal • Low-cost microcontrollers
data processing. • Automotive
• Data communications
• Industrial control
• Consumer products
There are already many Cortex-M3 processor-based products
on the market, including low-end
Products priced as low as US$1, making the cost of ARM
microcontrollers comparable to or lower than that of many
8-bit microcontrollers.

II. KEIL IDE SOFTWARE


Keil IDE is a windows operating system (os) software
program that runs on a PC to develop applications of ARM
microcontroller and digital signal controller.
It is also called Integrated Development Environment or IDE
because it provides a single integrated “environment” to
develop code for embedded microcontroller.
The Keil compiler is the industry standard and supports more
than 500 current 8051 device variants. Now, Keil software
offers development tools for ARM.

Keil Software, world's leading developer of Embedded


Systems Software, makes ANSI C compilers, macro
assemblers, real-time kernels, debuggers, linkers, library
managers, simulators, integrated environments, and
evaluation boards for the 8051, 251, ARM7, and C16x/ST10
Fig.1:- A Simplified View of ARM Cortex M-3 microcontroller families. Keil Software implemented the first
C compiler designed from the ground-up specifically for the
The Cortex-M3 addresses the requirements for the 32-bit 8051 microcontroller.
embedded processor market in the following ways- Keil development tools offer a complete development
Greater performance efficiency, Low power consumption, environment for ARM Cortex-M, and Cortex-R

92 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
processor-based devices. They are easy to learn and use, yet 2. You must be able to disable and enable interrupts from C.
powerful enough for the most demanding embedded 3. The processor must support interrupts and you need to
applications. provide an interrupt that occurs at regular intervals (typically
In this project i will use keil IDE software micro Vision-5. between 10 to 100 Hz).
4. The processor must support a hardware stack, and the
processor must be able to store a fair amount of data on the
III. µCOS-II stack (possibly many Kbytes).
Introduction: - µCOS-II (pronounced "Micro C O S 2") stands 5. The processor must have instructions to load and store the
for Micro-Controller Operating System Version 2. µCOS-II is stack pointer and other CPU registers either on the stack or in
upward compatible with µCOS (V1.11) but provides many memory.
improvements over µCOS such as the addition of a ARM Cortex M-3 satisfies all the above requirements so we
fixed-sized memory manager, user definable callouts on task can easily port µCOS-II in it.
creation, task deletion, task switch and system tick, supports Porting µCOS -II is actually quite straightforward once you
TCB extensions, stack checking and, much more. understand the subtleties of the target processor and the C
If you currently have an application (i.e. product) that runs compiler you will be using.
with µCOS, your application should be able to run, virtually If your processor and compiler satisfy µCOS -II’s
unchanged, with µCOS-II. All of the services (i.e. function requirements, and you have all the necessary tools, porting
calls) provided by µCOS have been preserved. You may, µCOS-II consists of the followings-
however, have to change include files and product build files 1. setting the value of 1 #define constants (OS_CPU.H)
to ‘point ’to the new file names. µCOS-II was developed and 2. Declaring 10 data types (OS_CPU.H)
tested on a PC; µCOS-II was actually targeted for embedded 3. Declaring 3 #define macros (OS_CPU.H)
systems and can easily be ported to many different processor 4. Writing 6 simple functions in C (OS_CPU_C.C)
architectures. 5. Writing 4 assembly language functions
It is a very small real-time kernel with memory footprint is (OS_CPU_A.ASM)
about 20KB for a kernel with full functions and source code is
about 5400 lines, mostly in ANSI C. Source code for µCOS-II All the source codes, you need not to write by your own but
is free but not for commercial purpose. If you want to use it as you should understand its working functionality well. These
commercial purpose, you have to take permission. source codes are easily available so you can use these directly
on your initial stage of porting because these are the processor
Selecting µCOS-II: - There are the following features which independent codes. You need to work on processor dependent
make µCOS-II suitable/convenient to port- codes and your application codes. Also you have to add
 Portable ‘INCLUDES.H’.
INCLUDES.H allows every .C file in your project to be
 ROMABLE
written without concerns about which header file will actually
 Scalable
be needed.
 Preemptive
Depending on the processor, a port can consist of writing or
 Multi-tasking changing between 50 and 300 lines of code.
 Deterministic
 Task stacks Starting and Initializing µCOS-II
 Services
 Interrupt Management a. Starting µCOS-II: - µCOS-II starts in the same way as
 Robust and reliable shown in the fig.2. First we will initialize both the hardware
and software .Here the hardware i have used is the ARM
Cortex M-3 and software is the real time operating system
PORTING OF µCOS-II µCOS-II. The resources are allocated for the tasks defined in
Adapting a real-time kernel to a microprocessor or a the application then the scheduler is started and it schedules
microcontroller is called a port. Most of µCOS- II is written in the tasks in pre-emptive manner.
C for portability; however, it is still necessary to write some
processor specific code in C and assembly language. b. Initialization of µCOS-II: - The steps to initialize µCOS-II
Specifically, µCOS-II manipulates processor registers which are shown in Fig.3. We will follow the corresponding steps to
can only be done through assembly language. initialize it.
Porting µCOS -II to different processors is not so much The Steps we will take to initialize µCOS-II through
difficult task only because µCOS -II was designed to be programming is shown below-
portable.
Void main (void)
If you are going to port µCOS-II for your processor, of course {
you need to know how µCOS-II’s processor specific code
works. /* User initialization*/
A processor can run µCOS-II if it satisfies the following OSInit ( ); /* kernel initialization */
requirements:
/* Start OS*/
1. You must have a C compiler for the processor and the C OSStart ( ); /* start multitasking */
compiler must be able to produce reentrant code. }

93 www.erpublication.org
Learning Embedded System using advanced Microcontroller and Real Time Operating System

Hardware and Software Architecture:-Given fig. 4 shows a


block diagram of the relationship between your application,
µCOS-II, the µCOS-II port, the BSP (Board Support
Package), the ARM Cortex-M3 CPU and the target hardware.
APP.C is a standard test file for µCOS-II. APP.C would be
where you would place main( ) but, of course, you can place
main( )anywhere you want.
The two important functions are-
1. Main ( ) and
2. AppStartTask ( )

Function main ( ):-


void main (void)
{
#if OS_TASK_NAME_SIZE > 13
INT8U err;
#endif
BSP_IntDisAll ( );
Fig. 2:- Starting of µCOS-II OSInit ( );
OSTaskCreateExt (AppStartTask,
Creating Task:- For multitasking , the µCOS-II needs to have (void *) 0,
information about the task, its starting address, top-of-stack (OS_STK *)&AppStartTaskStk
(TOS), priority, arguments passed to the task etc. [APP_TASK_START_STK_SIZE-1],
You can create a task by calling a service provider by APP_TASK_START_PRIO,
μCOS-II in the following way- (OS_STK *)&AppStartTaskStk [0],
OStaskCreate (void (*task) (void *parg),Void *parg); // APP_TASK_START_STK_SIZE,
Address of Task (void *) 0,
OS_STK *pstk; // Pointer to task’s Top of Task OS_TASK_OPT_STK_CHK |
INT8U prio); // Priority of task (0--64) OS_TASK_OPT_STK_CLR);
You can create the task before you start multitasking (at
initialization time). #if OS_TASK_NAME_SIZE > 11
OSTaskNameSet (APP_TASK_START_PRIO, "Start
Task", &err);
#endif

OSStart ( );
}

AppStartTask ( ):-
static void AppStartTask (void *p_arg)
{
(void) p_arg;

BSP_Init ( );

OS_CPU_SysTickInit ( );

#if OS_TASK_STAT_EN > 0


OSStatInit ( );
#endif
Fig. 3:- Initializing µCOS-II AppTaskCreate ( );

IV. ARCHITECTURE While (TRUE) {


/* Do something ‘useful’ in this task */
In every embedded systems, there is a board support LED_Toggle (1);
package (BSP) for a given board. It is commonly built with OSTimeDly (OS_TICKS_PER_SEC / 20);
a boot loader that contains the minimal device support to load }
the operating system and device drivers for all the devices on }
the board. It can provide a root file system, a tool chain
for making programs to run on the embedded system Once you have a port of µCOS-II for your processor, you will
(which would be part of the architecture support package), need to verify its operation. Testing a multitasking real -time
and configurations for the devices. kernel such as µCOS-II is not as complicated as you may

94 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
think. You should test your port without application code. In Buzzer (we can generate desired music or alarm), UART,
other words, test the operations of the kernel by itself. Also Displaying LCD etc. We can also perform many projects Like
you can test it by checking whether context switching is Home automation using Bluetooth and UART together ,
happening or not on your Register window in KEIL IDE. Noticeboard display using Bluetooth, LCD and UART
There are two reasons to do this. First, you don’t want to together etc. But it is enough to perform two-three tasks to test
complicate things any more than they need to be. Second, if porting of our µCOS-II. Thus we can port µCOS-II using
something doesn’t work, you know that the problem lies in the development tools i.e. ARM Cortex M-3 and KEIL IDE.
port as opposed to your application. Start with a couple of
simple tasks and only the ticker interrupt service routine.
Once you get multitasking going, it’s quite simple to add your VI. CONCLUSION
application tasks. In this Research paper the porting of a real time operating
system µCOS-II on ARM Cortex M-3 using software keil
µvision-5 is presented. It mainly concentrates on
development of an embedded monitoring system using ARM
Cortex M-3 and Real Time Kernel. All the steps taken while
porting the µCOS-II and implementation thesis are provided
in the paper. The paper gives a detailed overview that will
help the students to develop and design an embedded
monitoring system using ARM cortex M-3 and Real time
operating system .

ACKNOWLEDGEMENT
I would like to acknowledge the Centre for Development and
Advance Computing (C-DAC), Hyderabad, Govt. of India
and their faculties to train and inspire for the work. I would
also like to place on record my sincere thanks and gratitude to
Shri Sanjay Kr. Vyas Scientist 'E' / Additional Director, HRD
Division, Dept. of Electronics and Information Technology
(DeitY), Ministry of Communication and IT, govt. of India for
his continuous guidance and support for my project work.

Figure 4:- Relationship between modules.


REFERENCES
[1] Micrium µC/OS-II for the ARM Cortex-M3 Processors and
V. IMPLEMENTATION www.micrium.com
[2] www.arm.com
The Real Time Kernel is the most important thing in any real [3]µCOS-II, The Real Time Kernel, and http://www.uCOS-II.com
time system that does pre-emtive scheduling to perform [4] Design of µC/ Os II RTOS Based Scalable Cost Effective Monitoring
System Using Arm Powered Microcontroller, M. Venkateswara Rao,
multitasking which is the real trait of RTOS. This research Dept. of ECM, K L University, A.P, India.
paper emphasizes the implementation of hardware and [5]Jean J Labrosse, MicroC/OS-II the Real-Time Kernel, Second Edition
software together. Beijing University of Aeronautics and Astronautics Press.
In µCOS-II maximum 64 tasks we can perform [6]Tianmiao Wang the Design and Development of Embedded System
Based on ARM Micro System and µC/OS-II Real-Time Operating
simultaneously but here we have six tasks in fig.5 shown System Tsinghua University Press.
below. [7]The Definitive Guide to the ARM Cortex M-3, second edition by
Joseph Yiu.

Nivesh Dwivedi, currently pursuing B.Tech Electronics


IV year from Hans Raj College, University Of Delhi. His Current interest is
to work in the field of Embedded Systems and Digital Image Processing.
Currently He is also working on a project "Image Registration using two
stereo Camera" with Helium Ink Company, Pune. He has been awarded by
Fig.5:- Implementation of hardware and software. Mr Akhilesh Yadav, Chief Minister of U.P. with 'Award of Excellence' for his
excellent performance in XII class. He is very active to participate in
Depending on our requirement we can vary the number of technical Seminars, Workshops as well as extracurricular activities.
tasks at a time. Here, to verify our porting of µCOS-II we can
perform several tasks like LED blinking, Systick Timer,

95 www.erpublication.org
Chapter 11
11. Direct Memory Access (DMA)

Some devices in the MSP430 family support a multi-channel Direct


Memory Address (DMA) controller that can move data from one
location to another, without CPU intervention. This increases the
throughput of peripheral modules and also allows the CPU to remain
in a low-power mode, without needing to wake up to perform the
data transfer. This gives the benefit of reduced power consumption.
Data transfers to/from peripherals can be initiated by external and
internal events, using triggers.
This chapter covers DMA operation, supported addressing and
transfer modes, trigger selection, channel priorities and DMA
controller interrupts.

Topic Page

11.1 Direct Memory Access (DMA) capability ........................11-2


11.2 DMA configuration and operation ..................................11-3
System interrupts ..............................................................11-8
DMA controller interrupts...................................................11-8
11.3 DMA registers ................................................................11-9
11.4 Laboratory 7: Direct Memory Access............................11-12
11.4.1 Lab7A: Data Memory transfer triggered by
software ..........................................................................11-12
11.4.2 Lab7B: Sinusoidal signal generator ....................11-14
11.5 Quiz .............................................................................11-17
11.6 FAQs ............................................................................11-18

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-1


Direct Memory Access (DMA)

11.1 Direct Memory Access (DMA) capability


The MSP430 is well suited to low-power applications, and DMA is a
very useful facility to have in order to achieve this. The following
devices in the MSP430 family support DMA: 5xxx, FG4xx(x), F261x,
F16x(x) and F15x. The Experimenter’s Board uses the
MSP430FG4618.

When a low power application requires data handling, the direct


memory access (DMA) capability automatically handles data without
CPU intervention, lowering the power consumption because the CPU
remains sleeping.

The objective of DMA is to move functionality from the CPU to


peripherals (see Figure 11-1) because:
 Peripherals use less current than the CPU;
 Performing operations directly between peripherals allows the
CPU to shut down, saving system power;
 “Intelligent” peripherals are the most capable, providing more
opportunity for CPU shutoff;
 DMA can be enabled for repetitive data handling, increasing the
throughput of peripheral modules;
 Minimal software requirements and CPU cycles.

Figure 11-1. DMA data handling example.

The TI webpage gives some application notes, which explain the use
of the DMA controller for different applications, with the objective of
reducing power consumption:

 Streamlining the mixed-signal path with the signal-chain-on-


chip MSP430F169 <slyt078.pdf>
 An integrated signal chain contains a variable resistance that
generates a voltage level sampled by the ADC. The
conversion result is processed and used to determine the
update rate of the DAC and consequently, the analogue
output signal frequency. The DAC output frequency
adjustment is made by interrupting the DMA instead of the
CPU, freeing up CPU resources for other tasks.

11-2 Copyright  2009 Texas Instruments, All Rights Reserved www.msp430.ubi.pt


DMA configuration and operation

 Interfacing the MSP430 with MMC/SD Flash Memory Cards


<slaa281b.pdf>
 The MSP430F161x microcontroller is used to communicate
with an MMC or SD flash memory card via a serial peripheral
interface (SPI). The DMA module is used for data
transmission between the MSP430 and the MMC card,
resulting in higher communication speed and less CPU load.

 Digital FIR Filter Design Using the MSP430F16x <slaa228.pdf>


 A FIR filter is implemented using the MSP430F16x family of
devices. The complete filter algorithm is executed by the 3-
channel DMA peripheral and the hardware multiplier
peripheral. The 3-channel DMA peripheral is used to handle
the required data, coefficients and movement of results
between the memory and the multiply-and-accumulate
(MAC). This dramatically improves the efficiency of the
computation of the real-time FIR filter algorithm running on-
chip, without the intervention of the CPU.

 Using the USCI I2C Master <slaa382.pdf>


 Use of the I2C master function, for MSP430 devices with the
USCI module. These functions can be used by the MSP430
master device to ensure proper initialization of the USCI
module and provide I2C transmit and receive functionality.
The DMA module manages the loading of seven data bytes
that need to be sent, because during the transmission, the
CPU is in Low Power Mode 0.

11.2 DMA configuration and operation


The direct memory access (DMA) controller (see block diagram in
Figure 11-2) allows movement of data from one memory address to
another, across the entire address range, without CPU intervention.

Three DMA channels are implemented on the MSP430FG4618 device


on the Experimenter’s board.

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-3


Direct Memory Access (DMA)

Figure 11-2. DMA block diagram.

11-4 Copyright  2009 Texas Instruments, All Rights Reserved www.msp430.ubi.pt


DMA configuration and operation

DMA controller features:


 Three independent transfer channels;
 Configurable (with the ROUNDROBIN bit) DMA channel priorities
(default: DMA0−DMA1−DMA2);
 DMA Transfer cycle time:
 Requires only two MCLK clock cycles per transfer;
 Each byte/word transfer requires two MCLK cycles after
synchronization, and one cycle of wait time after the transfer.
 Byte or word and mixed byte/word transfer capability:
 Byte-to-byte;
 Word-to-word;
 Byte-to-word (upper byte of the destination word is cleared
when the transfer occurs);
 Word-to-byte (lower byte of the source word transfers).
 Block sizes up to 65535 bytes or words;
 Configurable selection of transfer trigger (see Table 11-1);

Table 11-1. DMA trigger modes.

DMAxTSELx Transfer triggered


when DMAREQ = 1
0000
(DMAREQ = 0 automatically when the transfer starts)
<Timer_A> when TACCR2 CCIFG = 1
0001 (CCIFG = 0 automatically when the transfer starts)
If CCIE = 1, CCIFG does not trigger a transfer
<Timer_B> when TBCCR2 CCIFG = 1
0010 (CCIFG = 0 automatically when the transfer starts)
If CCIE = 1, CCIFG does not trigger a transfer
<USART0>:
when URXIFG0 = 1
(URXIFG0 = 0 automatically when the transfer starts)
If URXIE0 = 1, URXIFG0 flag does not trigger a transfer
0011
<USCI_A0>:
when UCA0RXIFG = 1
(UCA0RXIFG = 0 automatically when the transfer starts)
If UCA0RXIE = 1, UCA0RXIFG flag does not trigger a transfer
<USART0>:
when UTXIFG0 =1
(UTXIFG0 = 0 automatically when the transfer starts)
If UTXIE0 = 1, UTXIFG0 flag does not trigger a transfer
0100
<USCI_A0>:
when UCA0TXIFG = 1
(UCA0TXIFG = 0 automatically when the transfer starts)
UCA0TXIE = 1, UCA0TXIFG flag does not trigger a transfer
<DAC12> when DAC12_0CTL DAC12IFG = 1
0101 (DAC12IFG = 0 automatically when the transfer starts)
If DAC12IE = 1, DAC12IFG does not trigger a transfer

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-5


Direct Memory Access (DMA)

Table 11-1. DMA trigger modes (continued).

DMAxTSELx Transfer triggered


<ADC12> when ADC12IFGx = 1 (corresponding ADC12IFGx flag for
single-channel conversions, and the ADC12IFGx for the last conversion for
0110 sequence conversions)
(All ADC12IFGx = 0 automatically when the associated ADC12MEMx
register is accessed by the DMA controller)
<Timer_A> when TACCR0 CCIFG = 1:
0111 CCIFG = 0 automatically when the transfer starts
If CCIE = 1, CCIFG flag does not trigger a transfer
<Timer_B> when TBCCR0 CCIFG = 1
1000 (CCIFG = 0 automatically when the transfer starts)
If CCIE = 1, CCIFG does not trigger a transfer
<USART1>:
when URXIFG1 = 1
1001
(URXIFG1 = 0 automatically when the transfer starts)
If URXIE1 = 1, URXIFG0 flag does not trigger a transfer
<USART1>:
when UTXIFG1 =1
1010
(UTXIFG1 = 0 automatically when the transfer starts)
If UTXIE1 = 1, UTXIFG0 flag does not trigger a transfer
<Hardware Multiplier>
1011
when the hardware multiplier is ready for a new operand
<USCI_B0>:
when UCB0RXIFG = 1
1100
(UCB0RXIFG = 0 automatically when the transfer starts)
If UCB0RXIE = 1, UCB0RXIFG flag does not trigger a transfer
<USCI_B0>:
when UCB0TXIFG = 1
1101
(UCB0TXIFG = 0 automatically when the transfer starts)
UCB0TXIE = 1, UCB0TXIFG flag does not trigger a transfer
when the DMAxIFG = 1:
DMA0IFG triggers channel 1
1110 DMA1IFG triggers channel 2
DMA2IFG triggers channel 0
(None of the DMAxIFG = 0 automatically when the transfer starts)
1111 When an external trigger DMAE0 = 1

 Selectable edge or level-triggered transfer (DMALEVEL bit);


 Four addressing modes (see Figure 11-3) for each DMA channel
independently configurable (DMASRCINCRx and DMADSTINCRx
control bits):
 Fixed address to fixed address;
 Fixed address to block of addresses;
 Block of addresses to fixed address;
 Block of addresses to block of addresses.

11-6 Copyright  2009 Texas Instruments, All Rights Reserved www.msp430.ubi.pt


DMA configuration and operation

Figure 11-3. DMA addressing modes.

 Six transfer modes. Each channel is individually configurable by


the DMADTx bits (see Table 11-2).

Table 11-2. DMA transfer modes.

DMADTx Transfer mode Description DMAEN after


transfer
000 Single transfer Each transfer requires a trigger 0
A complete block is transferred
001 Block transfer 0
with one trigger
CPU activity is interleaved with a
010, 011 Burst-block transfer 0
block transfer
Repeated single
100 Each transfer requires a trigger 1
transfer
Repeated block A complete block is transferred
101 1
transfer with one trigger
Repeated burst-block CPU activity is interleaved with a
110, 111 1
transfer block transfer

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-7


Direct Memory Access (DMA)

System interrupts
DMA transfers are not interruptible by system interrupts, but system
interrupt service routines (ISRs) may be interrupted by DMA
transfers.
Only non-maskable interrupts (NMIs) can be configured to interrupt
the DMA controller, if the ENNMI bit is set. If it is not set, system
interrupts remain pending until the completion of the transfer.

DMA controller interrupts


Each DMA channel has its own DMAIFG flag, which is set when the
corresponding DMAxSZ register counts to zero (all modes). If the
corresponding DMAIE and GIE bits are set, an interrupt request is
generated.
The MSP430FG4618 device implements the interrupt vector register
DMAIV. In this case, all DMAIFG flags are prioritized and combined
to source a single interrupt vector. The interrupt vector register
DMAIV is used to determine which flag requested an interrupt.

 USCI_B I2C module with DMA


 Two trigger sources for the DMA controller;
 Triggers a transfer when new I2C data is received and when
data is needed for transmit.

 ADC12 with DMA


 Automatically moves data from any ADC12MEMx register to
another location;

 DAC12 with DMA


 Automatically moves data to the DAC12_xDAT register;

 Flash memory with the DMA


 Automatically moves data to the flash memory;
 Supports word/byte data transfers to the flash memory;
 The write timing control is performed by the flash controller;
 Write transfers to the flash memory succeed if the flash
controller set-up is prior to the DMA transfer and if the flash
is not busy.

All these DMA transfers occur without CPU intervention and


independently of any low-power modes. This increases throughput
of the modules and enhances low-power applications by allowing the
CPU to remain off while data transfers occur.
11-8 Copyright  2009 Texas Instruments, All Rights Reserved www.msp430.ubi.pt
DMA registers

11.3 DMA registers

The DMA controller registers are shown for the MSP430FG4618:

DMACTL0, DMA Control Register 0

15 14 13 12 11 10 9 8
Reserved DMA2TSELx

7 6 5 4 3 2 1 0
DMA1TSELx DMA0TSELx

See Table 11-1. All DMAxTSELx registers are the same.

DMACTL1, DMA Control Register 1

15 14 13 12 11 10 9 8
0 0 0 0 0 0 0 0

7 6 5 4 3 2 1 0
0 0 0 0 0 DMAONFETCH ROUNDROBIN ENNMI

Bit Description
2 DMAONFETCH DMA on fetch:
DMAONFETCH = 0  DMA transfer occurs immediately
DMAONFETCH = 1  DMA transfer occurs on next instruction fetch after
the trigger
1 ROUNDROBIN Round robin:
ROUNDROBIN = 0  DMA channel priority is DMA0 − DMA1 − DMA2
ROUNDROBIN = 1  DMA channel priority changes with each transfer
0 ENNMI Enable NMI when ENNMI = 1, allowing a NMI interrupt to interrupt a DMA
transfer

DMAxCTL, DMA Channel x Control Register

15 14 13 12 11 10 9 8
Reserved DMADTx DMADSTINCRx DMASRCINCRx

7 6 5 4 3 2 1 0
DMADSTBYTE DMASRCBYTE DMALEVEL DMAEN DMAIFG DMAIE DMAABORT DMAREQ

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-9


Direct Memory Access (DMA)

Bit Description
14-12 DMADTx DMA transfer mode:
DMADT2 DMADT1 DMADT0 = 000  Single transfer
DMADT2 DMADT1 DMADT0 = 001  Block transfer
DMADT2 DMADT1 DMADT0 = 010  Burst-block transfer
DMADT2 DMADT1 DMADT0 = 011  Burst-block transfer
DMADT2 DMADT1 DMADT0 = 100  Repeated single transfer
DMADT2 DMADT1 DMADT0 = 101  Repeated block transfer
DMADT2 DMADT1 DMADT0 = 110  Repeated burst-block
transfer
DMADT2 DMADT1 DMADT0 = 111  Repeated burst-block
transfer
11-10 DMADSTINCRx DMA destination address increment/decrement after each byte
or word transfer:
When DMADSTBYTE = 1, the destination address increments /
decrements by one
When DMADSTBYTE = 0, the destination address increments /
decrements by two.
DMADSTINCR1 DMADSTINCR0 = 00  Address unchanged
DMADSTINCR1 DMADSTINCR0 = 01  Address unchanged
DMADSTINCR1 DMADSTINCR0 = 10  Address decremented
DMADSTINCR1 DMADSTINCR0 = 11  Address increment
9-8 DMASRCINCRx DMA source address increment/decrement after each byte or
word transfer:
When DMASRCBYTE = 1, the source address
increments/decrements by one
When DMASRCBYTE = 0, the source address
increments/decrements by two.
DMASRCINCR1 DMASRCINCR0 = 00  Address unchanged
DMASRCINCR1 DMASRCINCR0 = 01  Address unchanged
DMASRCINCR1 DMASRCINCR0 = 10  Address decremented
DMASRCINCR1 DMASRCINCR0 = 11  Address increment
7 DMADSTBYTE DMA destination length (byte or word):
DMADSTBYTE = 0  Word
DMADSTBYTE = 1  Byte
6 DMASRCBYTE DMA source length (byte or word):
DMASRCBYTE = 0  Word
DMASRCBYTE = 1  Byte
5 DMALEVEL DMA level:
DMALEVEL = 0  Edge sensitive trigger (rising edge)
DMALEVEL = 1  Level sensitive trigger (high level)
4 DMAEN DMA enable when DMAEN = 1
3 DMAIFG DMA interrupt flag DMAIFG = 1 when interrupt pending
2 DMAIE DMA interrupt enable when DMAIE = 1
1 DMAABORT DMA Abort DMAABORT = 1 when a DMA transfer is interrupted
by NMI
0 DMAREQ DMA request DMAREQ = 1 starts DMA

11-10 Copyright  2009 Texas Instruments, All Rights Reserved www.msp430.ubi.pt


Laboratory 7: Direct Memory Access

DMAxSA, DMA Source Address Register


The 32-bit DMAxSA register points to the DMA source address for
single transfers or to the first source address for block transfers.
 Bits 31−20 are reserved and always read as zeros;
 Reading or writing bits 19-16 requires the use of extended
instructions;
 When writing to DMAxSA with word instructions, bits 19-16 are
cleared.

DMAxDA, DMA Destination Address Register


The 32-bit DMAxDA register points to the DMA destination address
for single transfers or to the first source address for block transfers.
 Bits 31−20 are reserved and always read as zeros;
 Reading or writing bits 19-16 requires the use of extended
instructions;
 When writing to DMAxDA with word instructions, bits 19-16 are
cleared.

DMAxSZ, DMA Size Address Register


The 16-bit DMA size address register defines the number of
byte/word data values per block transfer.
 DMAxSZ register decrements with each word or byte transfer;
 When DMAxSZ = 0, it is immediately and automatically reloaded
with its previously initialized value.

DMAIV, DMA Interrupt Vector Register


The 16-bit DMA Interrupt Vector value only uses bits 3 to 1. The
remaining bits are always read as zero.
The content of the DMAIV provides the priority of the interrupt
source:
 DMAIV = 02h: DMA channel 0 (highest priority);
 DMAIV = 04h: DMA channel 1;
 DMAIV = 06h: DMA channel 2;

 DMAIV = 0Eh: Reserved (lowest priority);

www.msp430.ubi.pt Copyright  2009 Texas Instruments, All Rights Reserved 11-11


Serial Communications
(Chapter 10)

RS232, SPI, I2C
Communications
• The simplest is parallel
The simplest is parallel
Multiple (8 
– One way typically) data 
lines
• There may be mechanism for
peripheral to get attention of  μC
“L t h”
“Latch” Peripheral
μC (i.e., interrupt, or poll) “CS”

– Two way
Data 
lines

μC “Latch” Peripheral

“CS”

“R/~W”

• This
This is resource expensive (pins, real
is resource expensive (pins real‐estate
estate…) in terms 
) in terms
of hardware, but easy to implement
Serial Communications
Serial Communications
• Many fewer lines are required to transmit data.  This is requires 
fewer pins but adds complexity
fewer pins, but adds complexity.

Data

μC
Clock Peripheral

“CS”

• Synchronous communications requires clock.  Whoever controls 
the clock controls communication speed.
• Asynchronous has no clock, but speed must be agreed upon 
beforehand (baud rate).
Asynchronous Serial (RS‐232)
• Commonly
Commonly used for one‐to‐one communication. 
used for one to one communication
• There are many variants, the simplest uses just two lines, TX 
(transmit) and RX (receive).
• Transmission process (9600 baud, 1 bit=1/9600=0.104 mS)
– Transmit idles high (when no communication).
– It
It goes low for 1 bit (0.104 mS)
l f 1 bit (0 104 S)
– It sends out data, LSB first (7 or 8 bits)
– There may be a parity bit (even or odd – error detection)
– There may be a stop bit (or two)
RS232 Voltage levels
• From processor side, 0V=logic 0,           3.3V=logic 1
• In a 
In a “serial”
serial  cable +12→+3V=logic 0,    ‐3→‐12V=logic 1
cable +12→+3V=logic 0 ‐3→‐12V=logic 1

• On “Experimenter’s board”
• Physical connector
RS232 – Handshaking
RS232 
• Some RS232 connections using handshaking lines between 
DCE (Data Communications Equipment) and DTE (Data 
( q p ) (
Terminal Equipment).  
– RTS (Ready To Send)
• Sent by the DTE to signal the DCE it is Ready To Send.
– CTS (Clear To Send)
• Sent by the DCE to signal the DTE that it is Ready to Receive.
– DTR (Data Terminal Ready)
• Sent to DTE to signal the DCE that it is ready to connect
– DSR (Data Set Read)
• Sent to DC to signal the DTE that it is ready to connect
• IIn practice if these handshaking lines are used it can be 
ti if th h d h ki li d it b
difficult to set up the serial communications, but it is quite 
robust once working.
• There is also software handshaking (XON/XOFF)
• DTE and DCE have different connector pinouts.
MSP430 USCI in UART mode
(also USART peripheral)

UART mode features include:
• 7‐ or 8‐bit data;  odd, even, or non‐parity
• Independent transmit and receive 
•LSB‐first or MSB‐first data 
d d f
•Receiver start‐edge detection for auto‐
wake up from LPMx modes
•Independent interrupt capability for 
receive and transmit
receive and transmit
• Status flags for error detection and 
suppression

•Built‐in idle‐line and address‐bit 
communication protocols for 
p y
multiprocessor systems
• Status flags for address detection
// Echo a received character, RX ISR used. Normal mode is LPM3,
// USCI_A0 RX interrupt triggers TX Echo.

UART code //
//
//
ACLK = BRCLK = LFXT1 = 32768, MCLK = SMCLK = DCO~1048k
Baud divider, 32768hz XTAL @9600= 32768/9600= 3.41(0003h 03h )
-----------------
// /|\| MSP430xG461x |-
// | | XIN|- 32kHz
// --|RST XOUT|-
// | P4.7/UCA0RXD|------------>
#include "msp430xG46x.h" // | | 9600 - 8N1
// | P4
P4.6/UCA0TXD|<------------
6/UCA0TXD|<
void main(void)
{
volatile unsigned int i;

P4SEL |= 0x0C0; // P4.7,6 = USCI_A0 RXD/TXD


UCA0CTL1 |= UCSSEL_1; // CLK = ACLK
UCA0BR0 = 0x03; // 32k/9600 - 3.41
UCA0BR1 = 0x00; // User’s manual has formulas for these
UCA0MCTL = 0x06; // Modulation
UCA0CTL1 &= ~UCSWRST; // **Initialize USCI state machine**
IE2 |= UCA0RXIE; // Enable USCI_A0 RX interrupt

_BIS_SR(LPM0_bits + GIE); // Enter LPM0, interrupts enabled


}

// Echo back RXed character, confirm TX buffer is ready first


#pragma vector=USCIAB0RX_VECTOR
__interrupt
p void USCIA0RX_ISR ((void)
)
{
while(!(IFG2&UCA0TXIFG)); // Make sure last character went out.
UCA0TXBUF = UCA0RXBUF; // TX -> RXed character
}
CONTROLLER AREA CAN was designed by Bosch and is currently described by ISO
NETWORK 118981. In terms of the Open Systems Interconnection model (OSI),
CAN partially defines the services for layer 1 (physical) and layer 2
Tutorial

(data link). Other standards such as DeviceNet, Smart Distributed


System, CAL, CAN Kingdom and CANopen (collectively called
higher layer protocols) build upon the basic CAN specification and
define additional services of the seven layer OSI model. Since all of
these protocols utilize CAN integrated circuits, they all comply with
the data link layer defined by CAN.

Figure 1.
Application Layer
The CAN Protocol Specification
and the OSI model
Logical Link Control (LLC)
ISO Data Link
(Layer 2)
Media Access Control (MAC)
CAN Protocol
Specification
Physical Layer Signaling (PLS)
ISO Physical
(Layer 1)
Medium Attachment Unit (MAU)

ISO Media Transmission Media


(Layer 0)

CAN specifies the medium access control (MAC) and physical layer
signaling (PLS) as it applies to layers 1 and 2 of the OSI model.
Medium access control is accomplished using a technique called
non-destructive bit-wise arbitration. As stations apply their unique
identifier to the network, they observe if their data are being
faithfully produced. If it is not, the station assumes that a higher
priority message is being sent and, therefore, halts transmission and
reverts to receiving mode. The highest priority message gets
through and the lower priority messages are resent at another time.
The advantage of this approach is that collisions on the network do
not destroy data and eventually all stations gain access to the
network. The problem with this approach is that the arbitration is
done on a bit by bit basis requiring all stations to hear one another
within a bit-time (actually less than a bit-time). At a 500 kbps bit-
rate, a bit-time is 2000 ns which does not allow much time for
transceiver and cable delays. The result is that CAN networks are
usually quite short and frequently less than 100 meters at higher
speeds. To increase this distance either the data rate is decreased
or additional equipment is required.

www.ccontrols.com
CAN DATA LINK LAYER CAN transmissions operate using the producer/consumer model.
When data are transmitted by a CAN device, no other devices are
addressed. Instead, the content of the message is designated by an

Tutorial
identifier field. This identifier field, which must be unique within
the network, not only provides content but the priority of the
message as well. All other CAN devices listen to the sender and
accept only those messages of interest. This filtering of the data is
accomplished using an acceptance filter which is an integral
component of the CAN controller chip. Data which fail the
acceptance criteria are rejected. Therefore, receiving devices
consume only that data of interest from the producer.

A CAN frame consists mainly of an identifier field, a control field


and a data field (Figure 2). The control field is six bits long, the
data field is zero to eight bytes long and the identifier field is 11
bits long for standard frames (CAN specification 2.0A) or 29 bits
long for extended frames (CAN specification 2.0B). Source and
Figure 2. destination node addresses have no meaning using the CAN data
An 11-bit identifier is link layer protocol.
used in standard
format.
Standard Format
Arbitration Control Data CRC Ack End of Int Bus
Field Field Field Field F. Frame Idle

S R I r
O 11 bit IDENTIFIER T D o DLC 0-8 Bytes 15 bit CRC
F R E

Bus arbitration is accomplished using a non-destructive bit-wise


arbitration scheme. It is possible that more than one device may
begin transmitting a message at the same time. Using a “wired
AND” mechanism, a dominant state (logic 0) overwrites the
recessive state (logic 1). As the various transmitters send their data
out on the bus, they simultaneously listen for the faithful
transmission of their data on a bit by bit basis until it is discovered
that someone’s dominant bit overwrote their recessive bit. This
indicates that a device with a higher priority message, one with an
identifier of lower binary value, is present and the loser of the
arbitration immediately reverts to receiving mode and completes
the reception of the message. With this approach no data are
destroyed and, therefore, throughput is enhanced. The losers
simply try again during their next opportunity. The problem with
this scheme is that all devices must assert their data within the
same bit-time and before the sampling point otherwise data will be
falsely received or even destroyed. Therefore, a timing constraint
has been introduced that impacts cabling distance.

www.ccontrols.com
PROPAGATION DELAY In a Philips’ application note2, the author does an in-depth study
on the maximum allowable propagation delay as a function of
various controller chip parameters. The propagation delay (Figure 3)
Tutorial

is due to the input/output delays of the CAN controller chip (tsd),


transmission delay of the transceiver (ttx), reception delay of the
transceiver (trx) and the signal delay due to the cable (tcbl). The
total propagation delay (tp) experienced is basically the round trip
delay from a CAN node located at the end of a cable segment
communicating to the furthest node and is expressed as follows:

tp = 2(tsd+ttx+trx+tcbl)

All delays are constant except the cable delay (tcbl) which depends
upon the length of the cable and the propagation delay factor of
the cable (Pc). The author provides a chart of maximum allowable
propagation delays (tpm) for various data rates and CAN chip
timing parameters. The actual propagation delay must not exceed
the maximum allowable propagation delay. By making the
appropriate substitutions, we can determine the maximum
allowable cable length (L).
L < 1/2tpm-tsd-trx-ttx
Pc
Using appendix A.1 of the application note and the most favorable
parameters for long distance, at 500 kbps, tpm equals 1626 ns.
Assuming transceiver delays of 100 ns each, chip delay of 62.5 ns
and a cable propagation factor of 5.5 ns/m, the maximum cable
length is 100 meters which is the value used in the DeviceNet
specification. Doing the same calculation at 250 kbps yields 248
meters and at 100 kbps, 680 meters. These values can be improved
Figure 3.
Use the longest path with better cable and faster transceivers.
when calculating
propagation delay.

CAN CAN CAN


CONTROLLER CONTROLLER CONTROLLER

tsd tsd

TRANSCEIVER TRANSCEIVER TRANSCEIVER

trx ttx tcbl trx ttx

tcbl
4

www.ccontrols.com
22.11.2017 I2C Tutor al

About Us Contact Us

  News  Reviews  Guides & Tutorials  Embedded  Previews & Unboxing  More

HOME  EMBEDDED  I2C TUTORIAL


FOLLOW US FACEBOOK

   

Search ... 

I2C Tutorial We
SUBSCRIBE VIA EMAIL Posted By Umang Gajera Posted date: April 05, 2017 in: Embedded No Comments
Bu w
Email Address Erişi

Subscribe In this tutorial we will go through I2C Bus & Protocol. I2C was originally invented by Philips(now NXP) in
1982 as bi-directional bus to communicate with multiple devices using just 2 wires/lines. I2C stands for GOOGLE+
Inter-Integrated Circuit. I2C is sometimes also referred as TWI, which is short for Two Wire Interface, since
it uses only 2 wires for data transmission and synchronization. I2C is pronounced and referred to as “I-
Squared-C” [I2C] , “I-Two-C” [I2C] and “I-I-C” [IIC]. The two wires of I2C Bus consists of:
      1. Data Line called SDA which is short for Serial Data
      2. Clock Line called SCL which is short for Serial Clock

SDA is the wire on which the actual data transfer happens, which is bi-directional, between different
masters and slaves. SCL is the wire on which the Master device generates a clock for slave device(s).

I2C supports 7 bit and 10 bit addresses for each device connected to the bus. 10 bit addressing was
introduced later. With 7 bit address its possible to connect up to 128 I2C devices to the same bus, however,
some addresses are reserved so practically only 112 devices can be connected at the same time. With 10
bit address a maximum of 1024 devices can be connected. To keep things simple we will be going through
goo
7 bit addressing in this tutorial. For 10 bit addressing you can look up the official I2C specification by NXP, a
Refuge
link to which is given at the bottom of this tutorial. Once you get familiar with the I2C protocol, 10 bit
addressing will be a piece of cake.

As per the original specification of I2C/TWI, it supports a maximum frequency of 100Khz. But along the
years the specifications was updated many times and now we have a bunch of different speed modes. The
latest mode added was Ultra-Fast Mode which allows I2C bus transfer speeds of up to 5Mhz.

I2C Speed Mode I2C Speed Communication

Standard Mode (Sm) 100 Kbit/s [Khz] Bidirectional

Fast Mode (Fm) 400 Kbit/s [Khz] Bidirectional

Fast Mode Plus (Fm+) 1 MBits/s [Mhz] Bidirectional

High-speed mode (Hs-mode) 3.4 MBits/s [Mhz] Bidirectional

Ultra Fast-mode (UFm) 5 MBits/s [Mhz] Unidirectional

I2C has 4 operating modes:

1. Master Transmitter mode : Master Writes Data to Slave


2. Master Receiver mode : Master Reads Data from Slave
3. Slave Transmitter mode : Slave Write Data to Master
4. Slave Receiver mode : Slave Reads Data from Master

 To achieve high transfer speeds Ultra-Fast Mode uses push-pull drivers instead of open-drain
which eliminates the use of pull-up resistors. Ultra-Fast Mode is unidirectional only and uses same
bus protocol but is not compatible with bi-directional I2C devices.

Even though multiple masters may be present on the I2C bus the arbitration is handled in such a way that
there is no corruption of data on bus in case when more than 2 masters try to transmit data at the same
time. Since the transmission, synchronization and arbitration is done using only 2 wires on the bus, the

communication protocol might be a bit uneasy to understand for beginners .. but its actually easy to
understand – just stick with me 🙂

http://www.ocfreaks.com/ 2c-tutor al/ 1/7


22.11.2017 I2C Tutor al
A general I2C/TWI bus topology with multiple masters and multiple slaves connected to the bus at the
same time is shown below:

Let us go through I2C protocol basics first. I2C bus is a Byte Oriented bus. Only a Byte can be transferred
at a time. Communication(Write to & Read from) is always initiated by a Master. The Master first sends a
START condition and then writes the Slave Address (SLA) and the Direction bit(Read=1/Write=0) on bus
and the corresponding Slave responds accordingly.

Format for I2C communication protocol

Depending on the Direction bit, 2 types of transfers are possible on the I2C bus:

Case 1 – Data transfer from “Master transmitter” to “Slave receiver”

1. In this case, after sending the START condition, the Master sends the First Byte which contains
the Slave address + Write bit.
2. The corresponding slave acknowledges it by sending back an Acknowledge (ACK) bit to the
Master.
3. Next, the Master sends 1 or more bytes to slave. After each byte received the Slave sends back
an Acknowledge bit (ACK).
4. When Master wants to stop writing it then sends a STOP condition.

Case 2 – Data transfer from “Slave transmitter” to “Master receiver”

1. Here the Master sends the First Byte which contains the Slave address + Read bit
2. The corresponding Slave acknowledges it by sending back an Acknowledge (ACK) bit to the
Master.
3. Next, the Slave sends 1 or more bytes and the Master acknowledges it everytime by sending an
Acknowledge bit (ACK).
4. When the Master wants to stop reading it sends a Not Acknowledge bit (NACK) followed by a
STOP condition.

Format for rst byte after START


As soon as the START condition is transmitted on the bus, the first byte (or the control byte) is transmitted.
Bits 7 to 1 contain the Slave address and Bit 0 is direction(Read/Write) bit.

http://www.ocfreaks.com/ 2c-tutor al/ 2/7


22.11.2017 I2C Tutor al
An example of timing diagram for complete data transfer
Given below, is a timing diagram for complete transfer of 3 Bytes including the first byte:

Image Source: I2C Specification

Start & Stop Conditions


All I2C transactions begin with a START (S) and are terminated by a STOP (P).
      START condition : When a HIGH to LOW transition occurs on the SDA line while SCL is HIGH.
      STOP condition : When a LOW to HIGH transition occurs on the SDA line while SCL is HIGH.

Repeated Start
A Repeat Start condition is similar to a Start condition, except it is sent in place of Stop when the master
does not want to loose the control over the bus and wants to complete its transfers in atomic manner
when multiple masters are present. When a master wants to switch to Master Receiver Mode from Master
Transmitter mode or vice-versa it sends a Repeated start at the end of the current transfer so it remains
master when next transfer starts.

 Generating the Clock pulses, STOP and START is the responsibility of the Master. When the
Master wants to change the transfer mode(i.e Read/Write) it sends a Repeated START condition
instead of a STOP condition. A transfer typically ends with a STOP or Repeated START condition.

SDA & SCL Voltage levels for di erent Voltage devices on same
bus
In many cases(but not all!), I2C supports devices having different signal voltage levels to be connected to
the same bus. Like for example interfacing 5V I2C Slave device with a 3.3V microcontroller like lpc1768,
lpc2148 or interfacing 3.3V I2C Slave device with 5V microcontroller like Arduino. In such cases we connect
the Pull-up resistors to the lower of the Vcc/Vdd. In the mentioned examples it would be 3.3V in both cases
since its the lower one. As per the I2C specification Input reference levels are set as 30 % and 70 % of Vcc.
Hence, VIL(LOW-level input voltage) is 0.3Vcc and VIH(HIGH-level input voltage) is 0.7Vcc . If these
thresholds for Input Reference Levels are met when using two or more device with different voltages you
are good to go by connecting pull ups to lowest Vcc else you will need a line buffer/driver which provides
level-shifting, between the different voltage level devices based on CMOS, NMOS, TTL, etc.

http://www.ocfreaks.com/ 2c-tutor al/ 3/7


22.11.2017 I2C Tutor al

Opendrain SDA and SCL lines


I2C uses Open-drain / Open-Collector drivers for both SDA and SCL. Consider the following image showing
basic open-drain driver for I2C:

Here the buffer is used to Receive(input) data and Mosfet is used to Transmit(output) data. Drivers for both
SDA and SCL are similar. When the Mosfet is activated it will sink the current from pull-ups resistors which
forces the pin to a Logic Low. Note that it cannot drive the line to HIGH by itself which is obvious. To
provide a logic High state when the output driver is not trying to pull the line LOW we use Pull-Up resistors.
Using pull-ups the logic state of SDA and SCL signals on the I2C bus is always defined and never
floating(digitally). Hence, when no transfers are occurring and the bus is idle, SDA and SCL are continuously
pulled to logic high.

I2C Pull-Up Resistor Values


We will go into intricacies of Pull up resistor Value selection for a particular mode in another post since its a
function of bus capacitance and Vcc/Vdd along with sinking current. For beginners it better to following the
rule of thumb: You need lower resistor values as the speed increases and Vice-versa. For simple general
purpose projects/application you can use a pull-up resistor value between 1kΩ to 10kΩ. For example when
interfacing I2C devices at 100Khz I use 10kΩ pull ups.

 Typical Range for Pull up Resistor value in Standard mode (Sm) i.e. 100Khz is between 5kΩ to
10kΩ, while that in Fast Mode (Fm) i.e. 400Khz is between 2kΩ to 5kΩ. For High Speed mode (Hs-
mode) i.e. 3.4Mhz, its around 1kΩ. Be sure to check your part manufacturer’s datasheet for more.

Clock Stretching
Clock Stretching is a mechanism for slave devices to make the master wait until data is ready or slave
device has to finish some internal operations (like: ADC conversion, Initial internal Write cycle, etc..) before
proceeding further. In Clock Stretching the SCL line is held low by the slave which pauses the current
transfer.

Acknowledge Polling
In practice, many Slave devices do not support clock stretching. Consider 24c16, at24c32, 24lc256, etc.
series of EEPROMs. These devices do not support clock stretching even though they have to perform 
internal byte write or page write operation when the master does a write operation. In this case the master
has to initiate an Acknowledge Polling (for EEPROMs its also called Write Polling ) which checks if the

http://www.ocfreaks.com/ 2c-tutor al/ 4/7


22.11.2017 I2C Tutor al
EEPROM has finished internal operation or not. When the EEPROM starts internal write cycle it won’t
respond to its address but when it completes, it responds with an ACK to the master. So, in Acknowledge
Polling we keep on sending the slave address with write bit and wait for any ACK from the Slave which
indicates Slave is ready for next operation.

Condition for Valid Data (Data Validity)


For any data bit to be Valid, the SDA line must be stable when the period of clock is HIGH. The change in
state of the SDA line(From HIGH to LOW or Vice-versa) can only happen when the SCL line is LOW. A valid
data bit is transferred for each corresponding clock pulse. This is illustrated in the timing diagram shows
below:

I2C Master Modes when using Microcontrollers / Arduino


When interfacing Microcontrollers, like LPC2148, LPC1768, LPC1114, Atmega8/16, PICs or MCU Boards like
Arduino Uno (Atmega 168/368) or Raspberry Pi, generally we use Master Transmitter & Master Receiver
mode since we interface such MCUs with Slave-Only I2C devices like EEPROMs, LCD panels, RTCs, Sensors
like digital Gyroscopes, 3 axis accelerometer, temperature sensors, etc. Such devices generally use 7 bit
addresses and commonly support Standard & Fast Speed mode i.e. they operate at frequencies from
100Khz to 400Khz.

Master Transmitter mode is summarized in the following diagram:

and Master Receiver Mode is summarized as follows:

Reference(s):
I2C Official Specification by NXP

Share this:

 Share

Tags: electronics I2C tutorial

 Like  Tweet  Share  Share  Share

 Previous Next 

Create Keil uVision5 Project for LPC2148 ARM7 LPC2148 I2C Programming Tutorial
MCU

http://www.ocfreaks.com/ 2c-tutor al/ 5/7


Introduction to SPI Interface
By Piyu Dhaker

Share on

Serial peripheral interface (SPI) is one of the most widely used interfaces Data Transmission
between microcontroller and peripheral ICs such as sensors, ADCs, DACs, To begin SPI communication, the master must send the clock signal and
shift registers, SRAM, and others. This article provides a brief description select the slave by enabling the CS signal. Usually chip select is an active
of the SPI interface followed by an introduction to Analog Devices’ SPI low signal; hence, the master must send a logic 0 on this signal to select
enabled switches and muxes, and how they help reduce the number of the slave. SPI is a full-duplex interface; both master and slave can send
digital GPIOs in system board design. data at the same time via the MOSI and MISO lines respectively. During SPI
communication, the data is simultaneously transmitted (shifted out serially
SPI is a synchronous, full duplex master-slave-based interface. The data
onto the MOSI/SDO bus) and received (the data on the bus (MISO/SDI) is
from the master or the slave is synchronized on the rising or falling clock
sampled or read in). The serial clock edge synchronizes the shifting and
edge. Both master and slave can transmit data at the same time. The SPI
sampling of the data. The SPI interface provides the user with flexibility to
interface can be either 3-wire or 4-wire. This article focuses on the popular
select the rising or falling edge of the clock to sample and/or shift the data.
4-wire SPI interface.
Please refer to the device data sheet to determine the number of data bits
transmitted using the SPI interface.
Interface

SPI CS CS SPI Clock Polarity and Clock Phase


Master Slave In SPI, the master can select the clock polarity and clock phase. The CPOL
SCLK SCLK
bit sets the polarity of the clock signal during the idle state. The idle state is
MOSI SDI defined as the period when CS is high and transitioning to low at the start
MISO SDO
of the transmission and when CS is low and transitioning to high at the
end of the transmission. The CPHA bit selects the clock phase. Depending
on the CPHA bit, the rising or falling clock edge is used to sample and/or
Figure 1. SPI configuration with master and a slave.
shift the data. The master must select the clock polarity and clock phase,
4-wire SPI devices have four signals: as per the requirement of the slave. Depending on the CPOL and CPHA bit
selection, four SPI modes are available. Table 1 shows the four SPI modes.
XX Clock (SPI CLK, SCLK)
XX Chip select (CS) Table 1. SPI Modes with CPOL and CPHA
XX Master out, slave in (MOSI) Clock
SPI CPHA Clock Phase Used to Sample
CPOL Polarity in
XX Master in, slave out (MISO) Mode
Idle State
and/or Shift the Data

The device that generates the clock signal is called the master. Data Data sampled on rising edge and
0 0 0 Logic low
transmitted between the master and the slave is synchronized to the shifted out on the falling edge
clock generated by the master. SPI devices support much higher clock Data sampled on the falling edge
frequencies compared to I2C interfaces. Users should consult the product 1 0 1 Logic low
and shifted out on the rising edge
data sheet for the clock frequency specification of the SPI interface. Data sampled on the falling edge
2 1 1 Logic high
and shifted out on the rising edge
SPI interfaces can have only one master and can have one or multiple slaves.
Figure 1 shows the SPI connection between the master and the slave. 0 Data sampled on the rising edge
3 1 Logic high
and shifted out on the falling edge
The chip select signal from the master is used to select the slave. This is
normally an active low signal and is pulled high to disconnect the slave Figure 2 through Figure 5 show an example of communication in four SPI
from the SPI bus. When multiple slaves are used, an individual chip select modes. In these examples, the data is shown on the MOSI and MISO line.
signal for each slave is required from the master. In this article, the chip The start and end of transmission is indicated by the dotted green line, the
select signal is always an active low signal. sampling edge is indicated in orange, and the shifting edge is indicated
in blue. Please note these figures are for illustration purpose only. For
MOSI and MISO are the data lines. MOSI transmits data from the master to successful SPI communications, users must refer to the product data
the slave and MISO transmits data from the slave to the master. sheet and ensure that the timing specifications for the part are met.

Analog Dialogue 52-09, September 2018 analogdialogue.com 1


nCS

CLK

MOSI
xxxx 1 0 1 0 0 1 0 1 xxxx
0xA5

MISO
Hi-Z 1 0 1 1 1 0 1 0 Hi-Z
0xBA

Figure 2. SPI Mode 0, CPOL = 0, CPHA = 0: CLK idle state = low, data sampled on rising edge and shifted on falling edge.

nCS

CLK

MOSI xxxx 1 0 1 0 0 1 0 1 xxxx

MISO Hi-Z Hi-Z


1 0 1 1 1 0 1 0

Figure 3. SPI Mode 1, CPOL = 0, CPHA = 1: CLK idle state = low, data sampled on the falling edge and shifted on the rising edge.

nCS

CLK

MOSI xxxx xxxx


0xA5 1 0 1 0 0 1 0 1

MISO
Hi-Z Hi-Z
0xBA 1 0 1 1 1 0 1 0

Figure 4. SPI Mode 2, CPOL = 1, CPHA = 1: CLK idle state = high, data sampled on the falling edge and shifted on the rising edge.

nCS

CLK

MOSI
xxxx xxxx
0xA5 1 0 1 0 0 1 0 1

MISO
Hi-Z Hi-Z
0xBA 1 0 1 1 1 0 1 0

Figure 5. SPI Mode 3, CPOL = 1, CPHA = 0: CLK idle state = high, data sampled on the rising edge and shifted on the falling edge.

Figure 3 shows the timing diagram for SPI Mode 1. In this mode, clock polar- Figure 5 shows the timing diagram for SPI Mode 3. In this mode, the clock
ity is 0, which indicates that the idle state of the clock signal is low. The clock polarity is 1, which indicates that the idle state of the clock signal is high.
phase in this mode is 1, which indicates that the data is sampled on the The clock phase in this mode is 0, which indicates that the data is sampled
falling edge (shown by the orange dotted line) and the data is shifted on the on the rising edge (shown by the orange dotted line) and the data is shifted
rising edge (shown by the dotted blue line) of the clock signal. on the falling edge (shown by the dotted blue line) of the clock signal.

Figure 4 shows the timing diagram for SPI Mode 2. In this mode, the clock Multislave Configuration
polarity is 1, which indicates that the idle state of the clock signal is high. The
clock phase in this mode is 1, which indicates that the data is sampled on Multiple slaves can be used with a single SPI master. The slaves can be
the falling edge (shown by the orange dotted line) and the data is shifted on connected in regular mode or daisy-chain mode.
the rising edge (shown by the dotted blue line) of the clock signal.

2 Analog Dialogue 52-09, September 2018


SPI CS3
Master CS2

CS1 CS CS CS

SCLK SPI SCLK SPI SCLK SPI


Slave Slave Slave
SDI SDI SDI
SDO SDO SDO
ADGS1412 ADGS1412 ADGS1412

SCLK
MOSI
MISO

Figure 6. Multislave SPI configuration.

Regular SPI Mode: In daisy-chain mode, the slaves are configured such that the chip select
In regular mode, an individual chip select for each slave is required from signal for all slaves is tied together and data propagates from one slave to
the master. Once the chip select signal is enabled (pulled low) by the the next. In this configuration, all slaves receive the same SPI clock at the
master, the clock and data on the MOSI/MISO lines are available for the same time. The data from the master is directly connected to the first slave
selected slave. If multiple chip select signals are enabled, the data on the and that slave provides data to the next slave and so on.
MISO line is corrupted, as there is no way for the master to identify which
In this method, as data is propagated from one slave to the next, the
slave is transmitting the data.
number of clock cycles required to transmit data is proportional to the
As can be seen from Figure 6, as the number of slaves increases, the slave position in the daisy chain. For example, in Figure 7, in an 8-bit
number of chip select lines from the master increases. This can quickly system, 24 clock pulses are required for the data to be available on the 3rd
add to the number of inputs and outputs needed from the master and limit slave, compared to only eight clock pulses in regular SPI mode. Figure 8
the number of slaves that can be used. There are different techniques shows the clock cycles and data propagating through the daisy chain.
that can be used to increase the number of slaves in regular mode; for Daisy-chain mode is not necessarily supported by all SPI devices. Please
example, using a mux to generate a chip select signal. refer to the product data sheet to confirm if daisy chain is available.

Daisy-Chain Method: CLK


Eight Eight Eight
Clocks Clocks Clocks
SPI
CS CS SPI
Master
Slave SDIN1 0xA5 0x5A 0x0A
SCLK SCLK

MOSI SDI

MISO SDOUT1
SDO X 0xA5 0x5A
SDIN2

SDOUT2
SDI X X 0xA5
SDIN3
CS
SPI
SCLK Slave
Figure 8. Daisy-chain configuration: data propagation.
SDO
Analog Devices SPI Enabled Switches and Muxes
The newest generation of ADI SPI enabled switches offer significant space
saving without compromise to the precision switch performance. This
section of the article discusses a case study of how SPI enabled switches
SDI
or muxes can significantly simplify the system-level design and reduce the
CS
SPI
number of GPIOs required.
SCLK Slave
The ADG1412 is a quad, single-pole, single-throw (SPST) switch, which
SDO requires four GPIOs connected to the control input of each switch. Figure 9
shows the connection between the microcontroller and one ADG1412.
Figure 7. Multislave SPI daisy-chain configuration.

Analog Dialogue 52-09, September 2018 3


configuration, four ADG1412s are used. This system would require 16 GPIOs,
Micro- limiting the available GPIOs in a standard microcontroller. Figure 10 shows
controller
the connection of four ADG1412s using the 16 GPIOs of the microcontroller.

One approach to reduce the number of GPIOs is to use a serial-to-parallel


converter, as shown in Figure 11. This device outputs parallel signals
GPIOs that can be connected to the switch control inputs and the device can
SPI Master be configured by serial interface SPI. The drawback of this method is an
increase in the bill of material by introducing an additional component.

An alternative method is to use SPI controlled switches. This method


ADG1412 provides the benefit of reducing the number of GPIOs required and also
Inputs Outputs
eliminates the overhead of additional serial-to-parallel converter. As shown
SPI in Figure 12, instead of 16 microcontroller GPIOS, only seven microcontroller
Slave GPIOs are needed to provide the SPI signals to the four ADGS1412s.
Figure 9. Microcontroller GPIO as control signals for the switch.
The switches can be configured in daisy-chain configuration to further
As the number of switches on the board increases, the number of required optimize the GPIO count. In daisy-chain configuration, irrespective of the
GPIOs increases significantly. For example, when designing a test instru- number of switches used in the system, only four GPIOs are used from
mentation system and a large number of switches are used to increase the master (microcontroller).
the number of channels in the system. In a 4 × 4 cross-point matrix

Micro-
controller
GPIOs

SPI Master

ADG1412 ADG1412 ADG1412 ADG1412

SPI SPI SPI SPI


Slave Slave Slave Slave

Figure 10. In a multislave configuration, the number of GPIOs needed increases tremendously.

Micro-
controller

CS Serial to
CLK Parallel
MOSI Converter
MISO

SPI Master

ADG1412 ADG1412 ADG1412 ADG1412

SPI SPI SPI SPI


Slave Slave Slave Slave

Figure 11. Multislave switches using a serial-to-parallel converter.

4 Analog Dialogue 52-09, September 2018


CS4
SPI CS3
Master CS2

CS CS CS
CS1 CS

SCLK SPI SCLK SPI SCLK SPI SCLK SPI


Slave Slave Slave Slave
SDI SDI SDI SDI
Microcontroller
SDO SDO SDO SDO
ADGS1412 ADGS1412 ADGS1412 ADGS1412

SCLK

MOSI
MISO

Figure 12. SPI enabled switches save up microcontroller GPIOs.

Figure 13 is for illustration purposes. The ADGS1412 data sheet recom-


SPI
CS CS SPI mends a pull-up resistor on the SDO pin. Please refer to the ADGS1412
Master
Slave data sheet for further details on daisy-chain mode. For the sake of sim-
SCLK SCLK
plicity, four switches have been used in this example. As the number of
MOSI SDI
ADGS1412
switches increase in a system, the benefits of board simplicity and space
MISO SDO saving is significant. The ADI SPI enabled switches provide a 20% overall
Microcontroller board space reduction in a 4 × 8 crosspoint configuration with eight quad
SPST switches on a 6-layer board. The article “Precision SPI Switch Con-
figuration Increases Channel Density” provides detail on how precision SPI
SDI switch configuration increases channel density.
CS
SPI
SCLK
Slave Analog Devices offers several SPI enabled switches and multiplexers. For
ADGS1412 more information visit here.
SDO
References
ADuCM3029 data sheet. Analog Devices, Inc., March 2017.
SDI
CS Nugent, Stephen. “Precision SPI Switch Configuration Increases Channel
SPI
Slave
Density.” Analog Dialogue, May 2017.
SCLK
ADGS1412 Usach, Miguel. AN-1248 Application Note: SPI Interface.
SDO Analog Devices, Inc., September 2015.

SDI
CS
SPI
Slave
SCLK
ADGS1412
SDO

Figure 13. SPI enabled switches configured in a daisy chain to further


optimize the GPIOs.

Piyu Dhaker
Piyu Dhaker [piyu.dhaker@analog.com] is an applications engineer in the
North America Central Applications Group of Analog Devices. She graduated
from San Jose State University in 2007 with a master’s degree in electrical
engineering. Piyu joined the North America Central Applications Group in June
2017. She also previously worked in the Automotive Power Train Group and
Power Management Group within ADI.

Analog Dialogue 52-09, September 2018 5


DSP Data Path: Arithmetic Note of Caution on DSP Architectures DSP vs. General Purpose MPU

Successful DSP architectures have two aspects: The “MIPS/MFLOPS” of DSPs is speed of Multiply-Accumulate
DSPs dealing with numbers representing real world (MAC).
● Key architectural and micro-architectural features
● DSP are judged by whether they can keep the multipliers
=> Want “reals”/ fractions that enabled product success in key parameters
busy 100% of the time.
DSPs dealing with numbers for addresses ● Speed
The "SPEC" of DSPs is 4 algorithms:
● Code density
=> Want integers
● Inifinite Impule Response (IIR) filters
● Low power
Support “fixed point” as well as integers ● Finite Impule Response (FIR) filters
● Architectural and micro-architectural features that
● FFT, and
-1 Š x < 1 are artifacts of the era in which they were designed
S . ● convolvers
radix In DSPs, algorithms are king!
point • We will focus on the former!
● Binary compatability not an issue
Software is not (yet) king in DSPs.
S –2N–1 Š x < 2N–1

.
radix
People still write in assembly language for a product to
minimize the die area for ROM in the DSP chip.
point 29 27
Kurt Keutzer Kurt Keutzer Kurt Keutzer

DSP Data Path: Precision TYPES OF DSP PROCESSORS


Architectural Features of DSPs
Data path configured for DSP
● Fixed-point arithmetic DSP Multiprocessors on a die
Word size affects precision of fixed point numbers ● MAC- Multiply-accumulate ● TMS320C80
DSPs have 16-bit, 20-bit, or 24-bit data words Multiple memory banks and buses - ● TMS320C6000
● Harvard Architecture
Floating Point DSPs cost 2X - 4X vs. fixed point, slower than fixed 32-BIT FLOATING POINT
● Multiple data memories
point ● TI TMS320C4X
Specialized addressing modes
DSP programmers will scale values inside code ● MOTOROLA 96000
● Bit-reversed addressing
● AT&T DSP32C
● SW Libraries ● Circular buffers
● ANALOG DEVICES ADSP21000
● Separate explicit exponent Specialized instruction set and execution control
● Zero-overhead loops 16-BIT FIXED POINT
“Blocked Floating Point” single exponent for a group of fractions
● Support for MAC ● TI TMS320C2X
Floating point support simplify development Specialized peripherals for DSP ● MOTOROLA 56000
THE ULTIMATE IN BENCHMARK DRIVEN ARCHITECTURE DESIGN!!! ● AT&T DSP16
● ANALOG DEVICES ADSP2100
30 28
Kurt Keutzer Kurt Keutzer Kurt Keutzer
Data Path DSP Data Path: Accumulator DSP Data Path: Overflow?

DSP Processor General-Purpose Processor Don’t want overflow or have to scale accumulator
DSP are descended from analog :
Option 1: accumalator wider than product:
Multiplies often take>1 cycle what should happen to output when “peg” an input?
Specialized hardware performs “guard bits”
(e.g., turn up volume control knob on stereo)
all key arithmetic operations in Shifts often take >1 cycle ● Motorola DSP:
1 cycle. ● Modulo Arithmetic???
Other operations (e.g., 24b x 24b => 48b product, 56b Accumulator
Hardware support for saturation, rounding) typically Option 2: shift right and round product before adder Set to most positive (2N–1–1) or
managing numeric fidelity: take multiple cycles. most negative value(–2N–1) : “saturation”
● Shifters Multiplier
Multiplier Many algorithms were developed in this model
● Guard bits
● Saturation Shift

ALU ALU

Accumulator G Accumulator
35 33
Kurt Keutzer Kurt Keutzer Kurt Keutzer

320C54x DSP Functional Block Diagram DSP Data Path: Rounding DSP Data Path: Multiplier

Even with guard bits, will need to round when store


accumulator into memory Specialized hardware performs all key arithmetic
3 DSP standard options operations in 1 cycle

Truncation: chop results 50% of instructions can involve multiplier


=> biases results up => single cycle latency multiplier

Round to nearest: Need to perform multiply-accumulate (MAC)


< 1/2 round down, 1/2 round up (more positive) n-bit multiplier => 2n-bit product
=> smaller bias
Convergent:
< 1/2 round down, > 1/2 round up (more positive), = 1/2
round to make lsb a zero (+1 if 1, +0 if 0)
=> no bias
36 IEEE 754 calls this round to nearest even 34
Kurt Keutzer Kurt Keutzer Kurt Keutzer
MAC Eg. - 320C54x DSP Functional Block Diagram FIR Filtering:
Micro-architectural impact - MAC A Motivating Problem

N−1 M most recent samples in the delay line (Xi)


element of finite-impulse
y(n)= åh(m)x(n−m) response filter computation New sample moves data down delay line
0 X Y “Tap” is a multiply-add
Each tap (M+1 taps total) nominally requires:
● Two data fetches
MPY
● Multiply
● Accumulate
● Memory write-back to update delay line
ADD/SUB Goal: 1 FIR Tap / DSP instruction cycle

ACC REG
41 39
Kurt Keutzer Kurt Keutzer Kurt Keutzer

DSP Memory Mapping of the filter onto a DSP execution unit BENCHMARKS - FIR FILTER
FIR Tap implies multiple memory accesses
DSPs want multiple data ports 4 6
1 3 FINITE-IMPULSE RESPONSE FILTER
5
Some DSPs have ad hoc techniques to reduce memory Xn X Σ Yn 1 2
2 6 Z −1
bandwdith demand Z −1 .... Z −1
X D
β αY
n-1
● Instruction repeat buffer: do 1 instruction 256 times 4
α CN
● Often disables interrupts, thereby increasing interrupt C1 C2 C N −1
response time
5 D
3
Some recent DSPs have instruction caches
● Even then may allow programmer to “lock in” The critical hardware unit in a DSP is the multiplier - much of the
instructions into cache architecture is organized around allowing use of the multiplier
● Option to turn cache into fast program memory on every cycle
No DSPs have data caches This means providing two operands on every cycle, through
multiple data and address busses, multiple address units and
May have multiple data memories
42 40
Kurt Keutzer Kurt Keutzerlocal accumulator feedback Kurt Keutzer
Eg. 320C62x/67x DSP Memory Architecture Conventional ``Von Neumann’’ memor

DSP Processor General-Purpose Processor

Harvard architecture Von Neumann architecture

2-4 memory accesses/cycle Typically 1 access/cycle

No caches-on-chip SRAM May use caches

Program
Memory
Processor Processor Memory
Data
Memory

47 45
Kurt Keutzer Kurt Keutzer Kurt Keutzer

DSP Addressing Eg. TMS320C3x MEMORY BLOCK DIAGRAM - Harvard Architecture HARVARD ARCHITECTURE in DSP

Have standard addressing modes: immediate, displacement,


register indirect
PROGRAM
X MEMORY Y MEMORY
Want to keep MAC datapth busy MEMORY
Assumption: any extra instructions imply clock cycles of GLOBAL
overhead in inner loop
P DATA
=> complex addressing is good
=> don’t use datapath to calculate fancy address X DATA

Autoincrement/Autodecrement register indirect Y DATA


● lw r1,0(r2)+ => r1 <- M[r2]; r2<-r2+1
● Option to do it before addressing, positive or negative

48 46
Kurt Keutzer Kurt Keutzer Kurt Keutzer
Addressing DSP Addressing: Buffers DSP Addressing: FFT
FFTs start or end with data in weird bufferfly order
DSP Processor General-Purpose Processor DSPs dealing with continuous I/O
0 (000) => 0 (000)
•Dedicated address generation •Often, no separate address Often interact with an I/O buffer (delay lines) 1 (001) => 4 (100)
units generation unit
To save memory, buffer often organized as circular buffer 2 (010) => 2 (010)
•Specialized addressing •General-purpose addressing
What can do to avoid overhead of address checking 3 (011) => 6 (110)
modes; e.g.: modes
instructions for circular buffer? 4 (100) => 1 (001)
● Autoincrement
5 (101) => 5 (101)
● Modulo (circular) Option 1: Keep start register and end register per address
6 (110) => 3 (011)
● Bit-reversed (for FFT) register for use with autoincrement addressing, reset to
7 (111) => 7 (111)
•Good immediate data support start when reach end of buffer
What can do to avoid overhead of address checking instructions
Option 2: Keep a buffer length register, assuming buffers
Have an optional “bit reverse” address addressing mode for use
starts on aligned address, reset to start when reach end
autoincrement addressing
Every DSP has “modulo” or “circular” addressing Many DSPs have “bit reverse” addressing for radix-2 FFT

53 51
Kurt Keutzer Kurt Keutzer Kurt Keutzer

Address calculation unit for DSP CIRCULAR BUFFERS BIT REVERSED ADDRESSING
000 x(0) F(0)

100 x(4) F(1)


Instructions accomodate three
Supports modulo and bit elements: 010 x(2) F(2)

reversal arithmetic • buffer address


110 x(6) F(3)

Often duplicated to calculate • buffer size


001 x(1) F(4)
multiple addresses per cycle • increment
Allows for cyling through: 101 x(5) F(5)

• delay elements 011 x(3) F(6)

• coefficients in data memory


111 x(7) F(7)

Four 2-point Two 4-point One 8-point DFT


DFTs DFTs

Data flow in the radix-2 decimation-in-time FFT algorit


54 52
Kurt Keutzer Kurt Keutzer Kurt Keutzer
Specialized peripherals Instruction Set DSP Instructions and Execution
DSP Processor General-Purpose Processor
May specify multiple operations in a single instruction
Specialized, complex Must support Multiply-Accumulate (MAC)
instructions General-purpose instructions
Multiple operations per Need parallel move support
Typically only one operation
instruction
per instruction Usually have special loop support to reduce branch overhe
● Loop an instruction or sequence
mac x0,y0,a x: (r0) + ,x0 y: (r4) + ,y0 mov *r0,x0
● 0 value in register usually means loop maximum num
mov *r1,y0
mpy x0, y0, a times
add a, b
● Must be sure if calculate loop count that 0 does not
mov y0, *r2
inc r0 May have saturating shift left arithmetic
inc rl
May have conditional execution to reduce branches

59 57
Kurt Keutzer Kurt Keutzer Kurt Keutzer

TMS320C203/LC203 BLOCK DIAGRAM DSP Core Approach - 1995 Specialized Peripherals for DSPs ADSP 2100: ZERO-OVERHEAD LOOP

•Synchronous serial ports •Host ports DO <addr> UNTIL condition”

•Parallel ports •Bit I/O ports


•Timers •On-chip DMA controller DO X ... Address Generation
•On-chip A/D, D/A •Clock generators PCS = PC + 1
converters if (PC = x && ! condition
PC = PCS
else
• On-chip peripherals often designed for PC = PC +1
X
“background” operation, even when core is
powered down.
• Eliminates a few instructions in loops -
• Important in loops with small bodies

60 58
Kurt Keutzer Kurt Keutzer Kurt Keutzer

S-ar putea să vă placă și