Documente Academic
Documente Profesional
Documente Cultură
ARM
Advanced RISC Machines
• FIQ (entered when a high priority (fast) interrupt is raised)
• IRQ (entered when a low priority (normal) interrupt is raised)
• Supervisor (entered on reset and when a Software Interrupt instruction is
executed)
• Abort (used to handle memory access violations)
The ARM Instruction Set • Undef (used to handle undefined instructions)
* ARM Architecture Version 4 adds a seventh mode:
• System (privileged mode using the same registers as user mode)
The ARM Instruction Set - ARM University Program - V1.0 1 The ARM Instruction Set - ARM University Program - V1.0 2
bank being governed by the processor mode. Each mode can access r9
r10
r9_fiq
r10_fiq
r9
r10
r9
r10
r9
r10
r9
r10
• a particular r13 (the stack pointer) and r14 (link register) r13 (sp)
r14 (lr)
r13_fiq
r14_fiq
r13_svc
r14_svc
r13_abt
r14_abt
r13_irq
r14_irq
r13_undef
r14_undef
• r15 (the program counter) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc)
The ARM Instruction Set - ARM University Program - V1.0 3 The ARM Instruction Set - ARM University Program - V1.0 4
The ARM Instruction Set - ARM University Program - V1.0 5 The ARM Instruction Set - ARM University Program - V1.0 6
The Program Status Registers
Condition Flags
(CPSR and SPSRs)
Logical Instruction Arithmetic Instruction
31 28 8 4 0
Flag
N Z CV I F T Mode
Negative No meaning Bit 31 of the result has been set
(N=‘1’) Indicates a negative number in
Copies of the ALU status flags (latched if the signed operations
instruction has the "S" bit set).
Zero Result is all zeroes Result of operation was zero
* Condition Code Flags * Interrupt Disable bits. (Z=‘1’)
N = Negative result from ALU flag. I = 1, disables the IRQ.
Z = Zero result from ALU flag. F = 1, disables the FIQ. Carry After Shift operation Result was greater than 32 bits
C = ALU operation Carried out (C=‘1’) ‘1’ was left in carry flag
V = ALU operation oVerflowed * T Bit (Architecture v4T only)
T = 0, Processor in ARM state oVerflow No meaning Result was greater than 31 bits
T = 1, Processor in Thumb state (V=‘1’) Indicates a possible corruption of
* Mode Bits the sign bit in signed
M[4:0] define the processor mode. numbers
The ARM Instruction Set - ARM University Program - V1.0 7 The ARM Instruction Set - ARM University Program - V1.0 8
Exception Handling
The Program Counter (R15)
and the Vector Table
* When the processor is executing in ARM state:
• All instructions are 32 bits in length * When an exception occurs, the core: Reset
0x00000000
• All instructions must be word aligned • Copies CPSR into SPSR_<mode>
Undefined Instruction
0x00000004
• Therefore the PC value is stored in bits [31:2] with bits [1:0] equal to • Sets appropriate CPSR bits
Software Interrupt
0x00000008
zero (as instruction cannot be halfword or byte aligned). If core implements ARM Architecture 4T and is
0x0000000C Prefetch Abort
currently in Thumb state, then
* R14 is used as the subroutine link register (LR) and stores the return 0x00000010 Data Abort
ARM state is entered.
address when Branch with Link operations are performed, Reserved
Mode field bits 0x00000014
calculated from the PC.
0x00000018 IRQ
* Thus to return from a linked branch Interrupt disable flags if appropriate.
• MOV r15,r14 • Maps in appropriate banked registers 0x0000001C FIQ
The ARM Instruction Set - ARM University Program - V1.0 11 The ARM Instruction Set - ARM University Program - V1.0 12
ARM Instruction Set Format
Conditional Execution
31 2827 1615 87 0 Instruction type
Cond 0 0 I Opcode S Rn Rd Operand2 Data processing / PSR Transfer * Most instruction sets only allow branches to be executed conditionally.
Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
* However by reusing the condition evaluation hardware, ARM effectively
Cond 0 0 0 0 1 U A S RdHi RdLo Rs 1 0 0 1 Rm Long Multiply (v3M / v4 only) increases number of instructions.
Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
• All instructions contain a condition field which determines whether the
Cond 0 1 I P U B W L Rn Rd Offset Load/Store Byte/Word CPU will execute them.
Cond 1 0 0 P U S W L Rn Register List Load/Store Multiple • Non-executed instructions soak up 1 cycle.
Cond 0 0 0 P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Halfword transfer : Immediate offset (v4 only)
– Still have to complete cycle so as to allow fetching and decoding of
Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword transfer: Register offset (v4 only)
following instructions.
Cond 1 0 1 L Offset Branch
* This removes the need for many branches, which stall the pipeline (3
Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange (v4T only) cycles to refill).
Cond 1 1 0 P U N W L Rn CRd CPNum Offset Coprocessor data transfer
• Allows very dense in-line code, without branches.
Cond 1 1 1 0 Op1 CRn CRd CPNum Op2 0 CRm Coprocessor data operation
• The Time penalty of not executing several conditional instructions is
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op2 1 CRm Coprocessor register transfer frequently less than overhead of the branch
Cond 1 1 1 1 SWI Number Software interrupt or subroutine call that would otherwise be needed.
The ARM Instruction Set - ARM University Program - V1.0 13 The ARM Instruction Set - ARM University Program - V1.0 14
The ARM Instruction Set - ARM University Program - V1.0 17 The ARM Instruction Set - ARM University Program - V1.0 18
Data processing Instructions Arithmetic Operations
* Largest family of ARM instructions, all sharing the same instruction * Operations are:
format. • ADD operand1 + operand2
* Contains: • ADC operand1 + operand2 + carry
• Arithmetic operations • SUB operand1 - operand2
• Comparisons (no results - just set condition codes) • SBC operand1 - operand2 + carry -1
• Logical operations • RSB operand2 - operand1
• Data movement between registers • RSC operand2 - operand1 + carry - 1
* Remember, this is a load / store architecture * Syntax:
• These instruction only work on registers, NOT memory. • <Operation>{<cond>}{S} Rd, Rn, Operand2
* They each perform a specific operation on one or two operands. * Examples
• First operand always a register - Rn • ADD r0, r1, r2
• Second operand sent to the ALU via barrel shifter. • SUBGT r3, r3, #1
* We will examine the barrel shifter shortly. • RSBLES r4, r5, #5
The ARM Instruction Set - ARM University Program - V1.0 19 The ARM Instruction Set - ARM University Program - V1.0 20
The ARM Instruction Set - ARM University Program - V1.0 21 The ARM Instruction Set - ARM University Program - V1.0 22
The ARM Instruction Set - ARM University Program - V1.0 23 The ARM Instruction Set - ARM University Program - V1.0 24
Quiz #2 - Sample Solutions The Barrel Shifter
“Normal” Assembler * The ARM doesn’t have actual shift instructions.
The ARM Instruction Set - ARM University Program - V1.0 25 The ARM Instruction Set - ARM University Program - V1.0 26
The ARM Instruction Set - ARM University Program - V1.0 27 The ARM Instruction Set - ARM University Program - V1.0 28
The ARM Instruction Set - ARM University Program - V1.0 31 The ARM Instruction Set - ARM University Program - V1.0 32
The ARM Instruction Set - ARM University Program - V1.0 35 The ARM Instruction Set - ARM University Program - V1.0 36
Multiplication Implementation Extended Multiply Instructions
* The ARM makes use of Booth’s Algorithm to perform integer * M variants of ARM cores contain extended multiplication
multiplication. hardware. This provides three enhancements:
* On non-M ARMs this operates on 2 bits of Rs at a time. • An 8 bit Booth’s Algorithm is used
• For each pair of bits this takes 1 cycle (plus 1 cycle to start with). – Multiplication is carried out faster (maximum for standard
• However when there are no more 1’s left in Rs, the multiplication will instructions is now 5 cycles).
early-terminate. • Early termination method improved so that now completes
* Example: Multiply 18 and -1 : Rd = Rm * Rs multiplication when all remaining bit sets contain
– all zeroes (as with non-M ARMs), or
Rm 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18 Rs – all ones.
Rs -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 Rm Thus the previous example would early terminate in 2 cycles in
both cases.
17 cycles 4 cycles • 64 bit results can now be produced from two 32bit operands
– Higher accuracy.
* Note: Compiler does not use early termination criteria to
decide on which order to place operands. – Pair of registers used to store result.
The ARM Instruction Set - ARM University Program - V1.0 37 The ARM Instruction Set - ARM University Program - V1.0 38
Multiply-Long and
Quiz #3
Multiply-Accumulate Long
1. Specify instructions which will implement the following:
* Instructions are a) r0 = 16 b) r1 = r0 * 4
• MULL which gives RdHi,RdLo:=Rm*Rs c) r0 = r1 / 16 ( r1 signed 2's comp.) d) r1 = r2 * 7
• MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
* However the full 64 bit of the result now matter (lower precision
multiply instructions simply throws top 32bits away) 2. What will the following instructions do?
• Need to specify whether operands are signed or unsigned a) ADDS r0, r1, r1, LSL #2 b) RSB r2, r1, #0
* Therefore syntax of new instructions are:
• UMULL{<cond>}{S} RdLo,RdHi,Rm,Rs 3. What does the following instruction sequence do?
• UMLAL{<cond>}{S} RdLo,RdHi,Rm,Rs ADD r0, r1, r1, LSL #1
• SMULL{<cond>}{S} RdLo, RdHi, Rm, Rs SUB r0, r0, r1, LSL #4
• SMLAL{<cond>}{S} RdLo, RdHi, Rm, Rs ADD r0, r0, r1, LSL #7
* Not generated by the compiler.
Warning : Unpredictable on non-M ARMs.
The ARM Instruction Set - ARM University Program - V1.0 39 The ARM Instruction Set - ARM University Program - V1.0 40
The ARM Instruction Set - ARM University Program - V1.0 43 The ARM Instruction Set - ARM University Program - V1.0 44
Load and Store Word or Byte: Load and Store Word or Byte:
Pre-indexed Addressing Post-indexed Addressing
r0 Memory
* Example: STR r0, [r1,#12] Memory Source * Example: STR r0, [r1], #12
0x5 Register
for STR r0
Updated r1 Offset Source
Offset
Base 0x20c 12 0x20c
0x5 Register
12 0x20c 0x5 Register for STR
r1
Base
Register 0x200 0x200 0x200 0x5
r1
Original
Base 0x200
Register
* To auto-increment the base register to location 0x1f4 instead use:
* To store to location 0x1f4 instead use: STR r0, [r1,#-12] • STR r0, [r1], #-12
* To auto-increment base pointer to 0x20c use: STR r0, [r1, #12]! * If r2 contains 3, auto-incremenet base register to 0x20c by multiplying
this by 4:
* If r2 contains 3, access 0x20c by multiplying this by 4:
• STR r0, [r1], r2, LSL #2
• STR r0, [r1, r2, LSL #2]
The ARM Instruction Set - ARM University Program - V1.0 45 The ARM Instruction Set - ARM University Program - V1.0 46
privilege. Pointer to 2 8
start of array
* If we want to step through every 1 4
• Normally used by an exception handler that is emulating a memory
access instruction that would normally execute in user mode. element of the array, for instance r0 0 0
to produce sum of elements in the
array, then we can use post-indexed addressing within a loop:
• r1 is address of current element (initially equal to r0).
• LDR r2, [r1], #4
Use a further register to store the address of final element,
so that the loop can be correctly terminated.
The ARM Instruction Set - ARM University Program - V1.0 47 The ARM Instruction Set - ARM University Program - V1.0 48
Offsets for Halfword and
Effect of endianess
Signed Halfword / Byte Access
* The Load and Store Halfword and Load Signed Byte or Halfword * The ARM can be set up to access its data in either little or big
instructions can make use of pre- and post-indexed addressing in much endian format.
the same way as the basic load and store instructions.
* Little endian:
* However the actual offset formats are more constrained:
• Least significant byte of a word is stored in bits 0-7 of an addressed
• The immediate value is limited to 8 bits (rather than 12 bits) giving an word.
offset of 0-255 bytes.
* Big endian:
• The register form cannot have a shift applied to it.
• Least significant byte of a word is stored in bits 24-31 of an
addressed word.
* This has no real relevance unless data is stored as words and then
accessed in smaller sized quantities (halfwords or bytes).
• Which byte / halfword is accessed will depend on the endianess of
the system involved.
The ARM Instruction Set - ARM University Program - V1.0 49 The ARM Instruction Set - ARM University Program - V1.0 50
{
r1 = 0x100 11 22 33 44 44 33 22 11 r1 = 0x100
n elements
Little-endian LDRB r2, [r1] Big-endian x+1
x
31 24 23 16 15 87 0 31 24 23 16 15 87 0
00 00 00 44 00 00 00 11
r0 0
r2 = 0x44 r2 = 0x11
The ARM Instruction Set - ARM University Program - V1.0 51 The ARM Instruction Set - ARM University Program - V1.0 52
; next element
; on exit sum contained in r1 Condition field Base register Each bit corresponds to a particular
Up/Down bit register. For example:
Load/Store bit • Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base 0 = Store to memory • Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base 1 = Load from memory
At least one register must be
Pre/Post indexing bit Write- back bit transferred as the list cannot be empty.
0 = Post; add offset after transfer, 0 = no write-back
1 = Pre ; add offset before transfer 1 = write address into base
PSR and force user bit
0 = don’t load PSR or force user mode
1 = load PSR or force user mode
The ARM Instruction Set - ARM University Program - V1.0 53 The ARM Instruction Set - ARM University Program - V1.0 54
Block Data Transfer (2) Stacks
* Base register used to determine where memory access should occur. * A stack is an area of memory which grows as new data is “pushed” onto
• 4 different addressing modes allow increment and decrement inclusive or the “top” of it, and shrinks as data is “popped” off the top.
exclusive of the base register location. * Two pointers define the current limits of the stack.
• Base register can be optionally updated following the transfer (by • A base pointer
appending it with an ‘!’. – used to point to the “bottom” of the stack (the first location).
• Lowest register number is always transferred to/from lowest memory • A stack pointer
location accessed.
– used to point the current “top” of the stack.
* These instructions are very efficient for
PUSH
• Saving and restoring context {1,2,3} POP
– For this useful to view memory as a stack.
SP 3 Result of
• Moving large blocks of data around memory 2 SP 2 pop = 3
– For this useful to directly represent functionality of the instructions. 1 1
SP
BASE BASE
BASE
The ARM Instruction Set - ARM University Program - V1.0 55 The ARM Instruction Set - ARM University Program - V1.0 56
Direct functionality of
Stacks and Subroutines
Block Data Transfer
* One use of stacks is to create temporary register workspace for * When LDM / STM are not being used to implement stacks, it is clearer to
subroutines. Any registers that are needed can be pushed onto the stack specify exactly what functionality of the instruction is:
at the start of the subroutine and popped off again at the end so as to • i.e. specify whether to increment / decrement the base pointer, before or
restore them before return to the caller : after the memory access.
STMFD sp!,{r0-r12, lr} ; stack all registers
* In order to do this, LDM / STM support a further syntax in addition to
........ ; and the return address
the stack one:
........
• STMIA / LDMIA : Increment After
LDMFD sp!,{r0-r12, pc} ; load all the registers
; and return automatically • STMIB / LDMIB : Increment Before
* See the chapter on the ARM Procedure Call Standard in the SDT • STMDA / LDMDA : Decrement After
Reference Manual for further details of register usage within • STMDB / LDMDB : Decrement Before
subroutines.
* If the pop instruction also had the ‘S’ bit set (using ‘^’) then the transfer
of the PC when in a priviledged mode would also cause the SPSR to be
copied into the CPSR (see exception handling module).
The ARM Instruction Set - ARM University Program - V1.0 59 The ARM Instruction Set - ARM University Program - V1.0 60
Example: Block Copy Quiz #5
• Copy a block of memory, which is an exact multiple of 12 words long * The contents of registers r0 to r6 need to be swapped around thus:
from the location pointed to by r12 to the location pointed to by r13. r14 • r0 moved into r3
points to the end of block to be copied. • r1 moved into r4
• r2 moved into r6
; r12 points to the start of the source data
; r14 points to the end of the source data • r3 moved into r5
; r13 points to the start of the destination data • r4 moved into r0
r13
loop LDMIA r12!, {r0-r11} ; load 48 bytes • r5 moved into r1
STMIA r13!, {r0-r11} ; and store them r14 Increasing • r6 moved into r2
CMP r12, r14 ; check for the end Memory
* Write a segment of code that uses full descending stack operations to
BNE loop ; and loop until done carry this out, and hence requires no use of any other registers for
temporary storage.
r12
• This loop transfers 48 bytes in 31 cycles
• Over 50 Mbytes/sec at 33 MHz
The ARM Instruction Set - ARM University Program - V1.0 61 The ARM Instruction Set - ARM University Program - V1.0 62
2 3
r3 = r0 r5 = r3 r0 = r4
r4 = r1 r1 = r5 Memory
r6 = r2 r2 = r6 Rm Rd
The ARM Instruction Set - ARM University Program - V1.0 63 The ARM Instruction Set - ARM University Program - V1.0 64
Coprocessor Register
Coprocessor Data Processing
Transfers
* This instruction initiates a coprocessor operation * These two instructions move data between ARM registers and
* The operation is performed only on internal coprocessor state coprocessor registers
• For example, a Floating point multiply, which multiplies the contents of • MRC : Move to Register from Coprocessor
two registers and stores the result in a third register • MCR : Move to Coprocessor from Register
* Syntax: * An operation may also be performed on the data as it is transferred
• CDP{<cond>} <cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>} • For example a Floating Point Convert to Integer instruction can be
implemented as a register transfer to ARM that also converts the data
from floating point format to integer format.
31 28 27 26 25 24 23 20 19 16 15 12 11 8 7 5 4 3 0
* Syntax
Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm
• <MRC|MCR>{<cond>} <cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 5 4 3 0
Destination Register Opcode
Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm
Source Registers
Opcode
Condition Code Specifier
ARM Source/Dest Register Opcode
Coprocesor Source/Dest Registers
Condition Code Specifier Transfer To/From Coprocessor
Opcode
The ARM Instruction Set - ARM University Program - V1.0 69 The ARM Instruction Set - ARM University Program - V1.0 70
The ARM Instruction Set - ARM University Program - V1.0 71 The ARM Instruction Set - ARM University Program - V1.0 72
Quiz #6 Quiz #6 - Sample Solution
* Write a short code segment that performs a mode change by modifying * Set up useful constants:
the contents of the CPSR
• The mode you should change to is user mode which has the value 0x10. mmask EQU 0x1f ; mask to clear mode bits
• This assumes that the current mode is a priveleged mode such as userm EQU 0x10 ; user mode value
supervisor mode.
• This would happen for instance when the processor is reset - reset code
would be run in supervisor mode which would then need to switch to * Start off here in supervisor mode.
user mode before calling the main routine in your application. MRS r0, cpsr ; take a copy of the CPSR
• You will need to use MSR and MRS, plus 2 logical operations. BIC r0,r0,#mmask ; clear the mode bits
ORR r0,r0,#userm ; select new mode
MSR cpsr, r0 ; write back the modified
31 28 8 4 0
; CPSR
N Z CV I F T * End up here in user mode.
Mode
The ARM Instruction Set - ARM University Program - V1.0 73 The ARM Instruction Set - ARM University Program - V1.0 74
Example 6: Heavy Conditional Instruction Use [1] Example 6: Heavy Conditional Instruction Use [2]
sum_rtn END
MOV r0,r1 ; set return value
MOV pc,lr
END
CS 160 Ward 19 CS 160 Ward 20
log.s: Compute k (n <= 2^k)
AREA LOG, CODE, READONLY
EXPORT log
; r0 = input variable n
; r0 = output variable m (0 by default)
; r1 = output variable k (n <= 2^k)
log
MOV r2, #0 ; set m = 0
MOV r1, #-1 ; set k = -1
log_loop
TST r0, #1 ; test LSB(n) == 1
ADDNE r2, r2, #1 ; set m = m+1 if true
ADD r1, r1, #1 ; set k = k+1
MOVS r0, r0, LSR #1 ; set n = n>>1
BNE log_loop ; continue if n != 0
log_rtn
MOV pc,lr
Returning from FIQ and IRQ Returning from FIQ and IRQ (Cont.)
o FIQ and IRQ are generated only after the o Restoring the program counter
execution of an instruction n If not using stack: SUBS pc, lr, #4 //pc = lr-4
n The program counter has been updated
FIQ or IRQ occurs
PC
n If using stack to store the return address
PC+4 SUB lr, lr, #4 //when entering the handler
n lr_mode = PC + 4 STMFD sp!, {reglist, lr}
o Point to one instruction beyond the end of the …
instruction in which the exception occurred LDMFD sp!, {reglist, pc}^ //when leaving the handler
IRQ Handler
IRQ_Handler: ; top-level handler
STMFD sp!,{r0-r12,lr} ; Store registers.
BL ISR_IRQ
Abstract — The Advanced Microcontroller Bus Architeture be used without royalties AMBA’s target is to help designer of
(AMBA) is a widely used interconnection standard for System embedded system to meet challenges like design for low pow-
on Chip (SoC) design. An AMBA-based microcontroller typi- er consumption, to facilitate the right-first-time development
cally consists of a high-performance system backbone bus of Embedded Microcontroller Products with one or more
(AMBA AHB or AMBA ASB), able to sustain the external CPUs or signal processors, to be technology-independent and
memory bandwidth, on which the CPU, on-chip memory and to encourage modular system [4]. To minimize the silicon in-
other Direct Memory Access (DMA) devices reside. This bus frastructure required supporting efficient on-chip and off-chip
provides a high-bandwidth interface between the elements that communication for both operation and manufacturing test [1].
are involved in the majority of transfers. This paper present
three distinct buses and their comparison. By considering mer- This paper discusses the architecture of AMBA in the section
its of APB , AMBA can be design by using HDL. II, section III deals with the various bus methods and their
comparison is discuss in section IV. Finally section V and VI
Index Terms — AMBA, AHB, ASB, APB, Difference of bus- gives proposed work and conclude the paper.
es
II. ARCHITECTURE OF AMBA BASED SIMPLE
I. INTRODUCTION
MICROCONTROLLER
Today in the era of modern technology micro-electronics play
a very vital role in every aspects of life of an individual, in- An AMBA-based microcontroller typically consists of a
creasing use for micro-electronics equipment increases the high-performance system backbone bus (AMBA AHB or
demand for manufacturing its components and its availability AMBA ASB), able to sustain the external memory band-
[4].Embedded system designers have a choice of using a share width, on which the CPU, on-chip memory and other Direct
or point-to-point bus in their designs [2]. Typically, an embed- Memory Access (DMA) devices reside. This bus provides a
ded design will have a general purpose processor, cache, high-bandwidth interface between the elements that are in-
SDRAM, DMA port, and Bridge port to a slower I/O bus, such volved in the majority of transfers[3]. Fig1 shows AMBA
as the Advanced Micro controller Bus Architecture (AMBA) based Simple Microcontroller. Also located on the high per-
Advanced Peripheral Bus (APB). In addition, there might be a formance bus is a bridge to the lower bandwidth APB, where
port to a DSP processor, or hardware accelerator, common most of the peripheral devices in the system are located.
with the increased use of video in many applications. As chip- AMBA APB provides the basic peripheral macro cell com-
level device geometries become smaller and smaller, more and munications infrastructure as a secondary bus from the higher
more functionality can be added without the concomitant [2] bandwidth pipelined main system bus [1]. Such peripherals
increase in power and cost per die as seen in prior generations. typically:
The Advanced Microcontroller Bus Architecture (AMBA) was (i) Have interfaces which are memory-mapped registers
introduced by ARM Ltd 1996 and is widely used as the on- (ii) Have no high-bandwidth interfaces
chip bus in system on chip (SoC) designs. AMBA is a regis- (iii) Are accessed under programmed control.
tered trademark of ARM Ltd. The first AMBA buses were Ad-
vanced System Bus (ASB) and Advanced Peripheral Bus The AMBA specification [2] has become a de-facto standard
(APB). In its 2nd version, AMBA 2, ARM added AMBA for the semiconductor industry, it has been adopted by more
High-performance Bus (AHB) that is a single clock-edge pro- than 95% of ARM’s partners and a number of IP providers.
tocol. In 2003, ARM introduced [2,4] the 3rd generation, The specification has been successfully implemented in sever-
AMBA 3, including AXI to reach even higher performance in- al ASIC designs. Since the AMBA interface is processor and
terconnect and the Advanced Trace Bus (ATB) as part of the technology independent, it enhances the reusability of periph-
Core Sight on-chip debug and trace solution. eral and system components across a wide range of applica-
These protocols are today the de-facto standard for 32-bit em- tions.
bedded processors because they are well documented and can
www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 4, April 2013 2
ISSN 2250-3153
The AMBA specification [1,3] has been derived to satisfy the Memory Access (DMA) or Digital Signal Processor (DSP)
following four key requirements. to be included as bus masters.
(i) To facilitate the right-first-time development of Embedded The external memory interface, APB bridge and any inter-
Microcontroller Products with one or more CPUs or signal nal memory are the most common AHB slaves. Any other
processors. peripheral in the system could also be included as an AHB
(ii) To be technology-independent and ensure that highly reus- slave. However, low-bandwidth peripherals typically reside
able peripheral and system macro cells can be migrated across on the APB.
a diverse range of IC processes and be appropriate for full-
custom, standard cell and gate array technologies. (B) The Advanced System Bus (ASB): ASB is the first
(iii) To encourage modular system design to improve proces- generation of AMBA system bus. A typical AMBA ASB
sor independence, providing a development road-map for ad- system may contain one or more bus masters. For example,
vanced cached CPU cores and the development of peripheral at least the processor and test interface. However, it would
libraries. also be common for a Direct Memory Access (DMA) or
(iv)To minimize the silicon infrastructure required supporting Digital Signal Processor (DSP) to be included as bus mas-
efficient on-chip and off-chip communication for both opera- ters.
tion and manufacturing test.
The external memory interface, APB bridge and any inter-
nal memory are the most common ASB slaves. Any other
peripheral in the system could also be included as an ASB
slave. However, low-bandwidth peripherals typically reside
on the APB.
www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 3, Issue 4, April 2013 3
ISSN 2250-3153
VI. CONCLUSION
REFERENCES
[1] AMBA specification, version 2.0.
[2] Akhilesh kumar and Richa Sinha, “design and verification analysis of
ABP3 protocol with coverage ”Inaternational journal of advance in
engineering and Technology vol. 1 issue 5 pp.310-317,Nov 2011.
[3] Priyanka Gandhani, Charu Patel “ Moving from AMBA AHB to AXI
Bus in SoC Designs: A Comparative Study” Int. J Comp Sci. Emerging
Tech Vol-2 No 4 ,pp.476-479 August, 2011.
www.ijsrp.org
Agenda
Cortex-M3 Overview
v7-M Architecture/Programmers Model
Data Path and Pipelines
ARM Cortex-M3 Tools and mbed Platform
Introduction
ARM University Relations
1 2
Trace Port Serial-Wire
Microcontrollers are getting powerful DAP Viewer
Lots of processing, memory, I/O in one package JTAG/SWD
MPU
ITM (1-pin)
Instrumentation
Floating-point is even available in some! Trace
3 4
ARM Cortex-M3 Microcontroller ARM Cortex-M3 Microcontroller
18 x 32-bit registers ARMv7M Architecture
Excellent compiler target No Cache - No MMU
5 6
0
Thumb-2 gives approximately 25%
32-bit 16-bit 16-bit with improvement in performance over
32-bit stack
Thumb
Memory width (zero wait state)
7 8
Agenda Cortex-M3 Register Set
Cortex-M3 Overview Main
xPSR
9 10
11 12
Memory Map NXP LPC1311/13/42/43 Block Diagram
Very simple linear 4GB memory map
The Bus Matrix partitions memory access via the AHB and PPB buses
FFFFFFFF
System
E0100000
The image cannot be display ed. Your computer may not hav e enough memory to open the image,
APB Debug Components
or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the
red x still appears, y ou may hav e to delete the image and then insert it again.
E0040000
CM3 Instruction SCS + NVIC
E0000000
Core Data
External Peripheral
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupt ed. Restart y our computer, and then open the file
again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
1 GB
Bus Matrix INTERNAL PPB
A0000000
with SYSTEM AHB
SYSTEM AHB External RAM
Debug Bit- Bander
Debug ICODE AHB 1 GB
Aligner
and Patch DCODE AHB 60000000
Peripheral ½GB
40000000
RAM
½GB
20000000
Code Space ½GB
00000000
13 14
Privileged Aborts
Supervisor Interrupts
Reset
Handler Mode
OS
User Non-Privileged
Thread Mode Application code
Memory
15 16
Memory Protection Unit (MPU) Cortex-M3 Bit Banding
8 register-stored regions
Same regions used for instructions and data
Minimum region size 32 Bytes (max 4GB) Mask and Modify
x x x x x 1 x x
Bit Element
No address translation or page tables 0x02000000
17 18
19 20
Interrupt Handling Exception Handling
One Non-Maskable Interrupt (INTNMI) supported
Reset
1-240 prioritizable interrupts supported
Interrupts can be masked NMI
Implementation option selects number of interrupts supported Faults
Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core Hard Fault
Interrupt inputs are active HIGH
Memory Manage
Bus Fault
INTNMI
Usage Fault
NVIC
SVCall
1-240 Interrupts Cortex-M3
Debug Monitor
…
PendSV
SysTick Interrupt
21 22
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
91 www.erpublication.org
Learning Embedded System using advanced Microcontroller and Real Time Operating System
components provide debugging features, such as instruction enhanced determinism, improved code density, Ease of use,
trace, and various types of debugging interfaces. Lower cost solutions, Wide choice of development tools
ARM cores use a 32-bit, Load-Store RISC architecture. It These above are the merits that make ARM Cortex m-3
means that the core cannot directly manipulate the memory of suitable for our porting purpose.
system. All data manipulation must be done by loading The Cortex-M3 processor is based on one profile of the v7
registers with information located in memory, performing the architecture, called ARM v7-M, an architecture specification
data operation and then storing the value back to memory. for microcontroller products. Cortex-M3 supports only the
The Cortex-M3 processor has registers R0 through R15. Thumb-2 (and traditional Thumb) instruction set. Instead of
R0–R12 are 32-bit general-purpose registers for data using ARM instructions for some operations, as in traditional
operations. Some 16-bit Thumb instructions can only access a ARM processors, it uses the Thumb-2 instruction set for all
subset of these registers (low registers, R0–R7). The operations.
Cortex-M3 contains two stack pointers (R13). They are The details of the ARMv7-M architecture are documented in
banked so that only one is visible at a time. The two stack The ARMv7-M Architecture Application Level Reference
pointers are follows- Manual. This document can be obtained via the ARM web
site through a simple registration process. The ARMv7-M
• Main Stack Pointer (MSP): The default stack pointer, used architecture contains the following key areas:
by the operating system (OS) kernel and exception handlers. • Programmer’s model
• Process Stack Pointer (PSP): Used by user application code. • Instruction set
R14 (The link register): - When a subroutine is called, the • Memory model
return address is stored in the link register. • Debug architecture
R15 (The program Counter):- The program counter is the Processor-specific information, such as interface details and
current program address. This register can be written to timing, is documented in the Cortex-M3 Technical Reference
control the program flow. Manual (TRM). This manual can be accessed freely on the
Special registers: The Cortex-M3 processor also has a ARM website.
number of special registers. They are as follows-
• Program Status Register (PSRs) Cortex-M3 Processor Applications
•Interrupt Mask registers (PRIMASK, With its high performance and high code density and small
FAULTMASK, and BASEPRI) silicon footprint, the Cortex-M3 processor is ideal for a wide
• Control registers (CONTROL) variety of applications as-
These registers have special functions and can be accessed
only by special instructions. They cannot be used for normal • Low-cost microcontrollers
data processing. • Automotive
• Data communications
• Industrial control
• Consumer products
There are already many Cortex-M3 processor-based products
on the market, including low-end
Products priced as low as US$1, making the cost of ARM
microcontrollers comparable to or lower than that of many
8-bit microcontrollers.
92 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
processor-based devices. They are easy to learn and use, yet 2. You must be able to disable and enable interrupts from C.
powerful enough for the most demanding embedded 3. The processor must support interrupts and you need to
applications. provide an interrupt that occurs at regular intervals (typically
In this project i will use keil IDE software micro Vision-5. between 10 to 100 Hz).
4. The processor must support a hardware stack, and the
processor must be able to store a fair amount of data on the
III. µCOS-II stack (possibly many Kbytes).
Introduction: - µCOS-II (pronounced "Micro C O S 2") stands 5. The processor must have instructions to load and store the
for Micro-Controller Operating System Version 2. µCOS-II is stack pointer and other CPU registers either on the stack or in
upward compatible with µCOS (V1.11) but provides many memory.
improvements over µCOS such as the addition of a ARM Cortex M-3 satisfies all the above requirements so we
fixed-sized memory manager, user definable callouts on task can easily port µCOS-II in it.
creation, task deletion, task switch and system tick, supports Porting µCOS -II is actually quite straightforward once you
TCB extensions, stack checking and, much more. understand the subtleties of the target processor and the C
If you currently have an application (i.e. product) that runs compiler you will be using.
with µCOS, your application should be able to run, virtually If your processor and compiler satisfy µCOS -II’s
unchanged, with µCOS-II. All of the services (i.e. function requirements, and you have all the necessary tools, porting
calls) provided by µCOS have been preserved. You may, µCOS-II consists of the followings-
however, have to change include files and product build files 1. setting the value of 1 #define constants (OS_CPU.H)
to ‘point ’to the new file names. µCOS-II was developed and 2. Declaring 10 data types (OS_CPU.H)
tested on a PC; µCOS-II was actually targeted for embedded 3. Declaring 3 #define macros (OS_CPU.H)
systems and can easily be ported to many different processor 4. Writing 6 simple functions in C (OS_CPU_C.C)
architectures. 5. Writing 4 assembly language functions
It is a very small real-time kernel with memory footprint is (OS_CPU_A.ASM)
about 20KB for a kernel with full functions and source code is
about 5400 lines, mostly in ANSI C. Source code for µCOS-II All the source codes, you need not to write by your own but
is free but not for commercial purpose. If you want to use it as you should understand its working functionality well. These
commercial purpose, you have to take permission. source codes are easily available so you can use these directly
on your initial stage of porting because these are the processor
Selecting µCOS-II: - There are the following features which independent codes. You need to work on processor dependent
make µCOS-II suitable/convenient to port- codes and your application codes. Also you have to add
Portable ‘INCLUDES.H’.
INCLUDES.H allows every .C file in your project to be
ROMABLE
written without concerns about which header file will actually
Scalable
be needed.
Preemptive
Depending on the processor, a port can consist of writing or
Multi-tasking changing between 50 and 300 lines of code.
Deterministic
Task stacks Starting and Initializing µCOS-II
Services
Interrupt Management a. Starting µCOS-II: - µCOS-II starts in the same way as
Robust and reliable shown in the fig.2. First we will initialize both the hardware
and software .Here the hardware i have used is the ARM
Cortex M-3 and software is the real time operating system
PORTING OF µCOS-II µCOS-II. The resources are allocated for the tasks defined in
Adapting a real-time kernel to a microprocessor or a the application then the scheduler is started and it schedules
microcontroller is called a port. Most of µCOS- II is written in the tasks in pre-emptive manner.
C for portability; however, it is still necessary to write some
processor specific code in C and assembly language. b. Initialization of µCOS-II: - The steps to initialize µCOS-II
Specifically, µCOS-II manipulates processor registers which are shown in Fig.3. We will follow the corresponding steps to
can only be done through assembly language. initialize it.
Porting µCOS -II to different processors is not so much The Steps we will take to initialize µCOS-II through
difficult task only because µCOS -II was designed to be programming is shown below-
portable.
Void main (void)
If you are going to port µCOS-II for your processor, of course {
you need to know how µCOS-II’s processor specific code
works. /* User initialization*/
A processor can run µCOS-II if it satisfies the following OSInit ( ); /* kernel initialization */
requirements:
/* Start OS*/
1. You must have a C compiler for the processor and the C OSStart ( ); /* start multitasking */
compiler must be able to produce reentrant code. }
93 www.erpublication.org
Learning Embedded System using advanced Microcontroller and Real Time Operating System
OSStart ( );
}
AppStartTask ( ):-
static void AppStartTask (void *p_arg)
{
(void) p_arg;
BSP_Init ( );
OS_CPU_SysTickInit ( );
94 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
think. You should test your port without application code. In Buzzer (we can generate desired music or alarm), UART,
other words, test the operations of the kernel by itself. Also Displaying LCD etc. We can also perform many projects Like
you can test it by checking whether context switching is Home automation using Bluetooth and UART together ,
happening or not on your Register window in KEIL IDE. Noticeboard display using Bluetooth, LCD and UART
There are two reasons to do this. First, you don’t want to together etc. But it is enough to perform two-three tasks to test
complicate things any more than they need to be. Second, if porting of our µCOS-II. Thus we can port µCOS-II using
something doesn’t work, you know that the problem lies in the development tools i.e. ARM Cortex M-3 and KEIL IDE.
port as opposed to your application. Start with a couple of
simple tasks and only the ticker interrupt service routine.
Once you get multitasking going, it’s quite simple to add your VI. CONCLUSION
application tasks. In this Research paper the porting of a real time operating
system µCOS-II on ARM Cortex M-3 using software keil
µvision-5 is presented. It mainly concentrates on
development of an embedded monitoring system using ARM
Cortex M-3 and Real Time Kernel. All the steps taken while
porting the µCOS-II and implementation thesis are provided
in the paper. The paper gives a detailed overview that will
help the students to develop and design an embedded
monitoring system using ARM cortex M-3 and Real time
operating system .
ACKNOWLEDGEMENT
I would like to acknowledge the Centre for Development and
Advance Computing (C-DAC), Hyderabad, Govt. of India
and their faculties to train and inspire for the work. I would
also like to place on record my sincere thanks and gratitude to
Shri Sanjay Kr. Vyas Scientist 'E' / Additional Director, HRD
Division, Dept. of Electronics and Information Technology
(DeitY), Ministry of Communication and IT, govt. of India for
his continuous guidance and support for my project work.
95 www.erpublication.org
Chapter 11
11. Direct Memory Access (DMA)
Topic Page
The TI webpage gives some application notes, which explain the use
of the DMA controller for different applications, with the objective of
reducing power consumption:
System interrupts
DMA transfers are not interruptible by system interrupts, but system
interrupt service routines (ISRs) may be interrupted by DMA
transfers.
Only non-maskable interrupts (NMIs) can be configured to interrupt
the DMA controller, if the ENNMI bit is set. If it is not set, system
interrupts remain pending until the completion of the transfer.
15 14 13 12 11 10 9 8
Reserved DMA2TSELx
7 6 5 4 3 2 1 0
DMA1TSELx DMA0TSELx
15 14 13 12 11 10 9 8
0 0 0 0 0 0 0 0
7 6 5 4 3 2 1 0
0 0 0 0 0 DMAONFETCH ROUNDROBIN ENNMI
Bit Description
2 DMAONFETCH DMA on fetch:
DMAONFETCH = 0 DMA transfer occurs immediately
DMAONFETCH = 1 DMA transfer occurs on next instruction fetch after
the trigger
1 ROUNDROBIN Round robin:
ROUNDROBIN = 0 DMA channel priority is DMA0 − DMA1 − DMA2
ROUNDROBIN = 1 DMA channel priority changes with each transfer
0 ENNMI Enable NMI when ENNMI = 1, allowing a NMI interrupt to interrupt a DMA
transfer
15 14 13 12 11 10 9 8
Reserved DMADTx DMADSTINCRx DMASRCINCRx
7 6 5 4 3 2 1 0
DMADSTBYTE DMASRCBYTE DMALEVEL DMAEN DMAIFG DMAIE DMAABORT DMAREQ
Bit Description
14-12 DMADTx DMA transfer mode:
DMADT2 DMADT1 DMADT0 = 000 Single transfer
DMADT2 DMADT1 DMADT0 = 001 Block transfer
DMADT2 DMADT1 DMADT0 = 010 Burst-block transfer
DMADT2 DMADT1 DMADT0 = 011 Burst-block transfer
DMADT2 DMADT1 DMADT0 = 100 Repeated single transfer
DMADT2 DMADT1 DMADT0 = 101 Repeated block transfer
DMADT2 DMADT1 DMADT0 = 110 Repeated burst-block
transfer
DMADT2 DMADT1 DMADT0 = 111 Repeated burst-block
transfer
11-10 DMADSTINCRx DMA destination address increment/decrement after each byte
or word transfer:
When DMADSTBYTE = 1, the destination address increments /
decrements by one
When DMADSTBYTE = 0, the destination address increments /
decrements by two.
DMADSTINCR1 DMADSTINCR0 = 00 Address unchanged
DMADSTINCR1 DMADSTINCR0 = 01 Address unchanged
DMADSTINCR1 DMADSTINCR0 = 10 Address decremented
DMADSTINCR1 DMADSTINCR0 = 11 Address increment
9-8 DMASRCINCRx DMA source address increment/decrement after each byte or
word transfer:
When DMASRCBYTE = 1, the source address
increments/decrements by one
When DMASRCBYTE = 0, the source address
increments/decrements by two.
DMASRCINCR1 DMASRCINCR0 = 00 Address unchanged
DMASRCINCR1 DMASRCINCR0 = 01 Address unchanged
DMASRCINCR1 DMASRCINCR0 = 10 Address decremented
DMASRCINCR1 DMASRCINCR0 = 11 Address increment
7 DMADSTBYTE DMA destination length (byte or word):
DMADSTBYTE = 0 Word
DMADSTBYTE = 1 Byte
6 DMASRCBYTE DMA source length (byte or word):
DMASRCBYTE = 0 Word
DMASRCBYTE = 1 Byte
5 DMALEVEL DMA level:
DMALEVEL = 0 Edge sensitive trigger (rising edge)
DMALEVEL = 1 Level sensitive trigger (high level)
4 DMAEN DMA enable when DMAEN = 1
3 DMAIFG DMA interrupt flag DMAIFG = 1 when interrupt pending
2 DMAIE DMA interrupt enable when DMAIE = 1
1 DMAABORT DMA Abort DMAABORT = 1 when a DMA transfer is interrupted
by NMI
0 DMAREQ DMA request DMAREQ = 1 starts DMA
RS232, SPI, I2C
Communications
• The simplest is parallel
The simplest is parallel
Multiple (8
– One way typically) data
lines
• There may be mechanism for
peripheral to get attention of μC
“L t h”
“Latch” Peripheral
μC (i.e., interrupt, or poll) “CS”
– Two way
Data
lines
μC “Latch” Peripheral
“CS”
“R/~W”
• This
This is resource expensive (pins, real
is resource expensive (pins real‐estate
estate…) in terms
) in terms
of hardware, but easy to implement
Serial Communications
Serial Communications
• Many fewer lines are required to transmit data. This is requires
fewer pins but adds complexity
fewer pins, but adds complexity.
Data
μC
Clock Peripheral
“CS”
• Synchronous communications requires clock. Whoever controls
the clock controls communication speed.
• Asynchronous has no clock, but speed must be agreed upon
beforehand (baud rate).
Asynchronous Serial (RS‐232)
• Commonly
Commonly used for one‐to‐one communication.
used for one to one communication
• There are many variants, the simplest uses just two lines, TX
(transmit) and RX (receive).
• Transmission process (9600 baud, 1 bit=1/9600=0.104 mS)
– Transmit idles high (when no communication).
– It
It goes low for 1 bit (0.104 mS)
l f 1 bit (0 104 S)
– It sends out data, LSB first (7 or 8 bits)
– There may be a parity bit (even or odd – error detection)
– There may be a stop bit (or two)
RS232 Voltage levels
• From processor side, 0V=logic 0, 3.3V=logic 1
• In a
In a “serial”
serial cable +12→+3V=logic 0, ‐3→‐12V=logic 1
cable +12→+3V=logic 0 ‐3→‐12V=logic 1
• On “Experimenter’s board”
• Physical connector
RS232 – Handshaking
RS232
• Some RS232 connections using handshaking lines between
DCE (Data Communications Equipment) and DTE (Data
( q p ) (
Terminal Equipment).
– RTS (Ready To Send)
• Sent by the DTE to signal the DCE it is Ready To Send.
– CTS (Clear To Send)
• Sent by the DCE to signal the DTE that it is Ready to Receive.
– DTR (Data Terminal Ready)
• Sent to DTE to signal the DCE that it is ready to connect
– DSR (Data Set Read)
• Sent to DC to signal the DTE that it is ready to connect
• IIn practice if these handshaking lines are used it can be
ti if th h d h ki li d it b
difficult to set up the serial communications, but it is quite
robust once working.
• There is also software handshaking (XON/XOFF)
• DTE and DCE have different connector pinouts.
MSP430 USCI in UART mode
(also USART peripheral)
UART mode features include:
• 7‐ or 8‐bit data; odd, even, or non‐parity
• Independent transmit and receive
•LSB‐first or MSB‐first data
d d f
•Receiver start‐edge detection for auto‐
wake up from LPMx modes
•Independent interrupt capability for
receive and transmit
receive and transmit
• Status flags for error detection and
suppression
•Built‐in idle‐line and address‐bit
communication protocols for
p y
multiprocessor systems
• Status flags for address detection
// Echo a received character, RX ISR used. Normal mode is LPM3,
// USCI_A0 RX interrupt triggers TX Echo.
UART code //
//
//
ACLK = BRCLK = LFXT1 = 32768, MCLK = SMCLK = DCO~1048k
Baud divider, 32768hz XTAL @9600= 32768/9600= 3.41(0003h 03h )
-----------------
// /|\| MSP430xG461x |-
// | | XIN|- 32kHz
// --|RST XOUT|-
// | P4.7/UCA0RXD|------------>
#include "msp430xG46x.h" // | | 9600 - 8N1
// | P4
P4.6/UCA0TXD|<------------
6/UCA0TXD|<
void main(void)
{
volatile unsigned int i;
Figure 1.
Application Layer
The CAN Protocol Specification
and the OSI model
Logical Link Control (LLC)
ISO Data Link
(Layer 2)
Media Access Control (MAC)
CAN Protocol
Specification
Physical Layer Signaling (PLS)
ISO Physical
(Layer 1)
Medium Attachment Unit (MAU)
CAN specifies the medium access control (MAC) and physical layer
signaling (PLS) as it applies to layers 1 and 2 of the OSI model.
Medium access control is accomplished using a technique called
non-destructive bit-wise arbitration. As stations apply their unique
identifier to the network, they observe if their data are being
faithfully produced. If it is not, the station assumes that a higher
priority message is being sent and, therefore, halts transmission and
reverts to receiving mode. The highest priority message gets
through and the lower priority messages are resent at another time.
The advantage of this approach is that collisions on the network do
not destroy data and eventually all stations gain access to the
network. The problem with this approach is that the arbitration is
done on a bit by bit basis requiring all stations to hear one another
within a bit-time (actually less than a bit-time). At a 500 kbps bit-
rate, a bit-time is 2000 ns which does not allow much time for
transceiver and cable delays. The result is that CAN networks are
usually quite short and frequently less than 100 meters at higher
speeds. To increase this distance either the data rate is decreased
or additional equipment is required.
www.ccontrols.com
CAN DATA LINK LAYER CAN transmissions operate using the producer/consumer model.
When data are transmitted by a CAN device, no other devices are
addressed. Instead, the content of the message is designated by an
Tutorial
identifier field. This identifier field, which must be unique within
the network, not only provides content but the priority of the
message as well. All other CAN devices listen to the sender and
accept only those messages of interest. This filtering of the data is
accomplished using an acceptance filter which is an integral
component of the CAN controller chip. Data which fail the
acceptance criteria are rejected. Therefore, receiving devices
consume only that data of interest from the producer.
S R I r
O 11 bit IDENTIFIER T D o DLC 0-8 Bytes 15 bit CRC
F R E
www.ccontrols.com
PROPAGATION DELAY In a Philips’ application note2, the author does an in-depth study
on the maximum allowable propagation delay as a function of
various controller chip parameters. The propagation delay (Figure 3)
Tutorial
tp = 2(tsd+ttx+trx+tcbl)
All delays are constant except the cable delay (tcbl) which depends
upon the length of the cable and the propagation delay factor of
the cable (Pc). The author provides a chart of maximum allowable
propagation delays (tpm) for various data rates and CAN chip
timing parameters. The actual propagation delay must not exceed
the maximum allowable propagation delay. By making the
appropriate substitutions, we can determine the maximum
allowable cable length (L).
L < 1/2tpm-tsd-trx-ttx
Pc
Using appendix A.1 of the application note and the most favorable
parameters for long distance, at 500 kbps, tpm equals 1626 ns.
Assuming transceiver delays of 100 ns each, chip delay of 62.5 ns
and a cable propagation factor of 5.5 ns/m, the maximum cable
length is 100 meters which is the value used in the DeviceNet
specification. Doing the same calculation at 250 kbps yields 248
meters and at 100 kbps, 680 meters. These values can be improved
Figure 3.
Use the longest path with better cable and faster transceivers.
when calculating
propagation delay.
tsd tsd
tcbl
4
www.ccontrols.com
22.11.2017 I2C Tutor al
About Us Contact Us
News Reviews Guides & Tutorials Embedded Previews & Unboxing More
Search ...
I2C Tutorial We
SUBSCRIBE VIA EMAIL Posted By Umang Gajera Posted date: April 05, 2017 in: Embedded No Comments
Bu w
Email Address Erişi
Subscribe In this tutorial we will go through I2C Bus & Protocol. I2C was originally invented by Philips(now NXP) in
1982 as bi-directional bus to communicate with multiple devices using just 2 wires/lines. I2C stands for GOOGLE+
Inter-Integrated Circuit. I2C is sometimes also referred as TWI, which is short for Two Wire Interface, since
it uses only 2 wires for data transmission and synchronization. I2C is pronounced and referred to as “I-
Squared-C” [I2C] , “I-Two-C” [I2C] and “I-I-C” [IIC]. The two wires of I2C Bus consists of:
1. Data Line called SDA which is short for Serial Data
2. Clock Line called SCL which is short for Serial Clock
SDA is the wire on which the actual data transfer happens, which is bi-directional, between different
masters and slaves. SCL is the wire on which the Master device generates a clock for slave device(s).
I2C supports 7 bit and 10 bit addresses for each device connected to the bus. 10 bit addressing was
introduced later. With 7 bit address its possible to connect up to 128 I2C devices to the same bus, however,
some addresses are reserved so practically only 112 devices can be connected at the same time. With 10
bit address a maximum of 1024 devices can be connected. To keep things simple we will be going through
goo
7 bit addressing in this tutorial. For 10 bit addressing you can look up the official I2C specification by NXP, a
Refuge
link to which is given at the bottom of this tutorial. Once you get familiar with the I2C protocol, 10 bit
addressing will be a piece of cake.
As per the original specification of I2C/TWI, it supports a maximum frequency of 100Khz. But along the
years the specifications was updated many times and now we have a bunch of different speed modes. The
latest mode added was Ultra-Fast Mode which allows I2C bus transfer speeds of up to 5Mhz.
To achieve high transfer speeds Ultra-Fast Mode uses push-pull drivers instead of open-drain
which eliminates the use of pull-up resistors. Ultra-Fast Mode is unidirectional only and uses same
bus protocol but is not compatible with bi-directional I2C devices.
Even though multiple masters may be present on the I2C bus the arbitration is handled in such a way that
there is no corruption of data on bus in case when more than 2 masters try to transmit data at the same
time. Since the transmission, synchronization and arbitration is done using only 2 wires on the bus, the
communication protocol might be a bit uneasy to understand for beginners .. but its actually easy to
understand – just stick with me 🙂
Let us go through I2C protocol basics first. I2C bus is a Byte Oriented bus. Only a Byte can be transferred
at a time. Communication(Write to & Read from) is always initiated by a Master. The Master first sends a
START condition and then writes the Slave Address (SLA) and the Direction bit(Read=1/Write=0) on bus
and the corresponding Slave responds accordingly.
Depending on the Direction bit, 2 types of transfers are possible on the I2C bus:
1. In this case, after sending the START condition, the Master sends the First Byte which contains
the Slave address + Write bit.
2. The corresponding slave acknowledges it by sending back an Acknowledge (ACK) bit to the
Master.
3. Next, the Master sends 1 or more bytes to slave. After each byte received the Slave sends back
an Acknowledge bit (ACK).
4. When Master wants to stop writing it then sends a STOP condition.
1. Here the Master sends the First Byte which contains the Slave address + Read bit
2. The corresponding Slave acknowledges it by sending back an Acknowledge (ACK) bit to the
Master.
3. Next, the Slave sends 1 or more bytes and the Master acknowledges it everytime by sending an
Acknowledge bit (ACK).
4. When the Master wants to stop reading it sends a Not Acknowledge bit (NACK) followed by a
STOP condition.
Repeated Start
A Repeat Start condition is similar to a Start condition, except it is sent in place of Stop when the master
does not want to loose the control over the bus and wants to complete its transfers in atomic manner
when multiple masters are present. When a master wants to switch to Master Receiver Mode from Master
Transmitter mode or vice-versa it sends a Repeated start at the end of the current transfer so it remains
master when next transfer starts.
Generating the Clock pulses, STOP and START is the responsibility of the Master. When the
Master wants to change the transfer mode(i.e Read/Write) it sends a Repeated START condition
instead of a STOP condition. A transfer typically ends with a STOP or Repeated START condition.
SDA & SCL Voltage levels for di erent Voltage devices on same
bus
In many cases(but not all!), I2C supports devices having different signal voltage levels to be connected to
the same bus. Like for example interfacing 5V I2C Slave device with a 3.3V microcontroller like lpc1768,
lpc2148 or interfacing 3.3V I2C Slave device with 5V microcontroller like Arduino. In such cases we connect
the Pull-up resistors to the lower of the Vcc/Vdd. In the mentioned examples it would be 3.3V in both cases
since its the lower one. As per the I2C specification Input reference levels are set as 30 % and 70 % of Vcc.
Hence, VIL(LOW-level input voltage) is 0.3Vcc and VIH(HIGH-level input voltage) is 0.7Vcc . If these
thresholds for Input Reference Levels are met when using two or more device with different voltages you
are good to go by connecting pull ups to lowest Vcc else you will need a line buffer/driver which provides
level-shifting, between the different voltage level devices based on CMOS, NMOS, TTL, etc.
Here the buffer is used to Receive(input) data and Mosfet is used to Transmit(output) data. Drivers for both
SDA and SCL are similar. When the Mosfet is activated it will sink the current from pull-ups resistors which
forces the pin to a Logic Low. Note that it cannot drive the line to HIGH by itself which is obvious. To
provide a logic High state when the output driver is not trying to pull the line LOW we use Pull-Up resistors.
Using pull-ups the logic state of SDA and SCL signals on the I2C bus is always defined and never
floating(digitally). Hence, when no transfers are occurring and the bus is idle, SDA and SCL are continuously
pulled to logic high.
Typical Range for Pull up Resistor value in Standard mode (Sm) i.e. 100Khz is between 5kΩ to
10kΩ, while that in Fast Mode (Fm) i.e. 400Khz is between 2kΩ to 5kΩ. For High Speed mode (Hs-
mode) i.e. 3.4Mhz, its around 1kΩ. Be sure to check your part manufacturer’s datasheet for more.
Clock Stretching
Clock Stretching is a mechanism for slave devices to make the master wait until data is ready or slave
device has to finish some internal operations (like: ADC conversion, Initial internal Write cycle, etc..) before
proceeding further. In Clock Stretching the SCL line is held low by the slave which pauses the current
transfer.
Acknowledge Polling
In practice, many Slave devices do not support clock stretching. Consider 24c16, at24c32, 24lc256, etc.
series of EEPROMs. These devices do not support clock stretching even though they have to perform
internal byte write or page write operation when the master does a write operation. In this case the master
has to initiate an Acknowledge Polling (for EEPROMs its also called Write Polling ) which checks if the
Reference(s):
I2C Official Specification by NXP
Share this:
Share
Previous Next
Create Keil uVision5 Project for LPC2148 ARM7 LPC2148 I2C Programming Tutorial
MCU
Share on
Serial peripheral interface (SPI) is one of the most widely used interfaces Data Transmission
between microcontroller and peripheral ICs such as sensors, ADCs, DACs, To begin SPI communication, the master must send the clock signal and
shift registers, SRAM, and others. This article provides a brief description select the slave by enabling the CS signal. Usually chip select is an active
of the SPI interface followed by an introduction to Analog Devices’ SPI low signal; hence, the master must send a logic 0 on this signal to select
enabled switches and muxes, and how they help reduce the number of the slave. SPI is a full-duplex interface; both master and slave can send
digital GPIOs in system board design. data at the same time via the MOSI and MISO lines respectively. During SPI
communication, the data is simultaneously transmitted (shifted out serially
SPI is a synchronous, full duplex master-slave-based interface. The data
onto the MOSI/SDO bus) and received (the data on the bus (MISO/SDI) is
from the master or the slave is synchronized on the rising or falling clock
sampled or read in). The serial clock edge synchronizes the shifting and
edge. Both master and slave can transmit data at the same time. The SPI
sampling of the data. The SPI interface provides the user with flexibility to
interface can be either 3-wire or 4-wire. This article focuses on the popular
select the rising or falling edge of the clock to sample and/or shift the data.
4-wire SPI interface.
Please refer to the device data sheet to determine the number of data bits
transmitted using the SPI interface.
Interface
The device that generates the clock signal is called the master. Data Data sampled on rising edge and
0 0 0 Logic low
transmitted between the master and the slave is synchronized to the shifted out on the falling edge
clock generated by the master. SPI devices support much higher clock Data sampled on the falling edge
frequencies compared to I2C interfaces. Users should consult the product 1 0 1 Logic low
and shifted out on the rising edge
data sheet for the clock frequency specification of the SPI interface. Data sampled on the falling edge
2 1 1 Logic high
and shifted out on the rising edge
SPI interfaces can have only one master and can have one or multiple slaves.
Figure 1 shows the SPI connection between the master and the slave. 0 Data sampled on the rising edge
3 1 Logic high
and shifted out on the falling edge
The chip select signal from the master is used to select the slave. This is
normally an active low signal and is pulled high to disconnect the slave Figure 2 through Figure 5 show an example of communication in four SPI
from the SPI bus. When multiple slaves are used, an individual chip select modes. In these examples, the data is shown on the MOSI and MISO line.
signal for each slave is required from the master. In this article, the chip The start and end of transmission is indicated by the dotted green line, the
select signal is always an active low signal. sampling edge is indicated in orange, and the shifting edge is indicated
in blue. Please note these figures are for illustration purpose only. For
MOSI and MISO are the data lines. MOSI transmits data from the master to successful SPI communications, users must refer to the product data
the slave and MISO transmits data from the slave to the master. sheet and ensure that the timing specifications for the part are met.
CLK
MOSI
xxxx 1 0 1 0 0 1 0 1 xxxx
0xA5
MISO
Hi-Z 1 0 1 1 1 0 1 0 Hi-Z
0xBA
Figure 2. SPI Mode 0, CPOL = 0, CPHA = 0: CLK idle state = low, data sampled on rising edge and shifted on falling edge.
nCS
CLK
Figure 3. SPI Mode 1, CPOL = 0, CPHA = 1: CLK idle state = low, data sampled on the falling edge and shifted on the rising edge.
nCS
CLK
MISO
Hi-Z Hi-Z
0xBA 1 0 1 1 1 0 1 0
Figure 4. SPI Mode 2, CPOL = 1, CPHA = 1: CLK idle state = high, data sampled on the falling edge and shifted on the rising edge.
nCS
CLK
MOSI
xxxx xxxx
0xA5 1 0 1 0 0 1 0 1
MISO
Hi-Z Hi-Z
0xBA 1 0 1 1 1 0 1 0
Figure 5. SPI Mode 3, CPOL = 1, CPHA = 0: CLK idle state = high, data sampled on the rising edge and shifted on the falling edge.
Figure 3 shows the timing diagram for SPI Mode 1. In this mode, clock polar- Figure 5 shows the timing diagram for SPI Mode 3. In this mode, the clock
ity is 0, which indicates that the idle state of the clock signal is low. The clock polarity is 1, which indicates that the idle state of the clock signal is high.
phase in this mode is 1, which indicates that the data is sampled on the The clock phase in this mode is 0, which indicates that the data is sampled
falling edge (shown by the orange dotted line) and the data is shifted on the on the rising edge (shown by the orange dotted line) and the data is shifted
rising edge (shown by the dotted blue line) of the clock signal. on the falling edge (shown by the dotted blue line) of the clock signal.
Figure 4 shows the timing diagram for SPI Mode 2. In this mode, the clock Multislave Configuration
polarity is 1, which indicates that the idle state of the clock signal is high. The
clock phase in this mode is 1, which indicates that the data is sampled on Multiple slaves can be used with a single SPI master. The slaves can be
the falling edge (shown by the orange dotted line) and the data is shifted on connected in regular mode or daisy-chain mode.
the rising edge (shown by the dotted blue line) of the clock signal.
CS1 CS CS CS
SCLK
MOSI
MISO
Regular SPI Mode: In daisy-chain mode, the slaves are configured such that the chip select
In regular mode, an individual chip select for each slave is required from signal for all slaves is tied together and data propagates from one slave to
the master. Once the chip select signal is enabled (pulled low) by the the next. In this configuration, all slaves receive the same SPI clock at the
master, the clock and data on the MOSI/MISO lines are available for the same time. The data from the master is directly connected to the first slave
selected slave. If multiple chip select signals are enabled, the data on the and that slave provides data to the next slave and so on.
MISO line is corrupted, as there is no way for the master to identify which
In this method, as data is propagated from one slave to the next, the
slave is transmitting the data.
number of clock cycles required to transmit data is proportional to the
As can be seen from Figure 6, as the number of slaves increases, the slave position in the daisy chain. For example, in Figure 7, in an 8-bit
number of chip select lines from the master increases. This can quickly system, 24 clock pulses are required for the data to be available on the 3rd
add to the number of inputs and outputs needed from the master and limit slave, compared to only eight clock pulses in regular SPI mode. Figure 8
the number of slaves that can be used. There are different techniques shows the clock cycles and data propagating through the daisy chain.
that can be used to increase the number of slaves in regular mode; for Daisy-chain mode is not necessarily supported by all SPI devices. Please
example, using a mux to generate a chip select signal. refer to the product data sheet to confirm if daisy chain is available.
MOSI SDI
MISO SDOUT1
SDO X 0xA5 0x5A
SDIN2
SDOUT2
SDI X X 0xA5
SDIN3
CS
SPI
SCLK Slave
Figure 8. Daisy-chain configuration: data propagation.
SDO
Analog Devices SPI Enabled Switches and Muxes
The newest generation of ADI SPI enabled switches offer significant space
saving without compromise to the precision switch performance. This
section of the article discusses a case study of how SPI enabled switches
SDI
or muxes can significantly simplify the system-level design and reduce the
CS
SPI
number of GPIOs required.
SCLK Slave
The ADG1412 is a quad, single-pole, single-throw (SPST) switch, which
SDO requires four GPIOs connected to the control input of each switch. Figure 9
shows the connection between the microcontroller and one ADG1412.
Figure 7. Multislave SPI daisy-chain configuration.
Micro-
controller
GPIOs
SPI Master
Figure 10. In a multislave configuration, the number of GPIOs needed increases tremendously.
Micro-
controller
CS Serial to
CLK Parallel
MOSI Converter
MISO
SPI Master
CS CS CS
CS1 CS
SCLK
MOSI
MISO
SDI
CS
SPI
Slave
SCLK
ADGS1412
SDO
Piyu Dhaker
Piyu Dhaker [piyu.dhaker@analog.com] is an applications engineer in the
North America Central Applications Group of Analog Devices. She graduated
from San Jose State University in 2007 with a master’s degree in electrical
engineering. Piyu joined the North America Central Applications Group in June
2017. She also previously worked in the Automotive Power Train Group and
Power Management Group within ADI.
Successful DSP architectures have two aspects: The “MIPS/MFLOPS” of DSPs is speed of Multiply-Accumulate
DSPs dealing with numbers representing real world (MAC).
● Key architectural and micro-architectural features
● DSP are judged by whether they can keep the multipliers
=> Want “reals”/ fractions that enabled product success in key parameters
busy 100% of the time.
DSPs dealing with numbers for addresses ● Speed
The "SPEC" of DSPs is 4 algorithms:
● Code density
=> Want integers
● Inifinite Impule Response (IIR) filters
● Low power
Support “fixed point” as well as integers ● Finite Impule Response (FIR) filters
● Architectural and micro-architectural features that
● FFT, and
-1 Š x < 1 are artifacts of the era in which they were designed
S . ● convolvers
radix In DSPs, algorithms are king!
point • We will focus on the former!
● Binary compatability not an issue
Software is not (yet) king in DSPs.
S –2N–1 Š x < 2N–1
●
.
radix
People still write in assembly language for a product to
minimize the die area for ROM in the DSP chip.
point 29 27
Kurt Keutzer Kurt Keutzer Kurt Keutzer
DSP Processor General-Purpose Processor Don’t want overflow or have to scale accumulator
DSP are descended from analog :
Option 1: accumalator wider than product:
Multiplies often take>1 cycle what should happen to output when “peg” an input?
Specialized hardware performs “guard bits”
(e.g., turn up volume control knob on stereo)
all key arithmetic operations in Shifts often take >1 cycle ● Motorola DSP:
1 cycle. ● Modulo Arithmetic???
Other operations (e.g., 24b x 24b => 48b product, 56b Accumulator
Hardware support for saturation, rounding) typically Option 2: shift right and round product before adder Set to most positive (2N–1–1) or
managing numeric fidelity: take multiple cycles. most negative value(–2N–1) : “saturation”
● Shifters Multiplier
Multiplier Many algorithms were developed in this model
● Guard bits
● Saturation Shift
ALU ALU
Accumulator G Accumulator
35 33
Kurt Keutzer Kurt Keutzer Kurt Keutzer
320C54x DSP Functional Block Diagram DSP Data Path: Rounding DSP Data Path: Multiplier
ACC REG
41 39
Kurt Keutzer Kurt Keutzer Kurt Keutzer
DSP Memory Mapping of the filter onto a DSP execution unit BENCHMARKS - FIR FILTER
FIR Tap implies multiple memory accesses
DSPs want multiple data ports 4 6
1 3 FINITE-IMPULSE RESPONSE FILTER
5
Some DSPs have ad hoc techniques to reduce memory Xn X Σ Yn 1 2
2 6 Z −1
bandwdith demand Z −1 .... Z −1
X D
β αY
n-1
● Instruction repeat buffer: do 1 instruction 256 times 4
α CN
● Often disables interrupts, thereby increasing interrupt C1 C2 C N −1
response time
5 D
3
Some recent DSPs have instruction caches
● Even then may allow programmer to “lock in” The critical hardware unit in a DSP is the multiplier - much of the
instructions into cache architecture is organized around allowing use of the multiplier
● Option to turn cache into fast program memory on every cycle
No DSPs have data caches This means providing two operands on every cycle, through
multiple data and address busses, multiple address units and
May have multiple data memories
42 40
Kurt Keutzer Kurt Keutzerlocal accumulator feedback Kurt Keutzer
Eg. 320C62x/67x DSP Memory Architecture Conventional ``Von Neumann’’ memor
Program
Memory
Processor Processor Memory
Data
Memory
47 45
Kurt Keutzer Kurt Keutzer Kurt Keutzer
DSP Addressing Eg. TMS320C3x MEMORY BLOCK DIAGRAM - Harvard Architecture HARVARD ARCHITECTURE in DSP
48 46
Kurt Keutzer Kurt Keutzer Kurt Keutzer
Addressing DSP Addressing: Buffers DSP Addressing: FFT
FFTs start or end with data in weird bufferfly order
DSP Processor General-Purpose Processor DSPs dealing with continuous I/O
0 (000) => 0 (000)
•Dedicated address generation •Often, no separate address Often interact with an I/O buffer (delay lines) 1 (001) => 4 (100)
units generation unit
To save memory, buffer often organized as circular buffer 2 (010) => 2 (010)
•Specialized addressing •General-purpose addressing
What can do to avoid overhead of address checking 3 (011) => 6 (110)
modes; e.g.: modes
instructions for circular buffer? 4 (100) => 1 (001)
● Autoincrement
5 (101) => 5 (101)
● Modulo (circular) Option 1: Keep start register and end register per address
6 (110) => 3 (011)
● Bit-reversed (for FFT) register for use with autoincrement addressing, reset to
7 (111) => 7 (111)
•Good immediate data support start when reach end of buffer
What can do to avoid overhead of address checking instructions
Option 2: Keep a buffer length register, assuming buffers
Have an optional “bit reverse” address addressing mode for use
starts on aligned address, reset to start when reach end
autoincrement addressing
Every DSP has “modulo” or “circular” addressing Many DSPs have “bit reverse” addressing for radix-2 FFT
53 51
Kurt Keutzer Kurt Keutzer Kurt Keutzer
Address calculation unit for DSP CIRCULAR BUFFERS BIT REVERSED ADDRESSING
000 x(0) F(0)
59 57
Kurt Keutzer Kurt Keutzer Kurt Keutzer
TMS320C203/LC203 BLOCK DIAGRAM DSP Core Approach - 1995 Specialized Peripherals for DSPs ADSP 2100: ZERO-OVERHEAD LOOP
60 58
Kurt Keutzer Kurt Keutzer Kurt Keutzer