Hierarchical processor integrating core and advanced system peripherals
Cortex-M3 core
Harvard architecture
3-stage pipeline w. branch speculation
Thumb®-2 and traditional Thumb
ALU w. H/W divide and single cycle multiply
Cortex-M3 Processor
Cortex-M3 core
Configurable interrupt controller
Bus matrix
Advanced debug components
Optional MPU & ETM (Not available in STM32F10x)
Cortex-M3 Processor Overview (1/2) ARM v7M Architecture Thumb-2 Instruction Set Architecture Harvard architecture Separate I & D buses allow parallel instruction fetching & data storage
Integrated Nested Vectored Interrupt Controller (NVIC) for low latency interrupt processing Vector Table is addresses, not instructions Designed to be fully programmed in C Even reset, interrupts and exceptions
Bus Arbiter Bit Banding – Atomic Bit Manipulation r e
u er
Memory Interface (I&D) Plus System Interface & Private Peripheral Bus
Integrated System Timer (SysTick) for Real Time OS or other scheduled tasks
Cortex-M3 Processor Overview (2/2)
3-Stage Pipeline
,
Single Cycle Multiply
16b x 16b
32b
1
32b x 16b
32b
1
32b x 32b
64b
3-7*
*UMULL, SMULL,UMLAL, and SMLAL are interruptible and can also complete early epen ng on source va ues
Hardware Division UDIV & SDIV (Unsigned or Signed divide) Instruction takes between 2 & 12 cycles depending on dividend and devisor Closer the dividend and division the faster the instruction completes Instruction is interruptible (abandoned/restarted)
Cortex-M3 & ARM7: Comparison ARM 7TDM I -S Architecture
v4T
Cortex-M 3 v7M
-
-
-
-
D M I P S/ M H z
0.74 Thumb / 0.93 ARM
1.25 Thumb-2
Pipeline
3-Stage
3-Stage + Branch Speculation , . Integrated NVIC Interrupt Controller up to 1-255 Priorities
Interrupt Latency
24-42 Cycles (Depending on LSM)
12 Cycles (6 when Tail Chaining)
Memory Map
Undefined
Architecture Defined
System Status
PSR. 6 modes. 20 Banked regs
xPSR. 2 modes. Stacked regs (1 bank)
Sleep M odes
No
Three
Additional Features of the Cortex-M3
Reduced pin debug & trace interfaces reduce pin overhead from 9-pins to 2- or 3-pins Hardware Interrupt Handling removes need for assembler code in interrupts Integrated atomic bit manipulation for improved data storage Extended Data Watchpoints & Flash Patch technology Embedded sleep control and power-down modes Optional very small Memory Protection Unit (MPU) & Embedded Trace Macrocell (ETM)
High Performance CPU and Buses ARM v7M A rchitecture: Harvard benefits with Von Neumann single memory space Von Neumann “ bottleneck” ♦ ♦ ♦
code execution, data transfer (core/dma), peripheral control PERIPH PERIPH
Three 32bit buses for a parallel , 0 ♦ data transfer (core/dma), 0 ♦ peripheral control
1 0 1 T1 S 1 C1 0 0
FLASH CORTEX-M3
DMIPS
ARM966 (ARM)
Outstanding efficiency of 1.25 DMIPS/ MHz and 1.2 CP I ARM7TDMI (THUMB)
f
THUM B2 instruction set provide 32bit performance w ith 16bit code density THUMB 16bit I nstruction Set
Full THUMB compatibility
ARM 32bit I nstruction Subset
Complete ARM instruction set for better performance
New 16/ 32bit I nstructions
1 cycle MAC and Hardware Divide
THUMB-2 Single POWERFULL instruction set No more mode switching ♦ Two 16bit instruction fetch per FLASH access ♦
Cortex-M3 Memory Map
Vendor Specific (0.5GB) Set aside to enable vendors to implement peripheral compatibility with previous systems Private Peripheral Bus (1M) (CoreSight, NVIC etc.) External Device (1GB). Intended for external devices and/or shared memory that needs ordering/non-buffered
Intended for off chip memory Peripheral (0.5G) Intended for normal peripherals. The bottom 1MB of the 32MB peripheral address space (0x40000000 – 0x400FFFFF) is reserved for bitband accesses. Accesses to the peripheral 32MB bit band alias region (0x42000000 – 0x43FFFFFF) are remapped to this 1MB SRAM (0.5GB) Intended for on-chip SRAM. The bottom 1MB of 0x200FFFFF) is reserved for bit-band accesses. Accesses to the SRAM 32MB bit band alias region (0x22000000 – 0x23FFFFFF) are remapped to this 1MB address space. Code(0.5GB) Reserved for code memory (flash, SRAM). This region is accessed via the Cortex-M3 ICode and DCode busses.
Optimized use of the RAM
Unaligned data access supported to improve data constant and RAM utilization long (32)
32bit machine which does Data not support a l i g n e d unaligned data
long char (8) char (8) int (16) long int (16)c int (16) long
(32) char (8) (32) char (8)
Structure management example
long (32) … … long char (8) char (8) char (8) int (16) long (32) … … long int (16)c char (8) int (16) long … … long (32)
(32)
Unused (wasted) space
Free space for the rest of the application
Reduces SRAM M emory Requirements By Over 25%
Less Memory - Low ER Cost devices
15
Debug Capabilities Serial W ire Debugging for optimized device pin-out
SWD
JTAG
ore p ns ava a e for the application
Embedded break/ w atch capabilities for easy flashed application debugging ♦ ♦
Cortex-M3 ARM7 • Load Multiple uninterruptible, and hence the core must complete the POP and the full stack PUSH
• POP ma be abandoned earl if another interrupt arrives • If POP is interrupted it only takes 6 cycles to enter ISR2 ( Equivalent to Tail -chaining) 26
Interrupt Response – Late Arriving IRQ1 Highest
IRQ2
26 Cortex-M3
PUSH
26
16 ISR 2
ISR 1
16
P OP
6
12
TailChaining
ARM7
Cortex-M3
• 26 cycles to ISR2 entered • mme a e y pre-emp e y an takes a further 26 cycles to enter ISR 1. • ISR 1 completes and then takes 16 cycles to return to ISR 2.
• Stack push to ISR 2 is interrupted • ac ng con nues u new vec or a ress is fetched in parallel • 6 cycles from late-arrival to ISR1 entry. • Tail-chain into ISR 2
27
Interrupt Prioritization Each interrupt source has an 4-bit interrupt priority value The 4 bits are divided into pre-empting priority levels and non-pre-empting “ ” The software programmable PRIGROUP register field of the NVIC chooses how many of the 4-bits are used for “group-priority” and how many are used for “subpriority” u -pr or y eve s on y ave an e ec e pre-emp ng pr or y eve s are e same Group priority is the pre-empting priority
Lower numbers are hi her riorit Hardware interrupt number is lowest level of prioritization IRQ3 is higher priority than IRQ4 if the priority registers are programmed the same PRIGROUP (3 Bits)
B in a ry P o i n t (group.sub)
Preempting Priority (Group P riority) Bits Levels 4 16
Sub-Priority Bits 0
Levels 0
8
1
2
2
4
2
4
gsss
1
2
3
8
ssss
0
0
4
16
011
4.0
gggg
100
3.1
gggs
3
101
2.2
ggss
110
1.3
111
0.4
In STM32F10x 16 levels (4-bit) of priority are implemented
Interrupt Priority Settings Examples PRIGROUP
Groups
Sub-Groups
0
16 groups all with preem tion over lower rou s
PRIGOUP = 011 „gggg“ 15
0 0 3
PRIGOUP = 101 „ggss“
0
4 groups with each 4 sub-groups. Preemption only across
3 3
PRIGOUP = 111 „ssss“ 15
16 sub-groups without pre-emption over lower sub-groups
Cortex-M3 Exception T pes No.
Exception Type
P riority
Type of Priority
Descriptions
1
Reset
-3 (Highest)
fixed
Reset
2
NM I
-2
fixed
Non-M askable I nterrupt
3
Hard Fault
-1
fixed
Default fault if other hander not implemented
4
M emM anage Fault
0
settable
M P U violation or access to illegal locations
5
Bus Fault
1
settable
Fault if AHB interface receives error
6
Usage Fault
2
settable
Exceptions due to program errors
Reserved
N.A.
N .A.
11
SVCall
3
settable
System Service call
12
Debug M onitor
4
settable
Break points, w atch points, external debug
13
Reserved
N .A.
N .A.
14
P endSV
5
settable
P endable request for System Device
15
SY STI CK
6
settable
System Tick Timer
16
I nterrupt #0
7
settable
External I nterrupt #0
7-10
…… 256
… … … … … … … .. I nterrupt#240
… … … … … … … .. 247
settable settable
… … … … … … … .. External I nterrupt #240
In STM32F10x 43 I nterrupts are implemented (total interrupts available 59)
Vector Table
Vector Table starts at location 0 Vector Table contains addresses (vectors) of exception handlers and ISRs
Table size (in words) is = number of IRQ inputs + 16
Minimum size ( case of 1 IRQ) : 17 words
Maximum size ( case of 240 IRQs) 256 words
Main stack ointer initial value in location 0
Not instructions like other ARM processors
Set up by hardware during Reset
Vector Table can be relocated (to SRAM)
o ware con gura e SCB
roug
e ca e reg s er n
Address
Vector
0x00
I nitial M ain SP
0x04
Reset
0x08
NMI
0x0C
Hard Fault
0x10
M emory M anage
0x14
Bus Fault
0x18
Usage Fault
0x1C-0x28
Reserved
0x2C
SVCall
0x30
Debug M onitor
0x34
Reserved
0x38
P endSV
0x3C
Systick
40
I RQ0
…
M ore I RQs
In STM32F10x the Vector Table size is 236 bytes (59 * 4 bytes) 37
Power Management “ 8bit Microcontroller like” pow er mode management SLEEP N OW Wait or Interrupt instructions to enter ow power mo e No more dedicated control register settings sequence ♦ “ W ait for Event” instructions to enter low power mode ♦
No need of Interrupt to wake-up from sleep Rapid resume from sleep
SLEEP on EXI T ♦ ♦
Sleep request done in interrupt routine ow power mo e en ere on n errup re urn
Very fast wakeup time without context saving (6 cycles)
DEEP SLEEP ♦
L on
d ur ati on s le e From product side: PLL can be stopped or shuts down the power to digital parts of the system Enables low power consumption
Optimized RUN mode CORE pow er consumption 3 time less than ARM 7 TDMI
System Timer (SysTick)
Flexible system timer -
-
2 configurable Clock sources
Suitable for Real Time OS or other scheduled tasks
I n STM32F10x the SysTick clock can be: CP U clock or CPU clock/ 8