☞ http://ya-n-ds.tistory.com/3103 : Embedded with ARM (1) ARM Register, Compile, ELF
☞ http://ya-n-ds.tistory.com/3137 : Embedded with ARM (2) Scatter file, Map file, Makefile
# Referece : Embedded Recipes (히언, 코너북)
< Assembly Structure & Syntax >
- Assembly code : Directives + Label + Instruction + Comment
- Directives ;
. AREA : Linker가 다루는 최소 단위 ( similar to 'Section in ELF file format'
assembly block의 '속성' 지정 가능 : ALIGN, CODE, DATA,
COMDEF, COMMON, NOINIT, READONLY, READWRITE
. ENTRY : AREA에서 수행되어야 할 첫번째 위치
. CODE32 / CODE16 : 32bit ARM code / 16bit Thumb code -> ARM/Thumb code can be coexist with Assembly code
. END : Assembly code의 끝
- Labels ; BEGIN / THUMB / LOOP / TEXT - 마음대로 이름 붙인 이름표 -> Symbol이 되어 Label이 있는 곳의 주소를 가리킴
- Other Directives
. EQU : same as #define
. PROC : Start of Funciton, EDNP : End of Function
e.g. Func_Name PROC ... ENDP
. ALIGN
< Assembly Instruction >
- ARM Assembly 개념 : 기본 명령어 + 조건 or 덧붙임 명령어
- OP Code + Operand e.g. 명령어 Rd,... ( Rd : Destination, STR만 반대 )
- [Rn] = *Rn ( a kind of pointer )
1. Branch : pc에 주소를 넣고 program 실행 번지를 바꿈
B / BL ( Jump with R14(LR) ) / BX ( Jump with mode change ) / BLXn ( BL + BX )
BLNE print_ch ; if ( ZF != '0' ), then branch to 'print_f' and then come back
BNE LOOP ; if ( ZF != '0' ), then branch to 'LOOP'
2. Data Processing
☞ http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/BABJAJIB.html
MOV / MVN ( negative value )
MOV r0, sp ; Move 'sp' value to r0
MOV r1, #0xC ; Move 0xC value to r1
ADD Rd, Rn, Var ; ( Rd = Rn+Var )
SUB Rd, Rn, Var ; ( Rd = Rn-Var )
RSB Rd, Rn, Var ; ( Rd = Var-Rn )
AND Rd, Rn, Var ; ( Rd = Rn & Var ) bit-wise operation
BIC Rd, Rn, Var ; ( Rd = Rn & ~Var ) Bit Clear
CMP Rn, Var ; If (Rn-Var)=='0', then ZF(Zero Flag) is set
3. Load/Store : 1st argument=Register(data), 2nd argument=Address(~Pointer), (3rd argument= )
cf. Poiner-like syntax : [Rn] = *Rn
cf. ! : Write back operator ( Update )
LDR r1, [r2] // r2가 가리키는 곳의 data를 r1으로 가져옴
LDR Rd, 0x100 // Rd = *0x100 : 0x100 주소의 값을 Rd로 가져옴
LDR Rd, =0x100 // Rd = 0x100 ( Immediate value )
LDR Rd, [Rn, offset] ; Rd = *[Rn + offset], Rn remains the same
LDR Rd, [Rn, offset]! ; Rd = *[Rn + offset] -> Rn=Rn+offset ; Pre-index
LDR Rd, [Rn], offset ; Rd = *Rn -> Rn=Rn+offset ; Post-index ( updated w/o '!' )
LDRB r0, [r1], #1 ; Load(Byte variable) the data indicated by the address in r1 to r0
; Then Increase r1 by 1
STR Rd, [Rn, offset] ; *[Rn + offset] = Rd, Rn remains the same
STR r1, [r2], #4 ; Store r1 into the area addressed by r2, then increase r2 by 4
3.1. Load/Store Multple
STMFD sp!, {r1-r3, lr} ; Store {r1~r3,lr} to Stack -> Decrease 'sp' by the amount of address
LDMFD sp!, {r1-r3, pc} ; Restore {r1~r3,pc} from Stack -> Increase 'sp' by the amount of address
4. Pseudo 명령어
LDR Rd, Label ; Label is an Immediate value
=> Label is replaced with [PC, #Offset] form -> LDR Rd, =[PC, #Offset]
=> Rd = *(PC + Offset)
LDR r0, =0xFFFFFFFF --Compiler--> LDR r0, [pc, #offset] + 'pc+offset' 위치에 DCD 0xFFFFFFFF
cf. LDR r0, MY_DATA ... MY_DATA DCD 0xFFFF0000
start
MOV r0, #10
ADR r4, start ; ADR - Assembler 'macro' which gets the relative address of PC(Program Counter)
; The assembler attempts to produce a single ADD or SUB instruction
; => SUB r4, pc, #0xC
ADR r0, THUMB+1 ; For the odd branch target address which indicates Thumb mode
BX r0 ; Thumb mode is executed with 'specified target address -1' after branch
; Thumb bit in CPSR is set
5. SWAP 명령 : Memory 영역의 값과 register값을 바꿔치기
SWP R0, R1, [R2] ; R0=*(R2) + *(R2)=R1
6. PSR(Program Status Register) 명령 - CPSR(Current Program Status Register), SPSR(Saved Program Status Register)
MRS r5, CPSR ; r5=CPSR ( Move to Register from PSR )
MSR CPSR, r5 ; CPSR=r5 ( Move to PSR from register )
cf. CPSR : CPSR_f bit[31:27,24], CPSR_c bit[7:0]
7. SWI(SoftWare Interrupt) 명령 : SW가 User mode에서 다른 mode로 전환할 때 사용
e.g. kernal(SVC mode), application(user mode) : application이 kernel에게 job을 요청할 때 사용
SWI 0x11 ; Software Interrupt Handler에서 0x11번째 case를 호출
8. DCD(Define Constant Double-words) directives : Data를 위한 Memory 할당
DCB (1Byte data) / DCW (2Byte) / DCD (4Byte) / DCQ (8Byte)
cf. DCB == '=' / DCD == '&'
SPACE 8 ; 8Byte만큼 0으로 채워진 영역을 예약 ( Static local 변수에 사용 )
9. EXPORT, IMPORT directives : Assembly에서 사용된 Symbol을 외부에서 사용할 수 있도록 해줌
* EXPORT : 외부에서 가져다 쓸 수 있게 해줌 = Global
* IMPORT : 어디에선가 EXPORT된 것을 가져다 씀 = Extern
10. FIELD, MAP directive ~ C의 구조체 선언과 비슷
MAP 0x1000, r5 ; This address is 'r5 + 0x1000'
member FIELD 4 ; member라는 이름의 4Byte memory 확보 ( 주소 : 0x1004 + r5 )
count FIELD 4 ; count라는 이름의 4Byte memory 확보 ( 주소 : 0x1008 + r5 )
ADR r5, Data ; r5에 Data의 주소를 Load
LDR r0, member ; member에 들어 있는 data를 r0에 가져옴
11. 덧붙임 명령어들 ( ! ^ S )
! : Update of base address
S : Restore CPSR with SPSR after calculation
^ : Restore CPSR with SPSR after calculation when PC is included
12. MOV vs. LDR(STR)
☞ https://www.raspberrypi.org/forums/viewtopic.php?t=16528
a. '#' for MOV, '=' for LDR ;; To define an immediate value
b. MOV can move only 8-bit value(0~255, shifted by an even amount, LRD can load a 32-bit value
c. LDR can move a data from a memory to a register
MOV can only move data between 2 registers
d. MOV is a real instruction (32-bit)
LDR is a pseudo instruction (multiple 32-bit instructions to achieve the goal)
=> MOV is faster than LDR
e.g. LDR r0, =0x55555555
Assembler will generate some code to place the constant 0x55555555 in a nearby table in the code area
Then it uses an instruction to load a data from that table pointed by the program counter and an offset to fill up r0.
cf. There is no way to fit a 32-bit data into a 32-instruction ( Instruction = instruntion-code-part + data part )
# Thumb vs. ARM Instruction format
- Thumb : [15:13]=major op-code, [12:11]=minor op-code, [10:8]=Rd(Destination and Souce register), [7:0]=Immediate value
- ARM : [31:28]=NZCV, [27:25]=major op-code, [24:20]=minor op-code, [19:16]=0-Rd(Distination), [15:12]=0-Rd(Source), [11:0]=0000-Immediate value
< ARM mode vs. Thumb mode >
# Compile
- armcc -o asm_File_Name.s -S c_File_Name.c ; For ARM
- armcc --thumb -o asm_File_Name.s -S c_File_Name.c ; For Thumb
( tcc -S -o asm_File_Name.s -S c_File_Name.c - Old version )
# Conditional Branch
- Thumb : CMP -> Condition check or Operation
- ARM : Conditional operation is possible in each Instruction via CPSR bit[31:28] NZCV
cf. CMPS : 'S' means the NZCV is to be updated after operation
# Stack Instrution
- Thumb : PUSH {r3,r4,r5, lr}, POP {r3,r4,r5, pc}
- ARM : STMFD sp!, {r3-r5, lr}, LDMFD sp!, {r3-r5, pc}
# ARM-Thumb mode change : Use 'BX' which is based on ATPCS(ARM Thumb Procedure Call Standard)
# Example - Thumb vs. ARM : Conditional operation
int gcd ( int a, int b )
{
while ( a!=b )
{
if(a>b) a= a-b;
else b= b-a;
}
return a;
}
// Thumb
CODE16
AREA example, CODE, READONLY
gcd PROC
CMP r0,r1 ; r0=a, r1=b
BEQ end ; if a<b
BLT less
SUB r0,r0,r1 ; if a>b
B gcd
less
SUB r1,r1,r0
B gcd
end
ENDP
// ARM
CODE32
AREA example, CODE, READONLY
gcd PROC
CMP r0,r1 ; r0=a, r1=b
SUBGT r0,r0,r1
SUBLT r1,r1,r0
BNE gcd
ENDP
cf. Leaf function : There is no stack-related code, because the function does not call other sub-function
< Method of ARM-Thumb Mode Change >
☞ http://recipes.egloos.com/5032032
a. CPSR bit[5] T-bit : 0=ARM, 1=Thumb
b. Jump address of BX : Odd->Thumb, Even->ARM
c. Veneer
armcc -c -apcs /interwork c_arm_File_Name.c ; For ARM
armcc --thumb -apcs /interwork c_thumb_File_Name.c ; For Thumb
armlink -elf -o elf_File_Name.elf c_arm_File_Name.o c_thumb_File_Name.o
cf. -apcs : AAPCS, /interwork : enable Linker to interworking
fromelf -c elf_File_Name.elf
--->
main
.text
0x000080a8: ea00017b {... B $Ven$AT$L$$thumbveneer ; 0x869c
$Ven$AT$L$$thumbveneer
0x0000869c: e59fc000 .... LDR r12,0x86a4
0x000086a0: e12fff1c ../. BX r12
...
0x000086a4: 000080ad .... DCD 3294 ;; Odd address for Thumb mode change
.text
0x000080ac: 0100 .. LSL r0,r0,#4
0x000080ae: 4770 pG BX r14
<---
# CPSR, SPSR
CPSR : Not used in User mode
SPSR : Not used in User/System mode
CPSR Format : bit[31:28] NZCV, bit[27:8] Reserved, bit[7:5] IFT, bit[4:0] Mode
cf. R15 : PC, R14 : Linked Register
CPSR_c : bit[07:00] Control field ( IRQ, FIQ, ARM/Thumb, Mode )
CPSR_x : bit[15:08] Extension field ( Reserved )
CPSR_s : bit[23:16] Status field ( Reserved )
CPSR_f : bit[31:24] Flag field ( NZCV, Reserved )
< Inline Assembly >
☞ http://recipes.egloos.com/v/5033184
# When is Assebly requried?
1. Direct control of Low Level : e.g. Co-processor
2. Direct control of ARM : e.g. PSR, Interrupt lock
3. Direct access to Register : e.g. R0, R1
# Syntax : __asm { ... }
A. Example 1
typedef unsigned int UINT32
work *get_StackPointer(int *stackPointer)
{
__asm
{
mov stackPointer, sp; // local variable can be accessed
}
}
>> tcc -o asm_File_Name.s -S c_File_Name.c
( Same as 'armcc --thumb -o asm_File_Name.s -S c_File_Name.c' )
; commandline
CODE16
AREA ||.text||, CODE, READONLY
get_StackPointer PROC
MOV r0,sp ; return value = r0
BX lr ; Return to where Linked Register indicates
ENDP
EXPORT get_StackPointer ; this symbol will be used outside
B. Example 2
#define PSR_IRQ_Mask 0x80
#define PSR_FIQ_Mask 0x40
#define PSR_INTR_Mask 0xC0
void Interrupt_lock(void)
{
__asm
{
mrs a1, CPSR ; mrs : Move to General register from PSR
orr a2, a1, #PSR_INTR_Mask ; Set I,F bit field
msr CPSR_c, a2 ; msr : Move to CPSR from General register
}
}
>> armcc -o asm_File_Name.s -S c_File_Name.c'
; command line
CODE32
AREA ||.text||, CODE, READONLY
Interrupt_lock PROC
MRS r0,CPSR
ORR r1,r0,#0xC0
MSR CPSR_c,r1 ; Move r1 bit[7:0] to CPSR bit[7:0]
MOV pc,lr ; Return to the previous pc
ENDP
EXPORT Interrupt_lock
cf. Before entering an inline function, used arguments are saved and the function is executed with arm registers
cf. bx lr (w/ Return values) / mv pc,lr (w/o Return values) ?
< Pipeline, Exception, ... >
- Exception : 'ARM 실행모드'로 전환 -> Exception vector로 PC branch -> CPSR is saved in SPSR
cf. ARM Pipe line 동작 : 복귀주소 조정 필요(Thumb mode: 2-byte unit) cf. Fetch -> Decode -> Execute
cf. PC(R15) : 'Fetch' instruction address
cf. SP(R13), LR(R14)
1. Reset Handler : Hardware handling during Power-on
2. Undefined Handler : during Decode -> PC(Program Count) points to 'Fetch' instruction ( +1 cycle : 4-Byte(ARM), 2-Byte(Thumb) )
-> For Exception : LR<=PC, SPSR<=CPSR
-> For Return : PC<=LR, CPSR<=SPSR
3. Prefetch Handler : during Fetch -> PC points to 'Fetch' instruction
-> For Exception : LR<='PC + 1-cycle'(by ARM), SPSR<=CPSR
-> For Return ( if Aborted Instruction cannot be executed ) : PC<=LR, CPSR<=SPSR
-> For Return ( if Aborted Instruction can be executed ) : PC<='LR - 1-cycle', CPSR<=SPSR
4. Data Abort Handler : during Execute -> PC points to 'Fetch' instruction ( +2 cycle )
-> For Exception : LR<='PC', SPSR<=CPSR
-> For Return ( if Aborted Instruction cannot be executed ) : PC<='LR - 1-cycle', CPSR<=SPSR
-> For Return ( if Aborted Instruction can be executed ) : PC<='LR - 2-cycle', CPSR<=SPSR
5. SWI ( Software Interrupt for SVC(Superviser) mode ) : during Decode -> PC points to 'Fetch' instruction ( +1 cycle, similar to Undefined case )
-> For Exception : LR<=PC, SPSR<=CPSR
-> For Return : PC<=LR, CPSR<=SPSR
6. IRQ/FIQ : during Execute -> PC points to 'Fetch' instruction ( +2 cycle )
-> For Exception : LR<='Next_op + 4'(by ARM), SPSR<=CPSR ... Curren instruction will be completed after interrupt handling
-> For Return : PC<='LR - 4', CPSR<=SPSR
cf. 'SUBS PC, LR, #4' works like PC<='LR-4', CPSR<=SPSR // Suffix 'S' makes CPSR restoration
cf. ldmfd sp!, {r0-r12, pc}^ works like {r0-r12,pc}<=*(sp), CPSR<=SPSR, 'sp' is updated // Suffix '^' makes CPSR restoration
< Exception Vector Table(EVT), Handler >
- Branch operation is used for Exception Handler
AREA INIT_VECTOR, CODE, READONLY
CODE32 ; ARM Mode during Excepion
ENTRY
B Reset_Handler ; 0x0
B Undefined_Handler ; 0x4
B SWI_Handler ; 0x8
...
BOOT_ROM 0x0
{
BOOT_RAM 0x0 0x4000
{
vectors.o (INIT_VECTOR, +FIRST) ; +FIRST -> 맨 앞에 위치 ( +LAST -> 맨 마지막 )
...
}
}
BOOT_ROM 0x0
{
BOOT_RAM 0xFFFF0000 0x4000 ; for High vector
{
vectors.o (INIT_VECTOR, +FIRST) ; +FIRST -> 맨 앞에 위치 ( +LAST -> 맨 마지막 )
...
}
}
// Another implementation
0x00 B Reset_Hanlder
0x04 ldr pc, 0x30
0x08 ldr pc, 0x40
...
0x30 dcd 0x1000 ; Undefined Handler
0x34 dcd 0x2000 ; SWI Handler
...
// Handler Usage
- Undefined : cowork with Co-processor
stmfd sp!, {r0-r12, r14} ; r14==LR, r13==sp, FD(Full Descending)
...
ldmfd sp!, {r0-r12, pc}^ ; r15==pc, !==update, ^==Restore CPSR with SPSR after calculation when PC is included
- Prefetch Abort : Debug Information logging
- Data Abort : e.g. Unaligned Data
sub lr, lr, #8 ; for Return to 2-cycle before, after handling
...
- SWI : Similar to Undefined
cf. Not 'real' interrupt, but for mode change to Supervisor(SVC) mode
cf. Usage : System Call(Kernel), Semi-hosting
cf. SWI {condition} -> jump to 0x8 SWI Handler
cf. SWI Instruction : bit[31:28] condition, bit[27:24]=0xF, bit[23:0] 24-bit intermediate value
-> Get 24-bit value :
LRD r0, [lr, #-4]
BIC r0, r0, 0xFF000000 ;; r0 = r0 & (~0xFF000000)
- FIQ, IRQ
sub lr, lr, #4 ; for Return to 1-cycle before, after handling
...
stmfd sp!, {r0-r12, r14} ; for IRQ
stmfd sp!, {r0-r7, r14} ; for FIQ
cf. FIQ Handler is in the Last of Vector Table -> Can be implemented without 'branch' operation
< Co-processor Assembly >
# Coprocessor Data Instruction
1. Co-processor register <-> Co-processor register
CDP {cond} coproc, #opcode1, CRd, CRn, CRm{, #opcode2}
2. Co-processor register <-> ARM register
MRC {cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2} ;; Move to ARM Register from Coprocessor
MCR {cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2} ;; Move to Coprocessor from ARM Register
cf. e.g. MCR p15, 0, r4, c2, c0, 0 ;; Move r4 to c2 of p15
cf. Rt must not be 'pc'
3. Co-processor register <-> Memory
LDC {L} {cond} coproc, CRd, [Rn] ;; Transfer Data from memory to Coprocessor
cf. Rn : is the register on which the memory address is based
STC {L} {cond} coproc, CRd, [Rn] ;; Transfer Data from Coprocessor to memory
< Bootloader, Memory budget(Map file) >
- ROM : Code, Const Data(RO), RW Data(초기값)
- RAM : RW Data, ZI
. NOR + PSRAM -> RO in NOR can be XIP
. NAND + SDRAM -> RO in NAND cannot be XIP -> shall be loaded into SDRAM
- Linker Symbol ro section
. RO : image$$RO$$Base, image$$RO$$Limit
. RW : image$$RW$$Base, image$$RW$$Limit
. ZI : image$$ZI$$Base, image$$ZI$$Limit
* Assembly Example : Write '0' in ZI section
r0, = |image$$ZI$$Base|
r1, = |image$$ZI$$Limit|
r2, = 0
begin
str r2, [r0], #4
cmp r0, r1
bne begin
* Scatter Loading 함수 vs. Linker Symbol
ROM_1 0x0000 ; Load$$ROM_1$$Base 0x0000
{
ROM_1 0x0000 ; Image$$ROM_1$$Base 0x0000
{ ; Image$$ROM_1$$Length 0x400
object1.o(+RO)
} ; Load$$DRAM$$Base 0x400
DRAM 0x18000 ; Image$$DRAM$$Base 0x18000
{
object1.o(+RW) ; Image$$DRAM$$Length 0x400
object2.o(+ZI) ; Image$$DRAM$$ZI$$Base 0x18400
} ; Image$$DRAM$$ZI$$Length 0x400
}
ROM_2 0x4000 ; Load$$ROM_2$$Base 0x4000
{
ROM_2 0x4000 ; Image$$ROM_2$$Base 0x4000
{ ; Image$$ROM_2$$Length 0x400
object2.o(+RO)
} ; Load$$SRAM$$Base 0x4400
SRAM 0x8000 ; Image$$SRAM$$Base 0x8000
{
object1.o(+RW) ; Image$$SRAM$$Length 0x400
object2.o(+ZI) ; Image$$SRAM$$ZI$$Base 0x8400
} ; Image$$SRAM$$ZI$$Length 0x400
}
* Symbol access in C-file
Step 1. Assign Symbol to Label in assembly file
Load__SRAM__Base DCD |Load$$SRAM$$Base|
Image__SRAM__Base DCD |Image$$SRAM$$Base|
Image__SRAM__Length DCD |Image$$SRAM$$Length|
Step 2. Define 'extern' in C-file
extern byte *Load__SRAM__Base
extern byte *Image__SRAM__Base
extern byte *Image__SRAM__Length
Step 3. Use the Label as a pointer
end_point = (dword *)((dword)Image_SRAM_Base + (dword)Image_SRAM_Length);
for( src=(dword *)Load__SRAM__Base, dst=(dword *)end_point; dst < end_point; src++, dst++)
{ *dst = *src }
< From Reset Handler to main(Entry Point) >
ARM Embedded System : Power on -> Reset handler ( SVC mode, pc=0x0 )
cf. ENTRY // B Reset_Handler ; 0x0
- HW handling for Reset
. IRQ/FIQ disable
. Clock setting for HW blocks
. MCU pin setting
. Disable HW blocks not related to boot up
. Memory setting : Memory size, etc.
- SW hanlding for Reset ( __main in C Library, Entry point of Kernel or Processor )
. Stack setting for each mode
. Copy RW, RO, ZI region to RAM
. Call 'main()' function by '__rt_entry'
- __rt_entry ( rt=Real Time, in C Library )
. Set up application stack and heap
. Initialize library functions
. Call top level constructors(C++)
. Jump to main() ( Entry point of User code )
. Exit from application
* To Use Reset Vector as Entry Point instead of __main of C Library
EXPORT __main // Reset Hanlder is placed in '__main' location
EXPORT _main
AREA INIT_VECTOR, CODE, READONLY
CODE32
__main
_main
ENTRY
B Reset_Handler
...
cf. '__rt_entry' is normally called at the end of Reset Handler to make C-library available
cf. If not want to use C-library, main() is to be called at the end of Reset Handler
cf. If __main is called at the end of Reset Handler, __main and __rt_entry are executed
< Scatter Loading & Bootup - '__user_initial_stackheap' >
By default, Compiler autonmously generates Symbols (e.g. Image$$ZI$$Base, Image$$ZI$$Limit) and use it to assign Stack and Heap region
* __value_in_regs : Qualifier which instructs the compiler to return a structure of up to four integer words in integer registers
* __user_initial_stackheap() returns the loacation of the initial stack and heap
. Default : the value of the symbol Image$$ZI$$Limit
. If the linker uses a scatter-loading description file (specified with the --scatter command-line option), __user_initial_stackheap() must be re-implemented
. Syntax
__value_in_regs struct __initial_stackheap __user_initial_stackheap(unsigned R0, unsigned SP, unsigned R2);
cf. __user_initial_stackheap() returns the:
heap base in r0
stack base in r1, that is, the highest address in the stack region
heap limit in r2
stack limit in r3, that is, the lowest address in the stack region.
cf. struct __initial_stackheap // Definition in rt_misc.h
{
unsigned heap_base, stack_base, heap_limit, stack_limit;
};
* Example
char _initial_heap[0x2000]
char _initial_stack[0x2000]
__value_in_regs struct __initial_stackheap __user_initial_stackheap(unsigned r0, unsigned SP, unsigned R2)
{
struct __initial_stackheap config;
config.heap_base = (unsigned)_initial_heap;
config.stack_base = (unsigned)_initial_stack + sizeof(_initial_stack) -1;
config.heap_limit = (unsigned)_initial_heap + sizeof(_initial_heap) -1;
config.stack_limit = (unsigned)_initial_stack;
return config;
}
cf. PC(r15): Program counter; LR(r14): Link register; SP(r13): Stack pointer; SL(r10): Stack limit; IP(r12): Intra-procedure scratch