ARM assembler overview
The latest ARM version of uLisp allows you to generate machine-code functions, integrated with Lisp, written in ARM Thumb code. It has the following features:
- You can create multiple named machine-code functions, limited only by the amount of code memory available.
- Machine-code functions are created with a defcode special form, which has a similar syntax to defun.
- You can include labels in your assembler listing simply by including them as symbols in the body of the defcode form. The defcode form creates these as local variables.
- The defcode form automatically does a two-pass assembly to resolve forward references, used in branches and memory references.
- The defcode form generates an assembler listing, showing the mnemonics and the machine-code generated from them.
- The machine-code functions are saved with save-image, and restored with load-image.
The assembler itself is written in Lisp to make it easy to extend it or add new instructions. For example, you could write assembler macros in Lisp. It will fit on most ARM boards, including SAMD21 and SAMD51 boards. The assembler uses only supports Thumb-1 instructions, and so is compatible with M0 ARM processors or higher.
Get the latest version of the assembler here: ARM assembler in uLisp.
To add it to uLisp: do Select All and Copy, Paste it into the field at the top of the Arduino IDE Serial Monitor window, and press Return. Or you could load it from an SD card.
References
For a summary of the RISC-V assembler instructions see ARM assembler instructions.
For some more complex examples see ARM assembler examples.
For an explanation of how the ARM version of the assembler works see ARM assembler written in Lisp.
For the RISC-V Instruction Set Manual see The RISC-V Instruction Set Manual on riscv.org.
The defcode form
The assembler uses a special defcode form to generate machine-code functions.
defcode special form
Syntax: (defcode name (parameters) form*)
The defcode form is similar in syntax to defun. It creates a named machine-code function from a series of 16-bit integers given in the body of the form. These are written into RAM, and can be executed by calling the function in the same way as a normal Lisp function.
For example:
(defcode mul13 (x) #x210d #x4348 #x4770)
creates a machine-code routine called mul13, with one parameter, consisting of three instructions which multiplies its single integer argument by 13. For example:
> (mul13 10) 130
If you specify the machine code instructions as constants, as in the above example, you don't need to load the ARM assembler.
Calling convention
Functions defined with defcode can take up to four parameters. These are passed to the machine-code routine in the registers r0 to r3 respectively. The symbols used for the four parameters can be used as synonyms for the corresponding register r0 to r3 in the body of the defcode form.
If a parameter is an integer its value is passed in the corresponding register; otherwise the address of the parameter is passed in the corresponding register. For examples showing how to access a list in a machine-code routine see ARM assembler examples - List examples.
The machine-code function should return the result back to uLisp in r0. This is returned as an integer.
Call-clobbered registers
The best registers to use in assembler functions are r0 to r3, r12, and r14 (lr) if you are not calling another subroutine. These are call clobbered; a function may use them without restoring the contents.
Call-saved registers
If you use r4 to r11 you must restore their original contents.
Assembler
Although you can supply machine-code instructions as hexadecimal op-codes, the assembler is more convenient as it allows you to write machine-code functions in ARM Thumb mnemonics. It is written in uLisp.
Assembler syntax
Where possible the syntax is very similar to ARM assembler syntax, with the following differences:
- The mnemonics are prefixed by '$' (because some mnemonics such as push and pop are already in use as Lisp functions).
- For simplicity the mnemonics don't include the 'S' suffix, sometimes used to indicate whether an ARM Thumb instruction affects the condition codes.
- Registers are represented as symbols, prefixed with a quote. Constants are just numbers.
- Lists of registers, as used in the $push and $pop mnemonics, are represented as a Lisp list.
Assembler instructions are just Lisp functions, so you can see the code they generate:
> ($mov 'r1 13) 8461
The assembler includes a function x16 to print a 16-bit value in hexadecimal, so you can see the result in hexadecimal by writing:
> (x16 ($mov 'r1 13)) #x210d
The following table shows typical ARM assembler formats, and the equivalent in this Lisp assembler:
Examples | ARM assembler | uLisp assembler |
Push and pop | push {r4, r5, r6, lr} | ($push '(r4 r5 r6 lr)) |
Registers | subs r1, r2, r3 | ($sub 'r1 'r2 'r3) |
Immediate | mov r2, #3 | ($mov 'r2 3) |
Load relative | ldr r0, [r3, #0] | ($ldr 'r0 '(r3 0)) |
Load in-line constant | ldr r0, label | ($ldr 'r0 label) |
Branch | bne label | ($bne label) |
Constant | .word 0x0f0f0f0f | ($word #x0f0f0f0f) |
Note that the order of the registers in the list supplied to $push and $pop is irrelevant; the registers are always pushed in the order highest number first to lowest last, and popped in the order lowest number first to highest last.
Simple example
Here's a simple example consisting of three ARM Thumb instructions that multiplies its parameter by 13 and returns the result:
(defcode mul13 (x) ($mov 'r1 13) ($mul 'r0 'r1) ($bx 'lr))
Evaluating this generates an assembler listing as follows:
0000 210d ($mov 'r1 13) 0002 4348 ($mul 'r0 'r1) 0004 4770 ($bx 'lr)
> (mul13 11) 143
The result is the number returned in the r0 register.
Note that functions written using defcode can't be relied upon to have a fixed position in memory and so should be position independent, and use only relative branches and memory references within the machine-code function.
Labels
You can include symbols in the body of the defcode form to create labels. The defcode assembler automatically creates these as local variables, and then does a two-pass assembly to resolve forward references. The assembler can then access these variables to calculate the offsets in branches and pc-relative addressing.
Note also that because uLisp requires comments starting with a semi-colon to be terminated by an open parenthesis, you can't put a comment immediately before a label. This is a limitation because the Arduino Serial Monitor removes all line break characters. You can use bracketing comments instead:
#| This is a comment |#
For example, here's a simple routine to calculate the Greatest Common Divisor of its two arguments, which uses two labels:
; Greatest Common Divisor (defcode gcd (x y) swap ($mov 'r2 'r1) ($mov 'r1 'r0) again ($mov 'r0 'r2) ($sub 'r2 'r2 'r1) ($blt swap) ($bne again) ($bx 'lr))
Evaluating this form generates the following assembler listing:
0000 swap 0000 000a ($mov 'r2 'r1) 0002 0001 ($mov 'r1 'r0) 0004 again 0004 0010 ($mov 'r0 'r2) 0006 1a52 ($sub 'r2 'r2 'r1) 0008 dbfa ($blt swap) 000a d1fb ($bne again) 000c 4770 ($bx 'lr)
For example, to find the GCD of 3287 and 3460:
> (gcd 3287 3460) 173
In-line constants
You can insert an in-line 32-bit constant with the $word function. This is often used in conjunction with the $ldr mnemonic to load a 32-bit constant into a register. The assembler automatically inserts a $nop mnemonic, if necessary, to align the constant on a four-byte boundary as required by the ARM processor.
The following example loads 1234567890 into r0 and returns it:
(defcode constant () ($ldr 'r0 const) ($bx 'lr) const ($word 1234567890))
The result:
> (constant) 1234567890
For more examples see ARM assembler examples.