asm6809.pl - a 6809 assembler written in Perl. by Ciaran Anscomb, 2010 A 3+ pass assembler. Kind of. Pass 1: Read in text, store macros while reading. Pass 2: Divide text into sections. Pass 3: Assemble each section in turn, expanding any macro calls. Repeat until addresses are stable. Possibly not fit for any purpose. You're free to use, copy, modify and redistribute so long as I don't get the blame for anything. SUMMARY + Most 'a09' features, including local labels and macros. + Always tries to use most efficient indexed mode. + Outputs to raw binary, Intel hex record, DragonDOS binary or CoCo binary. + Support multiple sections. - Slower and probably more memory intensive than 'a09', as it's in Perl. - No support for fancy relocatable 'modules'. - Check bytes in Intel hex record output not correct yet (XRoar ignores these anyway). RUNNING Usage: ./asm6809.pl [OPTION]... SOURCE-FILE... Assembles 6809 source code. -I PATH add to include path -B, --bin output to binary file (default) -H, --hex output to (currently malformed) Intel hex record file -D, --dragondos output to DragonDOS binary file -C, --coco output to CoCo segmented binary file -e, --exec=ADDR EXEC address (for output formats that support one) -o, --output=FILE set output filename -l, --listing=FILE create listing file -s, --symbols=FILE create symbol table -v, --verbose show what assembler is doing at each stage -q, --quiet suppress warnings --help show this help and exit If more than one SOURCE-FILE is specified, they are assembled as though they were all in one file. DIFFERENCES FROM A09 The syntax accepted is mostly the same as that accepted by the free 'a09' assembler (written in C). Differences are: - Command line switches completely different. - Labels should resolve whichever order they're specified in (if possible). - Indexed offsets should always be assembled to fastest possible form (a09 gets confused by symbols used as offsets). LOCAL LABELS If a label consists only of digits, it is considered a local label. These can be specified multiple times, and are referenced by their name followed by a 'B' (search backwards from current line) or 'F' (search forwards) character. For example: 0000 8E0400 scroll ldx #$0400 0003 EC8820 1 ldd 32,x 0006 ED81 std ,x++ 0008 8C05E0 cmpx #$05e0 000B 25F6 blo 1B 000D CC6060 ldd #$6060 0010 ED81 1 std ,x++ 0012 8C0600 cmpx #$0600 0015 25F9 blo 1B 0017 39 rts The '1' label occurs twice, but each reference to '1B' refers to the closest one searching backwards. A syntax error will be generated if a non-local label is duplicated. MACROS Here's how to define a simple shortcut macro: lsld macro lslb rola endm Once this macro is defined, issuing 'lsld' as an opcode inserts the appropriate code. The first line instructs the assembler to start creating a macro called 'lsld'. Subsequent lines are added to the macro until 'endm' is encountered. When a macro's name is encountered as an opcode later on, the macro lines are substituted. Arguments to macros are allowed. When expanding the macro, & is replaced with the th argument. For example: move_sprite macro ldb #&1 ldy #32 * &2 jsr do_move_sprite endm Can then be used like this: move_sprite 2,3 Which might expand to: move_sprite 2,3 5000 C602 ldb #2 5002 108E0060 ldy #32 * 3 5006 BD4000 jsr do_move_sprite Macro arguments can be quoted, e.g. for passing a string like "1,y". SECTIONS Code can be placed into named sections with the 'section' opcode. This can make breaking souce into multiple input files more comfortable. Example: section zeropage org $0000 tmp1 rmb 1 section code org $1000 clr tmp1 ; following could appear in an included file section zeropage tmp2 rmb 1 section code clr tmp2 Will assemble to: 0000 tmp1 rmb 1 0001 tmp2 rmb 1 1000 7F0000 clr tmp1 1003 7F0001 clr tmp2 EXPRESSIONS Almost any number or address is actually parsed as an expression. The simplest expressions are a number or a label, but arithmetic can be included. The parser checks for valid values and operators and constructs an expression to be evaluated by Perl. Note that when expanding a macro, simple textual substitutions are performed, and expressions will be evaluated later. Allowed values: +, - Unary plus or minus [0-9]+[FB] Address of local label label Value of label (equate or address) [0-9]+ Decimal number $[0-9a-f]+ Hexadecimal number %[01]+ Binary number @[0-7]+ Octal number * Address of current instruction Allowed operators: + - * / & | ^ << >> DIRECT AND EXTENDED ADDRESSING If a 'setdp' directive is included, subsequent addresses are checked to see if they fall within that page and if so, direct addressing is used. If it falls outside the page, or if no 'setdp' directive has been given, extended addressing is used. Direct addressing can be forced by prefixing the address with a '<' character. Similarly, extended addressing can be forced by prefixing a '>' character. Examples: org $4000 4000 value rmb 1 4001 B64000 lda value 4004 9600 lda value 400B B64100 lda $4100 INDEXED ADDRESSING By default, the fastest instruction is used to encode indexed instructions. Consider the instruction "lda offset,x". The following code would be generated, depending on the value of 'offset': 0000 offset equ 0 0000 A684 lda offset,x 000C offset equ 12 0000 A60C lda offset,x 0064 offset equ 100 0000 A68864 lda offset,x 00C8 offset equ 200 0000 A68900C8 lda offset,x 8-bit offsets can be forced with a '>' character before them: 0000 offset equ 0 0000 A68800 lda >offset,x 0064 offset equ 100 0000 A68864 lda >offset,x 16-bit offsets can be forced with two '>' characters: 0064 offset equ 100 0000 A6890064 lda >>offset,x Currently there is no way force a 5-bit offset when the offset is zero. TFR AND EXG Mismatched register sizes and numbers (not expressions) instead of register names are allowed by the assembler, but warnings will be generated. Example: 0000 1F01 tfr 0,x 0002 1F19 tfr x,b