asm6809.pl - a 6800/6801/6803/6809/6309 assembler written in Perl. by Ciaran Anscomb, 2010-2013 A 3+ pass assembler. Kind of. Pass 1: Read in text, store macros while reading. Pass 2: Divide text into sections. Pass 3: Assemble each section in turn, expanding any macro calls. Repeat until addresses are stable. Possibly not fit for any purpose. You're free to use, copy, modify and redistribute so long as I don't get the blame for anything. SUMMARY + Most 'a09' features, including local labels and macros. + Always tries to use most efficient indexed mode. + Outputs to raw binary, Intel hex record, DragonDOS binary or CoCo binary. + Support multiple sections. - Slower and probably more memory intensive than 'a09', as it's in Perl. - No support for fancy relocatable 'modules'. - Check bytes in Intel hex record output not correct yet (XRoar ignores these anyway). RUNNING Usage: ./asm6809.pl [OPTION]... SOURCE-FILE... Assembles 6809 source code. -I PATH add to include path -B, --bin output to binary file (default) -H, --hex output to (currently malformed) Intel hex record file -D, --dragondos output to DragonDOS binary file -C, --coco output to CoCo segmented binary file -e, --exec=ADDR EXEC address (for output formats that support one) -0, --6800 use 6800 ISA -1, --6801, --6803 use 6801 ISA (6803 is identical) -8, -9, --6809 use 6809 ISA (default) -3, --6309 use 6309 ISA (6809 with extensions) -x, --extended permit extended syntax -o, --output=FILE set output filename -l, --listing=FILE create listing file -s, --symbols=FILE create symbol table -v, --verbose show what assembler is doing at each stage -q, --quiet suppress warnings --help show this help and exit If more than one SOURCE-FILE is specified, they are assembled as though they were all in one file. DIFFERENCES FROM OTHER ASSEMBLERS The main difference from other assemblers is that this one is horrifically slow, because it's written in Perl. It also doesn't support any of the things that allow it to be used with OS-9 like special link formats, etc. Comments MUST be introduced with a ';' character: the "official" Motorola syntax allows comments to immediately follow any operands, but this does not easily permit whitespace to occur within expressions, a feature I personally appreciate. For similar reasons, the '*' character cannot be used to introduce comments, except full-line comments, as it might be mistaken for "current PC" or "multiply". So long as it can be resolved in up to 10 passes, a label may be forward referenced. In 6809 indexed addressing, the offsets should default to being assembled to the fastest possible form. If you want a fast, capable assembler that follows the official syntax, you'd be better off giving this one a miss. Check out William Astle's "LWTOOLS" here, instead: http://lwtools.projects.l-w.ca/ LOCAL LABELS If a label consists only of digits, it is considered a local label. These can be specified multiple times, and are referenced by their name followed by a 'B' (search backwards from current line) or 'F' (search forwards) character. For example: 0000 8E0400 scroll ldx #$0400 0003 EC8820 1 ldd 32,x 0006 ED81 std ,x++ 0008 8C05E0 cmpx #$05e0 000B 25F6 blo 1B 000D CC6060 ldd #$6060 0010 ED81 1 std ,x++ 0012 8C0600 cmpx #$0600 0015 25F9 blo 1B 0017 39 rts The '1' label occurs twice, but each reference to '1B' refers to the closest one searching backwards. A syntax error will be generated if a non-local label is duplicated. MACROS Here's how to define a simple shortcut macro: lsld macro lslb rola endm Once this macro is defined, issuing 'lsld' as an opcode inserts the appropriate code. The first line instructs the assembler to start creating a macro called 'lsld'. Subsequent lines are added to the macro until 'endm' is encountered. When a macro's name is encountered as an opcode later on, the macro lines are substituted. Arguments to macros are allowed. When expanding the macro, & is replaced with the th argument. For example: move_sprite macro ldb #&1 ldy #32 * &2 jsr do_move_sprite endm Can then be used like this: move_sprite 2,3 Which might expand to: move_sprite 2,3 5000 C602 ldb #2 5002 108E0060 ldy #32 * 3 5006 BD4000 jsr do_move_sprite Macro arguments can be quoted, e.g. for passing a string like "1,y". SECTIONS Code can be placed into named sections with the 'section' opcode. This can make breaking source into multiple input files more comfortable. Example: section zeropage org $0000 tmp1 rmb 1 section code org $1000 clr tmp1 ; following could appear in an included file section zeropage tmp2 rmb 1 section code clr tmp2 Will assemble to: 0000 tmp1 rmb 1 0001 tmp2 rmb 1 1000 7F0000 clr tmp1 1003 7F0001 clr tmp2 EXPRESSIONS Almost any number or address is actually parsed as an expression. The simplest expressions are a number or a label, but arithmetic can be included. The parser checks for valid values and operators and constructs an expression to be evaluated by Perl. Note that when expanding a macro, simple textual substitutions are performed, and expressions will be evaluated later. Allowed values: +, - Unary plus or minus [0-9]+[FB] Address of local label label Value of label (equate or address) [0-9]+ Decimal number $[0-9a-f]+ Hexadecimal number %[01]+ Binary number @[0-7]+ Octal number * Address of current instruction Allowed operators: + - * / & | ^ << >> DIRECT AND EXTENDED ADDRESSING If a 'setdp' psuedo-op is included, subsequent addresses are checked to see if they fall within that page and if so, direct addressing is used. If it falls outside the page, or if no 'setdp' psuedo-op has been given, extended addressing is used. Direct addressing can be forced by prefixing the address with a '<' character. Similarly, extended addressing can be forced by prefixing a '>' character. Examples: org $4000 4000 value rmb 1 4001 B64000 lda value 4004 9600 lda value 400B B64100 lda $4100 In the 6800 ISA family, "setdp" is not recognised, but the "direct page" is assumed by default to be zero rather than undefined. INDEXED ADDRESSING By default, the fastest instruction is used to encode indexed instructions. Consider the instruction "lda offset,x". The following code would be generated, depending on the value of 'offset': 0000 offset equ 0 0000 A684 lda offset,x 000C offset equ 12 0000 A60C lda offset,x 0064 offset equ 100 0000 A68864 lda offset,x 00C8 offset equ 200 0000 A68900C8 lda offset,x 8-bit offsets can be forced with a '>' character before them: 0000 offset equ 0 0000 A68800 lda >offset,x 0064 offset equ 100 0000 A68864 lda >offset,x 16-bit offsets can be forced with two '>' characters: 0064 offset equ 100 0000 A6890064 lda >>offset,x A 5-bit zero offset can be coerced with a leading '<': 0000 3084 leax 0,x 0002 3000 leax <0,x TFR AND EXG Mismatched register sizes and numbers (not expressions) instead of register names are allowed by the assembler, but warnings will be generated. Example: 0000 1F01 tfr 0,x 0002 1F19 tfr x,b This behaviour is changed when using the 6309 ISA, as the behaviour of mixed-size operations is assumed to be defined. DATA PSEUDO-OPS The pseudo-ops "rmb", "fcb", "fdb" and "fcc" should all work as expected. Two others are included: "rzb " inserts bytes of $00. "fill ," inserts bytes of . "fill" is borrowed from lwasm. EXTENDED SYNTAX The "-x" option extends the syntax a little. The following pseudo-ops are added that allow generation of postbyte values: fa format address fr format inter-register postbyte frbm format register bit with memory postbytes Examples: 0000 40 fa <$0040 0001 0040 fa >$0040 0003 86 fa a,x 0004 89 fr a,b 0005 5640 frbm a,2,6,$40 The 'fa' pseudo-op is probably only useful in the indexed form, but the other addressing modes are considered anyway.