As85

From Kodewerx
Revision as of 01:51, 28 February 2009 by Gayswee (talk | contribs) (Development)
Jump to: navigation, search

as85 is a simple assembler for the Sharp sm8521; the same microcontroller used in the Tiger Game.com. Game.com was released in 1998 and had only a few games ever made for it. It also has not had any homebrew games made for it. as85 is an attempt to build an assembler that will help hackers write homebrew code that will run on Game.com hardware.

Documentation on the Game.com hardware is available at Game.commies.

Download

The source code is available at http://hg.kodewerx.org/as85/

Current Progress

The current state of as85 is "almost usable, but not quite there yet." A number of bugs exist which need to be fixed before it can be used as a development tool:

  • Bug 2: Add support for jump/call/branch instructions
  • Bug 3: Output object code
  • Bug 4: Write a linker

I've also filed a bug about giving the project a better name [1].

Usage

The program takes one argument; the file name of an sm8521 assembly file. The files in the /examples directory are a good place to start.

  • test.asm is an example of what the beginning of what a Game.com program might look like; it contains a header, and some [random] instructions to give you an idea.
  • test2.asm is for testing the integrity of the parser with complex strings.
  • test3.asm lists all possible sm8521 instructions; for verifying the output binary is correct.

Example Output

The following command:

 $ ./as85 ../examples/test3.asm

Produces the following output:

 00 01           clr     R1
 01 01           neg     R1
 02 01           com     R1
 03 01           rr      R1
 04 01           rl      R1
 05 01           rrc     R1
 06 01           rlc     R1
 07 01           srl     R1
 08 01           inc     R1
 09 01           dec     R1
 0A 01           sra     R1
 0B 01           sll     R1
 0C 01           da      R1
 0D 01           swap    R1
 0E 01           push    R1
 0F 01           pop     R1
 10 0A           cmp     r1, r2
 11 0A           add     r1, r2
 12 0A           sub     r1, r2
 13 0A           adc     r1, r2
 14 0A           sbc     r1, r2
 15 0A           and     r1, r2
 16 0A           or      r1, r2
 17 0A           xor     r1, r2
 18 02           incw    RR2
 19 02           decw    RR2
 1A 08           clr     @r1
 1A 09           neg     @r1
 1A 0A           com     @r1
 1A 0B           rr      @r1
 1A 0C           rl      @r1
 1A 0D           rrc     @r1
 1A 0E           rlc     @r1
 1A 0F           srl     @r1
 1B 08           inc     @r1
 1B 09           dec     @r1
 1B 0A           sra     @r1
 1B 0B           sll     @r1
 1B 0C           da      @r1
 1B 0D           swap    @r1
 1B 0E           push    @r1
 1B 0F           pop     @r1
 1C 07 24        bclr    0xFF24, #7
 1C 0F 94        bclr    0x94(r1), #7
 1D 07 24        bset    0xFF24, #7
 1D 0F 94        bset    0x94(r1), #7
 1E 02           pushw   RR2
 1F 02           popw    RR2
 20 0A           cmp     r1, @r2
 20 4A           cmp     r1, (r2)+
 20 88 94        cmp     r1, @0x94
 20 8A 94        cmp     r1, 0x94(r2)
 20 CA           cmp     r1, -(r2)
 21 0A           add     r1, @r2
 21 4A           add     r1, (r2)+
 21 88 94        add     r1, @0x94
 21 8A 94        add     r1, 0x94(r2)
 21 CA           add     r1, -(r2)
 22 0A           sub     r1, @r2
 22 4A           sub     r1, (r2)+
 22 88 94        sub     r1, @0x94
 22 8A 94        sub     r1, 0x94(r2)
 22 CA           sub     r1, -(r2)
 23 0A           adc     r1, @r2
 23 4A           adc     r1, (r2)+
 23 88 94        adc     r1, @0x94
 23 8A 94        adc     r1, 0x94(r2)
 23 CA           adc     r1, -(r2)
 24 0A           sbc     r1, @r2
 24 4A           sbc     r1, (r2)+
 24 88 94        sbc     r1, @0x94
 24 8A 94        sbc     r1, 0x94(r2)
 24 CA           sbc     r1, -(r2)
 25 0A           and     r1, @r2
 25 4A           and     r1, (r2)+
 25 88 94        and     r1, @0x94
 25 8A 94        and     r1, 0x94(r2)
 25 CA           and     r1, -(r2)
 26 0A           or      r1, @r2
 26 4A           or      r1, (r2)+
 26 88 94        or      r1, @0x94
 26 8A 94        or      r1, 0x94(r2)
 26 CA           or      r1, -(r2)
 27 0A           xor     r1, @r2
 27 4A           xor     r1, (r2)+
 27 88 94        xor     r1, @0x94
 27 8A 94        xor     r1, 0x94(r2)
 27 CA           xor     r1, -(r2)
 28 0A           mov     r1, @r2
 28 4A           mov     r1, (r2)+
 28 88 94        mov     r1, @0x94
 28 8A 94        mov     r1, 0x94(r2)
 28 CA           mov     r1, -(r2)
 29 11           mov     @r1, r2
 29 51           mov     (r1)+, r2
 29 90 94        mov     @0x94, r2
 29 91 94        mov     0x94(r1), r2
 29 D1           mov     -(r1), r2
 2C 02           exts    RR2
 2E 94           mov     ps0, #0x94
 2F 94 01        btst    R1, #0x94
 30 09           cmp     r1, @rr2
 30 49           cmp     r1, (rr2)+
 30 88 24 94     cmp     r1, @0x9424
 30 89 24 94     cmp     r1, 0x9424(rr2)
 30 C9           cmp     r1, -(rr2)
 31 09           add     r1, @rr2
 31 49           add     r1, (rr2)+
 31 88 24 94     add     r1, @0x9424
 31 89 24 94     add     r1, 0x9424(rr2)
 31 C9           add     r1, -(rr2)
 32 09           sub     r1, @rr2
 32 49           sub     r1, (rr2)+
 32 88 24 94     sub     r1, @0x9424
 32 89 24 94     sub     r1, 0x9424(rr2)
 32 C9           sub     r1, -(rr2)
 33 09           adc     r1, @rr2
 33 49           adc     r1, (rr2)+
 33 88 24 94     adc     r1, @0x9424
 33 89 24 94     adc     r1, 0x9424(rr2)
 33 C9           adc     r1, -(rr2)
 34 09           sbc     r1, @rr2
 34 49           sbc     r1, (rr2)+
 34 88 24 94     sbc     r1, @0x9424
 34 89 24 94     sbc     r1, 0x9424(rr2)
 34 C9           sbc     r1, -(rr2)
 35 09           and     r1, @rr2
 35 49           and     r1, (rr2)+
 35 88 24 94     and     r1, @0x9424
 35 89 24 94     and     r1, 0x9424(rr2)
 35 C9           and     r1, -(rr2)
 36 09           or      r1, @rr2
 36 49           or      r1, (rr2)+
 36 88 24 94     or      r1, @0x9424
 36 89 24 94     or      r1, 0x9424(rr2)
 36 C9           or      r1, -(rr2)
 37 09           xor     r1, @rr2
 37 49           xor     r1, (rr2)+
 37 88 24 94     xor     r1, @0x9424
 37 89 24 94     xor     r1, 0x9424(rr2)
 37 C9           xor     r1, -(rr2)
 38 09           mov     r1, @rr2
 38 49           mov     r1, (rr2)+
 38 88 24 94     mov     r1, @0x9424
 38 89 24 94     mov     r1, 0x9424(rr2)
 38 C9           mov     r1, -(rr2)
 39 21           mov     @rr2, r4
 39 61           mov     (rr2)+, r4
 39 A0 24 94     mov     @0x9424, r4
 39 A1 24 94     mov     0x9424(rr2), r4
 39 E1           mov     -(rr2), r4
 3A 14           movw    rr2, @rr4
 3A 54           movw    rr2, (rr4)+
 3A 90 24 94     movw    rr2, @0x9424
 3A 94 24 94     movw    rr2, 0x9424(rr4)
 3A D4           movw    rr2, -(rr4)
 3B 22           movw    @rr2, rr4
 3B 62           movw    (rr2)+, rr4
 3B A0 24 94     movw    @0x9424, rr4
 3B A2 24 94     movw    0x9424(rr2), rr4
 3B E2           movw    -(rr2), rr4
 3C 14           movw    rr2, rr4
 40 02 01        cmp     R1, R2
 41 02 01        add     R1, R2
 42 02 01        sub     R1, R2
 43 02 01        adc     R1, R2
 44 02 01        sbc     R1, R2
 45 02 01        and     R1, R2
 46 02 01        or      R1, R2
 47 02 01        xor     R1, R2
 48 02 01        mov     R1, R2
 4A 04 02        movw    RR2, RR4
 4B 02 24 94     movw    RR2, #0x9424
 4C 04 02        mult    RR2, R4
 4D 94 02        mult    RR2, #0x94
 4E 07 01        bmov    bf, R1, #7
 4E 47 01        bmov    R1, #7, bf
 4F 07 01        bcmp    bf, R1, #7
 4F 47 01        band    bf, R1, #7
 4F 87 01        bor     bf, R1, #7
 4F C7 01        bxor    bf, R1, #7
 50 94 01        cmp     R1, #0x94
 51 94 01        add     R1, #0x94
 52 94 01        sub     R1, #0x94
 53 94 01        adc     R1, #0x94
 54 94 01        sbc     R1, #0x94
 55 94 01        and     R1, #0x94
 56 94 01        or      R1, #0x94
 57 94 01        xor     R1, #0x94
 58 94 01        mov     R1, #0x94
 5C 04 02        div     RR2, RR4
 5D 94 02        div     RR2, #0x94
 5E 01 94 02     movm    R1, #0x94, R2
 5F 01 94 24     movm    R1, #0x94, #0x24
 60 04 02        cmpw    RR2, RR4
 61 04 02        addw    RR2, RR4
 62 04 02        subw    RR2, RR4
 63 04 02        adcw    RR2, RR4
 64 04 02        sbcw    RR2, RR4
 65 04 02        andw    RR2, RR4
 66 04 02        orw     RR2, RR4
 67 04 02        xorw    RR2, RR4
 68 02 24 94     cmpw    RR2, #0x9424
 69 02 24 94     addw    RR2, #0x9424
 6A 02 24 94     subw    RR2, #0x9424
 6B 02 24 94     adcw    RR2, #0x9424
 6C 02 24 94     sbcw    RR2, #0x9424
 6D 02 24 94     andw    RR2, #0x9424
 6E 02 24 94     orw     RR2, #0x9424
 6F 02 24 94     xorw    RR2, #0x9424
 78 24 94        movw    rr0, #0x9424
 79 24 94        movw    rr8, #0x9424
 7A 24 94        movw    rr2, #0x9424
 7B 24 94        movw    rr10, #0x9424
 7C 24 94        movw    rr4, #0x9424
 7D 24 94        movw    rr12, #0x9424
 7E 24 94        movw    rr6, #0x9424
 7F 24 94        movw    rr14, #0x9424
 A0 01           bclr    R1, #0
 A1 01           bclr    R1, #1
 A2 01           bclr    R1, #2
 A3 01           bclr    R1, #3
 A4 01           bclr    R1, #4
 A5 01           bclr    R1, #5
 A6 01           bclr    R1, #6
 A7 01           bclr    R1, #7
 A8 01           bset    R1, #0
 A9 01           bset    R1, #1
 AA 01           bset    R1, #2
 AB 01           bset    R1, #3
 AC 01           bset    R1, #4
 AD 01           bset    R1, #5
 AE 01           bset    R1, #6
 AF 01           bset    R1, #7
 B0 01           mov     r0, R1
 B1 01           mov     r1, R1
 B2 01           mov     r2, R1
 B3 01           mov     r3, R1
 B4 01           mov     r4, R1
 B5 01           mov     r5, R1
 B6 01           mov     r6, R1
 B7 01           mov     r7, R1
 B8 01           mov     R1, r0
 B9 01           mov     R1, r1
 BA 01           mov     R1, r2
 BB 01           mov     R1, r3
 BC 01           mov     R1, r4
 BD 01           mov     R1, r5
 BE 01           mov     R1, r6
 BF 01           mov     R1, r7
 C0 94           mov     r0, #0x94
 C1 94           mov     r1, #0x94
 C2 94           mov     r2, #0x94
 C3 94           mov     r3, #0x94
 C4 94           mov     r4, #0x94
 C5 94           mov     r5, #0x94
 C6 94           mov     r6, #0x94
 C7 94           mov     r7, #0x94
 C8 94           mov     ie0, #0x94
 C9 94           mov     ie1, #0x94
 CA 94           mov     ir0, #0x94
 CB 94           mov     ir1, #0x94
 CC 94           mov     p0, #0x94
 CD 94           mov     p1, #0x94
 CE 94           mov     p2, #0x94
 CF 94           mov     p3, #0x94
 F0              stop
 F1              halt
 F8              ret
 F9              iret
 FA              clrc
 FB              comc
 FC              setc
 FD              ei
 FE              di
 FF              nop
 assemble() returned 0: OK
 Clean up...

Note: The output binary has not been verified to be accurate.

Development

As85 is a fairly simple assembler. It doesn't use any sort of 'compiler-compiler' for lexical analysis. In fact, its lexical analysis is very specific to the sm8521 CPU.

The main loop (assemble() function, defined in asm.c) does the input text parsing inline. (This should probably be moved out to a new source file.) After splitting a line into two pieces; op[0] containing the instruction, and op[1] containing its operands; a string comparison against op[0] is done over all supported instructions. If a match is found, the operands string is passed to a dynamically chosen function (from a function pointer table, indexed by the matched instruction). This function performs the lexical analysis required to decide which instruction we are trying to assemble.

Since the sm8521 is a CISC machine, its instruction set contains a number of different ways to assemble the same instruction mnemonic. For example, several different addressing modes for the mov instruction are shown above in the test3.asm output. The lexical analysis is the voodoo which picks the proper addressing mode and byte codes.

The lexical analyzing functions are defined in inst.c (following the function pointer table mentioned previously). The function handling the current instruction will test the operands string against a series of lexical patterns with the chk_pattern() function (defined in asm.c, although this should probably be moved).

chk_pattern() uses a scanf-like formatting string, rather than a regular expression, which is more common in lexical analysis. Documentation for the formatting string can be found in inst.h. With just a few pattern primitives, any of the sm8521's addressing modes can be matched, with the matching primitives output as part of an array. It makes good use of the format scanners defined in scan.c.

If chk_pattern() manages to find a match, the matching data may be further checked for validity on a per-context basis. Finally, the full instruction byte codes will be put together and returned to the assemble() loop. This is where the object code would be built. Currently the only thing that happens now is dumping the assembled instruction to stdout, in a debug build. [2]

Optimization Concerns

Some optimization could be done within this lexical analysis process. The first improvement would be replacing the linear string comparison with a binary tree. The second thing that would help in this immediate area would be replacing the string comparison itself with a hash comparison. The hash algorithm would have to be suitably small and fast enough to make much of a difference.

The next big optimization would be rearranging the chk_pattern() calls within each lexical analyzer to check the most likely patterns first. The best way to choose the best order is static analysis of sm8521 source code, which is obviously in very short supply. Accurate disassemblies of commercial Game.com games would be helpful to this end, however.