Difference between revisions of "As85"

From Kodewerx
Jump to: navigation, search
m (Fix hgkw links)
(Download: Update source link)
Line 5: Line 5:
 
==Download==
 
==Download==
  
The source code is available at http://hg.kodewerx.org/as85/
+
The source code is available at http://git.kodewerx.org/as85/src/
  
 
==Current Progress==
 
==Current Progress==

Revision as of 00:08, 30 December 2012

as85 is a simple assembler for the Sharp sm8521; the same microcontroller used in the Tiger Game.com. Game.com was released in 1998 and had only a few games ever made for it. It also has not had any homebrew games made for it. as85 is an attempt to build an assembler that will help hackers write homebrew code that will run on Game.com hardware.

Documentation on the Game.com hardware is available at Game.commies.

Download

The source code is available at http://git.kodewerx.org/as85/src/

Current Progress

The current state of as85 is "almost usable, but not quite there yet." A number of bugs exist which need to be fixed before it can be used as a development tool:

  • Bug 2: Add support for jump/call/branch instructions
  • Bug 3: Output object code
  • Bug 4: Write a linker

I've also filed a bug about giving the project a better name [1].

Usage

The program takes one argument; the file name of an sm8521 assembly file. The files in the /examples directory are a good place to start.

  • test.asm is an example of what the beginning of a Game.com program might look like; it contains a header, and some [random] instructions to give you an idea.
  • test2.asm is for testing the integrity of the parser with complex strings.
  • test3.asm lists all possible sm8521 instructions; for verifying the output binary is correct.

Example Output

The following command:

$ ./as85 ../examples/test3.asm

Produces the following output:

00 01           clr     R1
01 01           neg     R1
02 01           com     R1
03 01           rr      R1
04 01           rl      R1
05 01           rrc     R1
06 01           rlc     R1
07 01           srl     R1
08 01           inc     R1
09 01           dec     R1
0A 01           sra     R1
0B 01           sll     R1
0C 01           da      R1
0D 01           swap    R1
0E 01           push    R1
0F 01           pop     R1
10 0A           cmp     r1, r2
11 0A           add     r1, r2
12 0A           sub     r1, r2
13 0A           adc     r1, r2
14 0A           sbc     r1, r2
15 0A           and     r1, r2
16 0A           or      r1, r2
17 0A           xor     r1, r2
18 02           incw    RR2
19 02           decw    RR2
1A 08           clr     @r1
1A 09           neg     @r1
1A 0A           com     @r1
1A 0B           rr      @r1
1A 0C           rl      @r1
1A 0D           rrc     @r1
1A 0E           rlc     @r1
1A 0F           srl     @r1
1B 08           inc     @r1
1B 09           dec     @r1
1B 0A           sra     @r1
1B 0B           sll     @r1
1B 0C           da      @r1
1B 0D           swap    @r1
1B 0E           push    @r1
1B 0F           pop     @r1
1C 07 24        bclr    0xFF24, #7
1C 0F 94        bclr    0x94(r1), #7
1D 07 24        bset    0xFF24, #7
1D 0F 94        bset    0x94(r1), #7
1E 02           pushw   RR2
1F 02           popw    RR2
20 0A           cmp     r1, @r2
20 4A           cmp     r1, (r2)+
20 88 94        cmp     r1, @0x94
20 8A 94        cmp     r1, 0x94(r2)
20 CA           cmp     r1, -(r2)
21 0A           add     r1, @r2
21 4A           add     r1, (r2)+
21 88 94        add     r1, @0x94
21 8A 94        add     r1, 0x94(r2)
21 CA           add     r1, -(r2)
22 0A           sub     r1, @r2
22 4A           sub     r1, (r2)+
22 88 94        sub     r1, @0x94
22 8A 94        sub     r1, 0x94(r2)
22 CA           sub     r1, -(r2)
23 0A           adc     r1, @r2
23 4A           adc     r1, (r2)+
23 88 94        adc     r1, @0x94
23 8A 94        adc     r1, 0x94(r2)
23 CA           adc     r1, -(r2)
24 0A           sbc     r1, @r2
24 4A           sbc     r1, (r2)+
24 88 94        sbc     r1, @0x94
24 8A 94        sbc     r1, 0x94(r2)
24 CA           sbc     r1, -(r2)
25 0A           and     r1, @r2
25 4A           and     r1, (r2)+
25 88 94        and     r1, @0x94
25 8A 94        and     r1, 0x94(r2)
25 CA           and     r1, -(r2)
26 0A           or      r1, @r2
26 4A           or      r1, (r2)+
26 88 94        or      r1, @0x94
26 8A 94        or      r1, 0x94(r2)
26 CA           or      r1, -(r2)
27 0A           xor     r1, @r2
27 4A           xor     r1, (r2)+
27 88 94        xor     r1, @0x94
27 8A 94        xor     r1, 0x94(r2)
27 CA           xor     r1, -(r2)
28 0A           mov     r1, @r2
28 4A           mov     r1, (r2)+
28 88 94        mov     r1, @0x94
28 8A 94        mov     r1, 0x94(r2)
28 CA           mov     r1, -(r2)
29 11           mov     @r1, r2
29 51           mov     (r1)+, r2
29 90 94        mov     @0x94, r2
29 91 94        mov     0x94(r1), r2
29 D1           mov     -(r1), r2
2C 02           exts    RR2
2E 94           mov     ps0, #0x94
2F 94 01        btst    R1, #0x94
30 09           cmp     r1, @rr2
30 49           cmp     r1, (rr2)+
30 88 24 94     cmp     r1, @0x9424
30 89 24 94     cmp     r1, 0x9424(rr2)
30 C9           cmp     r1, -(rr2)
31 09           add     r1, @rr2
31 49           add     r1, (rr2)+
31 88 24 94     add     r1, @0x9424
31 89 24 94     add     r1, 0x9424(rr2)
31 C9           add     r1, -(rr2)
32 09           sub     r1, @rr2
32 49           sub     r1, (rr2)+
32 88 24 94     sub     r1, @0x9424
32 89 24 94     sub     r1, 0x9424(rr2)
32 C9           sub     r1, -(rr2)
33 09           adc     r1, @rr2
33 49           adc     r1, (rr2)+
33 88 24 94     adc     r1, @0x9424
33 89 24 94     adc     r1, 0x9424(rr2)
33 C9           adc     r1, -(rr2)
34 09           sbc     r1, @rr2
34 49           sbc     r1, (rr2)+
34 88 24 94     sbc     r1, @0x9424
34 89 24 94     sbc     r1, 0x9424(rr2)
34 C9           sbc     r1, -(rr2)
35 09           and     r1, @rr2
35 49           and     r1, (rr2)+
35 88 24 94     and     r1, @0x9424
35 89 24 94     and     r1, 0x9424(rr2)
35 C9           and     r1, -(rr2)
36 09           or      r1, @rr2
36 49           or      r1, (rr2)+
36 88 24 94     or      r1, @0x9424
36 89 24 94     or      r1, 0x9424(rr2)
36 C9           or      r1, -(rr2)
37 09           xor     r1, @rr2
37 49           xor     r1, (rr2)+
37 88 24 94     xor     r1, @0x9424
37 89 24 94     xor     r1, 0x9424(rr2)
37 C9           xor     r1, -(rr2)
38 09           mov     r1, @rr2
38 49           mov     r1, (rr2)+
38 88 24 94     mov     r1, @0x9424
38 89 24 94     mov     r1, 0x9424(rr2)
38 C9           mov     r1, -(rr2)
39 21           mov     @rr2, r4
39 61           mov     (rr2)+, r4
39 A0 24 94     mov     @0x9424, r4
39 A1 24 94     mov     0x9424(rr2), r4
39 E1           mov     -(rr2), r4
3A 14           movw    rr2, @rr4
3A 54           movw    rr2, (rr4)+
3A 90 24 94     movw    rr2, @0x9424
3A 94 24 94     movw    rr2, 0x9424(rr4)
3A D4           movw    rr2, -(rr4)
3B 22           movw    @rr2, rr4
3B 62           movw    (rr2)+, rr4
3B A0 24 94     movw    @0x9424, rr4
3B A2 24 94     movw    0x9424(rr2), rr4
3B E2           movw    -(rr2), rr4
3C 14           movw    rr2, rr4
40 02 01        cmp     R1, R2
41 02 01        add     R1, R2
42 02 01        sub     R1, R2
43 02 01        adc     R1, R2
44 02 01        sbc     R1, R2
45 02 01        and     R1, R2
46 02 01        or      R1, R2
47 02 01        xor     R1, R2
48 02 01        mov     R1, R2
4A 04 02        movw    RR2, RR4
4B 02 24 94     movw    RR2, #0x9424
4C 04 02        mult    RR2, R4
4D 94 02        mult    RR2, #0x94
4E 07 01        bmov    bf, R1, #7
4E 47 01        bmov    R1, #7, bf
4F 07 01        bcmp    bf, R1, #7
4F 47 01        band    bf, R1, #7
4F 87 01        bor     bf, R1, #7
4F C7 01        bxor    bf, R1, #7
50 94 01        cmp     R1, #0x94
51 94 01        add     R1, #0x94
52 94 01        sub     R1, #0x94
53 94 01        adc     R1, #0x94
54 94 01        sbc     R1, #0x94
55 94 01        and     R1, #0x94
56 94 01        or      R1, #0x94
57 94 01        xor     R1, #0x94
58 94 01        mov     R1, #0x94
5C 04 02        div     RR2, RR4
5D 94 02        div     RR2, #0x94
5E 01 94 02     movm    R1, #0x94, R2
5F 01 94 24     movm    R1, #0x94, #0x24
60 04 02        cmpw    RR2, RR4
61 04 02        addw    RR2, RR4
62 04 02        subw    RR2, RR4
63 04 02        adcw    RR2, RR4
64 04 02        sbcw    RR2, RR4
65 04 02        andw    RR2, RR4
66 04 02        orw     RR2, RR4
67 04 02        xorw    RR2, RR4
68 02 24 94     cmpw    RR2, #0x9424
69 02 24 94     addw    RR2, #0x9424
6A 02 24 94     subw    RR2, #0x9424
6B 02 24 94     adcw    RR2, #0x9424
6C 02 24 94     sbcw    RR2, #0x9424
6D 02 24 94     andw    RR2, #0x9424
6E 02 24 94     orw     RR2, #0x9424
6F 02 24 94     xorw    RR2, #0x9424
78 24 94        movw    rr0, #0x9424
79 24 94        movw    rr8, #0x9424
7A 24 94        movw    rr2, #0x9424
7B 24 94        movw    rr10, #0x9424
7C 24 94        movw    rr4, #0x9424
7D 24 94        movw    rr12, #0x9424
7E 24 94        movw    rr6, #0x9424
7F 24 94        movw    rr14, #0x9424
A0 01           bclr    R1, #0
A1 01           bclr    R1, #1
A2 01           bclr    R1, #2
A3 01           bclr    R1, #3
A4 01           bclr    R1, #4
A5 01           bclr    R1, #5
A6 01           bclr    R1, #6
A7 01           bclr    R1, #7
A8 01           bset    R1, #0
A9 01           bset    R1, #1
AA 01           bset    R1, #2
AB 01           bset    R1, #3
AC 01           bset    R1, #4
AD 01           bset    R1, #5
AE 01           bset    R1, #6
AF 01           bset    R1, #7
B0 01           mov     r0, R1
B1 01           mov     r1, R1
B2 01           mov     r2, R1
B3 01           mov     r3, R1
B4 01           mov     r4, R1
B5 01           mov     r5, R1
B6 01           mov     r6, R1
B7 01           mov     r7, R1
B8 01           mov     R1, r0
B9 01           mov     R1, r1
BA 01           mov     R1, r2
BB 01           mov     R1, r3
BC 01           mov     R1, r4
BD 01           mov     R1, r5
BE 01           mov     R1, r6
BF 01           mov     R1, r7
C0 94           mov     r0, #0x94
C1 94           mov     r1, #0x94
C2 94           mov     r2, #0x94
C3 94           mov     r3, #0x94
C4 94           mov     r4, #0x94
C5 94           mov     r5, #0x94
C6 94           mov     r6, #0x94
C7 94           mov     r7, #0x94
C8 94           mov     ie0, #0x94
C9 94           mov     ie1, #0x94
CA 94           mov     ir0, #0x94
CB 94           mov     ir1, #0x94
CC 94           mov     p0, #0x94
CD 94           mov     p1, #0x94
CE 94           mov     p2, #0x94
CF 94           mov     p3, #0x94
F0              stop
F1              halt
F8              ret
F9              iret
FA              clrc
FB              comc
FC              setc
FD              ei
FE              di
FF              nop
assemble() returned 0: OK
Clean up...

Note: The output binary has not been verified for accuracy.

Development

As85 is a fairly simple assembler. It doesn't use any sort of 'compiler-compiler' for lexical analysis. In fact, its lexical analysis is very specific to the sm8521 MCU.

The main loop (assemble() function, defined in asm.c) does the input text parsing inline. (This should probably be moved out to a new source file.) After splitting a line into two pieces; op[0] containing the instruction, and op[1] containing its operands; a string comparison against op[0] is done over all supported instructions. If a match is found, the operands string is passed to a dynamically chosen function (from a function pointer table, indexed by the matched instruction). This function performs the lexical analysis required to decide which instruction we are trying to assemble.

Since the sm8521 is a CISC machine, its instruction set contains a number of different ways to assemble the same instruction mnemonic. For example, several different addressing modes for the mov instruction are shown above in the test3.asm output. The lexical analysis is the voodoo which picks the proper addressing mode and byte codes by analyzing the operands.

The lexical analyzing functions are defined in inst.c (following the function pointer table mentioned previously). The function handling the current instruction will test the operands string against a series of lexical patterns with the chk_pattern() function (defined in asm.c, although this should probably be moved).

chk_pattern() uses a scanf-like formatting string, rather than a regular expression, which is more common in lexical analysis. Documentation for the formatting string can be found in inst.h. With just a few pattern primitives, any of the sm8521's addressing modes can be matched, with the matching primitives output as part of an array. It makes good use of the format scanners defined in scan.c.

If chk_pattern() manages to find a match, the matching data may be further checked for validity on a per-context basis. Finally, the full instruction byte codes will be put together and returned to the assemble() loop. This is where the object code would be built. Currently the only thing that happens now is dumping the assembled instruction to stdout, in a debug build. [2]

Optimization Concerns

Some optimization could be done within this lexical analysis process. The first improvement would be replacing the linear string comparison with a binary search tree. The second thing that would help in this immediate area would be replacing the string comparison itself with a hash comparison. The hash algorithm would have to be suitably small and fast enough to make much of a difference.

The next big optimization would be rearranging the chk_pattern() calls within each lexical analyzer to check the most likely patterns first. The best way to choose the best order is static analysis of sm8521 source code, which is obviously in very short supply. Accurate disassemblies of commercial Game.com games would be helpful to this end, however.