As85
as85 is a simple assembler for the Sharp sm8521; the same microcontroller used in the Tiger Game.com. Game.com was released in 1998 and had only a few games ever made for it. It also has not had any homebrew games made for it. as85 is an attempt to build an assembler that will help hackers write homebrew code that will run on Game.com hardware.
Documentation on the Game.com hardware is available at Game.commies.
Contents
Download
The source code is available at http://git.kodewerx.org/as85/src/
Current Progress
The current state of as85 is "almost usable, but not quite there yet." A number of bugs exist which need to be fixed before it can be used as a development tool:
- Bug 2: Add support for jump/call/branch instructions
- Bug 3: Output object code
- Bug 4: Write a linker
I've also filed a bug about giving the project a better name [1].
Usage
The program takes one argument; the file name of an sm8521 assembly file. The files in the /examples directory are a good place to start.
- test.asm is an example of what the beginning of a Game.com program might look like; it contains a header, and some [random] instructions to give you an idea.
- test2.asm is for testing the integrity of the parser with complex strings.
- test3.asm lists all possible sm8521 instructions; for verifying the output binary is correct.
Example Output
The following command:
$ ./as85 ../examples/test3.asm
Produces the following output:
00 01 clr R1 01 01 neg R1 02 01 com R1 03 01 rr R1 04 01 rl R1 05 01 rrc R1 06 01 rlc R1 07 01 srl R1 08 01 inc R1 09 01 dec R1 0A 01 sra R1 0B 01 sll R1 0C 01 da R1 0D 01 swap R1 0E 01 push R1 0F 01 pop R1 10 0A cmp r1, r2 11 0A add r1, r2 12 0A sub r1, r2 13 0A adc r1, r2 14 0A sbc r1, r2 15 0A and r1, r2 16 0A or r1, r2 17 0A xor r1, r2 18 02 incw RR2 19 02 decw RR2 1A 08 clr @r1 1A 09 neg @r1 1A 0A com @r1 1A 0B rr @r1 1A 0C rl @r1 1A 0D rrc @r1 1A 0E rlc @r1 1A 0F srl @r1 1B 08 inc @r1 1B 09 dec @r1 1B 0A sra @r1 1B 0B sll @r1 1B 0C da @r1 1B 0D swap @r1 1B 0E push @r1 1B 0F pop @r1 1C 07 24 bclr 0xFF24, #7 1C 0F 94 bclr 0x94(r1), #7 1D 07 24 bset 0xFF24, #7 1D 0F 94 bset 0x94(r1), #7 1E 02 pushw RR2 1F 02 popw RR2 20 0A cmp r1, @r2 20 4A cmp r1, (r2)+ 20 88 94 cmp r1, @0x94 20 8A 94 cmp r1, 0x94(r2) 20 CA cmp r1, -(r2) 21 0A add r1, @r2 21 4A add r1, (r2)+ 21 88 94 add r1, @0x94 21 8A 94 add r1, 0x94(r2) 21 CA add r1, -(r2) 22 0A sub r1, @r2 22 4A sub r1, (r2)+ 22 88 94 sub r1, @0x94 22 8A 94 sub r1, 0x94(r2) 22 CA sub r1, -(r2) 23 0A adc r1, @r2 23 4A adc r1, (r2)+ 23 88 94 adc r1, @0x94 23 8A 94 adc r1, 0x94(r2) 23 CA adc r1, -(r2) 24 0A sbc r1, @r2 24 4A sbc r1, (r2)+ 24 88 94 sbc r1, @0x94 24 8A 94 sbc r1, 0x94(r2) 24 CA sbc r1, -(r2) 25 0A and r1, @r2 25 4A and r1, (r2)+ 25 88 94 and r1, @0x94 25 8A 94 and r1, 0x94(r2) 25 CA and r1, -(r2) 26 0A or r1, @r2 26 4A or r1, (r2)+ 26 88 94 or r1, @0x94 26 8A 94 or r1, 0x94(r2) 26 CA or r1, -(r2) 27 0A xor r1, @r2 27 4A xor r1, (r2)+ 27 88 94 xor r1, @0x94 27 8A 94 xor r1, 0x94(r2) 27 CA xor r1, -(r2) 28 0A mov r1, @r2 28 4A mov r1, (r2)+ 28 88 94 mov r1, @0x94 28 8A 94 mov r1, 0x94(r2) 28 CA mov r1, -(r2) 29 11 mov @r1, r2 29 51 mov (r1)+, r2 29 90 94 mov @0x94, r2 29 91 94 mov 0x94(r1), r2 29 D1 mov -(r1), r2 2C 02 exts RR2 2E 94 mov ps0, #0x94 2F 94 01 btst R1, #0x94 30 09 cmp r1, @rr2 30 49 cmp r1, (rr2)+ 30 88 24 94 cmp r1, @0x9424 30 89 24 94 cmp r1, 0x9424(rr2) 30 C9 cmp r1, -(rr2) 31 09 add r1, @rr2 31 49 add r1, (rr2)+ 31 88 24 94 add r1, @0x9424 31 89 24 94 add r1, 0x9424(rr2) 31 C9 add r1, -(rr2) 32 09 sub r1, @rr2 32 49 sub r1, (rr2)+ 32 88 24 94 sub r1, @0x9424 32 89 24 94 sub r1, 0x9424(rr2) 32 C9 sub r1, -(rr2) 33 09 adc r1, @rr2 33 49 adc r1, (rr2)+ 33 88 24 94 adc r1, @0x9424 33 89 24 94 adc r1, 0x9424(rr2) 33 C9 adc r1, -(rr2) 34 09 sbc r1, @rr2 34 49 sbc r1, (rr2)+ 34 88 24 94 sbc r1, @0x9424 34 89 24 94 sbc r1, 0x9424(rr2) 34 C9 sbc r1, -(rr2) 35 09 and r1, @rr2 35 49 and r1, (rr2)+ 35 88 24 94 and r1, @0x9424 35 89 24 94 and r1, 0x9424(rr2) 35 C9 and r1, -(rr2) 36 09 or r1, @rr2 36 49 or r1, (rr2)+ 36 88 24 94 or r1, @0x9424 36 89 24 94 or r1, 0x9424(rr2) 36 C9 or r1, -(rr2) 37 09 xor r1, @rr2 37 49 xor r1, (rr2)+ 37 88 24 94 xor r1, @0x9424 37 89 24 94 xor r1, 0x9424(rr2) 37 C9 xor r1, -(rr2) 38 09 mov r1, @rr2 38 49 mov r1, (rr2)+ 38 88 24 94 mov r1, @0x9424 38 89 24 94 mov r1, 0x9424(rr2) 38 C9 mov r1, -(rr2) 39 21 mov @rr2, r4 39 61 mov (rr2)+, r4 39 A0 24 94 mov @0x9424, r4 39 A1 24 94 mov 0x9424(rr2), r4 39 E1 mov -(rr2), r4 3A 14 movw rr2, @rr4 3A 54 movw rr2, (rr4)+ 3A 90 24 94 movw rr2, @0x9424 3A 94 24 94 movw rr2, 0x9424(rr4) 3A D4 movw rr2, -(rr4) 3B 22 movw @rr2, rr4 3B 62 movw (rr2)+, rr4 3B A0 24 94 movw @0x9424, rr4 3B A2 24 94 movw 0x9424(rr2), rr4 3B E2 movw -(rr2), rr4 3C 14 movw rr2, rr4 40 02 01 cmp R1, R2 41 02 01 add R1, R2 42 02 01 sub R1, R2 43 02 01 adc R1, R2 44 02 01 sbc R1, R2 45 02 01 and R1, R2 46 02 01 or R1, R2 47 02 01 xor R1, R2 48 02 01 mov R1, R2 4A 04 02 movw RR2, RR4 4B 02 24 94 movw RR2, #0x9424 4C 04 02 mult RR2, R4 4D 94 02 mult RR2, #0x94 4E 07 01 bmov bf, R1, #7 4E 47 01 bmov R1, #7, bf 4F 07 01 bcmp bf, R1, #7 4F 47 01 band bf, R1, #7 4F 87 01 bor bf, R1, #7 4F C7 01 bxor bf, R1, #7 50 94 01 cmp R1, #0x94 51 94 01 add R1, #0x94 52 94 01 sub R1, #0x94 53 94 01 adc R1, #0x94 54 94 01 sbc R1, #0x94 55 94 01 and R1, #0x94 56 94 01 or R1, #0x94 57 94 01 xor R1, #0x94 58 94 01 mov R1, #0x94 5C 04 02 div RR2, RR4 5D 94 02 div RR2, #0x94 5E 01 94 02 movm R1, #0x94, R2 5F 01 94 24 movm R1, #0x94, #0x24 60 04 02 cmpw RR2, RR4 61 04 02 addw RR2, RR4 62 04 02 subw RR2, RR4 63 04 02 adcw RR2, RR4 64 04 02 sbcw RR2, RR4 65 04 02 andw RR2, RR4 66 04 02 orw RR2, RR4 67 04 02 xorw RR2, RR4 68 02 24 94 cmpw RR2, #0x9424 69 02 24 94 addw RR2, #0x9424 6A 02 24 94 subw RR2, #0x9424 6B 02 24 94 adcw RR2, #0x9424 6C 02 24 94 sbcw RR2, #0x9424 6D 02 24 94 andw RR2, #0x9424 6E 02 24 94 orw RR2, #0x9424 6F 02 24 94 xorw RR2, #0x9424 78 24 94 movw rr0, #0x9424 79 24 94 movw rr8, #0x9424 7A 24 94 movw rr2, #0x9424 7B 24 94 movw rr10, #0x9424 7C 24 94 movw rr4, #0x9424 7D 24 94 movw rr12, #0x9424 7E 24 94 movw rr6, #0x9424 7F 24 94 movw rr14, #0x9424 A0 01 bclr R1, #0 A1 01 bclr R1, #1 A2 01 bclr R1, #2 A3 01 bclr R1, #3 A4 01 bclr R1, #4 A5 01 bclr R1, #5 A6 01 bclr R1, #6 A7 01 bclr R1, #7 A8 01 bset R1, #0 A9 01 bset R1, #1 AA 01 bset R1, #2 AB 01 bset R1, #3 AC 01 bset R1, #4 AD 01 bset R1, #5 AE 01 bset R1, #6 AF 01 bset R1, #7 B0 01 mov r0, R1 B1 01 mov r1, R1 B2 01 mov r2, R1 B3 01 mov r3, R1 B4 01 mov r4, R1 B5 01 mov r5, R1 B6 01 mov r6, R1 B7 01 mov r7, R1 B8 01 mov R1, r0 B9 01 mov R1, r1 BA 01 mov R1, r2 BB 01 mov R1, r3 BC 01 mov R1, r4 BD 01 mov R1, r5 BE 01 mov R1, r6 BF 01 mov R1, r7 C0 94 mov r0, #0x94 C1 94 mov r1, #0x94 C2 94 mov r2, #0x94 C3 94 mov r3, #0x94 C4 94 mov r4, #0x94 C5 94 mov r5, #0x94 C6 94 mov r6, #0x94 C7 94 mov r7, #0x94 C8 94 mov ie0, #0x94 C9 94 mov ie1, #0x94 CA 94 mov ir0, #0x94 CB 94 mov ir1, #0x94 CC 94 mov p0, #0x94 CD 94 mov p1, #0x94 CE 94 mov p2, #0x94 CF 94 mov p3, #0x94 F0 stop F1 halt F8 ret F9 iret FA clrc FB comc FC setc FD ei FE di FF nop assemble() returned 0: OK Clean up...
Note: The output binary has not been verified for accuracy.
Development
As85 is a fairly simple assembler. It doesn't use any sort of 'compiler-compiler' for lexical analysis. In fact, its lexical analysis is very specific to the sm8521 MCU.
The main loop (assemble() function, defined in asm.c) does the input text parsing inline. (This should probably be moved out to a new source file.) After splitting a line into two pieces; op[0] containing the instruction, and op[1] containing its operands; a string comparison against op[0] is done over all supported instructions. If a match is found, the operands string is passed to a dynamically chosen function (from a function pointer table, indexed by the matched instruction). This function performs the lexical analysis required to decide which instruction we are trying to assemble.
Since the sm8521 is a CISC machine, its instruction set contains a number of different ways to assemble the same instruction mnemonic. For example, several different addressing modes for the mov instruction are shown above in the test3.asm output. The lexical analysis is the voodoo which picks the proper addressing mode and byte codes by analyzing the operands.
The lexical analyzing functions are defined in inst.c (following the function pointer table mentioned previously). The function handling the current instruction will test the operands string against a series of lexical patterns with the chk_pattern() function (defined in asm.c, although this should probably be moved).
chk_pattern() uses a scanf-like formatting string, rather than a regular expression, which is more common in lexical analysis. Documentation for the formatting string can be found in inst.h. With just a few pattern primitives, any of the sm8521's addressing modes can be matched, with the matching primitives output as part of an array. It makes good use of the format scanners defined in scan.c.
If chk_pattern() manages to find a match, the matching data may be further checked for validity on a per-context basis. Finally, the full instruction byte codes will be put together and returned to the assemble() loop. This is where the object code would be built. Currently the only thing that happens now is dumping the assembled instruction to stdout, in a debug build. [2]
Optimization Concerns
Some optimization could be done within this lexical analysis process. The first improvement would be replacing the linear string comparison with a binary search tree. The second thing that would help in this immediate area would be replacing the string comparison itself with a hash comparison. The hash algorithm would have to be suitably small and fast enough to make much of a difference.
The next big optimization would be rearranging the chk_pattern() calls within each lexical analyzer to check the most likely patterns first. The best way to choose the best order is static analysis of sm8521 source code, which is obviously in very short supply. Accurate disassemblies of commercial Game.com games would be helpful to this end, however.