Writing a Goboy: the bugs' awakening

December 15, 2019

Goboy is nearly completion – well, as complete as a Gameboy emulator can ever be. My wife was volunteering in Phnom Penh for a few weeks and I had intended to use the time to implement the audio unit. As things turned out I ended up spending a few weeks chasing down various bugs. There is still some errant flickering sprites that I haven’t had time to debug yet and the audio remains unimplemented. Those two aside, though, Goboy is in much better shape.

In this post I will recount a few bugs I encountered, how they exposed deficiencies in my development process and tips for preventing bugs in emulators.

The repo is here.

Making Mario invincible

The first bug on my list was that the invincibility star in Mario didn’t actually work. It should make Mario flash and be invincible for a short time. This was odd as it had worked before. When the star touched Mario he receives the points bonus, so it was clearly registering but not actually doing any invincibility. I thought this was due to the timer since the invincibility ran on a countdown and the blargg CPU timer test failed too. The timer and interrupt code all looked okay and they’re two of the simpler pieces of the Goboy so fairly quickly I was running out of ideas for where the bug could be.

When this happens I go and read through the documentation. Usually it’ll say something that doesn’t quite fit with how I know the Goboy works and that gives me a clue. I discovered the “Cycle Accurate Game Boy Docs” shockingly late in this process and they’ve been a godsend. So I checked through everything and realised that the timer was way more complicated than I had thought but my implementation resembled the Gameboy Programmer’s Manual specification as well as other emulators I’d seen so I was fairly sure it should at least be vaguely accurate. Looking at the interrupts I checked all the flags and registers and found my error:

--- a/pkg/cpu/interrupts.go
+++ b/pkg/cpu/interrupts.go
@@ -9,7 +9,7 @@ type Interrupt byte
 const (
        VBlank        Interrupt = 0
        LCDCStatus              = 1
-       TimerOverflow           = 3
+       TimerOverflow           = 2
        Input                   = 4
 )

While doing some refactoring I had erroneously assigned TimerOverflow to the serial interrupt’s value (which I’m not bothering to implement), meaning no timer interrupts were firing and so the invincibility countdown didn’t work. I do kind of enjoy it when a few hours of work result in a single line fix!

I think the way to catch this kind of bug is unit testing. If I’m honest I was quite pleased to get my Goboy to a state where it could run test ROMs and my attitude was a little too much “now that I’m a real boy running test ROMs I don’t need to bother with unit tests any more!”. Test ROMs are good at catching incorrect behaviour but they’re pretty poor at indicating where the failure is coming from. I wasn’t sure if my timer was just a bit off or completely not working.

Since the value I had mistakenly used here is in fact valid, albeit for a different interrupt, I don’t see an easy way to catch the error at compile time. The risk of a unit test is that you just end up writing the test based on an incorrect interpretation of the documentation, so the error is replicated in both places. At least in this case a unit test would have caught my regression.

Overall I regret that I mostly only unit tested the CPU instruction implementations. Yes, test ROMs are needed for testing the interaction between subsystems. But even painfully simple unit tests can lock down expected behaviour.

Getting Link going

Once Mario could be invincible again I decided to try running Link’s Awakening. All I got was a blank screen at boot. How could it fail so badly even though Mario worked perfectly? Emulation is a tough master.

My preferred debugging technique is to add a log somewhere useful and see what it says. Logging to stdout, rather than having a separate debug screen, means I can add the same log statement to a known good emulator and diff the output (vim -d became my friend). Doing this, I saw that Goboy was trying to read from different memory addresses than the working emulator. The addresses were in the range of the memory banking controller (MBC) address space – a smoking gun! Further logging showed that Goboy was reading from correct addresses but the MBC code was sending it to an incorrect offset. The fix was similarly simple:

--- a/pkg/cpu/memory.go
+++ b/pkg/cpu/memory.go
        case address >= ROMBank00Limit && address < ROMBankLimit:
                if !m.bankingEnabled {
                        return m.rom[address]
                }
-               offset := uint16(address - ROMBank00limit)
-               return m.rom[offset+uint16(m.cpu.currentROMBank*ROMBankSize)]
+               offset := address - ROMBank00Limit
+               return m.rom[offset+(m.cpu.currentROMBank*ROMBankSize)]

The casting to uint16s turned out to truncate the offset, resulting in an incorrect value, and wasn’t needed anyway.

This error made me realise I had no confidence in the correctness of the MBC code. It’s quite a standalone subsection and so was a perfect target for some unit testing. I found numerous other mistakes that could have caused all kinds of subtle bugs. In doing so I hit against a limitation of the unit testing approach, which was that I did’t necessarily know the expected behaviour. I tried to imagine how the hardware would work and went from there.

For example, the MBC treats the bit pattern in a particular address differently depending on which mode its. Changing mode doesn’t actually change the bit pattern, just how it’s interpreted. Therefore, if I switch from one mode to another and then back again the original value should be there, etc. etc. Simple checks that behaviour is reversible work well. So, to recap: unit test!

With MBC working Zelda loaded but had weird visual errors. This was a juicy bug that took a few hours of investigating. The visual errors obviously pointed to the GPU implementation. Dumping the status of the screen on each cycle showed that it was weirdly switching between modes. The Gameboy GPU is meant to cycle between modes in a defined way and mine was jumping all over the place.

I looked through the mode selection code, which at the time was implemented as a neat little state machine. I couldn’t see any way how the screen could move through the modes in the pattern I was observing. There were no possible transitions so what the hell was going on? Confused, I added logging to the code that updated the status register (which holds the mode) to see what values were being written. That exposed the bug. I was directly writing the value to the register and neglecting the fact that the mode bits were read only. I think the Gameboy Programming Manual didn’t state that they were read only but the Cycle Accurate Docs spelled it out. I needed to mask writes to the register:

+++ b/pkg/cpu/memory.go
@@ -133,9 +133,11 @@ func (m *Memory) set(address uint16, value byte) {
+               } else if address == STATAddress {
+                       readOnlyBits := m.ioram[localAddress] & 7
+                       m.ioram[localAddress] = (value & 0xF8) | readOnlyBits

I went through the Cycle Accurate Docs and wrote more unit tests to check that I was handling the read only bits in every register correctly. I found a few other errors and also learned that unused bits in a register were always 1 and not 0! That could cause a game to fail if it does a comparison on a register value expecting the unused bits to be 1.

An idea I’d like to explore would be to encode the read/write status of a register’s bits in the register variable itself, rather than just treating it as a plain byte alias. Writes would go through a method that handled the masking. The downside is that there would need to be an escape hatch for code that did need to update the read only registers.

This points to the more general problem, however, was a poor separation of interfaces. I recommend making sure that you really understand what kind of access different parts of the Gameboy have to each other.

What helped massively in my case was to move the memory implementation into a separate package. I’d wanted to do this for a while so that I could just specify a small interface in the CPU package and provide a simple backing array for the CPU tests, which would make CPU testing easier. It also clarified in my mind that memory-mapped registers have two interfaces: the “external” one that writes via memory go through – that is, from the perspective of the ROM – and the “internal” one that the CPU uses to directly set all of the bits. The mode bits of the status register are read-only to the ROM but obviously need to be writable for the CPU itself.

This distinction hadn’t been clear in my code and sometimes CPU methods were going through the memory route, sometimes not. Some places applied bit masking in helper methods in the CPU package, some in the memory implementation. Creating a separate memory package enforced this distinction. All of the bit masking should live in the memory package as it’s part of the external, memory-facing interface.

Moving the memory out of the CPU package also broke a few places that were already code smells, such as doing this in the GPU in order to access VRAM directly and avoid the restrictions that the memory.get method imposed on CPU acceses to VRAM:

data := gpu.cpu.memory.vram[address - VRAMOffset]

What would Liskov say?! Since the GPU has unmediated access to VRAM and the CPU does not, VRAM should be owned by the GPU struct, not the CPU struct. I moved the VRAM and sprite RAM to live in the GPU, which the memory package communicates with via an interface. I’m much happier with this approach.

The big bug

After this Zelda ran! Joy! But it crashed just as Link woke up from his dream. Kirby’s Dreamland also froze within a few seconds of starting the first level. This turned out to be a much trickier bug than the others and took a few days to track down.

What happened was that the emulator would land on a RESET instruction that caused it to jump to a memory address that also contained a RESET instruction, causing an infinite loop. Weird! Looking at the ROM code that contained the original RESET, I noticed that there were loads of sequential RESET instructions. Why would that be? I noticed that the opcode for RESET is 0xFF. It seemed the emulator was executing data as code. Since the addresses were in the MBC range that implied another mistake in the MBC code. After double-checking my MBC implementation (all fine) I looked at where my emulator diverged from the known good emulator. The good one followed a conditional jump. Mine didn’t and then executed loads of seemingly nonsensical instructions.

My interrupt timing was also way off and so I suspected that the interrupts were meant to build up a state that the conditional jump relied on. I spent quite some time looking through the GPU’s interrupt handling and found a few bugs but not the one causing the freezing.

Feeling stuck, I decided to see what the stack pointer was doing and found my smoking gun. Something was causing a stack overflow that caused the stack pointer to run past work RAM and down through the whole address range. That included writes to the MBC address range, which caused the MBC to swap ROM banks and send the program counter to all kinds of weird places. That caused the craziness I saw. More logging showed that the emulator was executing a CALL instruction in a loop, which added a stack frame, but without a corresponding RETURN to pop the frame off. So that was causing the stack overflow but what was causing the loop?

Looking more closely at the code around the conditional jump, I noticed that Goboy performed a slightly different loop to the correct emulator. What turned out to be the root issue was quite silly:

--- a/pkg/cpu/cpu.go
+++ b/pkg/cpu/cpu.go
@@ -442,10 +442,10 @@ func (cpu *CPU) Execute(instr in.Instruction) {
        case in.JumpRelativeConditional:
                if cpu.conditionMet(i.Condition) {
-                       cpu.setPC(cpu.GetPC() - 2 + uint16(i.Immediate))
+                       cpu.setPC(cpu.GetPC() + uint16(i.Immediate))
                }
--- a/pkg/decoder/decoder.go
+++ b/pkg/decoder/decoder.go
@@ -198,11 +198,11 @@ func Decode(il Iterator) in.Instruction {
        case op&in.JumpConditionalMask == in.JumpRelativeConditionalPattern:
                // JR cc n. 0b001c c000, n
-               Immediate := int8(il.Next()) + 2
+               Immediate := int8(il.Next())
                Condition := in.GetCondition(op)
                instruction = in.JumpRelativeConditional{Immediate, Condition}

I had misread the Gameboy Programming Manual and thought that I needed to add two to the immediate value in the conditional jump instruction. At some point I had realised that the extra two represented the length of the instruction itself. Since I incremented the program counter each time I called il.Next() this length was already accounted for. I subtracted the superfluous two in the cpu.Execute() method, clearly having forgotten that I’d added them in the decoder in the first place. By itself this was fine, if a little pointless. However, if the Immediate value was right at the limit of a signed byte range the addition would cause it to wrap round. The subtraction didn’t undo that because at that point I was treating Immediate as a uint16. As a result my conditional jump was jumping with an incorrect offset, by chance hitting a CALL instruction and causing the stack overflow. Whew! By the end of this bug hunt I was exhausted and couldn’t look at the code for days.

You can see from these bugs that I’ve been stung a few times by incorrect casting. They are tricky to catch if the entire range of the output data type is potentially valid. Better unit testing of edge cases, rather than relying on the blargg test ROMs, would have helped here.

Another big time saver would be a stack overflow detector. The emulator could maintain a stack of instructions modifying the actual stack and dump that if the stack overflows work RAM. What put me off adding this was not knowing if it’s actually invalid behaviour (from the hardware/emulator perspective) for the stack pointer to move out of work RAM. What if some games intentionally do that?

In retrospect that was a premature worry. At this stage of development it’s far more likely that the stack pointer going out of work RAM is a bug and not intentional. If I were to do another emulator (the SNES is tempting me, though I hear it’s a beast) I would start with a debug mode that ran by default and locked down as much as possible through extensive use of assert statements. That would catch lots of bugs at an earlier point. If a game accidentally triggered the assert that could be either a hint that my understanding of how the Gameboy works isn’t quite correct, or maybe just odd behaviour that would be permitted in the release build. I have seen other people implement debug modes but they seemed to focus more on having another screen dump register data, which seems less useful than a greppable/diffable text dump.

Summary

What strikes me on looking through this is that all of these bugs were tiny. Nothing more than a one line fix. It’s very, very tempting when writing an emulator to focus on getting something on the screen and start running ROMs. That comes at the cost of bugs manifesting themselves far from their origin.

So it seems that this is just another cautionary tale for why we should unit test and separate out responsibilities. But though that sounds easy and sensible on paper it misses the important point that in many cases I simply didn’t know what correct behaviour to expect or exactly what was a separate sub-unit. There is plenty of documentation out there but unless you’ve written something similar before it’s a total information overload. In the absence of full information perhaps the best you can do is to codify your assumptions in some way – unit tests, assertions, types – and at least then you get quick feedback when one of your assumptions turns out to be incorrect.