Writing a Goboy: LOAD complete

July 10, 2019

The first tranch of instructions to implement are the various load and store operations. These are interesting to start off with because they obviously require some kind of memory model. The instructions themselves are quite variable. Some include immediate values, entailing variable-length instructions, some do complex addressing and so on. I spent quite a bit of time implementing these and refactoring what I had because I wanted to establish a solid pattern. Implementing the memory instructions was probably a bit more involved than, say, the arithmetic ones but I thought that if I could get the memory instructions right the others would follow easily.

The code is at this point in the repo.

Load instruction types

The Gameboy has a range of load instructions covering operations like:

  • move register to register
  • move immediate to register
  • load/store value using address provided as immediate
  • load/store using immediate address and offset in register

They range from one to three bytes in length. To handle this I switched from directly passing opcodes to Run and started loading a series of instructions into memory and then starting a proper execution loop:

func (cpu *CPU) Run() {
	for opcode := cpu.fetchAndIncrement(); opcode != 0; opcode = cpu.fetchAndIncrement() {
		instr := Decode(opcode)
		switch i := instr.(type) {
		case Move:
			cpu.Set(i.dest, cpu.Get(i.source))
		case MoveImmediate:
			i.immediate = cpu.fetchAndIncrement()
			cpu.Set(i.dest, i.immediate)
        // ...

func (cpu *CPU) LoadProgram(program []byte) {
	cpu.memory.Load(ProgramStartAddress, program)

Separately, the Instruction interface is now specified by an Opcode method that returns the instruction’s opcode as a byte array:

type Instruction interface {
	Opcode() []byte

type Move struct {
	source, dest Register

func (i Move) Opcode() []byte {
	return []byte{byte(MovePattern | i.source | i.dest<<DestRegisterShift)}

This makes it so much easier to write tests because I can write the test instructions using the Instruction structs:

func TestSetAndGetRegister(t *testing.T) {
	cpu := Init()

	var expected byte = 3
		MoveImmediate{dest: A, immediate: expected},
		Move{source: A, dest: B},
		Move{source: B, dest: C},
		Move{source: C, dest: D},
		Move{source: D, dest: E},

	if regValue := cpu.Get(E); regValue != 3 {
		t.Errorf("Expected %X, got %X", expected, regValue)

I found that a lot of Gameboy emulators I’ve seen online have quite a lot of hex values interspersed through the code and it’s often difficult to know what they’re meant to represent. Using the structs as an intermediate format helps hugely with legibility. The majority of the bitmasking and matching is kept in the decoder and the execution loop only deals with the Instruction structs.

Memory model

I have deliberately started with a very simple memory model:

type Memory []byte

const ProgramStartAddress = 0x150
const StackStartAddress = 0xFF80

func (m Memory) Load(start int, data []byte) {
	for i := 0; i < len(data); i++ {
		m[start+i] = data[i]

func (m Memory) Set(address uint16, value byte) byte {
	m[address] = value
	return value

func (m Memory) Get(address uint16) byte {
	return m[address]

func InitMemory() Memory {
	return make(Memory, 0xFFFF)

This will fail horribly when I try to run anything real because according to the Gameboy programmer’s manual all kinds of memory regions should be ready-only or map to various interfaces. None of the instructions I’ve implemented yet touch on that so I am waiting for it to become a problem before implementing the correct protections. For now the memory is just an open array of bytes but every access is behind a function so all being well I can just add the checks in there as the need arises.

Decoding instructions

One of the trickiest things has been decoding the instructions in an elegant way. One extreme would be just have a separate case for every individual opcode bit pattern. This is verbose and ignores that many instructions vary only by the registers used. On the other hand it’s tempting to try and be really clever and find all of the patterns and subpatterns in the encoding. For example, often the loads and stores come in pairs that only differ by a single bit. At first I was tempted to match both in the same case statement and even try reusing the same Instruction struct, just with the source and destination fields reversed. For a few this worked but the decoding often ended up quite fiddly and then the execution of the Instruction would need to change depending on the type of source etc. Where I’ve ended up is to try and do things the obvious way and just deal with having a few similar instructions. I feel fairly confident that I’ve got a good balance because there are no examples of different Instructions having identical execution steps.

One big help I found online was to treat instructions using the HL register pair as a pointer as operating on a register M with its own identifier. This fits neatly into the register definitions:

const (
	A Register = 0x7
	B          = 0x0
	C          = 0x1
	D          = 0x2
	E          = 0x3
	H          = 0x4
	L          = 0x5
	M          = 0x6 // memory reference through H:L

The programmer’s manual had treated such instructions as completely different to the ones operating on the actual registers and so I had things like Move, MoveHL and so on. Treating M as just another register meant I could get rid of the HL-specialised instruction.

Going instruction by instruction I had come up with quite inconsistent ways of getting and setting values. Sometimes I wanted to treat a memory pair as a pointer, sometimes I wanted the actual value and so on. Once all the instructions were done I realised I could have a clean, consistent interface if created a separate type for register pairs since only they were used as pointers:

func (cpu *CPU) Get(r Register) byte
func (cpu *CPU) GetMem(r RegisterPair) byte
func (cpu *CPU) GetPair(r RegisterPair) (byte, byte)

Each function just switches on the Register/RegisterPair and retrieves the corresponding values. The only inconsistency is that Get will read from memory if passed M, but as explained above I feel that’s worth it for the simplicity gains elsewhere.

Clock cycles

One thing that held me up a bit was emulating clock cycles correctly. The programmer’s manual specifies how many clock cycles each instruction takes. Instructions dealing only with registers use a single cycle, immediates and memory accesses require two. An instruction that reads an immediate value and writes it to memory takes three.

The naive approach is to just increment by the requisite amount after executing the instruction. This felt a little inaccurate to me. Perhaps it could lead to timing problems with interrupts that happen in the middle of a three-cycle instruction? This might be overthinking it but I preferred the idea of incrementing the counter at each step of the instruction execution. I would like the execution of the instruction to naturally use the correct number of cycles rather than having to hardcode the values for each instruction.

I spent a few hours experimenting with breaking each instruction down into micro-ops. The problem with storing them in an Execute() method on the Instruction struct is that each micro-op needs access to the cpu struct in order to read and write values. I want to keep decoding as separate as possible from execution and will probably split them into separate packages soon. I experimented with things like using closures to carry around the parameters known at decode type (destination register etc) and then return a function that took cpu as an argument and got the runtime parameters (value in source register etc). It didn’t look very good and it seemed to me that the CPU should have knowledge of how to execute an instruction, not the instruction itself. An alternative would be to pass a series of micro-ops to the Run method but then you lose information about what instruction generated that set of micro-ops.

In the end I decided that I was massively overthinking everything. My solution (for now at least) is to increment the cycle counter when fetching the next byte opcode and when performing memory operations. For the majority of the instructions this means that they take the correct number of cycles. For some I’ve had to manually add another increment to get the specified amount. This is annoying but /seems/ to be because for some instructions the clock cycles don’t match what their “micro-ops” take. Perhaps I’ll discover a way of avoiding these manual bumps.

After the memory instructions come the arithmetic instructions, which I’ll cover next.