Hello and welcome to Part 11 of my Beginning Logic Design series. In this episode, I will continue implementing the CPU I planned and stared in the previous post.

The first CPU operations I’d like to have working are going to be my LOAD and STORE type instructions, as these provide the basic reading and writing operations to interact with my system bus. This design will not be very efficient nor the most clever implementation, but it will work!

The LOAD Instruction

I want each of my registers to have the same LOAD capabilities, in contrast to the 6502 instruction set which has 8 types of LOAD instructions for A and 5 for X and Y.

I have 5 types of load instruction in mind:

Load register with next byte in program code (Immediate Load)
Load register using next two bytes of program code as a memory address. (Memory Load)
Load register using next byte as the upper 4 bits of a memory address, and the A register as the lower 4 bits (A indexed load)
Load register using next byte as the upper 4 bits of a memory address, and the B register as the lower 4 bits (B indexed load)
Load register using next byte as the upper 4 bits of a memory address, and the C register as the lower 4 bits (C indexed load)

With my 3 registers and these 5 different types of load instructions, this consumes 15 of the 16 possible LOAD instructions.

0 - Immediate Load A
1 - Immediate Load B
2 - Immediate Load C
3 - Memory Load A
4 - Memory Load B
5 - Nemory Load C
6 - A Index Load A
7 - A Index Load B
8 - A Index Load C
9 - B Index Load A
a - B Index Load B
b - B Index Load C
c - C Index Load A
d - C Index Load B
e - C Index Load C
f - undefined

0 - Immediate Load A
1 - Immediate Load B
2 - Immediate Load C
3 - Memory Load A
4 - Memory Load B
5 - Nemory Load C
6 - A Index Load A
7 - A Index Load B
8 - A Index Load C
9 - B Index Load A
a - B Index Load B
b - B Index Load C
c - C Index Load A
d - C Index Load B
e - C Index Load C
f - undefined

0 - Immediate Load A
1 - Immediate Load B
2 - Immediate Load C
3 - Memory Load A
4 - Memory Load B
5 - Nemory Load C
6 - A Index Load A
7 - A Index Load B
8 - A Index Load C
9 - B Index Load A
a - B Index Load B
b - B Index Load C
c - C Index Load A
d - C Index Load B
e - C Index Load C
f - undefined

Immediate Load

With a rough plan, I’m ready to start implementing! My first goal is to just get the Immediate Load A instruction to work. I’ll write a small program that should load A with 00 then load it with 42.

c0 00
c0 42

c0 00
c0 42

c0 00
c0 42

Now I’ll start building out what will end up being a huge tree of case statements implementing my various operations. This isn’t the most elegant way to organize the code, but it’s simple and it will work for a start.

PERFORM: begin
  case (op_type)
    LOAD: begin
      case (instruction[3:0])
        0: begin
          if (!read) begin
            read <= 1;
            address_bus <= program_counter + 1;
          end else begin
            read <= 0;
            a <= data_bus;
            program_counter += 2;
            state <= FETCH;
          end
        end
      endcase
    end
  endcase
end

PERFORM: begin
case (op_type)
LOAD: begin
case (instruction[3:0])
0: begin
if (!read) begin
read <= 1;
address_bus <= program_counter + 1;
end else begin
read <= 0;
a <= data_bus;
program_counter += 2;
state <= FETCH;
end
end
endcase
end
endcase
end

PERFORM: begin
  case (op_type)
    LOAD: begin
      case (instruction[3:0])
        0: begin
          if (!read) begin
            read <= 1;
            address_bus <= program_counter + 1;
          end else begin
            read <= 0;
            a <= data_bus;
            program_counter += 2;
            state <= FETCH;
          end
        end
      endcase
    end
  endcase
end

Similar to the fetch state this will be a two cycle operation. First the CPU starts a memory read request for the next byte in program code, on the next cycle the result is stored into the program counter. The program_counter then gets incremented to the next address after the opcode and its parameter.

Testing this bit of code in simulation verifies it works as intended!

From here we can use our copy-pasta skills to do the same for the immediate load operations for the B and C registers.

LOAD: begin
  case (instruction[3:0])
    0: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        a <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
    1: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        b <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
    2: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        c <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
  endcase
end

LOAD: begin
case (instruction[3:0])
0: begin
if (!read) begin
read <= 1;
address_bus <= program_counter + 1;
end else begin
read <= 0;
a <= data_bus;
program_counter += 2;
state <= FETCH;
end
end
1: begin
if (!read) begin
read <= 1;
address_bus <= program_counter + 1;
end else begin
read <= 0;
b <= data_bus;
program_counter += 2;
state <= FETCH;
end
end
2: begin
if (!read) begin
read <= 1;
address_bus <= program_counter + 1;
end else begin
read <= 0;
c <= data_bus;
program_counter += 2;
state <= FETCH;
end
end
endcase
end

LOAD: begin
  case (instruction[3:0])
    0: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        a <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
    1: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        b <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
    2: begin
      if (!read) begin
        read <= 1;
        address_bus <= program_counter + 1;
      end else begin
        read <= 0;
        c <= data_bus;
        program_counter += 2;
        state <= FETCH;
      end
    end
  endcase
end

I’ll write a new program to test this out:

c0 aa
c1 bb
c2 cc

c0 aa
c1 bb
c2 cc

c0 aa
c1 bb
c2 cc

Stepping through these opcodes, I should end up with the A register set to aa, B to bb and C to cc.

Woohoo! These load commands work and were not too difficult to implement. At this point I’m feeling pretty excited about my first CPU design.

Memory Load

For my next trick, I will implement my memory load operations. These will fetch a memory address after the current instruction and set the register to the number at that location. This operation is going to take more than two CPU cycles. With this in mind, I’m going to add a new internal variable logic [1:0] cycle; to track each CPU cycle. In my FETCH, state, I will set cycle to 0 as I transition to the PERFORM state so that all instructions can use this same variable. After the main case statement within PERFORM, I’ll add cycle++; to increment cycle every clock cycle.

Next, before I implement my memory load operations, I’ll modify the immediate load implementation to follow this model for consistency.

LOAD: begin
  case (instruction[3:0])
    0: begin
      case (cycle)
        0: begin
          read <= 1;
          address_bus <= program_counter + 1;
        end
        1: begin
          read <= 0;
          a <= data_bus;
          program_counter += 2;
          state <= FETCH;
        end
      endcase
    end
  ...

LOAD: begin
case (instruction[3:0])
0: begin
case (cycle)
0: begin
read <= 1;
address_bus <= program_counter + 1;
end
1: begin
read <= 0;
a <= data_bus;
program_counter += 2;
state <= FETCH;
end
endcase
end
...

LOAD: begin
  case (instruction[3:0])
    0: begin
      case (cycle)
        0: begin
          read <= 1;
          address_bus <= program_counter + 1;
        end
        1: begin
          read <= 0;
          a <= data_bus;
          program_counter += 2;
          state <= FETCH;
        end
      endcase
    end
  ...

Now for the memory load! It will start off identical to the immediate load by reading the next byte in code.

// Memory load A
3: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
  endcase
end

// Memory load A
3: begin
case (cycle)
0: begin
read <= 1;
address_bus <= program_counter + 1;
end
endcase
end

// Memory load A
3: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
  endcase
end

On the next cycle I’ll have the most significant address byte returned via the data bus and I’ll need another register to store it. I’ll add logic [7:0] x; near my other CPU internal registers and request the next byte.

1: begin
  x <= data_bus;
  address_bus <= program_counter + 2;
end

1: begin
x <= data_bus;
address_bus <= program_counter + 2;
end

1: begin
  x <= data_bus;
  address_bus <= program_counter + 2;
end

On the 3rd cycle, I’ll have the lower address byte and can concatenate it with the x register to read that memory address.

2: begin
  address_bus <= {x,data_bus};
end

2: begin
address_bus <= {x,data_bus};
end

2: begin
  address_bus <= {x,data_bus};
end

Finally, on the last cycle of the operation, I will have the value from the specified memory location on the data_bus. I can store that value, increment the program_counter by the total length of the instruction, clear the read signal and transition back into FETCH.

3: begin
  program_counter += 3;
  read <= 0;
  a <= data_bus;
  state <= FETCH;
end

3: begin
program_counter += 3;
read <= 0;
a <= data_bus;
state <= FETCH;
end

3: begin
  program_counter += 3;
  read <= 0;
  a <= data_bus;
  state <= FETCH;
end

Now to test it! I’ll extend my previous program to include this new operation. It’ll load the first byte of the program into A.

Alright! It does successfully pull the memory address and uses it to load the value at that address into the register. With some more copy paste I can replicate this for the B and C registers.

Offset Memory Load

With the basic memory load operation figured out, the offset memory load is a small modification. I only need to read the most significant address byte then I can concatenate that with the appropriate register to read the desired offset address.

// A offset load A
6: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
    1: begin
      address_bus <= {data_bus, a};
    end
    2: begin
      program_counter += 2;
      read <= 0;
      a <= data_bus;
      state <= FETCH;
    end
  endcase
end

// A offset load A
6: begin
case (cycle)
0: begin
read <= 1;
address_bus <= program_counter + 1;
end
1: begin
address_bus <= {data_bus, a};
end
2: begin
program_counter += 2;
read <= 0;
a <= data_bus;
state <= FETCH;
end
endcase
end

// A offset load A
6: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
    1: begin
      address_bus <= {data_bus, a};
    end
    2: begin
      program_counter += 2;
      read <= 0;
      a <= data_bus;
      state <= FETCH;
    end
  endcase
end

As before I can duplicate this for the various permutations of the load command. I validated this in the simulator as well and it looks to work just as intended.

Store Operations

The STORE operations are nearly identical to the load operations, though there are not Immediate Store instructions. Because the operations are so similar I will actually even use the same numbers for the lower 4 operation bits to indicate the types of operation.

0 - undefined
1 - undefined
2 - undefined
3 - Memory Store A
4 - Memory Store B
5 - Nemory Store C
6 - A Index Store A
7 - A Index Store B
8 - A Index Store C
9 - B Index Store A
a - B Index Store B
b - B Index Store C
c - C Index Store A
d - C Index Store B
e - C Index Store C
f - undefined

0 - undefined
1 - undefined
2 - undefined
3 - Memory Store A
4 - Memory Store B
5 - Nemory Store C
6 - A Index Store A
7 - A Index Store B
8 - A Index Store C
9 - B Index Store A
a - B Index Store B
b - B Index Store C
c - C Index Store A
d - C Index Store B
e - C Index Store C
f - undefined

0 - undefined
1 - undefined
2 - undefined
3 - Memory Store A
4 - Memory Store B
5 - Nemory Store C
6 - A Index Store A
7 - A Index Store B
8 - A Index Store C
9 - B Index Store A
a - B Index Store B
b - B Index Store C
c - C Index Store A
d - C Index Store B
e - C Index Store C
f - undefined

I’ll first implement the Memory Store A operation. It starts off pretty similar to the load, as it needs to begin by reading the memory address from the program code.

0: begin
  read <= 1;
  address_bus <= program_counter + 1;
end
1: begin
  x <= data_bus;
  address_bus <= program_counter + 2;
end

0: begin
read <= 1;
address_bus <= program_counter + 1;
end
1: begin
x <= data_bus;
address_bus <= program_counter + 2;
end

0: begin
  read <= 1;
  address_bus <= program_counter + 1;
end
1: begin
  x <= data_bus;
  address_bus <= program_counter + 2;
end

On the next cycle, I’ll have the full address and can stop reading to start writing A to the data_bus. On the last cycle I’ll increment the program counter and return to FETCH to grab the next bit of code.

2: begin
  address_bus <= {x,data_bus};
  read <= 0;
  write <= 1;
  write_data <= a;
end
3: begin
  program_counter += 3;
  write <= 0;
  state <= FETCH;
end

2: begin
address_bus <= {x,data_bus};
read <= 0;
write <= 1;
write_data <= a;
end
3: begin
program_counter += 3;
write <= 0;
state <= FETCH;
end

2: begin
  address_bus <= {x,data_bus};
  read <= 0;
  write <= 1;
  write_data <= a;
end
3: begin
  program_counter += 3;
  write <= 0;
  state <= FETCH;
end

Easy enough! I’ll extend my last program to end with an operation to write A to the first byte of my RAM.

c0 02
c6 80
d3 00 00

c0 02
c6 80
d3 00 00

c0 02
c6 80
d3 00 00

Via simulation I can confirm it’s stashing the A register into the first byte of RAM.

As before we can use this as the basis for the memory store calls for the B and C registers.

Offset Store Operations

As the offset load was a small variation on memory load, the same will be true for offset store. With some small modifications to the regular memory store call, the offset store is easily implemented.

// A offset store A
6: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
    1: begin
      address_bus <= {data_bus, a};
      read <= 0;
      write <= 1;
      write_data <= a;
    end
    2: begin
      program_counter += 2;
      write <= 0;
      state <= FETCH;
    end
  endcase
end

// A offset store A
6: begin
case (cycle)
0: begin
read <= 1;
address_bus <= program_counter + 1;
end
1: begin
address_bus <= {data_bus, a};
read <= 0;
write <= 1;
write_data <= a;
end
2: begin
program_counter += 2;
write <= 0;
state <= FETCH;
end
endcase
end

// A offset store A
6: begin
  case (cycle)
    0: begin
      read <= 1;
      address_bus <= program_counter + 1;
    end
    1: begin
      address_bus <= {data_bus, a};
      read <= 0;
      write <= 1;
      write_data <= a;
    end
    2: begin
      program_counter += 2;
      write <= 0;
      state <= FETCH;
    end
  endcase
end

With the STORE and LOAD operations implemented I will call it a wrap for this post. As always I would love any feedback or questions you may have. Keep tinkering!

Save

This is a cached copy of a post I have not migrated to the new design. Some links may no longer work.

Beginning Logic Design – Part 11