Chapter 8 -- MAL and registers
REGISTERS and MAL
-----------------
An introduction to the subject of registers -- from a motivational
point of view.
This lecture is an attempt to explain a bit about why computers
are designed (currently) the way they are. Try to remember that
speed of program execution is an important goal. Desire for increased
speed drives the design of computer hardware.
The impediment to speed (currently): transfering data to and from
memory.
look at a SAL instruction:
add x, y, z
-x, y, and z must all be addresses of data in memory.
-each address is 32 bits.
-so, this instruction requires more than 96 bits.
if each read from memory delivers 32 bits of data,
then it takes a lot of reads before this instruction can
be completed.
at least 3 for instruction fetch
1 to load x
1 to load y
1 to store z
that's 6 transactions with memory for 1 instruction!
How bad is the problem?
Assume that a 32-bit 2's complement addition takes 1 time unit.
A read/write from/to memory takes about 10 time units.
So we get
fetch instruction: 30 time units
decode 1 time unit
load x 10 time units
load y 10 time units
add 1 time unit
store z 10 time units
---------------------------------
total time: 62 time units
60/62 = 96.7 % of the time is spent doing memory operations.
what do we do to reduce this number?
1. transfer more data at one time
if we transfer 2 words at one time, then it only takes 2 reads
to get the instruction. There is no savings in loading/storing
the operands. And, an extra word worth of data is transferred
for each load, a waste of resources.
So, this idea would give a saving of 1 memory transaction.
2. modify instructions such that they are smaller.
This was common on machines from more than a decade ago.
Here's how it works:
SAL implies what is called a 3-address machine. Each
arithmetic type instruction contains 3 operands, 2 for sources
and 1 for the destination of the result.
To reduce the number of operands (and thereby reduce the number
of reads for the instruction fetch), develop an instruction set
that uses 2 operands for arithemtic type instructions.
(Called a 2-address machine.)
Now, instead of add x, y, z
we will have load x, z (puts the value of z into x)
add x, y ( x <- x + y )
so, arithmetic type instructions always use one of the operands
as both a source and a destination.
There's a couple of problems with this approach:
- where 1 instruction was executed before, 2 are now executed.
It actually takes more memory transactions to execute this sequence!
at least 2 to fetch each instruction
1 for each of the load/storing of the operands themselves.
that is 8 reads/writes for the same sequence.
So, allow only 1 operand -- called a 1-address format.
now, the instruction add x, y, z will be accomplished
by something like
load z
add y
store x
to facilitate this, there is an implied word of storage
associated with the ALU. All results of instructions
are placed into this word -- called an ACCUMULATOR.
the operation of the sequence:
load z -- place the contents of address z into the accumulator
(sort of like if you did move accumulator, z in SAL)
add y -- implied operation is to add the contents of the
accumulator with the operand, and place the result
back into the accumulator.
store x-- place the contents of the accumulator into the location
specified by the operand.
Notice that this 1-address instruction format implies the use
of a variable (the accumulator).
How many memory transactions does it take?
2 -- (load) at least 1 for instruction fetch, 1 for read of z
2 -- (add) at least 1 for instruction fetch, 1 for read of y
2 -- (store) at least 1 for instruction fetch, 1 for write of x
---
6 the same as for the 3 address machine -- no savings.
BUT, what if the operation following the add was something like
div x, x, 3
then, the value for x is already in the accumulator, and the
code on the 1 address machine could be
load z
add y
div 3
store x
there is only 1 extra instruction (2 memory transactions) for this
whole sequence!
On the 3-address machine: 12 transactions (6 for each instr.)
On the 1-address machine: 8 transactions (2 for each instr.)
REMEMBER this: the 1 address machine uses an extra word of storage
that is located in the CPU.
the example shows a savings in memory transactions
when a value is re-used.
3. shorten addresses. This restricts where variables can be placed.
First, make each address be 16 bits (instead of 32). Then
add x, y, z
requires 2 words for instruction fetch.
Shorten addresses even more . . . make them each 5 bits long.
Problem: that leaves only 32 words of data for operand storage.
So, use extra move instructions that allow moving data from
a 32 bit address to one of these special 32 words.
Then, the add can fit into 1 instruction.
NOW, put a couple of these ideas together.
Use of storage in CPU (accumulator) allowed re-use of data.
Its easy to design -- put a bunch of storage in the CPU --
call them REGISTERS. How about 32 of them? Then, restrict
arithmetic instructions to only use registers as operands.
add x, y, z
becomes something more like
load reg10, y
load reg11, z
add reg12, reg11, reg10
store x, reg12
presuming that the values for x, y, and z can/will be used again,
the load operations take relatively less time.
The MIPS R2000 architecture does this. It has
1. 32 32-bit registers.
2. Arithmetic/logical instructions use register values as operands.
A set up like this where arith/logical instr. use only registers
for operands is called a LOAD/STORE architecture.
A computer that allows operands to come from main memory is often
called a MEMORY TO MEMORY architecture, although that term is not
universal.
Load/store architectures are common today. They have the advantages
1. instructions can be fixed length (and short)
2. their design allows (easily permits) pipelining, making load/store
architectures faster
(More about pipelining at the end of the semester)
MAL
---
discussing some of the details of the MIPS architecture, and how
to write assembly language.
MIPS assembly language (or at least MAL) looks a lot like SAL,
except that operands are now in registers.
To reference a register as an operand, use the syntax
$x, where x is the number of the register you want.
Some limitations on the use of registers.
Due to conventions set by the simulator, certain registers are used
for special purposes. It is wise to avoid the use of those registers.
$0 is 0 (use as needed)
$1 is used by the assembler (the simulator in our case)
-- don't use it.
$2-7 are used by the simulator -- don't use them until
you know what they are for and how they are used.
$26-27 Used to implement the mechanism for calling special
procedures that do I/O and take care of other
error conditions (like overflow)
$29 is a stack pointer -- you are automatically allocated
a stack (of words), and the $sp is initialized to
contain the address of the empty word at the top of
the stack at the start of any program.
On to some MAL instructions. Here are descriptions and samples
of only some instructions. There are far too many to be able to
go over each one in detail.
Some sample info for all the examples:
hex address hex contents (opt) assembly lang.
00002000 0011aaee c1: .word 0x0011aaee
00002004 ???????? c2: .space 12
00002008 ????????
0000200c ????????
00002010 00000016 c4: .word 22
00002014 000000f3 c5: .word 0x000000f3
Load/Store
----------
la rt, label # load address
place the address assigned to label into the register rt.
example: la $9, c1
$9 gets the value 0x00002000
lw rt, label # load word
place the word at address label into the register rt.
lw rt, (rb) # load word
place the word at address (rb) into the register rt.
lw rt, x(rb) # load word
place the word at address X + (rb) into the register rt.
example: lw $10, c1
$10 gets the value 0x0011aaee
lb rt, label # load byte
place the byte at address label into the least
significant byte of register rt, and sign extend the value
to the rest of the register.
lb rt, (rb) # load byte
place the byte at address (rb) into the least
significant byte of register rt, and sign extend the value
to the rest of the register.
lb rt, x(rb) # load byte
place the byte at address X + (rb) into the least
significant byte of register rt, and sign extend the value
to the rest of the register.
example: lb $10, c1
on a little endian machine:
presuming $9 contains the value 0x00002000,
$10 gets the value 0xffffffee
sw rt, label # store word
write the contents of register rt to address label
sw rt, (rb) # store word
write the contents of register rt to address (rb)
sw rt, x(rb) # store word
write the contents of register rt to address X + (rb)
example: sw $10, c2
the value 0xffffffee is placed into the word of memory
at address 0x00002004
Branch
------
all the branch instructions for MAL look just like the ones from
SAL! (on purpose). Just be sure that you use one that exists!
The only difference worth mentioning is that the operands are
required to be in registers.
example: beq $20, $23, branchtarget
Compare the values in registers 20 and 23. If the values are
the same, then load the PC with the address branchtarget
If not, then do nothing and fetch the next instruction.
j target # jump target
identical in effect to b target, but the implementation
and execution are different (wrt the machine code).
A branch specifies an offset to be added to the current value
of the PC. A jump gives as many bits of address as possible,
and the remaining ones come from the PC (no addition).
Arithmetic/Logical
------------------
Very much like their SAL equivalents, except that all the operands
are in registers. (No exceptions!)
add rd, rs, rt # rd <- rs + rt (2's complement)
addi rd, rs, immediate # rd <- rs + immediate
example: addi $13, $5, 8
if $5 contains the value 14, then the result in
$13 after execution will be the value 22.
16 bits are available for storing the immediate value.
It is sign extended to 32 bits and then a 2's comp.
add is done.
To think about:
what does the instruction add $8, $12, $0 do?
Answer: copies the value in $12 into $8.
I/O instructions
----------------
There are 3: getc, putc and puts
getc is just get on .byte quantities
putc is just put on .byte quantities
AND, the operand is in a register.
examples:
putc $18 # prints out the character contained in the
# least significant byte of the register
getc $9 # gets one character that the user typed, and
# places it in the least significant byte of
# the register specified.
puts can be used one of 2 ways:
puts str1 # prints the null terminated string labelled str1
OR
puts $13 # prints the null terminated string that begins at
# the address contained in the register specified.
Here's a sample MAL program.
# this simple MAL program reads in 2 characters, figures
# out which one is alphabetically first, and prints it out.
# register assignments
# 8 -- the first character typed by the user
# 9 -- the second character typed by the user
# 10 -- temporary
# 11 -- holds the value of the larger character
# 13 -- the address of the newline character constant
# 14 -- newline character (a constant)
.data
newline: .byte '\n'
.text
__start: getc $8 # get 2 characters
getc $9
la $13, newline # print out newline
lb $14, ($13)
putc $14
sub $10, $9, $8 # figure out which is larger
bgez $10, secondlarger
add $11, $8, $0
b printresult
secondlarger: add $11, $9, $0
printresult: putc $11
end:
done
What has been ignored so far:
how to fit both an opcode and an address in a 32 bit instruction.
first. . .how many bits are "needed" for the opcode?
the number of unique patterns given by n bits is 2 ** n.
So, the problem boils down to deciding how many instructions
are necessary (and desired) for a computer.
arithmetic ( + - * / ) (how many representations?)
logicals (up to 16)
shifting
branches
loads/stores
there are possibly 64 enumerated here, so 6 bits should be enough.
That leaves 26 left for a 32 bit address specification.
Oops! For a load/store instruction, we need a register specification
also (where the data is to come from/go to). That leaves only 21 bits
for an address specification.
a discussion of addressing modes:
The original goal of this discussion was to figure out a way
to fit 32 bit addresses into less than 32 bits.
The discussion is going to be expanded a bit to talk about the
different ways that an instruction could specify where its operands
are.
But first, some way to specify a 32 bit address:
1. A BAD WAY. Use more than 1 word to specify an instruction.
2 words: the first contains the opcode and other operands
the second contains a 32 bit address
This method defeats the whole purpose.
2. Keep the address needed in a register. Then use a register
specification to tell where the address is. The operand
is reached by using the address within the register.
Other methods are variations on this one:
2a. specify 2 registers. The address is obtained by adding
the contents of the 2 registers.
2b. specify 1 register plus a small constant. The address
is obtained by adding the contents of the register plus
the constant.
3. (Not mentioned in the text.) Specify only a constant (offset).
The address is calculated by adding the constant to the
value of the current PC.
4. Use whatever bits are available to specify the least significant
portion of an address. The missing most significant bits
can be taking from the PC.
This implies that the operand (address of) is located in the
same portion of memory as the instruction being executed.
The MIPS architecture uses #2b (exclusively) for address
specification within a load/store instruction. Address specification
for branch instructions uses a variation of #3.
Many computers offer more ways of getting at operands. These
methods are called addressing modes.
load/store architectures usually have a VERY limited set
of addressing modes available
memory to memory architectures often offer LOTS of modes.
This flexibility often forces these machines to have
variable length instructions.
Here are some addressing modes. These names have come under
common usage. REMEMBER, an addressing mode really gives the
information of where an operand is (its address). An instruction
decides how to use the address.
Register. The operand is in the register.
Immediate. The operand is contained within the instruction itself.
Direct. The address of the operand is contained within the
instruction. (This means that extra bits will need
to specify a complete address.)
Register Direct. The address of the operand is contained within
a register given. This is #2.
Base Displacement. Also called indexed or relative.
The address is the sum of the contents of a register
plus a small constant. This is #2b.
Indirect. Adds a level of indirection to direct mode. An address
is specified within the instruction. The contents
of the address are the address of the operand.
A variation might be Register Indirect. The initial
address is located in a register (instead of in the
instruction).
A second look at some MAL load and store instructions:
Load/Store
----------
lw rt, x(rb) # load word
place the word at address X + (rb) into the register rt.
example: lw $10, 0($9)
presuming $9 contains the value 0x00002000,
$10 gets the value 0x0011aaee
lb rt, x(rb) # load byte
place the byte at address X + (rb) into the least
significant byte of register rt, and sign extend the value
to the rest of the register.
example: lb $10, 0($9)
on a little endian machine:
presuming $9 contains the value 0x00002000,
$10 gets the value 0xffffffee
sw rt, x(rb) # store word
write the contents of register rt to address X + (rb)
example: la $11, c2
sw $10, 4($11)
$11 gets the value 0x00002004, then
the value 0xffffffee is placed into the word of memory
at address 0x00002008
MAL programming example (SIMULATED)
-----------------------
# MAL program to print out the alphabet
.data
str1: .asciiz "The alphabet:\n"
# register assignments
# $8 -- the ASCII character code to be printed
# $9 -- the ASCII code for 'z', the ending character
.text
__start: la $10, str1
puts $10
add $8, $0, 97 # $8 gets ASCII code for 'a'
add $9, $0, 122 # $9 gets ASCII code for 'z'
while: bgt $8, $9, all_done
putc $8
add $8, $8, 1
b while
all_done: putc '\n'
done
Another MAL programming example (SIMULATED)
-------------------------------
# a MAL program to print out the ? of a user-entered integer.
.data
# prompts
str1: .asciiz "Enter an integer: "
str2: .asciiz "The result is "
str_error: .asciiz "\nInput error detected. Quitting.\n"
newline: .byte '\n'
# variables
int_array: .word 0:20 # array to hold integer for printing
.text
__start: la $8, str1 # print prompt
puts $8
lb $10, newline # read characters and calculate
li $11, 57 # the integer represented
li $12, 48
getc $9
get_chars: beq $9, $10, got_int # newline char terminates loop
bgt $9, $11, int_error
blt $9, $12, int_error
sub $13, $9, 48 # convert char to digit
mul $14, $14, 10 # int = int * 10 + digit
add $14, $14, $13
getc $9
b get_chars
int_error: la $8, str_error
puts $8
j end_program
got_int:
# $14 -- the integer to be printed
# $15 -- base address of array holding the integer
# $16 -- running address of array element
# $17 -- single digit of the integer
# $18 -- single character of the integer
print_int: la $8, str2
puts $8
la $15, int_array
move $16, $15
more_digits: rem $17, $14, 10
sw $17, ($16)
add $16, $16, 4
div $14, $14, 10
bgtz $14, more_digits
sub $16, $16, 4
bge $16, $15 more_chars # test for result = 0
putc '0'
putc $10 # print newline
more_chars: lw $18, ($16)
add $18, $18, 48
putc $18
sub $16, $16, 4
bge $16, $15, more_chars
end_program: putc $10
done