Chapter 10 -- assembly
MIPS floating point hardware
-----------------------------
Floating point arithmetic could be done by hardware, or by software.
Hardware is fast, and takes up chip real estate.
Software is slow, but takes up no space (memory for the software --
an insignificant amount)
An assembly language programmer cannot tell which is being used,
except if calculations are quite lengthy and then there could
be a noticeable time difference. Software could be 100 to 1000
(or more) times slower.
The MIPS specifies and offers a HW approach.
All the control HW and integer arithmetic HW is located on 1 VLSI
chip. That packs it full. So, the MIPS architecture is designed
that other chips can accept instructions and execute them. These
other chips are called coprocessors. The integer one is called
C0 (coprocessor 0). One that does fl. pt. arithmetic is called
C1 (coprocessor 1).
Alternative name: C0 is the R2000
C1 is the R2010
-------- --------
| | | |
| C0 | | C1 |
| | | |
-------- --------
| |
|--------------|
|
--------
| |
| MEM |
| |
--------
C1 "listens" to the instruction sequence. It partially decodes
each instruction. When it gets one that is meant for it to execute,
it executes it. At the same time, C0 ignores the instruction
meant for C1 (for the correct amount of time) and then fetches
another instruction.
Just as there are registers meant for integers, there are registers
meant for floating pt. values.
C1 has 32, 32 bit registers.
Integer instructions have no access to these registers, just as
fl. pt. instructions have no access to the C0 registers.
The fl. pt. registers must be used in restricted ways. An explanation:
to comply with the IEEE standard for fl. pt. arithmetic, the HW
must support 2 fl. pt. types, single precision and double precision.
We have only discussed (and will only use) single precision.
That means that 1 fl. pt. number fits into 1 fl. pt. register.
And, a double precision fl. pt. number requires 2 fl. pt registers,
since double precision numbers are 64 bits long.
So, if a sgl. prec. number is to be stored, it is always placed
in the least significant word of a pair of registers.
bit 31 . . . 0
--------------
f0 | |
+------------+
f1 | |
+------------+
.
.
.
+------------+
f29 | |
+------------+
f30 | |
+------------+
f31 | |
--------------
This means that for the purposes of storing fl. pt. values in registers,
there are only really 16. . .the even numbered ones. You must use
the number corresponding to which of the 32 registers it is, but only
use even numbered ones.
Instuctions that the coprocessor has:
load/store
move
fl. pt. operations
load/store instructions
-----------------------
lwc1 ft, x(rb)
Address of data is x + (rb) -- note that rb is an R2000 register
Read the data, and place it into fl. pt. register ft.
Address calculation is the same. Where the data goes is different.
move instructions
-----------------
mtc1 rt, fs
Move contents of R2000 register rt into fl. pt. register fs.
This is really a copy operation. No translation is done.
It is a bit copy.
mfc1 rt, fs
Move contents fl. pt. register fs into of R2000 register rt.
This is really a copy operation. No translation is done.
It is a bit copy.
floating point arithmetic instructions
--------------------------------------
add, subtract, multiply, divide -- each specifies 3 fl. pt. registers.
convert -- single precision to double precision
double precision to single precision
2's comp. (called fixed point format) to single precision
etc.
These operations convert and move data within the fl. pt.
registers.
To do a convert like was given in SAL, must convert then
move, or move (from R2000) then convert.
comparison operation -- set a bit, or a set of bits based on a comparison
such that a branch instruction can use the information.
THE ASSEMBLY PROCESS
--------------------
-- a computer understands machine code
-- people (and compilers) write assembly language
assembly ----------------- machine
source --> | assembler | --> code
code -----------------
an assembler is a program -- a very deterministic program --
it translates each instruction to its machine code.
in the past, there was a one-to-one correspondence between
assembly language instructions and machine language instructions.
this is no longer the case. Assemblers are now-a-days made more
powerful, and can "rework" code.
assembler starts at the top of the source code program,
and SCANS. It looks for
-- directives (.data .text .space .word .byte .float )
-- instructions
IMPORTANT:
there are separate memory spaces for data and instructions.
the assembler allocates them IN SEQENTIAL ORDER as it scans
through the source code program.
the starting addresses are fixed -- ANY program will be assembled
to have data and instructions that start at the same address.
EXAMPLE
.data
a1: .word 3
a2: .byte '\n'
a3: .space 5
address contents
0x00001000 0x00000003
0x00001004 0x??????0a
0x00001008 0x????????
0x0000100c 0x???????? (the 3 MSbytes are not part of the declaration)
the assembler will align data to word addresses unless you specify
otherwise!
simple example of machine code generation for simple instruction:
assembly language: addi $8, $20, 15
^ ^ ^ ^
| | | |
opcode rt rs immediate
machine code format
31 15 0
-----------------------------------------
| opcode | rs | rt | immediate |
-----------------------------------------
opcode is 6 bits -- it is defined to be 001000
rs is 5 bits, encoding of 20, 10100
rt is 5 bits, encoding of 8, 01000
so, the 32-bit instruction for addi $8, $20, 15 is
001000 10100 01000 0000000000001111
re-spaced:
0010 0010 1000 1000 0000 0000 0000 1111
OR
0x 2 2 8 8 0 0 0 f
MAL --> TAL
-----------
What we've discussed so far is really a simplified version
of what an assembler does. What complicates matters is
the computations that need to be done if several files (modules)
of assembly language code are all part of the same program,
and we want to assemble them separately.
partial review:
the assembler's job is to
1. assign addresses
2. generate machine code
a simple assembler will make 2 complete passes over the data
to complete this task.
pass 1: create complete SYMBOL TABLE
generate machine code for instructions other than
branches, jumps, jal, la, etc. (those instructions
that rely on an address for their machine code).
pass 2: complete machine code for instructions that didn't get
finished in pass 1.
now, for some details about MAL/MIPS and the assembler.
MAL -- the instructions accepted by the assembler
TAL -- a subset of MAL. These are instructions that
can be directly turned into machine code.
There are lots of MAL instructions that have no direct TAL
equivalent.
How to determine whether an instruction is a TAL instruction or not:
look in appendix C. If the instruction is there, then
it is a TAL instruction.
The assembler takes (non MIPS) MAL instructions and synthesizes
them with 1 or more MIPS instructions.
Some examples:
mul $8, $17, $20
becomes
mult $17, $20
mflo $8
why? because the MIPS architecture has 2 registers that
hold results for integer multiplication and division.
They are called HI and LO. Each is a 32 bit register.
mult places the least significant 32 bits of its result
into LO, and the most significant into HI.
operation of mflo, mtlo, mfhi, mthi
addressing modes do not exist in TAL!
lw $8, label
becomes
la $8, label
lw $8, 0($8)
which becomes
lui $8, 0xMSpart of label
ori $8, $8, 0xLSpart of label
lw $8, 0($8)
or
lui $8, 0xMSpart of label
lw $8, 0xLSpart of label($8)
instructions with immediates are synthesized with other
instructions
add $sp, $sp, 4
becomes
addi $sp, $sp, 4
because an add instruction requires 3 operands in registers.
addi has one instruction that is immediate.
AN EXAMPLE:
.data
a1: .word 3
a2: .word 16:4
a3: .word 5
.text
__start: la $6, a2
loop: lw $7, 4($6)
mult $9, $10
b loop
done
SOLUTION:
Symbol table
symbol address
---------------------
a1 0040 0000
a2 0040 0004
a3 0040 0014
__start 0080 0000
loop 0080 0008
memory map of data section
address contents
hex binary
0040 0000 0000 0003 0000 0000 0000 0000 0000 0000 0000 0011
0040 0004 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0008 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 000c 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0010 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0014 0000 0005 0000 0000 0000 0000 0000 0000 0000 0101
translation to TAL code
.text
__start: lui $6, 0x0040 # la $6, a2
ori $6, $6, 0x0004
loop: lw $7, 4($6)
mult $9, $10
beq $0, $0, loop # b loop
ori $2, $0, 10 # done
syscall
memory map of text section
address contents
hex binary
0080 0000 3c06 0040 0011 1100 0000 0110 0000 0000 0100 0000 (lui)
0080 0004 34c6 0004 0011 0100 1100 0110 0000 0000 0000 0100 (ori)
0080 0008 8cc7 0004 1000 1100 1100 0111 0000 0000 0000 0100 (lw)
0080 000c 012a 0018 0000 0001 0010 1010 0000 0000 0001 1000 (mult)
0080 0010 1000 fffd 0001 0000 0000 0000 1111 1111 1111 1101 (beq)
0080 0014 3402 000a 0011 0100 0000 0010 0000 0000 0000 1010 (ori)
0080 0018 0000 000c 0000 0000 0000 0000 0000 0000 0000 1100 (syscall)
EXPLANATION:
The assembler starts at the beginning of the ASCII source
code. It scans for tokens, and takes action based on those
tokens.
--- .data
A directive that tells the assembler that what will come next
are to be placed in the data portion of memory.
--- a1:
A label. Put it in the symbol table. Assign an address.
Assume that the program data starts at address 0x0080 0000.
branch offset computation.
at execution time (for taken branch):
contents of PC + sign extended offset field | 00 --> PC
PC points to instruction after the beq when offset is added.
at assembly time:
byte offset = target addr - ( 4 + beq addr )
= 00800008 - ( 00000004 + 00800010 ) (hex)
(ordered to give POSITIVE result)
0000 0000 1000 0000 0000 0000 0001 0100
- 0000 0000 1000 0000 0000 0000 0000 1000
------------------------------------------
0000 0000 0000 0000 0000 0000 0000 1100 (byte offset)
1111 1111 1111 1111 1111 1111 1111 0011
+ 1
-----------------------------------------
1111 1111 1111 1111 1111 1111 1111 0100 (-12)
we have 16 bit offset field.
throw away least significant 2 bits
(they should always be 0, and they are added
back at execution time)
1111 1111 1111 1111 1111 1111 1111 0100 (byte offset)
becomes
11 1111 1111 1111 01 (offset field)
jump target computation.
at execution time:
most significant 4 bits of PC || target field | 00 --> PC
(26 bits)
at assembly time, to get the target field:
take 32 bit target address,
eliminate least significant 2 bits (word address!)
eliminate most significant 4 bits
what remains is 26 bits, and it goes in the target field