The instructions again where: [1]

Opcode Number | Opcode Neumonic | Instruction | comment |
---|---|---|---|

0x0 | MACS | R = A + (X * Y >> 31) | saturation |

0x1 | MACS1 | R = A + (-X * Y >> 31) | saturation |

0x2 | MACW | R = A + (X * Y >> 31) | wraparound |

0x3 | MACW1 | R = A + (-X * Y >> 31) | wraparound |

0x4 | MACINTS | R = A + X * Y | saturation |

0x5 | MACINTW | R = A + X * Y | wraparound (31-bit) |

0x6 | ACC3 | R = A + X + Y | saturation |

0x7 | MACMV | R = A, acc += X * Y | 67 bit accum, you can grab MS 32 bits of LS 32 bits |

0x8 | ANDXOR | R = (A & X) ^ Y | |

0x9 | TSTNEG | R = (A >= Y) ? X : ~X | |

0xa | LIMIT | R = (A >= Y) ? X : Y | |

0xb | LIMIT1 | R = (A < Y) ? X : Y | |

0xc | LOG | ... | |

0xd | EXP | ... | |

0xe | INTERP | R = A + (X * (Y - A) >> 31) | ; saturation |

0xf | SKIP | R,CCR,CC_TEST,COUNT |

Operand Formats:

opcode R,A,X,Y

Important: The Operands always represents an address to a register. The Emu10k1 architeture does not implement indirect, immediate or any other type of addressing mode.

In this section I will try to give a detailed description of each instruction. Since Creative/Emu have never released any official documents, some of the instructions are still not fully understood. As well, in some instance, special quirks arise (such as the ACCUM quirk), more quirk may also be lurking so beware.

If you discover any inacuracies or would like to add something don't hesitate to email me.

MACS:

*Formula:* R = A + (X * Y >> 31)

MACS1:

*Formula:* R = A + (-X * Y >> 31)

*Behaviour:*

These instructions are frequently refered to as "fractional" macs. They performes the multiplication of operands X(or -X) and Y. Of the resulting 63 bit value, the 32 most significant (the "High" value) bits are taken and added with operand A. The result of the operation is then stored in the R operand.

They are called fractional macs because the operation of using the 32 MSB is equivalent to dividing by 2^31. Thus one could say that the formula is also given by: R = A+X*Y/(2^31)

When defining constants to be stored into general purpose registers (GPRs), a special operand can be used for the macs instruction. The "#" indicates to the assembler that the value of the operand should be multiplied by (2^31-1).

For example: #0.5 would be mutiplied (2^31-1) to give a value (in hex) $3fffffff. Fractional macs allow us to implement non-integer values (such as filter coefficients) using integer mathematics. The real numbers (between 1 and -1) are essentially quantized to 2^31 different values, giving us a delta (smallest increment between two values) of 2^(-31)=4.65661287307739e-10 which is more than accurate for most calculations.

For more info on storing values in GPRs see the DC directive.

*Overflow:*

These instructions saturate on overflow.

MACW:

*Formula:* R = A + (X * Y >> 31)

MACW1:

*formula:* R = A + (-X * Y >> 31)

*Behaviour:*

These instruction behave in exactly the same way as the MACS and MASC1 instruction except that it handles overflows differently.

*Overflow Behaviour*

These instructions wraparound upon overflow. A simple illustration, imagine a number system with possible values between -9 and 9.

With saturation 8+2=9

with overflow 8+4=-8

The saturate method results in smaller increases in error with overflow (errors result in noise), a thus at first glance offers a better handling of overflow. However, the saturate method prevents us from using an important property of two's complement arithmetic:

*"If several 2's-complement numbers whose sum would not overflow are added, then the result of 2's-complement accumulation of these numbers is correct, even though intermediate sums might overflow"[4].*

This property can be used in FIR filtering were one has taken care to design the system such that the total output does not overflow. Note that this property does not hold true through a multiplication, however (thus you cannot blindly use this in IIR filtering).

*Formula:* R = A + X * Y

*Behaviour:*

Both these perform an integer mac operation.

*Overflow:*

MACINTS saturates upon overflow.

With MACINTW, the result is wrapped around but the sign bit (bit 31) is zeroed. Essentially the wrap around occurs around bit 30 instead of bit 31 (I have no idea why this would be useful).

*Formula:* R = A + X + Y

*Behaviour*

ACC3 is perty straight forward. It simply sums the three operands placing the result in R. The result is saturated upon overflow.

It should be noted that the accumulation accurs in the High Accumulator.

*Formula:* R = A, acc += X * Y

The MACMV instruction combines a multiply accumulate and a parallel move into one instruction. The result of the X Y multiplication is accumulated into the accumulator. The result must be fetched via a MAC,MACINT , or ACC3 instruction following the series of MACMV instrution. The Accumulator register address (0x56) can only be specified as the A registers, if used in X or Y, the emu10k1 will use 0 instead.

The accumulator is 67 bits wide, and will wraparound on overflow. The ACC3 and MACS will fetch the HIGH accumulator, were as the MACSINT instruction will fetch the LOW accumulator. When fetched, if the accumulator contains value greater than 63 bits one length, the accumulator will be saturated

The MACMV instruction is most useful for FIR filters as it can process each delayed unit in one instruction, including shifting the delays. Hence a N order FIR filter uses just a little over N+1 instructions (plus a few overhead instructions).

*Formula* R = (A & X) ^ Y

*Behaviour*

The ANDXOR can be used to synthesis standard logical instructions. The table below shows some of the logical operations that can be synthesised.[5]

A | X | Y | Result |
---|---|---|---|

A | X | Y | (A AND X)XOR Y |

A | X | 0 | A AND X |

A | 0xFFFFFFFF | Y | A XOR Y |

A | 0xFFFFFFFF | 0xFFFFFFFF | NOT A |

A | X | ~X | A OR Y |

A | X | 0xFFFFFFFF | A NAND X |

The as10k1 assembler is distributed with a file called "emu_constants.asm", this file contains macros with these logical operations already synthesized.

**TSTNEG**

*Formula:* R = (A >= Y) ? X : ~X

**LIMIT**

*Formula:* R = (A >= Y) ? X : Y

**LIMIT1**

*Formula:* R = (A < Y) ? X : Y

*Behaviour:*

The Result of these operations are condition upon the values of A and Y. The result of the TSTNEG instruction will be complemented if A<Y. The Limit and limit1 instructions function in a similar maner, but the value of the resultant can be X or Y depending on the conditional.

*Behaviour:*
The LOG formula converts linear data into a Sign-Exponent-Mantissa form. The size of the exponent (i.e. number of bits occupied by) is variable between 2 and 5 bits. The sign occupies one bit and the mantissa occupies the rest. The LOG instruction can also perform absolute value, negative absolute, as well as negative on the resultant.[2]

The Resultant is stored in the following exponential form:

Sign | Exponent | Mantissa |
---|---|---|

1 bit | 2-5 bits | 29-26 bits |

*Instruction Format:*
LOG R,Lin_data,Max_exponent,Y

Where:

2 <= Max_exponent <= 31 (=0x1f);

And "sign" is a register containing a two bit number that has the following properties:

Value in Sign Operand | Description |
---|---|

0 0 | Normal |

0 1 | absolute value* |

1 0 | negative of absolute value* |

1 1 | negative* |

The way it works [?]:

- 1. The sign bit is stored
- 2. The absolute value is taken of the data
- 3. The data is shifted left ( << )towards the binary point (the MSB).
- 4. Exp = Max_Exp_Size - Num_Of_Shifts
- 5. If MSB=1 and Exp <= Max_Exp_Size then: the implicit MSB is remove by shifting << one more bit and Exp is incremented.
- 6. The Resulting mantissa is Right Shifted by sizeof(Max_Exp_Size)+1
- 7. The sign bit and sign operand are compared and proper action is taken according to the table shown above.

Example of conversions (max_exp=7(3 bits),sign=00):

Linear Value | LOG value | Break Down | |||
---|---|---|---|---|---|

Sign | Exponent | Mantissa | Implicit MSB? | ||

0x40004000 | 0x70001000 | 0 | 7 | 0x0001000 | Yes |

0x20002000 | 0x60001000 | 0 | 6 | 0x0001000 | Yes |

0x10001000 | 0x50001000 | 0 | 5 | 0x0001000 | Yes |

0x08000800 | 0x40001000 | 0 | 4 | 0x0001000 | Yes |

0x04000400 | 0x30001000 | 0 | 3 | 0x0001000 | Yes |

0x02000200 | 0x20001000 | 0 | 2 | 0x0001000 | Yes |

0x01000100 | 0x10001000 | 0 | 1 | 0x0001000 | Yes |

0x00800080 | 0x08000800 | 0 | 0 | 0x8000800 | No |

0xff008000 | 0xf008000f | 1 | 0 | 0x008000f | No |

This instruction performs oposite of the LOG instruction.

*Formula:* R = A + (X * (Y - A) >> 31)

*Behaviour:*

Used for linear interpolating between two points. "X" should be positive and represents a fractional value between 0 and 1. "x" is the fraction of the interval between A and Y where the desired value is located.

The INTERP instruction is not only useful for linear interpolation, it can also be used for rescaling values. In such a case, the input must be bounded by [0,1], the output will be bounded by [A,Y]. Thus the intruction can be though of as:

interp R,MIN,X,MAX

where MIN and MAX are the bounds of your output.

The skip command is available to provide some flow control in the dsp programs. The skip command has the following format:

Skip R,CCR,CC_TEST,COUNT

COUNT is a register containing the number of instructions o skip.

CC_TEST is a register containing a value which indicates under which conditions to skip on.

CCR is address of the CCR register. This can be any register, thus a previously saved CCR can be reused, or a constant can be used to implement an always skip (a NOP).

R is an address to copy the value of the CCR to for future use.

The CC_TEST operand uses one of 4 equations.

Form 1 | ( S * Z * M * B * N * S'* Z' * M' * B' * N') + ( S * Z * M * B * N * S'* Z' * M' * B' * N') + ( S * Z * M * B * N * S'* Z' * M' * B' * N') |
---|---|

Form 2 | ( S + Z + M + B + N + S'+ Z' + M' + B' + N') * ( S + Z + M + B + N + S'+ Z' + M' + B' + N') * ( S + Z + M + B + N + S'+ Z' + M' + B' + N') |

Form 3 | ( S * Z * M * B * N * S'* Z' * M' * B' * N') + ( S * Z * M * B * N * S'* Z' * M' * B' * N') + ( S + Z + M + B + N + S'+ Z' + M' + B' + N') |

Form 4 | ( S + Z + M + B + N + S'+ Z' + M' + B' + N') * ( S + Z + M + B + N + S'+ Z' + M' + B' + N') + ( S * Z * M * B * N * S'* Z' * M' * B' * N') |

Presumably, the two most significant bits of CC_TEST select one of the above equations. The remaining 30 bits act as a mask that controls whether an element is active.

Through informed trial and error, the following CC_TEST have been discovered:

Name | Branch on | formula | CC_TEST |
---|---|---|---|

beq | Equal-to (zero) | ==0 | 0x00000008 |

bne | Not Equal-to | !=0 | 0x00000100 |

blt | less-than | <0 | 0x00000004 |

ble | less-than or equal-to | <=0 | TBD |

bgt | greater-than | >0 | 0x00000180 |

bge | greater-than or equal-to | >=0 | 0x00000180 |

bsa | On saturation | -- | 0x00000010 |

The file "emu_constants.asm" contains macros with these branch-on-condition already synthesized.

The File "emu_constants.asm" contains the following macros:

Name | format | formula |
---|---|---|

and | AND R,src1,src2 | R = src1 & src2 |

or | OR R,src1,src2 | R = src1 | src2 |

xor | XOR R,src1,src2 | R = src1 ^ src2 |

nand | NAND R,src1,src2 | R = (src1&src2)' |

nor | NOR R,src1,src2 | R = (src1|src2)' |

not | NOT R,src1 | R = src1' |

neg | NEG R,src1 | R = -src1 (==src1'+1) |

move | MOVE R,src1 | R = src1 |

test | TEST src1 | null=src1 (sets CCR for skip) |

cmp | CMP src1,src2 | null=src1-src2 (sets CCR for skip) |