Различия между версиями 9 и 10
Версия 9 от 2019-11-30 22:19:38
Размер: 7011
Редактор: FrBrGeorge
Комментарий:
Версия 10 от 2019-11-30 23:34:06
Размер: 7041
Редактор: FrBrGeorge
Комментарий:
Удаления помечены так. Добавления помечены так.
Строка 187: Строка 187:
  '''TODO'''  1. <<EJHSE(117, FractionTruncate, Inexact fraction)>>
Строка 189: Строка 189:
 1.
 1.

06. Mathematical coprocessor

Base lecture in Russian

Real numbers

  • The're no such object IRL
    • ⇒ Always model

    • ⇒ No float1 == float2 (only by coincidence)

Representation

  1. Fixed point: 1231234.21341234

    • Trailing/leading zeros: 1230000000.0/0.0000000123

  2. Floating point: 1.213123·10²³ = 1.213123E23 = 1.213123e+23

    • Normalization $$1<=mantissa<10$$

    • Accidental zeros: 712E+8 + 3e-5 = 71200000000.00003

Real numbers modelling

  • Fixed point: very small range
  • Lexical (string/remainder based): too slow/complex, although perfect

⇒ float point.

  • Binary fixed point: use 2-1, 2-2, 2-3 etc.

  • 155.625 =
    • = 1·27 +0·26+0·25+1·24+1·23+0·22+1·21+1·20+1·2-1+0·2-2+1·2-3 =

    • = 128 + 0 + 0 + 16 + 8 + 0 + 2 + 1 + 0,5 + 0 + 0,125 155.62510 =

    • = 10011011.1012

  • 155.625 = 1.55625·exp10+2 = 10011011.1012·exp20 = 1.0011011101·exp2+111 (1112 = 710)

IEEE_754

But no, they thought they're all smartasses.

IEEE_754

  • S[  E   ][         M           ]
    01110101111100010110101110101111
  • S - sign bit
  • E - biased exponent; 8 bits for 32-bit float

    • E = exponent +127 for 32-bit float
  • M - remainder of mantissa (23 bits for 32-bit float)

    • 223=8388608 , if mantissa > 223, it will loose lower digits

    • 2-normalized float: $$1<= mantissa <2$$

      • ⇒ mantissa always starts from 1, do not store it

    • 2-denormalized float ($$n<=0.5$$): $$0.5<=mantissa<1$$

      • ⇒ mantissa always starts from 0, do not store it

Number

31 bit

30-22 bit

22-0 bit

Hexadecimal

Sign

Biased exponent

Mantissa

155.625 (normalized)

0

10000110

00110111010000000000000

431BA000

-5.23E-39 (denormalized)

1

00000000

01110001111001100011101

8038f31d

  • Signed like integer
  • Zero like integer (hence exponent bias)
  • Double float: 1-bit Sign, 11-bit Exponent, 52-bit Mantissa
  • MARS: «Tools → Floating Point Representation»
  • IEEE 754 is mathematically and practically awful! longread in Russian

    • ⇒ NaN, Inf etc.

FPU / C1

  • The concept of coprocessor: orthogonal task, data formats, performance, data flow

  • C0 — control coprocessor (later)
  • FPU MIPS:
    1. IEEE 754 /32 /64
    2. 32 dedicated C1 f-registers

    3. =16 d-registers $f0~$f, $f2~$f3 etc., so only $f0, $f2, $f4 ... can be used

Instruction set

  • Memory:

    op

    cop

    ft

    fs

    fd

    funct

    6bits

    5bits

    5bits

    5bits

    5bits

    6bits

    • op = 17

    • cop = 16 for 32-bit and 17 for 654-bit

    • fTarget, fSource, fDestination — f-registers
    • funct — extension

  • Assembler:
    • command.type $fdestination $fsource $ftarget

      • command: add, sub, div, mul

      • type: s or d

           mul.s $f1 $f2 $f8
           add.d $f0 $f0 $f2
    • command.type $fdestination $fsource

      • command: neg, abs, mov, sqrt, movf, movt

        • movf/movt — conditional move

           mov.s $f4 $f7
           sqrt.d $f0 $f4
    • memory: command.type f-register offset(comon-register)

      • l/s (load/store), s/d

           l.s $f1 40($t4)
           s.d $f6 ($t5)
    • registers: command.type common-register f-register

      • mfc1/mtc1 (move from/to C1) , s/d

      • double use 2 common registers (e. g. $t0~$t)

           mtc1.s $t1 $f3
           mfc1.d $t2 $f4
    • float/int conversion: command.type.type f-destination f-source

      • cvt/floor/trunc/round, s/d/w (word, i. e. integer)

      • use f-register only (why ?)
           cvt.w.s $f1 $f1
           floor.w.d $f2 $f4

More complex instructionx

  • Non-atomic conditional jumps
    • comparison: c.le/lt/eq.s/d $fsource.. $ftarget

      • store 1/0 into C1 flag (#0, but there's others, like c.le.s 1 $f0 $f1)

      • ge/gt is reversed lt/le :)

    • jump: bc1t/bc1f label

      • jump if C0 flag 0 is 1/0 (similarly bc1t 1 label for C0 flag 1)

           c.le.s $f0 $f1
           bc1t less
    • Conditional moves:
      • movt/movf rdestination rsource — move conditional register if C1 flag 0 is True/False (also movt $t0 $t1 2)

      • movt/movf.type fdestination fsource (+optinoal flag number) — for f-registers

           c.le.s $f0 $f1
           movt $t4 $t3
           movt.s $f1 $f0
    • Also, common register conditional commands!:
      • slt rdest rsource rtarget (set rdest to 1/0 if rsource is less then/(not) rtarget); used in pseudoinstruction like blt $t0 $t1 label

      • movz/movn rdest rsource rtarget (set rdest to rsource if rtarget is zero/nonsero)

      • movz/movn .s//d fdest fsource rtarget (set fdest to fsource if rtarget is zero/nonsero)

Examples

Calculate a square root from integer

  •    1 .data
       2 src:    .word   100
       3 dst:    .float  0
       4 idst:   .word   0
       5 .text
       6         lw      $t0 src         # source integer
       7         mtc1    $t0 $f2         # store to FPU
       8         cvt.s.w $f2 $f2         # convert to single-sized float
       9         mtc1    $zero $f0       # zero in $f0 (non need to convert)
      10         c.lt.s  $f2 $f0         # check if <0
      11         bc1t    nosqrt          # no root then
      12         sqrt.s  $f2 $f2
      13 nosqrt: s.s     $f2 dst         # store float result
      14         cvt.w.s $f2 $f2         # convert to integer
      15         mfc1    $t0 $f2         # get from FPU
      16         sw      $t0 idst        # store integer result
    

Caution: lt vs. 1t sucks :(

Caclulate $$e$$ as infinite sum of $$sum_(n=1)^infty 1/(n!)$$

   1 .data
   2 one:    .double 1
   3 ten:    .double 10
   4 .text
   5         l.d     $f2 one         # 1
   6         sub.d   $f4 $f4 $f4     # n
   7         mov.d   $f6 $f2         # n!
   8         mov.d   $f8 $f2         # here will be e
   9         l.d     $f10 ten        # here will be ε
  10         mov.d   $f0 $f2         # decimal length K
  11         li      $v0 5
  12         syscall
  13 
  14 enext:  blez    $v0 edone       # 10**(K+1)
  15         mul.d   $f0 $f0 $f10
  16         subi    $v0 $v0 1
  17         b       enext
  18 edone:  div.d   $f10 $f2 $f0    # ε
  19 
  20 loop:   add.d   $f4 $f4 $f2     # n=n+1
  21         mul.d   $f6 $f6 $f4     # n!=(n-1)!*n
  22         div.d   $f0 $f2 $f6     # next summand
  23         add.d   $f8 $f8 $f0
  24         c.lt.d  $f0 $f10        # next summand < ε
  25         bc1f    loop
  26 
  27         li      $v0 3           # output a double
  28         mov.d   $f12 $f8        # $f12 by syscall standard
  29         syscall

H/W

  1. EJudge: CubicRoot 'Cubical root'

    Input double (positive or negative) float $$1<=|A|<=1000000$$ and $$0.00001<=varepsilon<=0.01$$. Calculate a cubical root of A with closeness $$<=varepsilon$$ (you do not need to round the result). HINT: you always can calculate a cubic power of something!

    Input:

    1000
    0.0001
    Output:

    9.99995
  2. EJudge: FractionTruncate 'Inexact fraction'

    Input three cardinals — A, B and n. Output double float F that has exact n decimal places of A/B. You need to write a subroutine than accepts double f=A/B in $f12 and integer n in $a0 and returns rounded double F in $f0. Hint: $$10^n*A/B < 2^31$$

    Input:

    123
    456
    7
    Output:

    0.2697368
  3. EJudge: LeibPi 'Caclulating Pi'

    Calculate π value using Leibniz_formula_for_π accurate to N decimal places. Input N, output the result. Use function defied in ../Homework_FractionTruncate to truncate out other digits. Keep in mind that the exact formula is calculating π/4, you probably should start with 4 instead 1 to gain exact accuracy. Warning: the algorithm is slow, do not panic, but keep code as simple as possible.

    Input:

    4
    Output:

    3.1416

HSE/ArchitectureASM/06_MathCoprocessor (последним исправлял пользователь FrBrGeorge 2019-11-30 23:34:06)