Software Floating Point








Introduction

Although most embedded applications only require integer arithmetic, some do require floating-point. Therefore software floating-point is supplied with the cross-compiler and the target Forth. The target floating point wordset is not fully ANS compliant, but satisfies the needs of embedded systems without undue complexity. The Forth data stack and the floating point stack are the same. The floating point data storage format is not IEEE format, but is optimised for performance on small controllers. If you need a separate floating point stack or IEEE format storage, please contact MPE. Any variations in the implementation will be documented in the target specific section of the manual.

The cross-compiler has a more limited floating-point support than the target, this means that some words are avaliable within colon definitions, but not outside them.




Source code

The source code is in two sets of files, one for 32 bit Forth targets, the other for 16 bit targets. The files are:


  COMMON\SFP32HI    32 bit primitives
  COMMON\SFP32COM   32 bit high level code
  COMMON\SFP16HI    16 bit primitives
  COMMON\SFP16COM   16 bit high level code

These files use no assembler definitions. Some targets have code versions of the primitives, and these will be found in the CPU specific code directory. A significant increase in performance can be obtained by using the code files.




Entering floating-point numbers

Floating-point numbers can be entered in two forms, 1.234 and 0.1234e1. Floating-point numbers are compiled as literal numbers when in a colon definition and placed on the cross-compiler's stack when outside a definition.




The form of floating-point numbers

A floating-point number is placed on the Forth data stack. In the Forth literature, this is referred to as a combined floating point and data stack. For 32 bit targets, a floating point number consists of two 32-bit numbers, one for the mantissa and one for the exponent. For 16 bit targets, it consists of a 32-bit double mantissa and a single 16-bit exponent. The mantissa is normalised. The exponent is on the top of the stack. Note that for 16 bit targets, number conversion is affected by the cross-compiler directives HOST-MATH and TARGET-MATH. HOST-MATH leaves double numbers and floats in 32-bit form, whereas TARGET-MATH leaves them in 16-bit form.




Creating variables

To create a variable, use FVARIABLE. FVARIABLE works in the same way as VARIABLE. For example, to create a floating-point variable called VAR1 you code:

  FVARIABLE VAR1

When VAR1 is used, it returns the address of the floating-point number.




Accessing variables

Two words are used to access floating-point variables, F@ and F!. These are analogous to @ and !.




Creating constants

To create a floating-point constant, use FCONSTANT. FCONSTANT is analogous to CONSTANT. For example, to generate a floating-point constant called CON1 with a value of 1.234, you enter:

  1.234 FCONSTANT CON1

When CON1 is executed, it returns 1.234 on the Forth stack.




Using the supplied words

The supplied words split into several groups:

  • sines, cosines and tangents
  • arc sines, cosines and tangents
  • arithmetic functions
  • logarithms
  • powers
  • displaying floating-point numbers
  • inputting floating-point numbers

    The following functions only exist as target words so you cannot use them in calculations in your source code when outside a colon definition.

    Calculating sines, cosines and tangents

    To calculate sine, cosine and tangent, use FSIN, FCOS and FTAN respectively. Angles are expressed in radians.

    Calculating arc sines, cosines and tangents

    To calculate arc sine, cosine and tangent, use FASIN, FACOS

    and FATAN respectively. They return an angle in radians.

    Calculating logarithms

    Two words are supplied to calculate logarithms, FLOG and FLN. FLOG calculates a logarithm to base 10 (decimal). FLN calculates a logarithm to base e. Both take a floating-point number in the range from 0 to Einf.

    Calculating powers

    Three power functions are supplied:

      FE^X F10^X X^Y



    Degrees or radians

    The angular measurement used in the trigonometric functions are in radians. To convert between degrees and radians use RAD>DEG or DEG>RAD. RAD>DEG converts an angle from radians to degrees. DEG>RAD converts an angle from degrees to radians.




    Displaying floating-point numbers

    Two words are available for displaying floating-point numbers, F. and E.. The word F. takes a floating-point number from the stack and displays it in the form xxxx.xxxxx or x.xxxxxEyy depending on the size of the number. The word E. displays the number in the latter form.




    Changes from v6.0 to v6.1

    Renamed DINT to F>D for consistency. F>D is the ANS word. The original F>D was just a synonym. Similarly SINT was renamed to F>S.

    The word FLOATS that enabled floating point number conversion has been renamed to REALS to avoid a name conflict with the ANS word of the same name.

    The F-PACK vocabulary has been removed as no one liked it, and it could be considered contrary to the ANS Forth specification. If you wish to retain the F-PACK vocabulary, add the following lines before and after the compilation of the floating point code:

    
      only forth definitions         \ *** added ***
      vocabulary f-pack              \ *** added ***
      also f-pack definition         \ *** added ***
      include %CommonDir%\Sfp32Hi    \ primitives
      include %CommonDir%\Sfp32Com   \ common high level code
      previous definitions           \ *** added ***

    The code enabling floating point to work in degrees or radians has been commented out for ANS compatibility. All trig functions now operate in radians. The commented out code may be uncommented if you need backward compatibility.

    32 bit targets: software floating point

    Overhauled 32 bit software floating point and incorporated improvements contributed by Hiden Analytical. These include more complete special case detection, faster high level code, and more accurate number input and output.

    Removed all use of global variables except PLACES to make the floating point code usable in interrupt routines and in multitasked systems. If the output routines are to be multitasked, change the definition of PLACES from:

      VARIABLE PLACES  8 PLACES !

    to:

      CELL +USER PLACES

    and remember to initialise PLACES before using the floating point output routines.

    Many words that are only useful as factors have been made headerless to save target memory space.

    16 bit targets: software floating point

    Note that the 16 bit floating point pack is not re-entrant. If you need to use the floating point pack in a multitasking system, you should convert the global variables to USER variables. The word +USER can be used

      <size> +USER <name>

    to define a USER variable of a given size (normally a CELL) at the next free offset in the USER area. Only PLACES will need initialisation.




    Glossary

    Basic stack and memory operators

    : F!            \ r addr --
    Stores r at addr

    : F@            \ addr -- r
    Fetches r from addr.

    : F,            \ r --
    Lays a real number into the dictionary, reserving 8 bytes.

    : FDUP          \ r -- r r
    Floating point equivalent of DUP.

    : FOVER         \ r1 r2 -- r1 r2 r1
    Floating point equivalent of OVER.

    : FROT          \ r1 r2 r3 -- r2 r3 r1
    Floating point equivalent of ROT.

    : FPICK         \ fu..f0 u -- fu..f0 fu
    Floating point equivalent of PICK.

    : FROLL         \ f1 f2 f3 --  f2 f3 f1
    Floating point equivalent of ROLL.

    : FSWAP         \ r1 r2 -- r2 r1
    Floating point equivalent of SWAP.

    : FDROP         \ r --
    Floating point equivalent of DROP.

    : FNIP          \ r1 r2 -- r2
    Floating point equivalent of NIP.

    Floating point defining words

    : FVARIABLE     \ "<spaces>name" -- ; Run: -- f-addr
    Use in the form: FVARIABLE <name> to create a variable that will hold a floating point number.

    : FCONSTANT     \ r "<spaces>name" -- ; Run: -- r
    Use in the form: <float> FCONSTANT <name> to create a constant that will return a floating point number.

    : FARRAY        \ "<spaces>name" fn-1..f0 n -- ; Run: n -- rn
    Use in the form: n FARRAY <name> to create a variable that will hold a default floating point number. When the array name is executed, the index i is used to retun the address of the i'th 0 zero-based element in the array. For example, 5 FARRAY TEST will set up 5 array elements each containing 0, and then f n TEST F! will store f in the nth element, and n TEST F@ will fetch it.

    Type conversions

    : NORM          \ n exp -- f
    Normalise a single integer and a single exponent to produce a floating point number. INTERNAL.

    : DNORM         \ d exp -- fn ; normalise a 64 bit double
    Normalise a double integer and a single exponent to produce a floating point number. INTERNAL.

    : FSIGN         \ fn -- |fn| flag ; true if negative
    Return the absolute value of fn and a flag which is true if fn is negative.

    : F>S           \ fn -- n
    Converts a float to a single integer. Note that F>S truncates the number towards zero according to the ANS specification. If |fn| is greater than maxint, +/-maxint is returned.

    : F>D           \ fn -- d
    Converts a float to a single integer. Note that F>D truncates the number towards zero according to the ANS specification. If |fn| is greater than dmaxint, +/-dmaxint is returned.

    : FINT          \ f1 -- f2
    Chop the number towards zero to produce a floating point representation of an integer.

    : S>F           \ n -- fn
    Converts a single integer to a float.

    : D>F           \ d -- fn
    Converts a double integer to a float.

    Arithmetic

    : FNEGATE       \ r1 -- r2
    Floating point negate.

    : ?FNEGATE      \ fn n -- fn|-fn 
    If n is negative, negate fn.

    : FABS          \ fn -- |fn|
    Floating point absolute.

    : F*            \ r1 r2 -- r3
    Floating point multiply.

    : F/            \ r1 r2 -- r3
    Floating point divide.

    : F+            \ r1 r2 -- r3
    Floating point addition.

    : F-            \ r1 r2 -- r3
    Floating point subtraction.

    : FSEPARATE     \ f1 f2 -- f3 f4
    Leave the signed integer quotient f4 and remainder f3 when f1 is divided by f2. The remainder has the same sign as the dividend.

    : FFRAC         \ f1 f2 -- f3
    Leave the fractional remainder from the division f1/f2. The remainder takes the sign of the dividend.

    Relational operators

    : F0<           \ f1 -- flag
    Floating point 0<.

    : F0>           \ f1 -- flag
    Floating point 0>.

    : F0=           \ f1 -- flag 
    Floating point 0=.

    : F0<>          \ f1 -- flag 
    Floating point 0<>.

    : F=            \ f1  f2 -- flag 
    Floating point =.

    : F<            \ r1  r2 -- flag 
    Floating point <.

    : F>            \ f1  f2 -- flag 
    Floating point >.

    : FMAX          \ r1 r2 -- r1|r2 
    Floating point MAX.

    : FMIN          \ r1 r2 -- r1|r2 
    Floating point MIN.

    Rounding

    f# 1.0 fconstant %ONE
    Floating point 1.0.

    : FLOOR         \ r1 -- r2 
    Floored round towards -infinity.

    : FROUND        \ r1 --  r2 
    Round the number to nearest or even.

    Miscellaneous

    : FALIGNED      \ addr -- f-addr 
    Aligns the address to accept an 8-byte float.

    : FALIGN        \ --
    Aligns the dictionary to accept an 8-byte float.

    : FDEPTH        \ -- +n
    Returns the number of floats on the stack.

    : FLOAT+        \ f-addr1 -- f-addr2 
    Increments addr by 8, the size of a float.

    : FLOATS        \ n1 -- n2 
    Returns n2, the size of n1 floats.

    Floating point output

    1 s>f 10 s>f f/ fconstant %.1
    Floating point 0.1.

              1 s>f fconstant %1
    Floating point 1.0.

             10 s>f fconstant %10
    Floating point 10.0.

      1250000000 34 fconstant %10^10
    Floating point 10^10.

     1844674407 -33 fconstant %10^-10
    Floating point 10^-10.

    F# 1.0E256 FCONSTANT %10^256
    Floating point 10^256.

    F# 1.0E-1 FCONSTANT %10E-1
    Floating point 10^-1.

    F# 1.0E-10 FCONSTANT %10E-10
    Floating point 10^-10.

    F# 1.0E-256 FCONSTANT %10^-256
    Floating point 10^-256.

    16 FARRAY POWERS-OF-10E1
    An array of 16 powers of ten starting at 10^0 in steps of 1.

    17 FARRAY POWERS-OF-10E16
    An array of 17 powers of ten starting at 10^0 in steps of 16.

    16 FARRAY POWERS-OF-10E-1
    An array of 16 powers of ten starting at 10^0 in steps of -1.

    17 FARRAY POWERS-OF-10E-16                                                              
    An array of 17 powers of ten starting at 10^0 in steps of -16.

    : RAISE_POWER   \ mant exp -- mant' exp'
    Raise the power in preparation for number formatting.

    : SINK_FRACTION \ mant exp -- mant' exp'
    Reduce the power in preparation for number formatting.

    variable places  8 places !     \ -- addr
    Number of digits output after the decimal point.

    : ROUND         \ f1 -- f2 
    Rounds least significant eight bits to 0 if higher 2 bits are all 0s or all 1s.

    : ?10PWR        \ exp[2] -- exp[2] exp[10] 
    Generate the power of ten corresponding to the power of two. INTERNAL.

    : SIGFIGS       \ fn n -- d dec_exponent
    From fn, generate a double number corresponding to n significant digits and a decimal exponent. INTERNAL.

    : op-prepare    \ fn -- d exp sign
    From fn, generate a double number corresponding to n significant digits, a decimal exponent and a sign indicator (nz=negative). INTERNAL.

    : .EXP          \ exp --
    Display the exponent. INTERNAL.

    : N#            \ d n -- d'
    Convert n digits. INTERNAL.

    : E.            \ n exp --
    Print the f.p. number on the stack in exponential form, x.xxxxxEyy.

    : REPRESENT     \ r c-addr u -- n flag1 flag2 
    Assume that the floating number is of the form +/-0.xxxxEyy. Place the significand xxxxx at c-addr with a maximum of u digits. Return n the signed integer version of yy. Return flag1 true if f is negative, and return flag2 true if the results are valid. In this implementation all errors are handled by exceptions, and so flag2 is always true.

    : F.            \ f --
    Print the f.p. number in free format, xxxx.yyyy, if possible. Otherwise display using the x.xxxxEyy format.

    Floating point input

    : FLITERAL      \ Comp: r -- ; Run: -- r 
    Compiles a float as a literal into the current definition. At execution time, a float is returned. For example, [ %PI F2* ] FLITERAL will compile 2PI as a floating point literal. Note that FLITERAL is immediate.

    : CONVERT-EXP   \ c-addr --
    If the character at c-addr is 'D' convert it to 'E'. INTERNAL.

    : CONVERT-FPCHAR        \ c-addr --
    Convert the f.p. char '.' to the double char ',' for conversion. INTERNAL.

    : ALL-BLANKS?   \ c-addr len -- flag
    Return true if string is all blanks (spaces). INTERNAL.

    : FCHECK        \ -- am lm ae le e-flag .-flag 
    Check the input string at PAD, returning the separated mantissa and exponent flags. The e-flag is returned true if the string contained an exponent indicator 'E' and the .-flag is returned true if a '.' was found. INTERNAL.

    : MNUM          \ c-addr u -- d 2 | 0
    Convert the mantissa string to a double number and 2. If conversion fails, just return 0. INTERNAL.

    : ENUM          \ c-addr u -- n 1 | 0 ; str as above 
    Convert the mantissa string to a single number and 1. If conversion fails, just return 0. INTERNAL.

    : *10^X         \ float dec_exponent -- float'
    Generate float' = float *10^dec_exp. INTERNAL.

    : FIXEXP     \ dmant exp -- mant' exp'
    Convert a double integer mantissa and a single integer exponent into a floating point number. INTERNAL.

    : FNUMBER?      \ addr --  0/.../mant exp 2
    Behaves like the integer version of NUMBER? except that if the number is in F.P. format and BASE is decimal, a floating point conversion is attempted. If conversion is successful, the floating point number is left on the float stack and the result code is 2.

    : >FLOAT        \ c-addr u -- r true|false 
    Try to convert the string at c-addr/u to a floating point number. If conversion is successful, flag is returned true, and a floating number is returned on the float stack, otherwise just flag=0 is returned.

    : (F#)          \ addr -- fn 2 | 0
    The primitive for F# and F#IN below.

    : F#IN          \ -- fn 2 | 0
    Attempts to convert a token from the input stream to a floating-point number. Numbers in integer format will be converted to floating-point. An indicator (0 or 2/3) is returned in the same way as an indicator is returned by FNUMBER?.

    : F#            \ -- [f] ; or compiles it [ state smart ]
    If interpreting, takes text from the input stream and, if possible converts it to a f.p. number on the stack. Numbers in integer format will be converted to floating-point. If compiling, the converted number is compiled.

    : REALS         \ -- ; allow f.p input 
    Switch NUMBER? to permit floating point input using FNUMBER?. This action can be reversed by INTEGERS. Both REALS and INTEGERS are in the FORTH vocabulary.

    : INTEGERS      \ -- ; no f.p input
    Switch NUMBER? to restore integer only input.

    Trigonmetric functions

    N.B. All angles are in radians.

    : DEG>RAD       \ n1 -- n2 
    Convert degrees to radians.

    : RAD>DEG       \ n1 -- n2  
    convert radians to degrees.

    : FSIN          \ f1 -- f2 
    f2=sin(f1).

    : FCOS          \ f1 -- f2 
    f2=cos(f1).

    : FTAN          \ f1 -- f2 
    f2=tan(f1).

    : FASIN         \ f1 -- f2 
    f2=arcsin(f1).

    : FACOS         \ f1 -- f2 
    f2=arccos(f1).

    : FATAN         \ f1 -- f2 
    f2=arctan(f1).

    Power and logarithmic functions

    : FLN           \ f1 -- f2 
    Take the logarithm of f1 to base e and return the result.

    : FLOG          \ f1 -- f2 
    Take the logarithm of f1 to base 10 and return the result.

    : FE^X          \ f1 -- f2
    f2=e^f1.

    : F10^X         \ f1 -- f2 
    f2=10^f1

    : FX^N          \ x-real n-integer -- fx^n
    fx^n=x^n where x is a float and n is an integer.

    : FX^Y          \ x-real y-real -- fn 
    fn=X^Y where Y and Y are both floats.

    : FSQR          \ f1 -- f2 ; FSQR by Heron's formula 
    F2=sqrt(f1) by Heron's formula.




    High Level primitives

    The software floating point pack requires several support primitives. High level versions are provided in SFP16HI.FTH and SFP32HI.FTH for 16 and 32 bit targets. Some targets have coded versions in the CPU directory and these will provide much better performance. The support file should be compiled before the common file.

    : <<1           \ n -- n<<1
    A compiler synonym for 2* or "1 LSHIFT".

    : >>1           \ n -- n>>1
    A compiler synonym for 2/ or "1 RSHIFT".

    : S->           \ n1 carry-in-flag --- n2 carry-out-flag
    Perform a right shift, applying the carry in to the m.s. bit and returning the carry out as 1 or 0.

    : <-S           \ n1 carry-in-flag --- n2 carry-out-flag
    Perform a left shift, applying the carry in to the l.s. bit and returning the carry out as 1 or 0.

    : d<<1          \ xd -- xd<<1
    One bit double left shift.

    : d>>1          \ xd -- xd>>1
    One bit double right shift.

    : D>>N          \ d m -- d>>m
    M bit double right shift.