SSE Floating Point

WARNING

As of August 2020, the SSE code is functional, but is slow because there is no optimisation. An SSE optimiser will be provided in due course. If you need FP performance, use the NDP float pack in Lib/x64/Ndpx64.fth.

Introduction

The Forth data stack and the floating point stack are separate. As with the data and return stacks, the floating point stack grows down. The floating point data storage format is IEEE 64 bit (double precision) format.The source code is in the file Lib/x64/FPSSE64S.fth. The Extern: call mechanism requires Lib/x64/FPSSE64S.fth when float or double arguments are used.

There are occasions when the 64 bit float format causes problems. In these cases you can use the 80 bit floats provided in Lib/x64/Ndpx64.fth. However, you will have to convert them to SSE form for use with the Extern: mechanism, unless you use VFX Forth 64 v5.4 or later which features automatic conversion of floats in the Extern: system.

Entering floating-point numbers

Floating point number entry is enabled by REALS and disabled by INTEGERS.

Floating-point numbers of the form 0.1234e1 are required (see FNUMBER?) during interpretation and compilation of source code. Floating-point numbers are compiled as literal numbers when in a colon definition (compiling) and placed on the stack when outside a definition (interpreting).

The more flexible word >FLOAT accepts numbers in two forms, 1.234 and 0.1234e1. Both words are documented later in this chapter. See also the section on Gotchas later in this chapter.

Note also that MPE Forths use ',' by default (it can be changed) as the double number indicator - it makes life much easier for Europeans.

The form of floating-point numbers

A floating-point number is placed on a separate floating point stack. In the Forth literature, this is referred to as separated floating point and data stacks. As with the data and return stacks, the floating point stack grows down. Items on the float stack are in IEEE 64-bit format.

Creating and using variables

To create a variable, use FVARIABLE. FVARIABLE works in the same way as VARIABLE. For example, to create a floating-point variable called VAR1 you code:

  FVARIABLE VAR1

When VAR1 is used, it returns the address of the floating-point number.

Two words are used to access floating-point variables, F@ and F!. These are analogous to @ and !.

Creating constants

To create a floating-point constant, use FCONSTANT, which is analogous to CONSTANT. For example, to generate a floating-point constant called CON1 with a value of 1.234, you enter:

  1.234e0 FCONSTANT FCON1

When FCON1 is executed, it returns 1.234 on the Forth stack.

Using the supplied words

The supplied words split into several groups:

The following functions only exist as target words so you cannot use them in calculations in your source code when outside a colon definition.

Calculating sines, cosines and tangents

To calculate sine, cosine and tangent, use FSIN, FCOS and FTAN respectively. Angles are expressed in radians.

Calculating arc sines, cosines and tangents

To calculate arc sine, cosine and tangent, use FASIN, FACOS

and FATAN respectively. They return an angle in radians.

Calculating logarithms

Two words are supplied to calculate logarithms, FLOG and FLN. FLOG calculates a logarithm to base 10 (decimal). FLN calculates a logarithm to base e. Both take a floating-point number in the range from 0 to Einf.

Calculating powers

Three power functions are supplied:

  FEXP F10^X X^Y

Degrees or radians

The angular measurement used in the trigonometric functions are in radians. To convert between degrees and radians use RAD>DEG or DEG>RAD. RAD>DEG converts an angle from radians to degrees. DEG>RAD converts an angle from degrees to radians.

Displaying floating-point numbers

Two words are available for displaying floating-point numbers, F. and E.. The word F. takes a floating-point number from the stack and displays it in the form xxxx.xxxxx or x.xxxxxEyy depending on the size of the number. The word E. displays the number in the latter form.

Number formats, ANS and Forth200x

The ANS Forth standard specifies that floating point numbers must be entered in the form 1.234e5 and must contain a point '.' and 'e' or 'E', and that double integers are terminated by a point '.'.

This situation prevents the use of the standard conversion words in international applications because of the interchangable use of the '.' and ',' characters in numbers. Because of this, VFX Forth uses two four-byte arrays, FP-CHAR and DP-CHAR, to hold the characters used as the floating point and double integer indicator characters. The FP-CHAR and DP-CHAR arrays (in the kernel) hold up to four character(s) to be treated as indicators. Set to '.' for ANS compatibility. Note that they should be accessed as one to four byte arrays, terminated by a zero byte. The first character of FP-CHAR is used as the point character for output.

By default, FP-CHAR is initialised to '.' and DP-CHAR is initialised to ',' and '.'. For strict ANS compliance, you should set them as follows.


\ ANS standard setting
  char . dp-char !
  char . fp-char !
: ans-floats    \ -- ; for strict ANS compliance
  [char] . dp-char !
  [char] . fp-char !
;
\ MPE defaults
  char , dp-char !
  char . dp-char 1+ c!
  char . fp-char !
: mpe-floats    \ -- ; for existing and most legacy code
  [char] , dp-char !
  [char] . dp-char 1+ c!
  [char] . fp-char !
;

You can of course set these arrays to hold any values which suit your application's language and locale. Note that integer conversion is always attempted before floating point conversion. This means that if the FP-CHAR and DP-CHAR arrays contain the same character, floating point numbers must contain 'e' or 'E'. If the arrays are all different, a number containing the FP-CHAR will be successfully converted as a floating point number, even if it does not contain 'e' or 'E'.

Only one FP package

Only one float pack can be installed. This is checked at compile time. To replace the floating point pack use:


integers
remove-FP-pack
include <sourcefile>

Configuation

create FP-PACK  \ -- addr
Marks that a float pack is being compiled.

The value FPSYSTEM defines which floating point pack is installed and active. See the Floating Point chapters for further details. Each floating point pack defines its own type as follows:

When FPSystem changes, the following files that use FPSystem are affected:

  Extern*.fth  kernel64.fth  Tokeniser.fth
  Lib/x64/Ndpx64.fth  Lib/Hfpx64.fth  Lib/x64/FPSSE64.fth

At present, only 0, 1, 2 and 4 are valid values of FPSystem in x64 systems.

#8 constant FPCELL      \ -- n
Defines the size of literals and floating point numbers in memory and on floating point stacks in memory

#8 constant /NDPSLOT    \ -- n
Size of aligned memory buffer used to hold an FP number.

/NDPSLOT negate constant -/NDPSLOT
Negative of /NDPSLOT

FP primitives

defer f.s       \ F: f --
Non-destructive display of the floating point stack.

: finit         \ F: i*f -- ; resets FPU and FP stack
Reset the floating point stack.

: fdepth        \ -- #f
Floating point equivalent of DEPTH. The result is returned on the Forth data stack.

code CLZ        \ x -- u
Return the number of leading zeros in x.

: DCLZ          \ dx -- u
Return the number of leading zeros in the double dx.

code >fs        \ f64 -- ; F: -- f64
Move a float from the data stack to the floating point stack.

code fs>        \ F: f64 -- ; -- f64
Move a float from the float stack to the data stack.

code fs@        \ F: f64 -- f64 ; -- f64
Copy a float from the float stack to the data stack.

code fps@       \ -- fps
Read the MXCSR floating point status/control register.

code fps!       \ fps --
Set the MXCSR floating point status/control register.

code exp@       \ F: f -- f ; -- exp(2)
Copy the exponent of the top float to the data stack. The IEEE exponent offset is removed. The floating point number has a mantissa in the range 0.5 <= mantisa < 1.0, such that the number is in the form:

  sign * mantissa * 2^exp

The exponent returned by exp@ and consumed by exp! is not the offset 1023 exponent of the IEEE 754 standard - it is one greater than that. IEEE views the number as being in the form:

  sign * 1.fraction * 2^(exp-1023)

code exp!               \ exp(2) -- ; F: f -- f'
Change/Set the exponent of the top float. The IEEE exponent offset is added.

code F!         \ F: r -- ; addr --
Stores r at addr.

code F@         \ addr -- ; F: -- r
Fetches r from addr.

code f+!        \ F: f -- ; addr -- ; add f to data at addr
Add F to the data at ADDR.

code f-!        \ F: f -- ; addr -- ; sub f from data at addr
Subtract F from the data at ADDR.

synonym DF! F!          \ F: r -- ; addr --
Stores r at addr in IEEE 64 bit format.

synonym DF@ F@          \ addr -- ; F: -- r
Fetches r from addr, which contains a float in IEEE 64 bit format..

code SF!                \ F: r -- ; addr --
Stores r at addr.

code SF@                \ addr -- ; F: -- r
Fetches r from addr.

: F,            \ F: r --
Lays a real number into the dictionary, reserving FPCELL bytes.

synonym DF, F,
Lays a real number into the dictionary as an IEEE 64 bit number.

: SF,           \ F: r --
Lays a real number into the dictionary as an IEEE 32 bit number.

code FDUP       \ F: r -- r r
Floating point equivalent of DUP.

code FOVER      \ F: r1 r2 -- r1 r2 r1
Floating point equivalent of OVER.

code FSWAP      \ F: r1 r2 -- r2 r1
Floating point equivalent of SWAP.

code FPICK      \ u -- ; F: fu..f0 -- fu..f0 fu
Floating point equivalent of PICK.

code FROT       \ F: r1 r2 r3 -- r2 r3 r1
Floating point equivalent of ROT.

code F-ROT              \ F: r1 r2 r3 -- r3 r1 r2
Floating point equivalent of -ROT.

code FDROP      \ F: r --
Floating point equivalent of DROP.

code FNIP               \ F: r1 r2 -- r2
Floating point equivalent of NIP.

code f>r        \ F: f -- ; R: -- f
Put float onto return stack.

code fr>        \ R: f -- ; F: -- f
Pull float from the return stack.

code flit       \ F: -- f ; inline literal
Run-time routine for a floating point literal. version.

Floating point defining words

: FVARIABLE     \ "<spaces>name" -- ; Run: -- addr
Use in the form: FVARIABLE <name> to create a variable that will hold a floating point number.

: FCONSTANT     \ F: r -- ; "<spaces>name" -- ; Run: -- r
Use in the form: <float> FCONSTANT <name> to create a constant that returns a floating point number.

: FARRAY        \ "<spaces>name" fn-1..f0 n -- ; Run: i -- ; F: -- ri
Create an initialised array of floating point numbers. Use in the form:

  fn-1 .. f1 f0 n FARRAY <name>

to create an array of n floating point numbers. When the array name is executed, the index i is used to return the address of the i'th 0 zero-based element in the array. For example:

  4e0 3e0 2e0 1e0 0e0 5 FARRAY TEST

will set up an array of five elements. Note that the rightmost float (0e0) is element 0. Then i TEST will return the *\{i}th element.

: FBUFF         \ u "name" -- ; i -- addr
Creates a buffer for u floats in the current memory section. The child action is to return the address of the ith element (zero-based).

  10 fbuff foo

Creates an buffer for ten float elements.

  3 foo

Returns the address of element 3 in the buffer.

: fvalue        \ F: f -- ; ??? -- ???
Use in the form: <float> FVALUE <name> to create a floating point version of VALUE that will return a floating point number by default, and that can accept the operators TO, ADDR, ADD, SUB, and SIZEOF. )

Type conversions

code FSIGN      \ F: fn -- |fn| ; -- flag ; true if negative
Return the absolute value of fn and a flag which is true if fn is negative.

: D>F           \ d -- ; F: -- fn
Converts a double integer to a float.

: f>d           \ F: f -- ; -- dint(f)
Converts a float to a double integer. Note that F>D truncates the number towards zero according to the ANS specification.

: S>F   \ n -- ; F: -- fn
Converts a single signed integer to a float.

: f>s           \ F: f -- ; -- int(f)
Converts a float to a single integer. Note that F>S truncates the number towards zero according to the ANS specification.

: FINT          \ F: f1 -- f2
Chop the number towards zero to produce a floating point representation of an integer.

Arithmetic

code FNEGATE    \ F: r1 -- r2
Floating point negate.

: ?FNEGATE      \ n -- ; F: fn -- fn|-fn
If n is negative, negate fn.

: FABS  \ F: fn -- |fn|
Floating point absolute.

code F+         \ F: r1 r2 -- r3
Floating point addition.

code F-         \ F: r1 r2 -- r3
Floating point subtraction; r3 := r1-r2

code F*         \ F: r1 r2 -- r3
Floating point multiply.

code F/         \ F: r1 r2 -- r3
Floating point divide; r3 := r1/r2

code 1/f        \ F: r1 -- 1/r1
Floating point divide; r3 := r1/r2

code fsqrt      \ F: f1 -- f2
F2=sqrt(f1).

: FSEPARATE     \ F: f1 f2 -- f3 f4
Leave the signed integer quotient f4 and remainder f3 when f1 is divided by f2. The remainder has the same sign as the dividend.

: FFRAC         \ F: f1 f2 -- f3
Leave the fractional remainder from the division f1/f2. The remainder takes the sign of the dividend.

Relational operators

code F0<        \ F: f1 -- ; -- flag
Floating point 0<.

code F0>        \ F: f1 -- ; -- flag
Floating point 0>.

code F0=        \ F: f1 -- ; -- flag
Floating point 0=.

code F0<>       \ F: f1 -- ; -- flag
Floating point 0<>.

: F=            \ F: f1 f2 -- ; -- flag
Floating point =.

: F<            \ F: r1 r2 -- ; -- flag
Floating point <.

: F>            \ F: f1 f2 -- ; -- flag
Floating point >.

: FMAX          \ F: r1 r2 -- r1|r2
Floating point MAX.

: FMIN          \ F: r1 r2 -- r1|r2
Floating point MIN.

: f~            \ F: f1 f2 f3 -- ; -- flag
Approximation function. If f3 is positive, flag is true if abs[f1-f2] less than f3. IF f3 is zero, flag is true if f1 and f2 encodings are the same. If f3 is negative, flag is true if abs[f1-f2] less than abs[f3*[abs[f1]+abs[f2]]].

Miscellaneous

: FALIGNED      \ addr -- f-addr
Aligns the address to accept an 8-byte float.

: FALIGN        \ --
Aligns the dictionary to accept an 8-byte float.

synonym DFALIGNED FALIGNED      \ addr -- f-addr
Aligns the address to accept an 8-byte float.

synonym DFALIGN FALIGN          \ --
Aligns the dictionary to accept an 8-byte float.

Synonym SFALIGNED ALIGNED       \ addr -- f-addr
Aligns the address to accept a 4-byte float.

Synonym SFALIGN ALIGN           \ --
Aligns the dictionary to accept a 4-byte float.

: FLOAT+        \ f-addr1 -- f-addr2
Increments addr by 8, the size of a float.

: FLOATS        \ n1 -- n2
Returns n2, the size of n1 floats.

Synonym DFLOAT+ FLOAT+  \ f-addr1 -- f-addr2
Increments addr by 8, the size of a D-float.

Synonym DFLOATS FLOATS  \ n1 -- n2
Returns n2, the size of n1 D-floats.

Synonym SFLOAT+ 4+      \ f-addr1 -- f-addr2
Increments addr by 4, the size of an S-float.

Synonym SFLOATS 4*      \ n1 -- n2
Returns n2, the size of n1 S-floats.

Powers of ten operations

Floating point IEEE numbers have the following approximate ranges:

As a result, the input code is different for 32 bit and 64 bit floats.

$0000:0000:0000:0000 >fs fconstant F%0
Floating point 0.0.

$0AC8:0628:64AC:6F43 >fs fconstant F%10^-256
Floating point 1.0e-256.

$3949:F623:D5A8:A733 >fs fconstant F%10^-32
Floating point 1.0e-32.

$3C9C:D2B2:97D8:89BC >fs fconstant F%10^-16
Floating point 1.0e-16.

$3FB9:9999:9999:999A >fs fconstant F%.1
Floating point 0.1.

$3FF0:0000:0000:0000 >fs fconstant F%1
Floating point 1.0.

$4000:0000:0000:0000 >fs fconstant F%2
Floating point 1.0.

$4024:0000:0000:0000 >fs fconstant F%10
Floating point 10.0.

$4341:C379:37E0:8000 >fs fconstant F%10^16
Floating point 1.0e16.

$4693:B8B5:B505:6E17 >fs fconstant F%10^32
Floating point 1.0e32.

$7515:4FDD:7F73:BF3C >fs fconstant F%10^256
Floating point 1.0e256.

16 FARRAY POWERS-OF-10E1
An array of 16 powers of ten starting at 10^0 in steps of 1.

17 FARRAY POWERS-OF-10E16
An array of 17 powers of ten starting at 10^0 in steps of 16.

16 FARRAY POWERS-OF-10E-1
An array of 16 powers of ten starting at 10^0 in steps of -1.

17 FARRAY POWERS-OF-10E-16
An array of 17 powers of ten starting at 10^0 in steps of -16.

: RAISE_POWER   \ exp(10) -- ; F: f -- f'
Raise the power in preparation for number formatting.

: SINK_FRACTION \ exp(10) -- ; F: f -- f'
Reduce the power in preparation for number formatting.

: *10^X         \  exp(10) -- ; F: f -- f'
Generate float' = float *10^dec_exp.

: f2/           \ F: f1 -- f2
Divide by 2.0; f2=f1/2.0.

Floating point input

Note that number conversion takes place in PAD.

: CONVERT-EXP   \ c-addr --
If the character at c-addr is 'D' convert it to 'E'.

: CONVERT-FPCHAR        \ c-addr --
Convert the f.p. char '.' to the double char ',' for conversion.

: ALL-BLANKS?   \ c-addr len -- flag
Return true if string is all blanks (spaces). A null string (len=0) returns false.

: FCHECK        \ -- am lm ae le e-flag .-flag
Check the input string at PAD, returning the separated mantissa and exponent flags. The e-flag is returned true if the string contained an exponent indicator 'E' and the .-flag is returned true if a '.' was found.

: doMNUM        \ c-addr u -- d 2 | 0
Convert the mantissa string to a double number and 2. If conversion fails, just return 0.

: doENUM        \ c-addr u -- n 1 | 0 ; str as above
Convert the exponent string to a single number and 1. If conversion fails, just return 0.

: FIXEXP     \ dmant exp(10) -- ; F: -- f
Convert a double integer mantissa and a single integer exponent into a floating point number.

: isFnumber?    \ caddr len -- 0 | n 1 | d 2 | -1 ; F: -- [f]
Behaves like the integer version of isNumber? except that if the number is in F.P. format and BASE is decimal, a floating point conversion is attempted. If conversion is successful, the floating point number is left on the float stack and the result code is 2. This word only accepts text with an 'E' as a floating point indicator, e.g, 1.2345e0. If *\fo{BASE is not decimal all numbers are treated as integers. The integer prefixes '#','$','0x' etc. are recognised and cause integer conversion to be used.

: FNUMBER?      \ addr -- 0 | n 1 | d 2 | -1 ; F: -- [f]
Behaves like the integer version of Number? except that if the number is in F.P. format and BASE is decimal, a floating point conversion is attempted. See isFnumber? above for more details.

: isFnumber?    \ caddr len -- 0 | n 1 | d 2 | -2 ; F: -- r
Behaves like the integer version of isNumber? except that if integer conversion fails, and BASE is decimal, a floating point conversion is attempted. If conversion is successful, the floating point number is left on the float stack and the result code is -2.

: Fnumber?      \ caddr -- 0 | n 1 | d 2 | -2 ; F: -- r
As isFnumber? above, but takes a counted string.

: >FLOAT        \ c-addr u -- true|false ; F: -- [f]
Try to convert the string at c-addr/u to a floating point number. If conversion is successful, flag is returned true, and a floating number is returned on the float stack, otherwise just flag=0 is returned. This word accepts several forms, e.g. 1.2345e0, 1.2345, 12345 and converts them to a float. Note that double numbers (containing a ',') cannot be converted. Number conversion is decimal only, regardless of the current BASE.

defer FLITERAL  \ Comp: F: r -- ; Run: F: -- r
Compiles a float as a literal into the current definition. At execution time, a float is returned. For example, [ %PI F2* ] FLITERAL will compile 2PI as a floating point literal. Note that FLITERAL is immediate.

: (F#)          \ addr -- -1|0 ; F: -- [f]
The primitive for F# below.

: F#            \ F: -- [f] ; or compiles it (state smart)
If interpreting, takes text from the input stream and, if possible converts it to a f.p. number on the stack. Numbers in integer format will be converted to floating-point. If compiling, the converted number is compiled.

Floating point output

8 value precision       \ -- u
Number of significant digits output.

: set-precision         \ u --
Set the number of significant digits used for output.

: exp(10)       \ F: f -- f ; -- exp[10]
Generate the power of ten corresponding to the float's power of two.

64 +user fopbuff        \ -- addr
Buffer in which output string is built.

32 +user frepbuff       \ -- addr
Buffer for use as the output of REPRESENT.

: roundfp       \ F: +f -- +f'
Add 0.5e(exp-precision-1).

: REPRESENT     \ F: r -- ; c-addr u -- n flag1 flag2
Assume that the floating number is of the form +/-0.xxxxEyy. Place the significand xxxxx at c-addr with a maximum of u digits. Return n the signed integer version of yy. Return flag1 true if f is negative, and return flag2 true if the results are valid. In this implementation all errors are handled by exceptions, and so flag2 is always true.

: (.sign)       \ flag $out --
Add '-' or nothing to the output string.

: (.mant)       \ binp $out n --
Add the mantissa string at binp), produced by *\fo{REPRESENT, to a counted string at $out) with *\i{n digits before the decimal point.

: (.exp)        \ exp(10) $out --
Add the exponent to the output string.

: (.initfop)    \ f -- ; -- exp(10)
initialise output conversion.

: (fs.)         \ F: f -- ; -- caddr len
Produce a string containing the number in scientific notation.

: (fe.)         \ F: f -- ; -- caddr len
Produce a string containing the number in engineering notation.

: ff?           \ f: f -- f ; -- flag
Return true if the number can be represented in free format.

: (ff.)         \ F: f -- ; -- caddr len
Produce a string containing the number in free notation. If the number cannot be displayed in free notation, scientific notation is uesed.

: fs.           \ F: f --
Display f in scientific notation:

  x.xxxxxE[-]yy

: fe.           \ F: f --
Display f in engineering notation:

  x.xxxxxE[-]yy

where the mantissa is 1 <= mantissa < 1000 and the exponent is a multiple of three.

: ff.           \ F: f --
Display f in free notation:

  xxx.xxxxx

: F.            \ F: f --
Print the f.p. number in free format, xxxx.yyyy, if possible. Otherwise display using the x.xxxxEyy format.

Rounding

Rounding modes are specified in the range 0..3 and are converted when used.

code rmode>     \ -- oldmode
Get the current rounding mode.

code >rmode     \ newmode --
Set the current rounding mode.

code >rmode>    \ newmode -- oldmode
Set the current rounding mode and get the previous one.

code (fround)   \ F: f1 -- f1'
Round the number to an integer value according to the current rounding mode.

: fround        \ F: f1 -- f1'
Round to nearest.

: floor         \ F: f1 -- f1'
Round to -infinity.

: ceil          \ F: f1 -- f1'
Round towards +infinity.

: roundup       \ F: f1 -- f1'
Round towards +infinity.

: ftrunc        \ F: f1 -- f1'
Round the number towards zero. on the FP stack.

: rounded       \ -- ; set SSE to round to nearest
Set SSE to round to nearest for all operations other than FINT, FLOOR and CEIL.

: floored       \ -- ; set SSE to floor
Set SSE to round to floor for all operations other than FROUND, FINT, FTRUNC and ROUNDUP.

: roundedup     \ -- ; set NDP to round up
Set NDP to round up for all operations other than FROUND, FINT and FLOOR.

: truncated     \ -- ; set NDP to chop to 0
Set NDP to chop to 0 for all operations other than FROUND, FLOOR and ROUNDUP.

Trigonmetric functions

N.B. All angles are in radians.

: DEG>RAD       \ F: n1 -- n2
Convert degrees to radians.

: RAD>DEG       \ F: n1 -- n2
convert radians to degrees.

: FSIN          \ F: f1 -- f2
f2=sin(f1).

: FCOS          \ F: f1 -- f2
f2=cos(f1).

: FTAN          \ F: f1 -- f2
f2=tan(f1).

: FASIN         \ F: f1 -- f2
f2=arcsin(f1).

: FACOS         \ F: f1 -- f2
f2=arccos(f1).

: FATAN         \ F: f1 -- f2
f2=arctan(f1).

Logarithms and Powers

: FLN           \ F: f1 -- f2
Take the logarithm of f1 to base e and return the result.

: FLOG          \ F: f1 -- f2
Take the logarithm of f1 to base 10 and return the result.

: FEXP          \ F: f1 -- f2
f2=e^f1.

Synonym FE^X FEXP       \ F: f1 -- f2
Compatibility word.

: fexpm1        \ r1 -- r2
Raise e to the power r1 and subtract one, giving r2.

: F10^X         \ F: f1 -- f2
f2=10^f1

: FX^N          \ n -- ; F: fx -- fx^n
fx^n=x^n where x is a float and n is an integer.

: F**   \ F: fx fy -- fx^fy
fn=X^Y where X and Y are both floats. If fx<=0e0, 0e0 is returned. This behaviour is required by the Forth Scientific Library. If fy=0e0, 1e0 is returned.

Synonym FX^Y F**        \ --
Compatibility word for old code.

COSEC SEC COTAN and hyberbolics

: fcosec        \ F: f -- cosec(f)
Floating point cosecant.

: fsec          \ F: f -- sec(f)
Floating point secant.

: fcotan        \ f: f -- cot(f)
Floating point cotangent.

: fsinh         \ F: f -- sinh(f) ; (e^x - 1/e^x)/2
Floating point hyberbolic sine.

: fcosh         \ F: f -- cosh(f) ; (e^x + 1/e^x)/2
Floating point hyberbolic cosine.

: ftanh         \ F: f -- tanh(f) ; (e^x - 1/e^x)/(e^x + 1/e^x)
Floating point hyberbolic tangent.

: fasinh        \ F: f -- asinh(f) ; ln(f+sqrt(1+f*f))
Floating point hyberbolic arcsine.

: facosh        \ F: f -- acosh(f) ; ln(f+sqrt(f*f-1))
Floating point hyberbolic arccosine.

: fatanh        \ F: f -- atanh(f) ; ln((1+f)/(1-f))/2
Floating point hyberbolic arctangent.

Debugging tools

defer f.s       \ F: f --
Non-destructive display of the floating point stack.

: (f.s)         \ F: f --
Non-destructive display of the floating point stack. Default action of F.S.

$22 value ignSSEmask    \ --
When the prompt checks for an error, it ignores the bits in the MXCSR register that are set in ignSSEmask. By default this is just the Precision Flag, which is set when floating point is inexact and the Denormal Flag.

: .FSysPrompt   \ --
Replacement system prompt that adds floating point stack depth display. Used in the form:

  ' .FSysPrompt is .prompt

Plugging floats into the system

: (rliteral)    \ F: f -- ; F: -- f
Compiles a float as a literal into the current definition. At execution time, a float is returned. For example, [ %PI F2* ] FLITERAL will compile 2PI as a floating point literal. The default action of FLITERAL. Note that FLITERAL is immediate, whereas (RLITERAL) is not.

' noop  ' (rliteral)  ' (rliteral)  RecType: r:SSE64    \ -- struct
Contains the three recogniser actions for floating point literals.

: rec-SSEfloats \ caddr u -- r:SSE64 | r:fail ; F: -- [f]
The parser part of the floating point recogniser.

: reals         \ -- ; turn FP system on
Switch the system to permit floating point number input.

: integers      \ -- ; turn FP system off
Switch the system not to recognise floating point input.

Installation code

The value FPSYSTEM defines which floating point pack is installed and active. See the Floating Point chapters for further details. Each floating point pack defines its own type as follows:

When FPSystem changes, the following files that use FPSystem are affected:

  Extern*.fth  kernel64.fth  Tokeniser.fth
  Lib/x64/Ndpx64.fth  Lib/x64/FPSSE64.fth

At present, only 0, 1, 2 and 4 are valid values of FPSystem in 64 systems.

: SSE64setup    \ --
Set up the Forth system for 64 bit SSE floats. Performed at start up.

Gotchas

The ANS and Forth-2012 specifications define the format of floating point numbers during text interpretation as:


Convertible string := <significand><exponent>

<significand> := [<sign>]<digits>[.<digits0>]
<exponent>    := E[<sign>]<digits0>
<sign>        := { + | - }
<digits>      := <digit><digits0>
<digits0>     := <digit>*
<digit>       := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

The format above is handled by the word FNUMBER?. The word >FLOAT accepts the more relaxed format below.


Convertible string := <significand>[<exponent>]

<significand> := [<sign>]{<digits>[.<digits0>] | .<digits> }
<exponent>    := <marker><digits0>
<marker>      := {<e-form> | <sign-form>}
<e-form>      := <e-char>[<sign-form>]
<sign-form>   := { + | - }
<e-char>      := { D | d | E | e }

This restriction makes it difficult to use the text interpreter during program execution as it requires floating point numbers to contain 'D' or 'E' indicators, which is not profane practice. A quick kluge to fix this is to change isFnumber? as below.


Replace:
  fcheck drop if                       \ valid f.p. number?
with:
  fcheck or if                         \ valid f.p. number?

Note that this change can/will cause problems if number base is not DECIMAL.