VFX Forth has a built-in assembler. This is to enable you to write time-critical definitions - if time is a constraint - or to do things that might perhaps be more difficult in Forth - things such as interrupt service routines. The assembler supports the Intel/AMD64 and the stack FP coprocessor. The supported instructions are mainly for the benefit of the code generator. In normal use, the assembler is very rarely needed.
Definitions written in assembler may use all the variables, constants, etc. used by the Forth system, and may be called from the keyboard or from other words just like any Forth high-level word. It is important when writing a code definition to remember which machine registers are used by the Forth system itself. These registers are documented later in this chapter. All other registers may be used freely. The reserved registers may also be used - but their contents must be preserved while they are in use and reset afterwards.
The assembler mnemonics used in the Forth assembler are just the same as those documented in the Intel literature. The operand order is also the same. The only difference is the need for a space between each portion of the instruction. This is a requirement of the Forth interpreter.
The assembler has certain defaults. These cover the order of the operands, the default addressing modes and the segment size. These are described later in this chapter.
The assembler source code is at *\fo(Kernel/x64Com/hasmx64.fth). Do not treat it as an example of good Forth style. The original 8086 assembler was witten in the early 1980s, and has survived several upgrades to come to its present form.
Normally the assembler will be used to create new Forth words
written in assembler. Such words use CODE
and
END-CODE
in place of :
and ;
or CREATE
and ;CODE
in place of CREATE
and DOES>
.
The word CODE
creates a new dictionary header and enables
the assembler.
As an example, study the definition of 0<
in assembly
language. The word 0<
takes one operand from the stack
and returns a true value, -1, if the operand was less than
zero, or a false value, 0, if the operand was greater than
or equal to zero.
CODE 0< \ n - t/f ; define the word 0<
OR RBX, RBX \ use OR to set flags
L, IF, \ less than zero ?
IF, \ y:
MOV RBX, # -1 \ -1 is true flag
ELSE, \ n:
XOR RBX, RBX \ dirty set to 0
ENDIF,
NEXT, \ return to Forth
END-CODE
Notice how the word NEXT,
is used. NEXT,
is a
macro that assembles a return to the Forth inner interpreter.
All code words must end with a return to the inner interpreter.
The example also demonstrates the use of structuring words
within the assembler. These words are pre-defined macros
which implement the necessary branching instructions. The
next example shows the same word, but implemented using local
labels instead of assembler structures for the control
structures.
CODE 0< \ n - t/f ; define the word 0<
OR RBX, RBX \ use OR to set flags
JGE L$1 \ skip if AX>=0
MOV RBX, # -1 \ -1 is true flag
JMP L$2 \ this part done
L$1: \ do following otherwise
SUB RBX, RBX \ dirty set to 0
L$2:
NEXT, \ return to Forth
END-CODE
There are several useful words provided within VFX Forth to control the use of the assembler.
;code \ --
Used in the form:
: <namex> CREATE .... ;CODE ... END-CODE
Stops compilation, and enables the assembler. This word is
used with CREATE
to produce defining words whose run-time
portion is written in code, in the same way that CREATE ... DOES>
is used to create high level defining words.
The data structure is defined between CREATE
and
;CODE
and the run-time action is defined between
;CODE
and END-CODE
. The current value of the
data stack pointer is saved by ;CODE
for later use by
END-CODE
for error checking.
When <namex>
executes the address of the data area will
be the top item of the CPU call stack. You can get the address
of the data area by POP
ing it into a register.
A definition of VARIABLE
might be as follows:
: VARIABLE
CREATE 0 ,
;CODE
sub rbp, 4
mov 0 [rbp], rbx
pop rbx
next,
END-CODE
VARIABLE TEST-VAR
CODE \ --
A defining word used in the form:
CODE <name> ... END-CODE
Creates a dictionary entry for <name>
to be defined by
a following sequence of assembly language words. Words defined
in this way are called code definitions. At compile-time,
CODE
saves the data stack pointer for later error
checking by END-CODE
.
END-CODE \ --
Terminates a code definition and checks the data stack pointer
against the value stored when ;CODE
or CODE
was
executed. The assembler is disabled. See: CODE
and
;CODE
.
LBL: \ --
A defining word that creates an assembler routine that can be called from other code routines as a subroutine. Use in the form:
LBL: <name>
...code...
END-CODE
When <name>
executes it returns the address of the first
byte of executable code. Later on another code definition can
call <name>
or jump to it.
The Forth virtual machine is held within the processor register set. Register usage is as follows:
Forth VM registers
RBP Data stack pointer - points to NOS
RSP Return stack pointer
R13 Float stack pointer
RIP Instruction pointer
RSI User area pointer
RDI Local variable pointer
Simulated Stack and scratch
RBX cached top of data stack
RAX
RCX
RDX
R8..R12
Special purpose registers
R14 Index for DO/LOOP non-volatile across ABIs
R15 Index~Limit non-volatile across ABIs
XMM8 FTOS volatile Linux, non-vol Windows
XMM9 FTEMP temp float
All unused registers may be freely used by assembler routines, but they may be altered by the operating system or wrapper calls. Before calling the operating system, all of the Forth registers should be preserved. Before using a register that the Forth system uses, it should be preserved and then restored on exit from the assembler routine. Be aware, in particular, that callbacks will generally modify the RAX register since this is used to hold the value returned from them.
The assembler is designed to be very closely compatible with MASM and other assemblers. To this end the assembler assembles code written in the conventional prefix notation. However, because code may be converted from other MPE Forth systems, the postfix notation is also supported. The default mode is prefix. The directives to switch mode are as follows:
PREFIX
POSTFIX
These switch the assembler from then onwards into the new mode. The directives should be used outside a code definition, not within one. Their use within a code definition will lead to unpredictable results. MPE always uses the assembler in PREFIX mode.
The assembler syntax follows very closely that of other AMD64 assemblers. The major difference being that the VFX Forth assembler needs white space around everything. For example, where in MASM one might define:
MOV RAX,10[RBX]
we must write:
MOV RAX, 10 [RBX]
This distinction must be borne in mind when reading the following addressing mode information.
Many instructions have a register to register form. Both operands are registers. Such an instruction is of the form:
MOV RAX , RBX
This moves the contents of RBX into RAX. For compatibility with older MPE assemblers the first operand may be merged with the comma thus:
MOV RAX, RBX
This use of a register name with a 'built-in' comma also applies to other addressing modes.
The assembler is set for immediate-as-default. Immediate data can also be defined explicitly (recommended). This is done by the use of a hash (#) character:
MOV RAX, # 23
This example places the number 23 in RAX. .
The rules for instruction format and range of literals that can be assigned to registers are arcane. The assembler does its best to generate the shortest opcode.
This example places the contents of address 23 in RAX.
Direct addresses have to be specifically defined, using the PTR
or []
directives:
variable foobar
...
MOV RAX, PTR 23
MOV RAX , [] 23
mov rcx, [] foobar
Both the above code fragments also place the contents of address 23 in RAX.
Intel define an addressing mode using a base and a displacement. In this mode, the effective address is calculated by adding the displacement to the contents of the base register. An example:
MOV RBX , # foobar
MOV RAX , 10 [RBX]
In this example, RAX is filled with the contents of address foobar+10.
The assembler lays down different modes for displacements of 8-bit or 32-bit size, but this is internal to the assembler. The following registers may be used as base registers with a displacement:
[RAX] [RCX] [RDX] [RBX] [RBP] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R12] [R13] [R14] [R15]
If the displacement is zero then the assembler internally defines the mode as Base only. However, the displacement of zero must be supplied to the assembler:
MOV RBX , # 0100
MOV RAX , 0 [RBX]
This places in RAX the contents of address 100 (pointed to by RBX).
The following registers may be used as a base with no displacement:
[RAX] [RCX] [RDX] [RBX] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R12] [R13] [R14] [R15]
The 80386 also allows two registers to be used to indirectly address memory. These are known as the base and the index. Such instructions are of the form:
MOV RAX , # 100
MOV RBX , # 200
MOV RDX , 10 [RAX] [RBX]
This will place in RDX the contents of address 100+200+10, or address 310. RAX is the base and RBX is the index. Again, the displacement may be 8-bits, 32-bits or have a value of zero. The assembler distinguishes between these three cases. The base and index registers may be any of the following:
[RAX] [RBX] [RCX] [RDX] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R14] [R15]
In addition, [RBP] may be used as the index register, and [RSP] may be used as the base register.
The 80386 further supports an addressing mode where the index register is automatically scaled by a fixed amount - either 2, 4 or 8. This is designed for indexing into two-dimensional arrays of elements of size greater than byte-size. One register may be used as the first index, another for the second index, and the word size becomes implicit in the instruction. The form of this addressing mode is very similar to that outlined above, with the exception that the index operand includes the number which is the scale:
MOV RBX , # 100
MOV RCX , # 2
MOV RAX , 10 [RBX] [RCX*4]
This stores into RAX, the contents of address 100+(4*2)+10, or address 118. The list of registers which may be used as base is the same as the above. The list of scaled indexes is as follows:
[RAX*2] [RCX*2] [RDX*2] [RBX*2] [RBP*2] [RSI*2] [RDI*2]
[RAX*4] [RCX*4] [RDX*4] [RBX*4] [RBP*4] [RSI*4] [RDI*4]
[RAX*8] [RCX*8] [RDX*8] [RBX*8] [RBP*8] [RSI*8] [RDI*8]
[R8*2] [RCX*2] [RDX*2] [RBX*2] [RBP*2] [RSI*2] [RDI*2]
[R8*4] [RCX*4] [RDX*4] [RBX*4] [RBP*4] [RSI*4] [RDI*4]
[R8*8] [RCX*8] [RDX*8] [RBX*8] [RBP*8] [RSI*8] [RDI*8]
Some instructions may be prefixed with a segment override. These force data addresses to refer to a segment other than the data segment. The override must precede the instruction to which it relates:
MOV RBX , # 100
ES: MOV RAX , 10 [RBX]
This will set RAX to the value contained in address 110 in the extra segment. The list of segment overrides is:
FS: GS:
The default data sizes are are the default data sizes the assembler will use. If the data is of a different size a data size override will have to be used. To define the size of the data the following size specifiers are used:
BYTE or B.
WORD or W.
DWORD or D.
QWORD
TBYTE
FLOAT
DOUBLE
EXTENDED
It is only necessary to specify size when ambiguity would otherwise arise. For example:
MOV 0 [RDX], # 10 \ can't tell
MOV 0 [RDX], RAX \ RAX specifies 64 bit
The BYTE
size defines that a byte operation is required:
MOVZX RAX , BYTE 10 [RBX]
The abbreviation B.
may also be used in place of
BYTE
to define a byte operation. The WORD
specifier defines that 16-bits are required:
MOV AX , WORD 10 [RBX]
The abbreviation W.
may also be used to define a word
operation. DWORD
is the default for a USE32
segment, and indicates that 32-bit data is to be used:
MOV RAX , DWORD 10 [RBX]
FSTP DWORD 10 [RBX]
The abbreviation D.
may also be used to specify a
DWORD
operation. The remaining size specifiers define
data sizes for the floating point unit.
QWORD
defines a 64-bit operation:
FSTP QWORD 10 [RBX]
TBYTE defines a 10-byte (80-bit) operation, such as:
FSTP TBYTE 10 [RBX]
FLOAT
, DOUBLE
and EXTENDED
are synonyms for
DWORD
, QWORD
and TBYTE
respectively.
The segment type defines the default data size and address size for the code in the segment. If needed, it is possible to force the data size or the address size laid down to be the other. There is a set of data and address size overrides which work for one instruction only. These are:
D16:
D16: MOV RAX , # 23
In a USE32
or USE64
segment, this would lay down
16-bit data to be loaded into AX. D16:
is almost never
needed for 64 bit programming.
The default for a JMP or a CALL is within the current code segment, whilst the default for a conditional branch is a short branch with a -128..+127 byte range. The directives supporting short/long and near/far are:
SHORT LONG
These would be used as follows:
2 CONSTANT THAT \ the segment number
LBL: THIS \ the address
CALL THIS
JMP THIS
JCC THIS
JCC SHORT THIS
JCC LONG THIS
For compatibility with older MPE assemblers the mnemonics
CALL/F
, RET/F
and JMP/F
are also provided.
The assembler in VFX Forth follows both the syntax and the mnemonics defined in the Intel Programmers Reference books. However, there are certain exceptions. These are listed below.
The zero operand forms of certain stack register instructions for the 80387 have been omitted. Their functionality is supported however. Such instructions are listed below, with a form of the syntax which will support the function:
FADD FADDP ST(1) , ST
FCOM FCOM ST(1)
FCOMP FCOMP ST(1)
FDIV FDIVP ST(1) , ST
FDIVR FDIVRP ST(1) , ST
FMUL FMULP ST(1) , ST
FSUB FSUBP ST(1) , ST
FSUBR FSUBRP ST(1) , ST
Certain 80386 instructions have either one operand or two operands, of which
only one is variable. These instructions are:
MUL DIV IDIV NEG NOT
These instructions take only one operand in the VFX Forth assembler.
If you need to use labels within a code definition, you may use the local labels provided. These are used just like labels in a normal assembler, but some restrictions are applied.
Ten labels are pre-defined, and their names are fixed. Additional
labels can be defined up to a maximum of 32. There is a limit of 128 forward
references.
A reference to a label is valid until the next occurrence of
LBL:
, CODE
or ;CODE
, whereupon all the
labels are reset.
A reference to a label in a definition must be satisfied in that definition. You cannot define a label in one code definition and refer to it from another.
The local labels have the names L$1 L$2 ... L$10
and
these names should be used when referring to them e.g.
JNE L$5
A local label is defined by words of the same names, but with a colon as a suffix:
L$1: L$2: ... L$10:
Additional labels (up to a maximum of 32 altogether) may be referred to by:
n L$
where n is in the range 11..32 (decimal), and they may be defined by:
n L$:
where n is again in the range 11..32 (decimal).
This assembler is designed to cope with CPUs from 80386 upwards. Some instructions are only available on later CPUs. Note that CPU selection affects the assembler and the VFX code code generator, not the run time of your application. If you select a higher CPU level than the application runs on, incorrect operation will occur.
CPU=x64 \ -- ; select base AMD64 instruction set
The best place to look for code examples is in the source code. The file Kernel/x64Com/reqdcode.fth contains the code definitions required by the VFX64 kernel.
code Nrev \ XN..X1 count -- x1..XN
\ *G Reverse the order of the top N data stack items.
cmp rbx, # 1 \ ignore count <1
g, if,
dec rbx \ item n at offset n-1
mov rcx, rbp \ data stack pointer
shl rbx, # 3 \ in cells
add rbx, rcx \ RBX points to XN, RCX to X1
begin,
mov rdx, 0 [rbx] \ perform exchange
mov rax, 0 [rcx]
mov 0 [rcx], rdx
add rcx, # cell
mov 0 [rbx], rax
sub rbx, # cell
cmp rcx, rbx
a, until,
endif,
mov rbx, 0 [rbp]
add rbp, # cell
next,
end-code
The assembler includes structure words modelled after the usual Forth structures
Some processors, especially those used for embedded applications, have processor-specific instructions that are extensions to the AMD64 instruction set. In order to use these, a few facilities are available.
: dxb \ b -- ; lay byte
Lay a byte into the instruction stream. Use in the form:
dxb $55
: dxw \ w -- ; lay 16 bits
Lay a 16-bit word into the instruction stream. Use in the form:
dxw $55AA
: dxl \ l -- ; lay 32 bit long
Lay a 32-bit dword into the instruction stream. Use in the form:
dxl $11223344
: dxx \ l -- ; lay 64 bit xword
Lay a 64-bit dword into the instruction stream. Use in the form:
dxx $1122334455667788
: $ \ -- chere
Return the PC value of the start of the instruction.