Intel/AMD x64 Assembler

VFX Forth has a built-in assembler. This is to enable you to write time-critical definitions - if time is a constraint - or to do things that might perhaps be more difficult in Forth - things such as interrupt service routines. The assembler supports the Intel/AMD64 and the stack FP coprocessor. The supported instructions are mainly for the benefit of the code generator. In normal use, the assembler is very rarely needed.

Definitions written in assembler may use all the variables, constants, etc. used by the Forth system, and may be called from the keyboard or from other words just like any Forth high-level word. It is important when writing a code definition to remember which machine registers are used by the Forth system itself. These registers are documented later in this chapter. All other registers may be used freely. The reserved registers may also be used - but their contents must be preserved while they are in use and reset afterwards.

The assembler mnemonics used in the Forth assembler are just the same as those documented in the Intel literature. The operand order is also the same. The only difference is the need for a space between each portion of the instruction. This is a requirement of the Forth interpreter.

The assembler has certain defaults. These cover the order of the operands, the default addressing modes and the segment size. These are described later in this chapter.

The assembler source code is at *\fo(Kernel/x64Com/hasmx64.fth). Do not treat it as an example of good Forth style. The original 8086 assembler was witten in the early 1980s, and has survived several upgrades to come to its present form.

Using the assembler

Normally the assembler will be used to create new Forth words written in assembler. Such words use CODE and END-CODE in place of : and ; or CREATE and ;CODE in place of CREATE and DOES>.

The word CODE creates a new dictionary header and enables the assembler.

As an example, study the definition of 0< in assembly language. The word 0< takes one operand from the stack and returns a true value, -1, if the operand was less than zero, or a false value, 0, if the operand was greater than or equal to zero.


CODE 0< \ n - t/f ; define the word 0<
  OR RBX, RBX      \ use OR to set flags
  L, IF,           \ less than zero ?
  IF,              \ y:
    MOV RBX, # -1  \   -1 is true flag
  ELSE,            \ n:
    XOR RBX, RBX   \  dirty set to 0
  ENDIF,
  NEXT,            \ return to Forth
END-CODE

Notice how the word NEXT, is used. NEXT, is a macro that assembles a return to the Forth inner interpreter. All code words must end with a return to the inner interpreter. The example also demonstrates the use of structuring words within the assembler. These words are pre-defined macros which implement the necessary branching instructions. The next example shows the same word, but implemented using local labels instead of assembler structures for the control structures.


CODE 0<         \ n - t/f ; define the word 0<
  OR RBX, RBX   \ use OR to set flags
  JGE L$1       \ skip if AX>=0
  MOV RBX, # -1 \ -1 is true flag
  JMP L$2       \ this part done
L$1:            \ do following otherwise
  SUB RBX, RBX  \ dirty set to 0
L$2:
  NEXT,         \ return to Forth
END-CODE

Assembler extension words

There are several useful words provided within VFX Forth to control the use of the assembler.

;code      \ --

Used in the form:

  : <namex> CREATE .... ;CODE ... END-CODE

Stops compilation, and enables the assembler. This word is used with CREATE to produce defining words whose run-time portion is written in code, in the same way that CREATE ... DOES> is used to create high level defining words.

The data structure is defined between CREATE and ;CODE and the run-time action is defined between ;CODE and END-CODE. The current value of the data stack pointer is saved by ;CODE for later use by END-CODE for error checking.

When <namex> executes the address of the data area will be the top item of the CPU call stack. You can get the address of the data area by POPing it into a register.

A definition of VARIABLE might be as follows:


: VARIABLE
  CREATE 0 ,
;CODE
  sub     rbp, 4
  mov     0 [rbp], rbx
  pop     rbx
  next,
END-CODE

VARIABLE TEST-VAR
CODE       \ --

A defining word used in the form:

CODE <name> ... END-CODE

Creates a dictionary entry for <name> to be defined by a following sequence of assembly language words. Words defined in this way are called code definitions. At compile-time, CODE saves the data stack pointer for later error checking by END-CODE.

END-CODE   \ --

Terminates a code definition and checks the data stack pointer against the value stored when ;CODE or CODE was executed. The assembler is disabled. See: CODE and ;CODE.

LBL:       \ --

A defining word that creates an assembler routine that can be called from other code routines as a subroutine. Use in the form:

LBL: <name>
  ...code...
END-CODE

When <name> executes it returns the address of the first byte of executable code. Later on another code definition can call <name> or jump to it.

Dedicated Forth registers

The Forth virtual machine is held within the processor register set. Register usage is as follows:


Forth VM registers
RBP     Data stack pointer - points to NOS
RSP        Return stack pointer
R13        Float stack pointer
RIP        Instruction pointer
RSI     User area pointer
RDI     Local variable pointer

Simulated Stack and scratch
RBX     cached top of data stack
RAX
RCX
RDX
R8..R12

Special purpose registers
R14        Index for DO/LOOP       non-volatile across ABIs
R15        Index~Limit             non-volatile across ABIs
XMM8    FTOS                    volatile Linux, non-vol Windows
XMM9       FTEMP temp float

All unused registers may be freely used by assembler routines, but they may be altered by the operating system or wrapper calls. Before calling the operating system, all of the Forth registers should be preserved. Before using a register that the Forth system uses, it should be preserved and then restored on exit from the assembler routine. Be aware, in particular, that callbacks will generally modify the RAX register since this is used to hold the value returned from them.

Default segment size

Assembler syntax

Default assembler notation

The assembler is designed to be very closely compatible with MASM and other assemblers. To this end the assembler assembles code written in the conventional prefix notation. However, because code may be converted from other MPE Forth systems, the postfix notation is also supported. The default mode is prefix. The directives to switch mode are as follows:

PREFIX
POSTFIX

These switch the assembler from then onwards into the new mode. The directives should be used outside a code definition, not within one. Their use within a code definition will lead to unpredictable results. MPE always uses the assembler in PREFIX mode.

The assembler syntax follows very closely that of other AMD64 assemblers. The major difference being that the VFX Forth assembler needs white space around everything. For example, where in MASM one might define:


MOV RAX,10[RBX]

we must write:


MOV RAX, 10 [RBX]

This distinction must be borne in mind when reading the following addressing mode information.

Register to register

Many instructions have a register to register form. Both operands are registers. Such an instruction is of the form:


  MOV RAX , RBX

This moves the contents of RBX into RAX. For compatibility with older MPE assemblers the first operand may be merged with the comma thus:

  MOV RAX, RBX

This use of a register name with a 'built-in' comma also applies to other addressing modes.

Immediate mode

The assembler is set for immediate-as-default. Immediate data can also be defined explicitly (recommended). This is done by the use of a hash (#) character:

MOV RAX, # 23

This example places the number 23 in RAX. .

The rules for instruction format and range of literals that can be assigned to registers are arcane. The assembler does its best to generate the shortest opcode.

Direct mode

This example places the contents of address 23 in RAX. Direct addresses have to be specifically defined, using the PTR or [] directives:


  variable foobar
  ...
  MOV RAX, PTR 23
  MOV RAX , [] 23
  mov rcx, [] foobar

Both the above code fragments also place the contents of address 23 in RAX.

Base + displacement

Intel define an addressing mode using a base and a displacement. In this mode, the effective address is calculated by adding the displacement to the contents of the base register. An example:

  MOV RBX , # foobar
  MOV RAX , 10 [RBX]

In this example, RAX is filled with the contents of address foobar+10.

The assembler lays down different modes for displacements of 8-bit or 32-bit size, but this is internal to the assembler. The following registers may be used as base registers with a displacement:

[RAX] [RCX] [RDX] [RBX] [RBP] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R12] [R13] [R14] [R15]

If the displacement is zero then the assembler internally defines the mode as Base only. However, the displacement of zero must be supplied to the assembler:

MOV RBX , # 0100
MOV RAX , 0 [RBX]

This places in RAX the contents of address 100 (pointed to by RBX).

The following registers may be used as a base with no displacement:

[RAX] [RCX] [RDX] [RBX] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R12] [R13] [R14] [R15]

Base + index + displacement

The 80386 also allows two registers to be used to indirectly address memory. These are known as the base and the index. Such instructions are of the form:

MOV RAX , # 100
MOV RBX , # 200
MOV RDX , 10 [RAX] [RBX]

This will place in RDX the contents of address 100+200+10, or address 310. RAX is the base and RBX is the index. Again, the displacement may be 8-bits, 32-bits or have a value of zero. The assembler distinguishes between these three cases. The base and index registers may be any of the following:

[RAX] [RBX] [RCX] [RDX] [RSI] [RDI]
[R8] [R9] [R10] [R11] [R14] [R15]

In addition, [RBP] may be used as the index register, and [RSP] may be used as the base register.

Base + index*scale + displacement

The 80386 further supports an addressing mode where the index register is automatically scaled by a fixed amount - either 2, 4 or 8. This is designed for indexing into two-dimensional arrays of elements of size greater than byte-size. One register may be used as the first index, another for the second index, and the word size becomes implicit in the instruction. The form of this addressing mode is very similar to that outlined above, with the exception that the index operand includes the number which is the scale:

  MOV RBX , # 100
  MOV RCX , # 2
  MOV RAX , 10 [RBX] [RCX*4]

This stores into RAX, the contents of address 100+(4*2)+10, or address 118. The list of registers which may be used as base is the same as the above. The list of scaled indexes is as follows:

  [RAX*2] [RCX*2] [RDX*2] [RBX*2] [RBP*2] [RSI*2] [RDI*2]
  [RAX*4] [RCX*4] [RDX*4] [RBX*4] [RBP*4] [RSI*4] [RDI*4]
  [RAX*8] [RCX*8] [RDX*8] [RBX*8] [RBP*8] [RSI*8] [RDI*8]
  [R8*2] [RCX*2] [RDX*2] [RBX*2] [RBP*2] [RSI*2] [RDI*2]
  [R8*4] [RCX*4] [RDX*4] [RBX*4] [RBP*4] [RSI*4] [RDI*4]
  [R8*8] [RCX*8] [RDX*8] [RBX*8] [RBP*8] [RSI*8] [RDI*8]

Segment overrides

Some instructions may be prefixed with a segment override. These force data addresses to refer to a segment other than the data segment. The override must precede the instruction to which it relates:

  MOV RBX , # 100
  ES: MOV RAX , 10 [RBX]

This will set RAX to the value contained in address 110 in the extra segment. The list of segment overrides is:

  FS: GS:

Data size overrides

The default data sizes are are the default data sizes the assembler will use. If the data is of a different size a data size override will have to be used. To define the size of the data the following size specifiers are used:

  BYTE or B.
  WORD or W.
  DWORD or D.
  QWORD
  TBYTE
  FLOAT
  DOUBLE
  EXTENDED

It is only necessary to specify size when ambiguity would otherwise arise. For example:


   MOV   0 [RDX], # 10  \ can't tell
   MOV   0 [RDX], RAX   \ RAX specifies 64 bit

The BYTE size defines that a byte operation is required:

  MOVZX RAX , BYTE 10 [RBX]

The abbreviation B. may also be used in place of BYTE to define a byte operation. The WORD specifier defines that 16-bits are required:

  MOV AX , WORD 10 [RBX]

The abbreviation W. may also be used to define a word operation. DWORD is the default for a USE32 segment, and indicates that 32-bit data is to be used:

MOV RAX , DWORD 10 [RBX]
FSTP DWORD 10 [RBX]

The abbreviation D. may also be used to specify a DWORD operation. The remaining size specifiers define data sizes for the floating point unit.

QWORD defines a 64-bit operation:

FSTP QWORD 10 [RBX]

TBYTE defines a 10-byte (80-bit) operation, such as:

FSTP TBYTE 10 [RBX]

FLOAT, DOUBLE and EXTENDED are synonyms for DWORD, QWORD and TBYTE respectively.

The segment type defines the default data size and address size for the code in the segment. If needed, it is possible to force the data size or the address size laid down to be the other. There is a set of data and address size overrides which work for one instruction only. These are:

  D16:
  D16: MOV RAX , # 23

In a USE32 or USE64 segment, this would lay down 16-bit data to be loaded into AX. D16: is almost never needed for 64 bit programming.

Near and far, long and short

The default for a JMP or a CALL is within the current code segment, whilst the default for a conditional branch is a short branch with a -128..+127 byte range. The directives supporting short/long and near/far are:

  SHORT  LONG

These would be used as follows:


2 CONSTANT THAT         \ the segment number
LBL: THIS               \ the address

CALL THIS
JMP THIS

JCC THIS
JCC SHORT THIS
JCC LONG THIS

For compatibility with older MPE assemblers the mnemonics CALL/F, RET/F and JMP/F are also provided.

Syntax exceptions

The assembler in VFX Forth follows both the syntax and the mnemonics defined in the Intel Programmers Reference books. However, there are certain exceptions. These are listed below.

The zero operand forms of certain stack register instructions for the 80387 have been omitted. Their functionality is supported however. Such instructions are listed below, with a form of the syntax which will support the function:


FADD    FADDP ST(1) , ST
FCOM    FCOM ST(1)
FCOMP   FCOMP ST(1)
FDIV    FDIVP ST(1) , ST
FDIVR   FDIVRP ST(1) , ST
FMUL    FMULP ST(1) , ST
FSUB    FSUBP ST(1) , ST
FSUBR   FSUBRP ST(1) , ST
Certain 80386 instructions have either one operand or two operands, of which
only one is variable. These instructions are:
MUL DIV IDIV NEG NOT

These instructions take only one operand in the VFX Forth assembler.

Local labels

If you need to use labels within a code definition, you may use the local labels provided. These are used just like labels in a normal assembler, but some restrictions are applied.

Ten labels are pre-defined, and their names are fixed. Additional labels can be defined up to a maximum of 32. There is a limit of 128 forward references. A reference to a label is valid until the next occurrence of LBL:, CODE or ;CODE, whereupon all the labels are reset.

A reference to a label in a definition must be satisfied in that definition. You cannot define a label in one code definition and refer to it from another.

The local labels have the names L$1 L$2 ... L$10 and these names should be used when referring to them e.g.

  JNE L$5

A local label is defined by words of the same names, but with a colon as a suffix:

  L$1: L$2: ... L$10:

Additional labels (up to a maximum of 32 altogether) may be referred to by:

  n L$

where n is in the range 11..32 (decimal), and they may be defined by:

n L$:

where n is again in the range 11..32 (decimal).

CPU selection

This assembler is designed to cope with CPUs from 80386 upwards. Some instructions are only available on later CPUs. Note that CPU selection affects the assembler and the VFX code code generator, not the run time of your application. If you select a higher CPU level than the application runs on, incorrect operation will occur.

 CPU=x64   \ -- ; select base AMD64 instruction set

Code examples

The best place to look for code examples is in the source code. The file Kernel/x64Com/reqdcode.fth contains the code definitions required by the VFX64 kernel.


code Nrev     \ XN..X1 count -- x1..XN
\ *G Reverse the order of the top N data stack items.
  cmp     rbx, # 1              \ ignore count <1
  g, if,
    dec     rbx                 \ item n at offset n-1
    mov     rcx, rbp            \ data stack pointer
    shl     rbx, # 3            \ in cells
    add     rbx, rcx            \ RBX points to XN, RCX to X1
    begin,
      mov     rdx, 0 [rbx]      \ perform exchange
      mov     rax, 0 [rcx]
      mov     0 [rcx], rdx
      add     rcx, # cell
      mov     0 [rbx], rax
      sub     rbx, # cell
      cmp     rcx, rbx
    a, until,
  endif,
  mov     rbx, 0 [rbp]
  add     rbp, # cell
  next,
end-code

Assembler structures

The assembler includes structure words modelled after the usual Forth structures

Generating new instructions

Some processors, especially those used for embedded applications, have processor-specific instructions that are extensions to the AMD64 instruction set. In order to use these, a few facilities are available.

: dxb            \ b -- ; lay byte
Lay a byte into the instruction stream. Use in the form:

  dxb $55

: dxw            \ w -- ; lay 16 bits
Lay a 16-bit word into the instruction stream. Use in the form:

  dxw $55AA

: dxl            \ l -- ; lay 32 bit long
Lay a 32-bit dword into the instruction stream. Use in the form:

  dxl $11223344

: dxx            \ l -- ; lay 64 bit xword
Lay a 64-bit dword into the instruction stream. Use in the form:

  dxx $1122334455667788

: $             \ -- chere
Return the PC value of the start of the instruction.