Library files

The Lib folder/directory contains tools maintained and and periodically updated by MPE. The contents of Lib differ between the Windows, Linux, OS X and DOS versions as some of the tools are operating system specific.

Building cross references

Introduction

Cross reference information helps you to manage your source code. When LIB\XREF.FTH is loaded you can use XREF <name> to find out in which other words <name> is used. You can also find out which words you defined but did not use. XREF is precompiled in the Studio version of VFX Forth but not in the base version.

The compiler generates cross references by building a chain of fields including LOCATE format (link:32, xt:32, line#:32) in a separate area of memory. Links and pointers are relative to the start of the XREF memory area.

Two chains are maintained. The first produces a chain of where a word is used, so that the user can find out where (say) DUP is used. The second produces a chain of which words and literals are called in order. This is the basis of decompilation and debugging.

Initialisation

XREF is initialised by the switch +XREFS and is terminated by -XREFS. You must use +XREFS to turn on the production of cross reference information.

By default 1Mb of cross reference memory is allocated from the heap. If you need more than this for a very large application, use the phrase <n> XREF-KB to set the size of the cross reference memory, where <n> is in kilobytes.

Decompilation and SHOW

Because the VFX code generator optimises so heavily, there is no direct relationship between the binary code and the source code. Consequently DIS and DASM use disassembly and special cases, but cannot produce a good approximation to the original source code.

The cross reference information includes a decompilation chain. When you use SHOW <name> the cross reference information is used to produce a machine decompilation. This includes none of the comments from the original source code, and is machine formatted.

Extending SHOW

The decompilation produced by SHOW is mostly default and automatic. However, some words such as string handling take in line data which would not be displayed by SHOW without special handling.

SHOW can be extended by adding items to the DCC-SWITCH chain. The stack effect of the action is: addrx -- addr ; where addrx is the offset of the cross reference packet in the cross reference information memory. See the /REF[X] structure in LIB\XREF.FTH for details of the structure of this data packet. The example below is for a word X" which takes an in-line string like S".


[+switch dcc-switch
  ' X"       run:  ." X"  [char] " emit  dup .$inline  ;
switch]

Note that unlike previous VFX Forth decompilers, SHOW is based on cross reference information which references the source word without knowledge of what it compiles. The only reasons for special cases are control of the decompilation layout and display of associated data to reconstruct source code.

Glossary

: dump(x)       \ offset len --
Displays the specified contents of the XREF table. Note that the given address is an offset from the start of the XREF table.

: init-xref     \ --
Initialise XREF memory and information if not already set up.

: term-xref     \ --
Free up XREF memory.

: save-xref     \ -- ; save XREF memory to file
Save the cross reference memory to disc. Unless the file name has been changed by XREF: <filename> the file will be called XREF.XRF.

: load-xref     \ -- ; reload XREF file from disc
Load the cross reference memory from disc. Unless the file name has been changed by XREF: <filename> the file used will be XREF.XRF.

: xref:         \ "filename" -- ; enable XREFs
Use in the form XREF: <filename> to define the file that SAVE-XREF and LOAD-XREF will use.

: xref-kb       \ n --
Specifies the size of the cross reference memory in kilobytes. By default this is 1024 kb, or 1Mb.

: +xrefs        \ -- ; enable XREF
Initialises the cross reference system if it has not already been initialised, and enables production of cross reference information.

: -xrefs        \ -- ; disable XREF
Stops production of cross reference information, which can be restarted by +XREFS. Cross reference memory is not erased or released. Thus, restarting with +XREFS will retain information. To release all previous information use TERM-XREF before +XREFS.

: xref-report   \ -- ; display XREF information
Displays some statistics about cross reference memory usage.

: WalkXref      \ xt1 xt2 -- ; XREF of XT1 using XT2 to display.
Used by application tools to walk the XREF chain for XT1. The structure offset for each step in the chain is handled by XT2 ( offset -- ). Because writing XT2 requires use of the internal XREF structure, you must expose the XREFFER module: EXPOSE-MODULE XREFFER to get access to the words in Lib\XREF.FTH.

: (show)        \ xt -- ; show/decompile words used by this XT
Given an XT, produces a machine decompilation of the word using the cross reference information. If cross referencing is not enabled, no action is taken.

: $show         \ $addr --
Given a counted string, it is looked up as a Forth word name and (SHOW) produces a machine decompilation of the word using the cross reference information. If cross referencing is not enabled, no action is taken.

: show          \ -- ; SHOW <name>
The following name is looked up as a Forth word name and (SHOW) produces a machine decompilation of the word using the cross reference information. If cross referencing is not enabled, no action is taken.

: hasXref?      \ xt -- flag ; true if word has XREF info
produces TRUE if xt has XREF information otherwise FALSE is returned.

: hasXDecomp?   \ xt -- flag ; true if word has XREF decompilation info
produces TRUE if xt has XREF decompilation information otherwise FALSE is returned.

: WalkDecomp    \ xt1 xt2 -- ; DECOMP of XT1 using XT2 to display.
Used by application tools to walk the decompilation chain for XT1. The structure offset for each step in the chain is handled by XT2 ( offset -- ). Because writing XT2 requires use of the internal XREF structure, you must expose the XREFFER module: EXPOSE-MODULE XREFFER to get access to the words in Lib\XREF.FTH.

: FindXrefInfo  \ pc xt -- info | 0 ; finds xref packet corresponding to PC
Given the current PC and the XT of the word the PC is in, FindXrefInfo returns a pointer to an XREF packet if the PC is at an exact compilation boundary, otherwise it returns zero.

: FindXrefNearest       \ pc xt -- info|0
Given the current PC and the XT of the word the PC is in, FindXrefNearest returns a pointer to the Xref packet for the address at or less than the PC. If no Xref information is available for the word, zero is returned.

: GetXrefPos    \ info -- startpos len line addr
Given a pointer to an XREF packet, GetXrefPos returns the position, name length, line number of the source text in the source file, and the value of HERE at the time of compilation.

: NextXref      \ info1 -- info2
Steps to the next info packet, given the offset of the previous.

: xref          \ -- ; XREF <name>
Use in the form XREF <name> to display where <name> is used.

: uses          \ -- ; synonym for XREF
A synonym for XREF above.

: xref-all      \ -- ; cross reference all words
Produces a cross reference listing of all the words with cross reference information. This information is often too long to be directly useful, but can be pasted from the console to an editor for sorting, printing, and other post-processing.

: xref-unused   \ -- ; cross reference all words
Produces a cross reference listing of all the unused words with cross reference information. This information is often too long to be directly useful, but can be pasted from the console to an editor for sorting, printing, and other post-processing.

: ttx-set       \ xt -- ; xt TTX-SET "<text>"
The quoted string is saved as the tooltip text for the word whose xt is given, e.g.

  ' dup ttx-set "x -- x x ; duplicate top item on stack"

: ttx-get       \ xt -- caddr len
Given an xt, return the tooltip text for the word.

: ttx?          \ xt -- flag
Return true if the word whose xt is given has a tooltip.

Extended String Package

This optional wordset found in /Lib/StringPk.fth contains the following definitions to aid in the manipulation of counted strings.

: $variable     \ #chars "name" --
Create a string buffer with space reserved for #chars characters

: $constant     \ "name" "text" --
Create a string constant called "name" and parse the the closing quotes for the content.

: ($+)          \ c-addr u $dest --
Add the string described by C-ADDR U to the counted string at $DEST. This word is now in the kernel.

: $+            \ $addr1 $addr2 --
Add the counted string $ADDR1 to the counted buffer at $ADDR2. This word is now in the kernel.

: $left         \ $addr1 n $addr2 --
Add the leftmost N characters of the counted string at $ADDR1 to the counted buffer at $ADDR2.

: $mid          \ $addr1 s n $addr2 --
Add N characters starting at offset S from the counted string at $ADDR1 to the counted buffer at $ADDR.

: $right        \ $addr1 n $addr2 --
Add the rightmost N characters of the counted string at $ADDR1 to the counted buffer at $ADDR2.

: $val          \ $addr -- n1..nn n
Attempt to convert the counted string at $ADDR1 into a number. The top-most return item indicates the number of CELLS used on stack to store the return result. 0 Indicates the string was not a number, 1 for a single and 2 for a double. $VAL obeys the same rules as NUMBER?.

: $len          \ $addr -- len
Return the length of a counted string. Actually performs C@ and is the same as COUNT NIP.

: $clr          \ $addr --
Clear the contents of a counted string. Actually sets its length to zero. Primarily used to reset buffers declared with $VARIABLE.

: $upc          \ $addr --
Convert the counted string at $ADDR to uppercase. This acts in place.

: $compare      \ $addr1 $addr2 -- -1/0/+1
Compare two counted strings. Performs the same action as the ANS kernel definition COMPARE except that it uses counted strings as input parameters.

: $<            \ $1 $2 -- flag
A counted string equivalent to the numeric < operator. Uses $COMPARE then generates a well - formed flag.

: $=            \ $1 $2 -- flag
A counted string equivalent to the numeric = operator. Uses $COMPARE then generates a well - formed flag.

: $>            \ $1 $2 -- flag
A counted string equivalent to the numeric > operator. Uses $COMPARE then generates a well - formed flag.

: $<>           \ $1 $2 -- flag
A counted string equivalent to the numeric <> operator. Uses $COMPARE then generates a well-formed flag.

: $instr        \ $1 $2 -- false | index true
Look for an occurance of the counted string $2 within the string $1. If found then the start offset within $1 is returned along with a TRUE flag, otherwise FALSE is returned.

Extensible CASE Mechanism

A CHAIN is an extensible version of the CASE..OF..ENDOF..ENDCASE mechanism. It is very similar to the SWITCH mechanism described in the Tools and Utilities chapter.

: case-chain    \ -- addr ; -- addr                              MPE.0000
Begin initial definition of a chain

: item:         \ addr n -- addr ;                               MPE.0000
Begin definition of a conditional code block

: end-chain     \ addr --                                        MPE.0000
Flag the end of the current block of additions to a chain

: in-chain?     \ n addr -- flag ;                               MPE.0000
Return TRUE if N is in the chain beginning at ADDR

: exec-chain?   \ i*x n addr -- j*x true | n FALSE               MPE.0000
Run through a given chain using TOS as a selector. If a match is made execute the relevant code block and return TRUE otherwise the initial selector and a FALSE flag is returned.

Using the chain mechanism


CASE-CHAIN <foo>
  <n> ITEM: <words> ;
  <m> ITEM: <words> ;
  <k> ITEM: <words> ;
END-CHAIN

More items can be added later:


<foo>
  <x> ITEM: <words> ;
  ...
END-CHAIN

The data structures are as follows:

CASE-CHAIN <foo> generates a variable that points to the last item added to the list.


ITEM: generates two cells and a headerless word:
  selector
  link
  headerless word .... exit

XML support

The code in Lib\XML.fth contains support for parsing XML input and outputting XML using TYPE and friends. The parser is derived from Jenny Brien's JenX parser published at EuroForth and in the magazine ForthWrite. Additional code was taken from a a modified JenX parser by Leo Wong. The generic XML description is by permission of Willem Botha of Construction Computer Software (http://www.ccssa.com).

Additional tools required for XML handling are contained in this file. These may be moved to Lib\Win32\Helpers.fth in the future.

Why XML

Since XML is non-proprietary and easy to read and write, it’s an excellent format for the interchange of data among different applications.

XML is a non-proprietary format, not encumbered by copyright, patent, trade secret, or any other sort of intellectual property restriction. It has been designed to be extremely powerful, while at the same time being easy for both human beings and computer programs to read and write. Thus it’s an obvious choice for exchange languages.

By using XML instead of a proprietary data format, you can use any tool that understands XML to work with your data.

XML is ideal for large and complex documents because the data is structured. It not only lets you specify a vocabulary that defines the elements in the document; it also lets you specify the relations between elements.

XML also provides a client-side include mechanism that integrates data from multiple sources and displays it as a single document.

XML doesn’t operate in a vacuum. Using XML as more than a data format requires interaction with a number of related technologies. These technologies include HTML for backward compatibility with legacy browsers, the CSS and XSL stylesheet languages, URLs and URIs, the XLL linking language, and the Unicode character set.

Cascading Style Sheets

Since XML allows arbitrary tags to be included in a document, there isn’t any way for the browser to know in advance how each element should be displayed. When you send a document to a user you also need to send along a style sheet that tells the browser how to format individual elements. One kind of style sheet you can use is a Cascading Style Sheet (CSS).

CSS, initially designed for HTML, defines formatting properties like font size, font family, font weight, paragraph indentation, paragraph alignment, and other styles that can be applied to particular elements.

It’s easy to apply CSS rules to XML documents. You simply change the names of the tags you’re applying the rules to.

Extensible Style Language

The Extensible Style Language (XSL) is a more advanced style-sheet language specifically designed for use with XML documents. XSL documents are themselves well-formed XML documents.

XSL documents contain a series of rules that apply to particular patterns of XML elements. An XSL processor reads an XML document and compares what it sees to the patterns in a style sheet. When a pattern from the XSL style sheet is recognized in the XML document, the rule outputs some combination of text.

XSL style sheets can rearrange and reorder elements. They can hide some elements and display others. Furthermore, they can choose the style to use not just based on the tag, but also on the contents and attributes of the tag, on the position of the tag in the document relative to other elements, and on a variety of other criteria.

URLs and URIs

XML documents can live on the Web, just like HTML and other documents. When they do, they are referred to by Uniform Resource Locators (URLs), just like HTML files.

Although URLs are well understood and well supported, the XML specification uses the more general Uniform Resource Identifier (URI). URIs are a more general architecture for locating resources on the Internet, that focus a little more on the resource and a little less on the location. In theory, a URI can find the closest copy of a mirrored document or locate a document that has been moved from one site to another.

XLinks and XPointers

As long as XML documents are posted on the Internet, you’re going to want to be able to address them and hot link between them. Standard HTML link tags can be used in XML documents, and HTML documents can link to XML documents.

XML lets you go further with XLinks for linking to documents and XPointers for addressing individual parts of a document.

XLinks enable any element to become a link, not just an A element. Furthermore, links can be bi-directional, multidirectional, or even point to multiple mirror sites from which the nearest is selected. XLinks use normal URLs to identify the site they’re linking to.

XPointers enable links to point not just to a particular document at a particular location, but to a particular part of a particular document. An XPointer can refer to a particular element of a document, to the first, the second, or the 17th such element, to the first element that’s a child of a given element, and so on. XPointers provide extremely powerful connections between documents that do not require the targeted document to contain additional markup just so its individual pieces can be linked to it. XPointers don’t just refer to a point in a document. They can point to ranges or spans.

How the Technologies Fit Together

XML defines a grammar for tags you can use to mark up a document. An XML document is marked up with XML tags. The default encoding for XML documents is Unicode.

Among other things, an XML document may contain hypertext links to other documents and resources. These links are created according to the XLink specification. XLinks identify the documents they’re linking to with URIs (in theory) or URLs (in practice). An XLink may further specify the individual part of a document it’s linking to. These parts are addressed via XPointers.

If an XML document is intended to be read by human beings—and not all XML documents are—then a style sheet provides instructions about how individual elements are formatted. The style sheet may be written in any of several style-sheet languages. CSS and XSL are the two most popular style-sheet languages, though there are others including DSSSL—the Document Style Semantics and Specification Language—on which XSL is based.

Using the XML Parser

All parsing is processed using the input stream. This allows XML files to be parsed by INCLUDE, and strings from sockets to be processed by EVALUATE.

The XML parser parses tags "<...>" and the text between them, called the contents. Inside a tag the text is separated into the tag name and the attribute name/value pairs 'name="value"'. Everything is held as text. Nested tags are supported. Three DEFERred words, doTags ( -- ), doContents ( -- ) and doAttribute ( val vlen name nlen -- ) must be supplied by the application to handle the data. These words are documented later. Their default action is to display the data so that you can see what has been processed.

The parser just generates and isolates the text. It is up to your application how the data is processed by the three words above. When a tag is processed, the tag handling routine can find the current tag name, the tag type, any attributes and the preceeding contents. The most common way to process tags and data is to ignore the contents before an opening tag, but to handle attributes. At the closing tag, the contents represent the data to be processed. Closing tag names include the leading '/' character so that opening and closing tags can be distinguished by name as well as status.

Generating XML output

Simple facilities are provided for generating XML text and tags from various types of data. These are designed to allow other scripting tools to generate XML output.

Tools

This section contains general-purpose tools which may be useful in other applications.

1 value .UnknownXML?    \ -- flag
If non-zero (default), show unknown XML tags and attributes.

Strings

: movex         \ src dest len --
An optimised version of MOVE.

: csplit        \ addr len char -- raddr rlen laddr llen
Extract a substring at the start of addr/len, returning the string raddr/rlen which includes char (if found) and the string laddr/llen which contains the text to left of char. If the string does not contain the character, raddr is addr+len and rlen=0.

: #>c           \ caddr u -- char
Converts a decimal or hexadecimal number to a single integer.

In XML white space is defined by tab and CR. Under some circumstances LF may also be treated as white space.

: skip-white    \ caddr u -- caddr' u'
Remove leading white space.

: scan-black    \ caddr u -- caddr' u.
Remove leading spaces and control characters.

: scan-quote    \ caddr u -- caddr' u'
Step forward until either a single or a double quote character is found. The returned string includes the quote character.

: scan-white    \ caddr u -- caddr' u'
Step to next white space character.

: -trailing-white       \ caddr u -- caddr' u'
Remove trailing white space.

: -leading-white        \ caddr u -- caddr' u'
Remove leading white space. A synonym for skip-white.

: -white        \ caddr u -- caddr' u'
Remove leading and trailing white space.

: >bl           \ addr u -- addr u
Convert control characters to spaces.

Gregorian calendar

The output formats are:

: date>         \ day month year -- ud ; see month codes
Convert a day/month/year into a Gregorian day number.

1 1 1980 date> 2constant date0  \ -- ud
Defines day 0 as 1 Jan 1980 for dates.

: sdate>        \ day month year -- u
Convert a day/month/year to a single day integer based as above.

: >sdate        \ u -- day month year
Convert a single day integer to day/month/year

Day time

Time of day may be stored as a single integer count of seconds. These routine provide conversion into secs/mins/hours format.

#24 #60 * #60 * constant secs/day       \ -- 86400
Seconds per day.

#60 #60 * constant secs/hr      \ -- 3600
Seconds per hour.

#60 constant secs/min           \ -- 60
Seconds per minute.

#60 constant mins/hr            \ -- 60
Minutes per hour.

#24 constant hrs/day            \ -- 24
Hours per day.

: tod>          \ ss mm hh -- secs
Convert a time of day in ss/mm/hh form to a single integer.

: >tod          \ secs -- ss mm hh
Convert a seconds integer to ss/mm/hh form.

Stackpads

Stackpads are effectively string stacks. String lengths are kept as cells. Stackpads can be in statically (ALLOTed) or dynamically (ALLOCATEd) memory. A stackpad must be initialised by SINIT before use and terminated by STERM after use. In this implementation, defined stackpads are initialised at COLD and terminated at BYE.

Strings on a stackpad are held in the following format, where u is the length of the string in bytes:


len   contents
 u    string text
 ?    padding to cell boundary
 cell u

The stackpad's top of stack pointer points to the length cell of the top item. To provide a valid cell, a zero length item is always created when the stackpad is initialised. Because the length cell is after the text, it is easy to manipulate the end of a string, to find the start address and to discard a string.

The requirement to align the length cell adds a little complexity, but permits portability to processors which require data alignment, e.g. ARM, and improves speed on PCs. Stackpads are controlled using the /stackpad structure below. The sp.ptos field contains the stack pointer. The sp.buff field permits underflow checks. The sp.len field permits overflow checks. The other fields allow for automatic instantiation and termination of dynamically allocated stackpads. Implementations without error checking only need the stack pointer and could use the first cell of the buffer as the stack pointer.

struct /stackpad        \ -- len
Structure defining a stackpad.

variable spChain        \ -- addr
Anchors the linked list of defined stackpads.

: sSpad:        \ len -- ; -- spad
Create a static stackpad with ALLOTed control area and data buffer.

: mSpad:        \ len -- ; -- spad
Create a mixed stackpad with an ALLOTed control area and an ALLOCATEd buffer.

: newSpad       \ len -- spad
Create a dynamic stackpad with ALLOCATEd control area and data buffer. A THROW occurs if the memory cannot be allocated.

: sinit         \ spad --
Initialise a stackpad. A THROW occurs on error.

: sterm         \ spad --
Release dynamic memory if the given stackpad has it.

: initSpads     \ --
Initialise all defined stackpads. Performed at COLD.

: termSpads     \ --
Clean up all defined stackpads, releasing any dynamically allocated memory. Performed at BYE.

: -align        \ caddr -- addr'
Align a byte address to the previous cell boundary. N.B. This word assumes a byte addressed 32 bit Forth.

: >spstr        \ lp -- caddr u
Given a pointer to a length cell, return the string.

: >sps          \ lp -- caddr
Given a pointer to a length cell, find the start of the string.

: >spe          \ lp -- caddr
Given a pointer to the length cell, find the address of the character after the string.

: spush         \ caddr u spad --
Push a string onto the stackpad.

: stos          \ spad -- caddr u
Return the address and length of the top string. The string is not popped.

: sdrop         \ spad --
Discard top string from stackpad.

: spop          \ spad -- caddr u
Return the address and length of the top string. The string is popped. Note that the stackpad cannot safely be used until all further processing of the string has been performed.

: snew          \ spad --
Add a zero-length string.

: sappend       \ caddr u spad --
Add the given string to the top stackpad string.

: s+char        \ char spad --
Add the given character to the top stackpad string.

: .spad         \ spad --
Display the strings on a stackpad.

Servants

Servants are a solution to CASE statements involving strings. A wordlist is defined to hold the actions required when a string is matched, the word names forming the strings to be matched. A default action must be specified. Note that in MPE Forths, the name search is case insensitive. Note also that without extensions to the word creation mechanism, the Because the strings are isolated in wordlists, calls may be nested.

: (Servant)     \ i*x caddr u wid xt -- j*x
Looks up caddr/u in the wid wordlist. If the word is found, it is executed. If the word is not found, the caddr/u string is passed to the default action xt which is executed.

: servant       \ wid xt -- ; i*x caddr u -- j*x
Servant creates a word that looks up caddr/u in a given wordlist and executes the matching word if found or a default word if not found. Servant is supplied with the wid of the wordlist and the xt of the default action.

: creation      \ wid --
Perform CREATE, but define the word in the specified wordlist.

: def:          \ wid --
Perform :, but define the word in the specified wordlist.

XML input parser

Required data and structures

cell +user CurrSpad     \ -- addr
Holds the address of the stackpad being used for output.

cell +user RefillStatus \ -- addr
Holds non-zero when REFILL has failed.

#32 kb mSpad: TagText   \ -- spad
Stackpad for tag text <tag ....>.

#32 kb mSpad: Contents  \ -- spad
Stackpad for everything not in a tag.

#32 kb mSpad: Attribs   \ -- spad
Stackpad for attribute handling in tags.

XML entities

In XML code the special characters and numbers are encoded in the form:

  &xxx;

This code allows substitution of the original character.

: UnknownEntity \ caddr u --
The default action is to check for a number, and if that fails just to pass the string to the output buffer. Note that the string includes the leading '&' but not the trailing ';'.

wordlist constant entity?       \ -- wid
The private wordlist used to contain action words for known entities.

: centity       \ char -- ; --
Children of this defining word add a character to the current stackpad. The words are used by the servant DENT below.

The following standard XML entities are predefined:

char < centity &LT
char > centity &GT
char ' centity &APOS
char " centity &QUOT
char & centity &AMP

entity? ' UnknownEntity servant dent    \ caddr u --
A servant which converts known entities and XML numbers of the form &#xxx; to characters or just copies the string to the current stackpad.

: dents+        \ caddr u --
Add the string to the top of the current stackpad, decoding and translating any entities.

Tag input

: .Tag          \ --
Default action of doTags below.

: .Contents     \ --
Default action of doContents below.

: .Attribute    \ val vlen name nlen --
Display the attribute name and value strings.

defer doTags            \ --
User defined action (default .Tag) that handles tag strings. The tag handlers are responsible for all handling of the contents stackpad. The top string on the *fo{TagText} stackpad is discarded after processing the tag text.

defer doContents        \ --
User defined action (default .Contents) that handles content strings. The contents stackpad is not discarded by doContents.

defer doAttribute       \ val vlen name nlen --
Process an attribute given strings for the value and name. The default action is to display the attrubte.

: DefXML        \ --
Set the default XML handlers.

vocabulary inputTags    \ --
Vocabulary containing tag actions on input.

' inputTags voc>wid constant widInputs  \ -- wid
Wordlist containing tag actions on input.

vocabulary outputTags   \ --
Vocabulary containing tag actions on output.

' outputTags voc>wid constant widOutputs        \ -- wid
Wordlist containing tag actions on output.

#256 buffer: CurrName   \ -- addr
Buffer for the current tag name. Held as a counted string. For multi-threaded use this should be redefined as thread-local storage.

#256 buffer: LastName
Buffer for the previous tag name. Held as a counted string. For multi-threaded use this should be redefined as thread-local storage.

variable TagStatus
Status indicator for the current tag. For multi-threaded use this should be redefined as thread-local storage. The tag status is a bit mask in the bottom 16 bits of a cell The upper 16 bits are reserved for application use.

  $0000 equ OPENING_TAG
  $0001 equ CLOSING_TAG
  $0002 equ EMPTY_TAG
  $0100 equ PI_TAG
  $0200 equ SPECIAL_TAG

variable LastStatus
Status indicator for the previous tag. For multi-threaded use this should be redefined as thread-local storage.

: defInputTag   \ caddr u --
The default action for an unknown tag is to display the content and tag strings.

widInputs ' defInputTag servant doInputTag      \ caddr u --
Processes input tags given a tag name string.

: getTagName    \ caddr u -- caddr' u' name nlen
From the given string, return the remaing string and the tag name, which is the first whitespace delimited token. Note that tag names include leading '?' and '!' characters.

: getAttribName \ caddr u -- caddr' u' name nlen
From the given string, return the remaing string and the attribute name, which is the first whitespace delimited token before an '=' character.

: getAttribValue        \ caddr u -- caddr' u' value vlen
From the given string, return the remaing string and the attribute value string, which is enclosed by quotation marks ' or ".

: getAttribute  \ caddr u -- caddr' u'
From the given string extract an attribute name/value pair, pass it to the deferred word doAttribute and return the remaining string. Attributes are of the form:

  name = "value"

: SetTagStatus  \ --
Set the tag status for opening/closing/empty, and for processing instruction and specials (the !xxx tags).

: doTagText     \ caddr u --
Parse the tag text <text...> excluding the brackets, extracting the tag name and the attributes.

: RunInputTag   \ --
The tag handler action of doTags for active processing of XML tags.

: ActiveXML     \ --
Set the active XML handlers, so that known tags are processed.

XML Parser core

: AsFarAs          \ char -- flag caddr u
Parse input stream up to char, returning the extracted string.

: withText      \ newspad -- oldspad
Start a new string on the given stackpad for a block of processings and make it the current stackpad. Return the previous current stackpad

: doneText      \ oldspad --
Discard the current stackpad string and restore the previous stackpad.

: doXMLblock    \ char --
Collect input text up to the terminating character into the current stackpad, and expand entities.

: skipPast    \ c-addr u --
Step through the input stream for a string (not space delimited), REFILLing as necessary until the string is found or input is exhausted.

: doTagBlock    \ x --
Process a tag block "<name ... >" starting immediately after the leading '<' character. The tag text is discarded after the tag has been processed. If x is non-zero, the tag is initialised to "?xml"

: doContentBlock        \ --
Process a content block up to but not including the trailing '<' character.

: ReadXML       \ --
Read XML from the current input stream.

: <?xml         \ --
After <?xml has been executed, all further input is treated as XML source and handled by the XML parser.

Data content input and output

These words are factors that can be used when constructing systems that extract and produce data in XML files. When producing an XML file, data is output by primitives that take the address of the data. When reading an XML file, data is set by primitives that take a string and the address of the data.

XML text output

XML text output of tag or content data must not contain the special characters which must be converted to the standard entity format "&xxx;".

: XMLemit       \ char --
Output a character translating the special characters.

: XMLtype       \ caddr len --
Output a string translating the special characters.

Single and double integers

: ud#>cl        \ ud -- caddr len
Convert an unsigned double to a decimal text string.

: d#>cl \ ud -- caddr len
Convert a signed double to a decimal text string.

: cl>d#         \ caddr len -- d
Convert the string to a double number.

: cl>ud#        \ caddr len -- ud
Convert the string to an unsigned double number.

: ?i            \ addr --
Display the contents of a signed 32 bit integer.

: !i            \ caddr len dest --
Set the contents of a signed 32 bit integer.

: ?ui           \ addr --
Display the contents of an unsigned 32 bit integer.

: !ui           \ caddr len dest --
Set the contents of an unsigned 32 bit integer.

: ?d            \ addr --
Display the contents of a signed 64 bit integer in Forth format (high cell at low address).

: !d            \ caddr len dest --
Set the contents of a signed 64 bit integer in Forth format (high cell at low address).

: ?ud           \ addr --
Display the contents of an unsigned 64 bit integer in Forth format (high cell at low address).

: !ud           \ caddr len dest --
Set the contents of an unsigned 64 bit integer in Forth format (high cell at low address).

: ?dI           \ addr --
Display the contents of a signed 64 bit integer in Intel format (low cell at low address).

: !dI           \ caddr len dest --
Set the contents of a signed 64 bit integer in Intel format (low cell at low address).

: ?udI          \ addr --
Display the contents of an unsigned 64 bit integer in Intel format (low cell at low address).

: !udI          \ caddr len dest --
Set the contents of a signed 64 bit integer in Intel format (low cell at low address).

Floating point numbers

: cl>f#         \ caddr u -- ; F: -- f
Convert a string to a floating point number. If a conversion fault occurs, f is set to zero.

: ?fs           \ addr --
Display the contents of 32 bit float.

: !fs           \ caddr u dest --
Set the contents of a 32 bit float.

: ?fd           \ addr --
Display the contents of 64 bit float.

: !fd           \ caddr u dest --
Set the contents of a 64 bit float.

: ?ft           \ addr --
Display the contents of an 80 bit float.

: !ft           \ caddr u dest --
Set the contents of an 80 bit float.

Strings

: .string       \ caddr len --
Output the given string in XML format.

: ?cstring      \ caddr --
Output a Forth counted string.

: !cstring      \ caddr len dest --
Set a Forth counted string.

: ?wstring      \ caddr --
Output a word (16 bits) counted string

: !wstring      \ caddr len dest --
Set a word (16 bits) counted string

: ?lstring      \ caddr --
Output a cell (32 bits) counted string

: !lstring      \ caddr len dest --
Set a cell (32 bits) counted string

Time and date

: .xuw          \ u w --
Display the unsigned number u as w digits.

: .xdate        \ day month year --
Output a date in XML format "CCYY-MM-DD".

: .xtime        \ secs mins hours --
Output a time in XML format "HH-MM-SS".

: .xdateTime    \ secs mins hours day month year --
Output a date/time in XML format. No time zone is output.

: .tz           \ mins --
Output a time zone indicator as an offset from UTC in minutes.

: xdt-utc       \  secs mins hours day month year --
Output a date/time in XML format. UTC is indicated.

: xdt-zone      \  secs mins hours day month year zmins --
Output a date/time in XML format. The time zone is indicated by a signed offset in minutes.

Tag output

: .GenTag       \ caddr len --
Display the text as a tag "<...>". Standard entities are encoded.

: .GenTag+      \ attr alen name nlen --
Display attribute and tag name text as "<name attr>". Standard entities are encoded.

: .ClosingTag   \ caddr len --
Display the text as a closing tag "</...>". Standard entities are encoded.

: .EmptyTag     \ caddr len --
Display the text as an empty tag "<.../>". Standard entities are encoded.

Test code

initSpads
ActiveXML

Configuration files

Application configuration can be done in a number of ways, especially under Windows.

Registry

A user nightmare to copy from one machine to another

INI

files Very slow for large configurations (before mpeparser.dll)

binary

Usually incompatible between versions

database

Big and often similar to binary

Forth

Already there, needs changes to interpreter. Independent of operating system.

A solution to this problem is available in Lib/ConfigTools.fth. Before compiling the file, ensure that the file GenIO device from Lib/Genio/FILE.FTH has been compiled.

The Forth interpreter is already available, but we have to consider how to handle incompatibilities between configuration files and issue versions of applications. The two basic solutions are:

  • Abort on error
  • Ignore on error
  • The abort on error solution is already available - it just requires the caller of included to provide some additional clean up code.

    
    : CfgIncluded   \ caddr len --
      -source-files            \ don't add source file names
      ['] included catch
      if  2drop  endif         \ clean stack on error
      +source-files            \ restore source action
    ;
    

    In VFX Forth, INTERPRET is used to process lines of input. INTERPRET is DEFERred and the default action is (INTERPRET). The maximum line size (including CR/LF) is FILETIBSZ, which is currently 512 bytes. If we restrict each configuration unit to one line of source code, we can protect the system by ignoring the line if an error occurs. We also have to introduce the convention in configuration files that actions are performed by the last word on the line (except for any parsing). This action has to be installed and removed, leading to the following code.

    
    : CfgInterp     \ --
    \ Interprets a line, discarding it on error.
      ['] (interpret) catch
      if  postpone \  endif
    ;
    
    : CfgIncluded   \ caddr len --
    \ Interprets a file, discarding lines with errors.
      -source-files                 \ don't add source file names
      behavior interpret >r
      ['] CfgInterp is interpret
      ['] included catch
      if  2drop  endif              \ clean stack on error
      r> is interpret
      +source-files                 \ restore source action
    ;
    

    Loading and saving configuration files

    : CfgInterp     \ --
    A protected version of (INTERPRET) which discards any line that causes an error.

    : CfgIncluded   \ caddr len --
    A protected version of INCLUDED which discards any line that causes an error, and carries on through the source file.

    : [SaveConfig   \ caddr len -- struct|0
    Starts saving a configuration file. Creates a configuration file and allocates required resources, returning a structure on success or zero on error. On success, the returned struct contains the sid for the file at the start of struct.

    : SaveConfig]   \ struct --
    Ends saving a file device by closing the file, releasing resources and restoring the previous output device.

    : SaveConfig    \ caddr len xt --
    Save the configuration file, using xt to generate the text using TYPE and friends. The word defined by xt must have no stack effect.

    Loading and saving data

    We chose to support five type of configuration data:

  • Single integers at given addresses. This copes with variables directly and values with addr.
  • Double integers at given addresses.
  • Counted strings
  • Zero terminated strings
  • Memory blocks.
  • All numeric output is done in hexadecimal to save space, and to avoid problems with BASE overrides. All words which generate configuration information must be used in colon definitions.

    : \Emit         \ char --
    Output a printable character in its escaped form.

    : \Type         \ caddr len --
    Output a printable string in its escaped form.

    : .cfg$         \ caddr len --
    Output a string in its escaped form, characters in the escape table being converted to their escaped form. The string is output as Forth source text, e.g.

      s\" escaped text\n\n"

    : .sint         \ x --
    Output x as a hex number with a leading '$' and a trailing space, e.g.

      $1234:ABCD

    Single Integers

    Single integers are saved by .SintVar and .SintVal.

    ' (SintVar) SimpleCfg: .SIntVar \ "<name>" --
    Saves a single integer as a string. <name> must be a Forth word that returns a valid address. Generates

     $abcd <name> !

    Use in the form:

     .SIntVar MyVar

    ' (SintVal) SimpleCfg: .SIntVal \ "<name>" --
    Saves a VALUE called <name>. Generates

     $abcd to <name>

    Use in the form:

     .SIntVal MyVal

    Double Integers

    Double integers are saved by .DintVar.

    ' (DintVar) SimpleCfg: .DIntVar \ "<name>" --
    Saves a double integer as a string. <name> must be a Forth word that returns a valid address. Generates

     $01234 $abcd <name> 2!

    Use in the form:

     .SIntVar MyVar

    Counted strings

    Counted strings are saved by .C$CFG.

    ' (c$cfg) SimpleCfg: .C$var     \ "<name>" --
    Saves a string <name> must be a Forth word that returns a valid address. Generates

     s\" <text>" <name> place

    Use in the form:

     .C$Var MyCstring

    Zero terminated strings

    Zero terminated strings are saved by .Z$var.

    ' (z$cfg) SimpleCfg: .Z$var     \ "<name>" --
    Saves a zero terminated string at <name> which must be a Forth word that returns a valid address. The output consists of one or more lines of source code, following lines being appended to the first.

     s\" <text>" <name> zplace
     s\" <more text>" <name> zAppend
     ...

    Use in the form:

     .Z$var MyZstring

    Memory blocks

    Memory blocks are output by

      .Mem <name> len

    <Name> must be a Forth word that returns a valid address. Len must be a constant or a number. The output takes one of three forms, depending on len.

      bmem <name> num  $ab $cd ...
      wmem <name> num  $abcd $1234 ...
      lmem <name> num  $1234:5678 $90ab:cdef ...

    A block of memory is output by

      .Mem <name> len

    <Name> must be a Forth word that returns a valid address. Len must be a constant or a number.

    : BMEM          \ "<name>" "len" --
    Imports a memory block output in byte units by .Mem.

    : WMEM          \ "<name>" "len" --
    Imports a memory block output in word (2 byte) units by .Mem.

    : LMEM          \ "<name>" "len" --
    Imports a memory block output in cell (4 byte) units by .Mem.