David Smith 2022 firstname.lastname@example.org
SmithForth is an implementation of the Forth programming language for x86-64 desktop computers. SmithForth is a text interpreter that runs in a Linux text console.
You can use SmithForth as you would use any other standard Forth system. SmithForth follows the the Forth standard of 2012. The Forth standard describes some optional word sets concerning features like floating-point arithmetic and dynamic memory allocation that most programmers today would regard as standard, but the Forth community considers optional, perhaps because Forth runs on modest machines like microcontrollers. SmithForth does not yet have floating-point arithmetic, local variables, or file access.
When I consider using a programming environment, I expect it to work on my hardware, and I expect it to be concrete and simple as possible. Good luck finding tools that meet this standard. So I wrote SmithForth. Please take a look at the source code and the comments. SmithForth is implemented in the subroutine-threaded style, which I think is the most straightforward way. The Intel manual is our source of information on the x86 architecture.
We use none of the usual tools from the world of C, not even an assembler. SmithForth is implemented in two source files:
Machine code is converted from a human-readable format "DMP" to binary by xxd (hex dump) -r (reverse).
$ cut -d'#' -f1 SForth.dmp | xxd -p -r > SForth $ cat system.fs >> SForthThe Forth source text is grafted onto the binary.
Grant permission to execute the file on the Linux system.
$ chmod +x SForthYou should now have a working executable (if your system is similar enough to mine).
(Optional:) Advanced users might edit the kernel or system.fs. You must ensure that the ELF segment header entry p_filesz contains the number of bytes of the segment that appear in binary file SForth. In my design, that's the whole file, including ELF header, machine code, and system Forth. We prefer not to count those bytes by hand. An easy solution is:
My script make.sh does these steps automatically:
$ ./make.sh # if and only if you modified SForth.dmp or system.fs
You can use SmithForth interactively, or you can feed your program into the standard input stream.
$ ./SForth $ ./SForth < YourProgram.fs $ cat YourProgram.fs - | ./SForthThe last command allows you to use SmithForth interactively after your programs.
0, 1, 2, ..., 9, A, B, C, D, E, F, 10, 11, ..., 1F, 20, ..., FF, 100, 101, ...
00, 01, 02, ..., FE, FF.
3E hexadecimal = 0011 1110 binary,as 3 = 0011 and E = 1110. You may see bits of a binary number grouped in various ways. For example, x86 instruction "sub r/m32, r32" has a ModR/M byte written:
11 111 000.Bits 11 select an access mode for argument r/m32, and bits 000 select register EAX for that argument. The middle three bits 111 select register EDI. To convert this into hexadecimal, we regroup the eight bits:
1111 1000 binary = F8 hexadecimal (as 1111 = F and 1000 = 8).
The x86 architecture is little-endian in bytes (that is, little-endian in base 28) ...
When an integer is moved from a CPU register into memory, for example, the least significant byte appears first in memory.... but each byte is written as a big-endian pair of hexadecimal numerals. This custom is observed:
There are infinitely many integers, but digital machines have finitely many states. Integers processed by the Arithmetic Logic Unit (ALU) of the machine are more aptly called congruence classes, as the ALU performs modular arithmetic for addition, subtraction, and multiplication, where the modulus of the congruence relation is a power of 2, like 28. However, some operations involve a particular representative of a congruence class.
Two rules are commonly used to select a representative. They are:
0, 1, 2, ..., modulus - 1, and
0, 1, ..., modulus/2 - 1, (nonnegative) and
-1, -2, ..., modulus/2 (negative);but the machine has no "minus sign," so instead we use
modulus - 1, modulus - 2, ..., modulus/2 (negative).
For example, if the modulus is 28, the signed representatives are
00, 01, ..., 7F, (nonnegative) FF, FE, ..., 80 (negative).We can tell negative from nonnegative by the most significant bit.
Compiling in Forth is simple. The compiler sees a word W, looks it up in the dictionary, and appends to the body of the current dictionary entry a CALL instruction. The target of the CALL is the address of the body of the dictionary entry of word W.
CALL facilitates the use of subroutines, which normally return control to the point in the instruction stream where CALL occurred. There are other forms of flow control, mainly conditionals and loops.
: X ... flag IF ... ( do this if flag is nonzero ) ... THEN ... ; \ / -->-- ( skip to THEN if flag is zero ) ---->---
Forth has words IF and THEN for conditional execution. They are used in compilation mode. Normally IF is paired with an instance of THEN later in a word definition. When the interpreter reaches IF at run time, it examines the number at the top of the stack. If the number is nonzero, execution continues past IF. If the number is zero, execution jumps past THEN.
The microprocessor has flow-control features including jumps and conditional jumps. The Forth compiler, when it sees IF, might emit a conditional jump. A likely instruction of this kind in x86 is JZ, jump if zero. JZ has a parameter that indicates where to jump to, past THEN in this case. However, when the compiler first sees IF, it has not yet seen any THEN. So the plan is:
Forth implements this plan by using the data stack during compilation:
Forth offers several ways to write loops. One is BEGIN-WHILE-REPEAT.
--------------------------------<--------------------------- / \ : X ... BEGIN ... flag WHILE ... ( do this if flag is nonzero ) ... REPEAT ... ; \ / --->-- ( skip to REPEAT if flag is zero ) ---->----
Here are my opinions and motivations for this project. Forth is a simple programming language that achieved some fame back when personal computers were young. What else would run on such modest computers? Nowadays computers and languages are fancy and complex. I say Forth is still a good choice. Forth is simple enough that we may reason about it and understand it. Forth provides:
A good system helps us to solve our problems quickly. Some systems try to be comprehensive. Others keep it simple. SmithForth is small and flat enough that a programmer can understand it all. SmithForth has only a couple files. We use none of the usual tools from the world of C. You might argue that C makes programming simpler by insulating us from the workings of the machine. I might agree if:
I can imagine a system that contains its own documentation and C source code and tailors these to the computer on which it is installed in such a way that the user can study them and be unconcerned with other computers. No one does this, as far as I know.
Forth has been implemented many times. Some Forths have a small kernel so that the system can be ported easily to other architectures (by implementing only a small number of kernel words in machine code). But if you want the system to run well, you have to write more machine code anyway, don't you? My goal with SmithForth is not to stop writing machine code early, but to start writing Forth early. In SmithForth, our early Forth contains a lot of machine code.
I believe SmithForth is a good first Forth to learn and a good first programming environment. The shortened presentation may appeal to mathematicians.
If you want to know my ideas for the direction of this project, I would like to write extensions so that SmithForth becomes fast and suitable for scientific computing; and to add enough features of Lisp, Python, Awk, and Tcl that I wouldn't think to use those languages. I would like to adapt it to other machines and make it usable without an underlying operating system like Linux.