I did this for the TMS34010 graphics processor many years ago but my notes are long gone. I was inspired by Mach 2 Forth, an amazing compiler for the early Mac. My compiler had local variables and tail call detection to take advantage of a special "one-deep" stack register on the chip.
The key point is that Forth can easily incorporate an RPN assembler which you can write in Forth, and with which you can then build a better Forth. I'm not sure if I even started with the official TI assembler for the chip; I probably first wrote a cross assembler in Mach 2 by transcribing the ISA manual and bootstrapped from that.
The key point is that Forth can easily incorporate an RPN assembler which you can write in Forth, and with which you can then build a better Forth. I'm not sure if I even started with the official TI assembler for the chip; I probably first wrote a cross assembler in Mach 2 by transcribing the ISA manual and bootstrapped from that.