I wrote several Forth compilers when learning to code as a teenager, one using the A86 assembler, and two fully bootstrapped-from-assembler ones for 8086 and 386. The 8086 source code is here (I didn't write the assembler myself): https://github.com/benhoyt/third (KERNEL.F has the assembler parts)
I did this for the TMS34010 graphics processor many years ago but my notes are long gone. I was inspired by Mach 2 Forth, an amazing compiler for the early Mac. My compiler had local variables and tail call detection to take advantage of a special "one-deep" stack register on the chip.
The key point is that Forth can easily incorporate an RPN assembler which you can write in Forth, and with which you can then build a better Forth. I'm not sure if I even started with the official TI assembler for the chip; I probably first wrote a cross assembler in Mach 2 by transcribing the ISA manual and bootstrapped from that.
Do you have any resources handy on how one could do that from scratch?