Writing My Own Assembler

Just completed a little project. Well, little by the standard of some development projects, but big enough to consume something like six months of my time.

I’ve just finished writing an assembler in Javascript, that runs in the browser.

Backstory as follows: Zilog, way back in the 1970s, developed a CPU chip called the Z80, which became immensely popular for several reasons. These being that:

[1] It was equipped with what was considered to be at the time, a beast of an instruction set;

[2] It was relatively affordable compared to some competitors;

[3] It was fully object code compatible with an operating system called CP/M, which had been written for the earlier Intel 8080;

[4] It could be integrated into a wide range of systems with relative ease.

The Z80 thus found its way into a number of popular 8-bit home computers in the 1980s - UK computer enthusiasts from the time will remember, for example, the Sinclair Spectrum and the Amstrad CPC series, while a number of Japanese companies brought out the various MSX computers based on the CPU. The chip also becampe popular in the world of embedded systems, and there are still quite a few legacy embedded systems running Z80 code today.

We now move forward to 2024. Zilog has decided to pension off the Z80 after a long life - the chip has been in circulation for no less than 46 years. Tens of millions of lines of code have been written for it over that period.

So, is the Z80 now dead?

Not exactly.

Zilog knew that there was a huge base of legacy code out there that still needed support. So, before retiring the venerable Z80 in its original form, they decided to keep the architecture alive by bringing out a modern version - the eZ80.

Not it so happens that the eZ80 is very definitely aimed more at the embedded systems market than at any home computing market, which in any case is pretty much monopolised by the x86 architecture and its 64-bit extension. Likewise, the ARM series of CPUs have pretty much sewn up the mobile phone sector. As a consequence, there’s a range of eZ80 CPU packages, which not only include a beefed up version of the original Z80 CPU, but various add-ons of the sort that the embedded market clamours for, such as dedicated high speed I/O devices on chip, timers, and various other useful bits and bobs.

So, I thought to myself, that since I’ve done some Z80 coding in the past, and Zilog have now brought out this new shiny eZ80 version, I’d have fun writing an assembler for it.

“Fun” turned out to be, well, an epithet with a chequered degree of application to this project. One hurdle that took time overcoming being the development of an expression evaluator, a necessary step in order to make the assembler genuinely useful, and handle operands consisting of mathematical expressions instead of simple numeric values or register names. JavaScript actually has a built-in function for this purpose, namely eval(), but use of this is very strongly discouraged in development circles, because it’s a massive security hole if used in a project - it can be hijacked by malware with almost embarrassing ease.

So, building an expression evaluator that wasn’t a massive security hole was a priority. That chewed up over a month on its own.

Then, came the fun of writing the assembler proper, making it not only compatible with legacy Z80 code, but making it compatible with the new, shiny eZ80 and its extensions. Which include, for those familiar with the old Z80, extending the address space to 16 megabytes, extending the register set to include optional 24-bit register sets, adding new instructions, and allowing the CPU to switch back and forth between legacy Z80 mode and new, shiny eZ80 mode at will.

Zilog, bless their little cotton socks, provide a full manual for the instruction set, allowing anyone wading through it to write their own assembler, But, er, the manual is a bit on the large side. It’s also terse and dense, written with seasoned system developers in mind. Not for the faint hearted.

But, after various struggles, the final project works. Nearly 20,000 lines of JavaScript code, if you include all the custom support libraries I wrote for other projects, which were also useful here, but the BIG file is the actual assembler itself - a whopping 16,248 lines. Debugging this has taken some time, as you can imagine. But now, it’s finished!

Oh, if you want to download the shiny new eZ80 manual from Zilog, you can find it as a downloadable PDF file here. All 411 pages of it. Then you can have fun imagining the hilarity I was involved in, wading through this to build my assembler.

More on this topic to follow after I’ve taken a break!

3 Likes

Recursive descent parsers are a simple way to deal with such expressions, and they should be quite fast to hack together.

Actually, that kind of parsers are quite fun. As an undergrad, I took on the self-imposed challenge to program a compiler over summer vacation. It compiled a subset of Basic into assembly code for the Motorola 68000 processor, and I let the system assembler take it from there, to make a working executable binary. Due to time constraints, I only implemented integer arithmetic, leaving out floating point. I also left out an optimiser stage, as that would have made stuff much more complicated. I based the design on a basic skeleton of a recursive descent parser that I stitched together from selected chapters in the Dragon book. The resulting compiler worked, but the generated code was not very efficient and the supported language was quite primitive. But it worked, it was mine, and I wrote it. Yay.

Circa 1980 I wrote an assembler for the Z80. In FORTRAN.

1 Like

GADS ! I’m impressed. I did a lot of Z80 assembly programming and wrote a BASIC interpreter for it back in the late 70’s. the Z80 was SO much better than the 8080 series, but it largely fell by the wayside when Gates wrote DOS for the 8086 series. CPM was also way better than DOS, but when IBM promoted the 8086 and DOS, it was pretty much all over for CPM and the Z80.

Bill Gates did not write MS-DOS. It was written by Tim Paterson at Seattle Computer Products and bought by Microsoft (for $25,000).

In what ways was CP/M better than MS-DOS? MS-DOS was a close clone of CP/M so I fail to see how CP/M was “way better” than MS-DOS.

Of course you are right. Gates was smart enough to sell IBM on it.

I have forgotten WHY I am left with the impression that CPM was better than DOS. Perhaps I just had a preference for it. It may just have been an impression left by the CPM/Z80 as opposed to the DOS/8086. I haven’t done any of that for many decades and I’m old and my memory is not what it used to be**. That’s my story and I’m sticking with it

**To be fair, my memory never was what it used to be.

1 Like

And wasn’t it originally called Dr. DOS?

No, it wasn’t. DR DOS was Digital Research’s (hence the “DR” in its name) version of DOS that they released in the late-80s. It was a rename of CP/M-86.

When the IBM PC first came out in 1981, you could get PC-DOS or CP/M-86 for it, but PC-DOS (IBM’s branded version of MS-DOS) cost only about 1/4 what Digital Research charged for CP/M-86. As you can guess, customers opted for the cheaper PC-DOS in overwhelming numbers, and the rest is history.

2 Likes

Cool beans. I started my coding career in 1983 on the TRS-80 Model 4, which ran a Z-80 at 4 mHz. As such I had to write some Z-80 assembler (printer drivers and the like).

More recently I fired up a Model 4 emulator complete with LS-DOS (usually marketed as TRS-DOS – but a pretty forward looking OS for its time). I thought it might be fun to write some stuff for it, knowing what I know today, but found the compilers to be pretty buggy. Serious work would require the C compiler, which I didn’t have the patience to set up. But it was a nice walk down memory lane. As a bonus, the emulator runs at the equivalent of about 50 mHz, such is the speed of modern hardware.

I’m sure you mean 50 MHz as mHz is millihertz, which is very slow. :laughing:

Sure, but 20x faster than the original.

In those days the OS key read loop interrupt was such that it ran faster along with the effective processor clock speed, so in the emulator the cursor was blinking so fast it was just sort of stuttering. I suspect the emulator could have run faster but there would be key repeats often enough to make typing impractical.

It was a curious mix of crudities like that, and Unix-style device independence and redirection.

Max file size was 16 megabytes. The largest hard drive it ever supported was 40 megabytes. Yet it actually supported 1.5 megabytes of (bank-switched) RAM; although no physical hardware was ever built with more than 128K, the emulator supports drivers for 3rd party boards that were sold back in the day, so you could write apps that could address all the RAM, without paying the high price per chip of the 1980s.

You could also, with a bit of work, boot CP/M or CP/M+ on the emulator.

Yes, 50 MHz is 12.5x faster than the original, but 50 mHz is 12.5 billion times slower than the original. Sorry to be pedantic.

NP, It’s what we do. We’re developers.

I write line-of-business software for a living, so MHz vs mHz is just not a thing I ever needed to notice. The 8 bit platform was as close to the metal as I ever got, and truthfully back in those days I was using things like BASCOM for the most part. My first app I was paid to write was a program that took text from the user and generated code that a typesetting machine understood, to generate yellow pages ads. I’d probably faint if I saw the code after all these years.

I’m in embedded firmware, so the difference between MHz and mHz is very important to me. I live down among the bits and bytes and use C and assembly.

1 Like

I’m in fintech, currently building a commercial credit DB (headless API, mostly C#, .NET + Sql Server) for a client. I leave the bits and bytes in your capable hands for the most part. If I ever manage to retire my dream is to build a relational DB for fun so that I can learn a little more bit-slinging myself.

With a background in physics, mHz vs MHz is a huge difference. For the uniitiated, the difference is a factor of nine orders of magnitude, the difference between variations on the scale of minutes vs. variations on the scale of micro and nanoseconds. Or in physical size (volume) the size difference between a Matchbox car and an aircraft carrier.

I cut my teeth on Basic and assembly, moving on to C (a.k.a. high-level assembler :wink: ). Today, I mostly use Python. Doing simulations, numerical calculations, proof-of concept stuff, etc., a lot of my code is once-off, so I need something that is high-level, fast to code with, yet can deliver tolerably high speed for numerical stuff. Thus Python.