Portable PDP-8 and DG Nova Cross-assembler
Intended to be a basic example of an assembler for two simple architectures, written with the aid of lex and yacc (or GNU flex/bison). Being small, it should lend itself to learning, extension and modification. The source code is released under the GPL and has been built and tested on NetBSD, Linux and OS X.
Building and Running
See the PDP-8 FAQ. If you don't own a PDP-8 (sadly, nor do I), I can highly recommend Bernhard Baehr's slick PDP-8/E Simulator for Macintosh, which apart from being attractive and usable, has the impressive virtue of running on all system software versions from 2 through OS X. Also available is Bob Supnik's outstanding simh emulator for these, and many other, machines (including software kits).
The assembler also targets the DG Nova, a related architecture with a 16-bit word length.
Why lex and yacc?
It has been said that writing an assembler using lex and yacc is "cheating". I can clearly show that it is, in fact, a very rational choice of tools.
The huge payoff of using lex and yacc: they free the assembler writer from the very tedious and error-prone activities of designing and building the lexical analyser and parser, and allows the author to concentrate on the relevant parts of the project: those that pertain to an assembler in general, and those that pertain to the specific architecture at hand.
(For simplicity I will refer to flex and GNU bison below rather than the original UNIX lex and yacc. Also note that all line counts run roughshod over comments, whitespace, etc.)
The magnitude of this payoff can be shown by a quick measurement of the source code:
It is clear that the effort is being expended where it should be, on the higher level task: nearly half on architecture-independent code (the stuff that any assembler does); and the other half on architecture specifics. The intricacies of lexing and parsing do not weigh the project down unnecessarily at only about one tenth of the overall product. The benefit to maintainability is hard to overstate.
In the case of the excellent
Because flex and bison are mature, high performance and debugged products, a resulting assembler (for any non-trivial input syntax) is far easier to maintain, more reliable, and very likely faster than one using a handbuilt lexer/parser. One important reason for this is that lexical analysers and parsers are particularly tedious machines to design, build, test and make fully correct; it's an entire field of study in itself. An assembler is a relatively simple thing.
In the Larry Wall sense, sometimes I am a "lazy" programmer who would prefer to solve a problem than reinvent a very intricate set of wheels. I often judge a toolset by how closely the size of a solution reflects the size of a problem. If the solution is somehow disproportionate to the difficulty of the problem, then likely the wrong tool is being used, or the tool is badly designed. (Examples are legion.)
In short, if I sit down to write an assembler, I should be thinking about the stuff an assembler does, and the architecture for which I am assembling. If I can leverage the drudgework put in by the geniuses who designed lex and yacc and their descendants, I will.