Day "4": SML/NJ

Nov 08, 2011 04:01

Welp, I think I jinxed it when I declared that it should be quick, since this one took about 3.5 days.


I decided that, SML being a good language I already knew, that I ought to change something up, and try to prepare for future, harder languages. So I decided to do a small-step semantics interpreter. For those not familiar with the term, the idea of small-step semantics is to define clearly what it means to perform a single "step" of computation, which you then iterate. This is as opposed to big-step semantics, wherein you just directly define how evaluation proceeds from start to finish. Small-step has some advantages; having well-defined 'steps' in the computation makes things like debugging easier, and makes it easy to analyze the process of computation, because the individual steps are simple. For example, I suspect that in a typical small-step semantics, the individual steps should all be provably-terminating, even though the evaluation function as a whole may not do so.

The simplicity and provable termination of evaluation steps is what attracted me to this idea -- because I have some languages on the list, like sed and possibly Agda, for which I'm probably going to need to use small-step semantics because they are not Turing-complete by themselves. Also because it was a challenge, and the whole point of this exercise is challenge -- I wouldn't learn anything from directly translating the Haskell version.

I also continued the tradition from the last couple versions of trying to avoid using underlying language features to implement the corresponding interpreter features. So I didn't use SML closures to implement Lisp closures (this doesn't complicate things much), and I didn't use SML mutation anywhere, including the implementation of Lisp mutation (this complicates things A LOT.) After some discussion on how to do Lisp in small steps (more on this in a minute), it occurred to me that I could even get Lisp continuations without using SML/NJ continuations if I defined my semantics right. I chose not to pursue this yet, but I may in the future.

As for semantics: There are various choices one can make, with big-step definitely being the easiest for Lisp. Small-step semantics for Lisp are tricky, in particular, because the state of the world in between steps must _completely_ capture the state of the computation. Since Lisp code and Lisp data are represented with the same structures, doing this unambiguously is very tricky. I _think_ it's not possible to use Lisp expressions for the between-steps state, because you can't tell whether a given symbol is already-evaluated (and should be preserved), or is not-yet-evaluated (and should be looked up in the symbol table to get a value.) I worked around this by marking every subexpression with either EXP or VAL; I started out with every expression marked by EXP, and marked things with VAL as I was done evaluating them.

Here's an example (in a handwavy syntax) of how small-step semantics works, with asterisks indicating an already-evaluated value. Note that, for each application, first we evaluate the function to a value, then each argument in turn; then once everything is evaluated to a value, we perform the application and substitute in the value of the result.

(+ (+ 2 3) 1)
(*+* (+ 2 3) 1)
(*+* (*+* 2 3) 1)
(*+* (*+* *2* 3) 1)
(*+* (*+* *2* *3*) 1)
(*+* *5* 1)
(*+* *5* *1*)
*6*

Note that at each level of the recursion, we have several subexpressions, and we need to step each one until it yields a value before moving on to the next one. (For example, we have to step (+ 2 3) four times before we finally get the value *5* and can move on.) The EXP and VAL wrappers are what track for us what we have already evaluated to a value (and if you imagine wrapping everything above with EXP, except starred things with VAL, you have basically the internal structure I used, called sdata.)

I discovered a couple more bugs in my version of lisp. I remembered that "cond" is supposed to have what's called an "implicit progn", which means that each condition/expression pair is supposed to take an arbitrary number of expressions, but I only take one. Pretty minor, and I haven't bothered fixing it.

I did change my implementation of scoping to be more Lisp-like; bindings made with define now go into the global scope, which is magically-dynamic, whereas all function parameter scoping (and by extension let scoping, if I had let) is lexical. This is done because it allows for things like free mutual recursion, and redefinition of existing functions in the interactive environment.

I had one very interesting bug in my implementation: I decided to use SML lists for Lisp lists, which raises some adequacy issues. In particular, because Lisp cons is not typed, Lisp allows the tail of a cons to be a nonlist, which is of course impossible in my representation. I consider this an okay tradeoff for compactness of code. But, when I made this switch, I took out CONS but left in NIL, so I had both NIL and (LIST []) representing the same thing. So all my cases had NIL in them, but code that recursively emptied lists would bottom out at (LIST []) instead. Oops.

SML's errors are really insufferable. Honestly? They're worse than C++'s. And that's terrible. The random and inconsistent expansion of type aliases in unification failure errors means I end up having to copy-paste the errors into an editor to figure out where the problem is.

Overall experience: This one really dragged, for some reason. It ended up being twice the length of the Haskell one, I guess because of the small-step semantics (I haven't compared them.) I may be less masochistic with the next one, just to get it done.
Source code: here.

Future plans: I am considering dropping the rate to one interpreter every two days, which appears to be more realistic, and is about where I am anyway at this point. Opinions welcome in the comments.
Tomorrow: Maybe Pike for real this time. Or maybe something weirder. Forth?
Previous post Next post
Up