As a member of the faculty, it is often hard to justify spending time learning new technologies. This, to me, is a rather ugly Catch-22: to learn a new technology, I must justify it either with a publication or a clear, immediate application to my classrooms. If you’re driving towards a clear, concrete deliverable, it really isn’t exploratory learning… it’s work, under pressure, with a deadline. That’s why I’m excited about a project I’m undertaking this semester with one of our seniors.
The student was looking for a 2-credit independent study (this is roughly a 1/2 course load). We brainstormed a great deal, and in the end, the student thought they’d like to dig into some more Scheme programming, so we decided a compiler might be a good way to focus our efforts. However, I didn’t expect that it would end up being a project with so much potential.
At Indiana, I saw how we can write compilers front-to-back in the nanopass style. However, several grad students in the languages group insisted the right way to write a compiler was back-to-front. What does this mean?
First, you take your assembly language, and generate an executable.
In our case, we will be working with LLVM Compiler Infrastructure project. Specifically, we’ll begin by writing programs in LLVM assembly, and using the LLVM toolchain to turn that into a native executable.
Although our input language is not C, we might imagine starting with a simple program in our language LNOT (L0) that is nothing more than the number 8. A single expression. When compiled and executed, it should return the number 8. Expressed in C, this would look something like:
1 2 3 4 5 6 7 | int main(int argc, char **argv) { int retval; /* Assign the result of our entire program to retval */ /* In this case, the entire program is the constant 8. */ retval = 8; printf("%d\n", retval); } |
The disassembled LLVM looks something like:
1 2 3 4 5 6 7 | define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %0 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 8) nounwind ; [#uses=0] ret i32 undef } declare i32 @printf(i8* nocapture, ...) nounwind |
As we are writing the compiler “back-to-front,” we now need to take one step backwards towards our input language. Because we are working in Scheme, we’ll represent our language in S-expressions (to minimize parsing effort). The first pass (or, the last pass, in front-to-back thinking) will be one that transforms an S-expression based syntax for LLVM assembly into actual LLVM assembly. Therefore, the input language to the penultimate pass might look something like:
1 2 3 4 5 6 7 8 | (define (main i32) (args [i32 argc] [(i8** nocapture) argv]) (entry (tail-call (printf (i32 (i8* ...)) ...)) (ret (i32 undef)))) (declare (printf (i32 (i8* ...) ...))) |
This elides many things, but my point is that we might take the ASM of LLVM and “lift” it into a syntax that can be easily pattern-matched in Scheme and manipulated. As we work from the back-end of the compiler to the front, our next step might be something trivial: for example, we might get rid of the entry form, because (with a very simple input language), we know we will only ever have one function, and therefore we can generate this automatically. Or, we might be more aggressive, and acknowledge that we don’t need to generate any of the wrapper code… so, we should only need to develop a language for statements that will proceed and/or live on the right-hand side of the retval assignment statement.
For me, it is exciting on a number of levels. I get to learn with a student, which happens less often than one would like. I don’t know LLVM, and am glad to be spending time getting to know it. Also, I haven’t ever written a compiler back-to-front, and evaluating this (both technically and pedagogically) has value — I’ll probably be teaching our compilers course next spring. And, I get to do some hacking, which I simply don’t get much time to do.
When we have a repository up, I’ll post a link to it, although it won’t be an “open” project, per se. We’re not doing this to engage in a new compiler authoring project so much as learn, but if you want to, you can watch us as we work.




