As a PhD student in PL who came from four years of industrial software development, lately I’ve been trying to figure out what working programmers should know from the study of programming languages. As part of that investigation I’ve been looking over some of the better books on programming languages to see what their take is. In the introduction to Shriram Krishnamurthi’s Programming Languages: Application and Interpretation, Shriram forgoes a description of the book’s philosophy in favor of a link to a video of the first lecture for the course. What follows is my own interpretation of the philosophy from that video, as an attempt to both put some of that material down into written, searchable form and discover my own thoughts on the matter.
(Disclaimer: this is my interpretation of what was said in the video, not necessarily what was meant, so any mistakes in the below description are my own)
Inspiration from Science
To the extent that computer science is a science, we should look to science for ways to approach our own field. One of the primary tasks of science is to come up with categories to better understand similarities and differences between objects or living things. Think of taxonomies in biology, for example, that might classify an organism as a plant, animal, or a member of some other kingdom.
We programmers often do this with programming languages at a high level. We categorize them as “functional”, “object-oriented”, “imperative”, “scripting”, and all sorts of other categories. However, this categorization breaks down upon further inspection: the categories are not well-defined, and many languages fit into more than one category (for example, Scala proponents often emphasize that it is well-suited for both object-oriented and functional programming).
Perhaps this kind of categorization would be useful at a different level. Another use of science is to study the smallest building blocks of objects, such as atoms or DNA base pairs, and determine how they combine to gain a better understanding of the larger structure. This approach is helpful in studying programming languages. By understanding the individual building blocks of the language (e.g. values, expression forms, types) and the rules for putting them together (i.e. the semantics), we gain a much deeper understanding of the language as a whole. Furthermore, at this level it can be useful to categorize the small building blocks or semantic rules in different ways, as the similarities and differences are more restricted and well-defined than for programming languages as a whole. The divide between call-by-name and call-by-value semantics is one such categorization familiar to PL researchers.
Programming Language = Core + Syntactic Sugar
The philosophy of the course is that every language has a small “essence” to it, made up of these small building blocks, and that the rest of the language is just syntactic sugar. That is, the non-core features of the language can be expressed in terms of the core features (if perhaps in a verbose manner). Finding and understanding that core provides a framework for understanding the rest of the language, and helps language designers understand the effect new features will have on their language.
The course has students write a desugaring process for the language under study, then write an interpreter for the remaining core language to better understand its semantics. The goal is to end up finding the “essence” of the language. One difficulty is that the implementer of both the desugaring process and the interpreter must also prove that their language implementation has the same semantics as the official language (whether that’s a formal specification, a reference implementation, or what have you). In other words, when one runs a program under the new implementation and gets an answer, that answer must be the same as the one predicted by the reference language.
Later on in the video, Shiram calls this a “recipe for [language] design”: start with the right core, think about the next necessary features, and add them in a way that works well with the core.
One might ask, “Why study programming languages at all?”. After all, only a very select few work on the commonly-used programming languages.
The problem is this: no one sets out to build a programming language. They start by writing something simple, such as a configuration file format. Then a colleague asks them for a way to save repeated computations, so they introduce variables. Then another colleague asks for a conditional construct, so they introduce “if”. Then another colleague asks for another feature, and so on and so on until the tiny configuration language has morphed into a full-blown programming language.
Every programmer will run into this situation at some point. In other words, every programmer will eventually end up writing a programming language, whether they wanted to or not. The key difference is that those who don’t understand programming languages will write it badly.
Or as Shiram put it, “we have a social obligation” to make sure the programming languages of the future are well-constructed, by teaching programmers the fundamentals of PL.
When considering what the take-aways are for most working developers, I think the most important part is the focus on understanding the core of a language and how it works. Even disregarding any issues around language engineering, if a programmer understands how their language works at that level, they will have a much better grasp on how all the pieces fit together and will write code that’s less prone to errors caused by weird corner cases of their language.
Certainly I think the core-first recipe for language design is the right way to go, and although I still have my doubts on how useful this is for all programmers, the discussion around configuration languages that grow out of control is well-taken. Also, the idea of a social obligation to ensure better semantics for “the next big thing” resonated with me—even if we can’t find a reliable way to become the designer of the next hot language, we can do our best to make sure the person who gets there knows what they’re doing.
One issue I noted is that it’s not always clear what the core of a given programming language is. There might be many opinions on what features make up the core, which was discussed briefly in the lecture when mentioning that there are usually multiple ways to desugar some feature. Finding the right core is probably not as important as finding a core, though, and my hypothesis is that most desugarings will give some insight into the workings of the language.
All in all, I think this is a great philosophy for those interested in knowing a bit more about programming languages to improve their own programming, and all programmers would be well-served by at least taking this approach to understanding their own language for day-to-day use.