fphistory

History of the Application and Evaluation of Functional Programming

What was when functional programming was not?

What was when functional programming was not?

What was not?

Functional programming is an engineering branch of constructive mathematics.
Oleg Nizhnikov.

We do not aim to define functional programming. However, we are compelled to establish practical boundaries for our study - to choose the history of what exactly we are writing about and to focus on something manageable.
Many definitions of functional programming are impractical from this perspective. The history of languages with first-class functions today is virtually synonymous with the history of programming languages. Even the few languages that still lack them - such as C++ or Rust - are historically connected in various ways to languages that do have first-class functions.
Similarly unworkable is the less vast but still unwieldy group of languages historically regarded as functional, which is often the focus of those writing about history of functional programming.
The main reason this definition of functional languages does not suit us is that this group includes Lisp. And we do not want to write a history of Lisp. Primarily because Lisp’s history is too monumental a topic to allow room for the histories of other languages in this traditional group. Moreover, there is a modern complication that did not exist when the conventional list of functional languages was formed: Lisp is now a typical example of a broad category of languages that were not functional at their inception but became so over time. Just like Java, Lisp had first-class functions added to it. Writing a history of Lisp’s functionalization would require comparing it with other languages that underwent the same process - that is, nearly all modern programming languages.
We would like to set ourselves a realistic goal: to focus on a narrower definition of functional languages and explore their history in greater depth than would be possible if we covered all languages.
To narrow down our selection, we will add several additional features to first-class functions. Parametric polymorphism and type inference (reconstruction) significantly reduce the scope of our study, though not enough. Therefore, we will also include algebraic data types and pattern matching. This last feature, finally, gives us a reasonably sized family of languages. Unfortunately, this combination of features appears rather arbitrary. Furthermore, if algebraic data types and pattern matching ever become as ubiquitous as first-class functions (we wouldn’t bet on it), the problem of an all-encompassing language history will return to haunt our successors - if we have any. This set of language features may seem arbitrary, but we did not invent the idea of grouping languages by these traits. Languages with this combination are sometimes called “ML-like.” However, there is little consensus on which languages qualify as ML-like. Not all MLs are particularly similar to other MLs, let alone non-MLs. Worse, not all MLs even have the features listed - a fact perhaps less widely known.
Let us try to identify the first language with these features and define the family around it. The good news: such a language exists-Hope. As far as we know, no one has independently reinvented this exact combination of features. Hope emerged as a “fusion” of languages, each of which had an incomplete set of the features we are interested in. Bad news: not all languages whose history we wish to write descend from Hope. Thus, our hopes for “Hope-like languages” are dashed, just as they were for “ML-like” ones - and for the same reasons.
“Fusion” is probably not the best perspective here. Relationships between languages do not neatly form convenient trees or graphs. It is more accurate to speak of a group of languages that, through their evolution, borrowed missing features from one another to complete our defining set. Hope was simply the first complete result of this process, which we will refer to as “Hopification.” ML did not undergo this process as a single language. At least three separate languages with “ML” in their names did.
At the time, interactions between languages, projects, and their creators would have been difficult if the participants were geographically distant. But they were not. The research programme whose history we are writing began in Edinburgh and the nearby town of St Andrews. Finally, we have found suitable boundaries - albeit geographic ones, which are not ideal, but they are what we have.
Nearly all languages descended from the original group of the Edinburgh research programme retained the discussed features. Certainly, the relatively popular and widely used languages that descended from them still possess these characteristics. If we step further back to examine their ancestors, such uniformity disappears.
We are writing the history of the “Edinburgh Research Programme.” But that name is a bit too long, so going forward, we will usually refer to the group of languages we have chosen to study as simply “functional languages”-just as our esteemed predecessors did when writing about the history of functional programming. In this sense, our work is no different from other historical accounts of functional programming. But in what ways is it different?

What is the purpose of this work, and how does it differ from others

The development of the theory of types that eventually led to the Standard ML type system goes back more than a hundred years.
David MacQueen, Robert Harper, and John Reppy. “The history of Standard ML.” ¹

It’s not that there’s a lack of literature on the history of functional programming. There are already general overviews of functional programming ² [^Turn12], as well as histories of specific languages or language families ³ ¹ ⁴ [^Stee96], and biographies of researchers ⁵ ⁶. So why do we need yet another one?

Cryptolambdian

When, yet again, gigabytes of memory are insufficient for compiling Haskell code, the question naturally arises: in what sense did functional programming exist in, say, 1973? Unfortunately, materials on the history of functional programming usually don’t pay much attention to this. For histories of functional programming, they often contain too little history of programming.
We don’t aim to define “programming,” but in this work, we assume it’s the process of writing programs. And our great predecessors, in their works on the history of functional programming, don’t particularly like to write about what programs resulted from this process.
Worse, programming historians often venture so far back that it’s questionable whether “programming” existed at all in those periods.
For example, MacQueen, Harper, and Reppy start the prehistory of Standard ML in 1874 ¹. Obviously, functional programming didn’t exist in 1874. It’s not so obvious whether it existed in 1974. What programs had been written in functional languages by that year? What implementations were available, and for whom? Until what year could functional programming languages be said to exist only in the same sense as in 1874, as notations in books, notebooks, on chalkboards, and so on?
For example, it’s often claimed that ML appeared in 1973, but what exactly happened that year? That year, Robin Milner had the desire to write an interactive theorem prover not in Lisp. Did Milner succeed, and in what year? What part of the interactive theorem prover was written in ML? How many lines of code did it have? Who wrote an interactive theorem prover entirely in ML, and how many years later? And who wrote an interactive theorem prover entirely using ML implementation, in turn, itself written in ML, and when? And how many lines of code were in that implementation? Our work will answer these and other questions.
There’s a vast difference - and a significant time gap - between dreaming of functional programming and actually writing a program in a functional language. We believe this perspective is useful for those who don’t think history begins and ends with an idea, and that implementing it is trivial and uninteresting.
It’s not that the question of functional language implementation usability is never raised, but our great predecessors weren’t particularly interested in it. Some surveys list implementations considered “non-toy,” “efficient” ², or “fast” [^SPJ87], but their criteria are vague, and you might get a different impression if you knew more about those implementations. We certainly did. In such lists, compilers that compiled themselves and other large projects are mentioned alongside ones that didn’t.
Therefore, we will try to establish what size programs were written using functional languages and what the performance of these programs was.
All this is mostly a consequence of our great predecessors writing primarily histories of ideas. And they didn’t do that because it’s easy - tracing the history of ideas is very hard. That’s one reason we’re not writing a history of ideas. We’re writing a history of implementations first, and only then a history of ideas.
For every idea, one can trace a predecessor, and each one pulls further into the depths of centuries. It’s ideas all the way down. That’s how a historian of functional programming ends up in 1874. A history of implementations solves that problem neatly - perhaps too neatly.
Ideas leave fewer traces, and the ones they leave are less convenient for a historian. An implementation (even a failed one) leaves behind different versions of code, notes in it, bug reports, release announcements - evidence of what worked and when.
Ideas leave behind papers, which might be published years later and “cleaned up” in ways that erase history. The related work section documents the relationships between ideas, but not their history. Influences are often declared, but the “influencing” idea might only be discovered after the “influenced” one has already been reinvented independently.
Sometimes, an author explicitly states this. For example, Xavier Leroy writes that he reinvented some ideas from Krivine’s machine without knowing about it and only added citations after other people pointed out the similarity. The creator of Chez Scheme independently rediscovered some ideas found in VAX ML, while others were directly borrowed. But not everyone feels the need to disclose this.
Attempts to trace the history of ideas are also complicated by the fact that the opinions of language authors about which languages they influenced may not coincide with the opinions of the authors of those languages about which languages influenced them.
This shouldn’t be surprising. Ideas for how to design programming languages often arise independently. Even the Hindley-Milner algorithm - a nontrivial and precisely defined concept - was independently reinvented. But no one has ever independently written two identical compilers.
It’s easier to cite a paper than to reuse compiler code. So, when we write histories of ideas, the graph ends up with too many edges.
To group implementations into families, we’ll use rarer but more reliable connections: shared code, common authors, and bootstrapping one implementation with another.
Historians of functional programming often face the same challenge we’ve outlined - how to define the scope of their study. If you’re writing a language’s history, where do you draw the line? One approach is to define a language by its syntax and semantics - trees or sequences of symbols that can be compared. This enables building trees and graphs that show how programming languages evolved and diverged. Of course, that level of analysis is very demanding. We’ll analyze not the syntax/semantics descriptions (the most suitable targets for such analysis), but the next best thing - actual implementations in functional languages.
Unfortunately, our great predecessors often chose simpler criteria. Names. If anything can prevent a historian from endlessly digging into the past, it’s names. But using names to decide where one language ends and another begins may not be a great solution.
Some authors love a name so much that they keep using it for ever-new and increasingly unrelated things. Syntaxes of many FP languages in the ’80s resembled each other more than they resembled their own earlier versions from the ’70s, despite sharing names. Fortunately, something was first called “ML” a very long time ago, and there are no obstacles to starting the history of ML from its beginning, or even long before its beginning.
Unfortunately, many authors use different names for the same thing. The first Haskell compiler was created in just a few months from a non-Haskell compiler. This clearly indicates that a great deal of work had already been done, which can justifiably be considered work towards creating Haskell. Three of the first five Haskell compilers were developed for many years before the idea to design this language even emerged. Alas, non-Haskell wasn’t called “Haskell,” so nothing could be done - in the history of Haskell ³, no more than a few lines could be devoted to this.
The authors of Haskell chose a different non-Haskell as the basis for the first Haskell Report. And this non-Haskell resembles Haskell 1.0 more than ML in ‘82 resembled ML in ‘83. But sorry - it wasn’t called “Haskell,” so our great predecessors can’t write its history.
We can.

Phanerolambdian

What is our history of implementations? Primarily, it’s a history of compilers for general-purpose machines, and to a lesser extent of interpreters and specialized hardware implementations. We chose compilers for conventional machines because they’re the most successful implementation path for functional languages: they’ve survived to this day (unlike special hardware), they’ve been used for substantial programs (unlike many interpreters and hardware), more is simply known about them; there is something to write about. Compilers are complex enough that there aren’t too many of them - unlike interpreters. So we end up with a manageable set of fairly detailed histories, rather than endless lists with terse descriptions.
Of course, we’ll touch on interpreter and hardware history where it’s needed to understand compiler history, but we don’t aim for completeness there.
Still, the history of implementations has its problems. The history of programming is not very eventful during times when functional programmers weren’t writing many non-trivial programs. What are non-trivial programs? We’ll consider such programs to be the code of a compiler for a functional language or the code of a theorem prover.
This may not seem like a high bar, but for functional programming it definitely is. Frankly, it’s not like we had much choice in setting this bar. Outside of micro-benchmark speeds and how fast a compiler written in a functional language compiles, there isn’t much information.
About some implementations, one can read that Fibonacci numbers are computed not much slower than in C, and that a compiler written in the language compiles something in an acceptable time. About others, one can read that no compiler is written in them, and performance is not that important anyway.
Since the history of programming and working implementations is heavily skewed towards the end of the history of ideas, it doesn’t fit well with the typical “progress” and “formation” structure of idea histories, where everything starts with Lisp, continues with strict functional languages, and concludes with lazy ones. Instead, we have a period where nothing worked, followed by a period where everything did - all at once.
Our history is structured by the lifecycle of implementations, or more precisely, their families. We will begin with implementations whose developers planned or even attempted to create a functional language compiler from a non-functional language compiler, or to turn a non-compiler functional system into a compiler. For a long time, such plans were either not implemented at all, or attempts were made but were unsuccessful. We will discuss this period in Part Zero of our history.
When several such attempts finally succeeded, the first part of the history began. In it, we will discuss the first compilers for functional languages that do not yet resemble those we defined above as the functional languages whose history we are writing.
In the second part, we will describe how these implementations were transformed into implementations of precisely such languages - the languages of the Edinburgh research programme, whose history we are writing. And about new implementations that were implementations of such languages from the very beginning.
In the third part, we will discuss the next period, in which these implementations were used to write the first serious programs: provers and a new generation of functional language compilers. This is when functional programming finally arrives.
The history of implementations poses a question for us, the answer to which requires considering the history of conventional machines and operating systems as well. Which, of course, we will also describe without any claim to completeness. If the idea of functional programming, as our great predecessors discovered, existed for as long as programming itself, or even longer, then why wasn’t there functional programming? When could it have already appeared? And what existed when functional programming did not?

Lambda the Ultimate Misunderstanding

Why was there no functional programming? Because there were no usable implementations and no machines suitable for running such implementations.

And when the past, in which functional programming did not exist, was replaced by a present in which functional programming exists (for now), this present was unevenly distributed.

By choosing a narrower definition of a functional language, we also limited the resources of the resulting research program.

The Edinburgh research program brought together a relatively small number of researchers, so it is quite natural to expect that they used many of the developments of other research programs—developments that, as it happened, were also useful for implementing the functional languages of the Edinburgh program.

And in order to build on a foundation already laid by researchers and implementers, to take advantage of the successes of an earlier and richer programme, there was a suitable candidate: LISP.

John McCarthy

the FORTRAN idea of writing programs algebraically was attractive.
McCarthy J. History of LISP ⁴

John McCarthy received a bachelor’s degree in mathematics from the California Institute of Technology in 1948, and in 1949 he accidentally walked into a symposium on cybernetics held at that university. There he heard a talk by von Neumann and became interested in what he would later call “artificial intelligence” [^Stoy79].

McCarthy received his PhD in mathematics from Princeton in 1951 and taught mathematics first at Princeton, then at Stanford, and finally at Dartmouth.

McCarthy was a mathematician, but when, no later than 1955, he began thinking about a programming language for artificial intelligence, he described it not as a mathematical notation like the lambda calculus with extensions. He described it by drawing parallels with natural language. This is a fairly common view of programming languages among their authors—but probably not among the authors of the languages whose history we are writing.

However, in 1955, they didn’t know how to make programming languages that looked like natural languages (whatever that meant), nor languages based on mathematical notation. But soon, some ideas appeared.

In 1956 McCarthy organized the famous Dartmouth Summer Research Project on Artificial Intelligence. At this seminar McCarthy encountered the first building block of the language of his dreams—lists. List processing existed in the language of Allen Newell and Herbert A. Simon, IPL (Information Processing Language).

But IPL itself did not make a very good impression on McCarthy; it seemed too low-level. And what language did he consider high-level enough? FORTRAN [^Stoy84].

McCarthy was impressed by the fact that in Fortran one could use expressions like arithmetic operations and nested function calls.

He believed that to get a high-level language for working with lists, one needed to add list-processing functions to Fortran. And already in 1957, McCarthy had the opportunity to add the missing piece—lists—to the cutting-edge high-level Fortran. In that year, N. Rochester, who was then a department head at one of IBM’s research centers, decided to realize Marvin Minsky’s idea: a program for proving geometric theorems.

McCarthy became a consultant on this project. IPL was originally intended to be used for the implementation, but McCarthy managed to prevent that. Instead of writing their own IPL implementation for the IBM 704 machines, McCarthy advised extending FORTRAN with list-processing facilities. IBM followed this advice, creating what later became FLPL (FORTRAN List Processing Language).

At first, it was difficult for the programmers implementing the extensions—and even for McCarthy himself—to think of list constructors as functions. Functions were things that took and returned numbers, not some sort of constructions in memory. And in general it was hard to see lists as an abstraction behind operations on addresses. As a result, procedures for constructing lists initially did not compose into nested expressions natural for functional languages.

These difficulties were overcome first not by McCarthy, but by programmers Herbert Gelernter and Carl L. Gerberich, who decided that functions could return not only mathematical objects but also “lists,” for which one could perhaps invent strict definitions and laws—but programmers managed without them. McCarthy would later return to those definitions and laws.

This seems to have been the end of these programmers’ contribution to the creation of functional programming. But in that same year, 1957, McCarthy met a programmer whose contribution was still to come.

Steven Russell studied mathematics at Dartmouth, but decided he did not particularly like mathematics. Programming interested him much more, so he abandoned mathematics and began working at the university as a programmer. But in 1957 he wrote nothing for McCarthy other than an assembler manual.

Thanks to contacts established during the Dartmouth seminar, between September 1957 and April 1958 McCarthy worked at MIT on a chess program. He wrote it in FORTRAN and decided that although the expressions available in FORTRAN were good, they were not enough. What FORTRAN lacked was a conditional expression.

There were no plans to add such an expression to FORTRAN at that time, but McCarthy was soon presented with the opportunity to work on the creation of a new high-level language from scratch.

At the end of 1957, J. Carr III formed the ACM committee on programming languages. The organizers were looking for participants, and McCarthy became one of them. The committee’s first meeting took place in January 1958, and in April of the same year a subcommittee was formed to work on an international algorithmic language. The members of this subcommittee were J. Backus, J. McCarthy, A. Perlis, and W. Turanski. During April and May of 1958 they wrote “Proposal for a Programming Language.”

McCarthy’s conditional expressions made it into this document—albeit in an unfinished and overcomplicated form. They were not expressions returning values, but producers of labels for GOTO. This, of course, made them far less attractive than modern conditionals. Why they ended up this way is understandable: in the international algorithmic language, returning anything nontrivial was not handled particularly well. But more on that later.

Conditional expressions were not McCarthy’s only proposal for the international algorithmic language, but the others were rejected already by the subcommittee.

In June, a meeting of the subcommittee with their European colleagues took place in Zurich. Conditional expressions were rejected there as well.

McCarthy recalls that the conditional expressions were too unusual and that he explained them poorly. This seems like a harbinger of many problems in the history of functional languages—because all their other components are even harder to explain.

In May 1958 McCarthy gave a guest lecture for Minsky’s course. James R. Slagle, who attended McCarthy’s lecture, took notes on how McCarthy described his dream language: “FORTRAN plus variable functions, composite functions (Church lambda), multiple functions, several valued functions of several variables, direct sum of functions, label portion of program with facility to make symbolic substitutions therein…”.

Conceptualizing the language as “FORTRAN with lambdas,” rather than “lambda calculus with practical extensions,” again looks more like the reasoning of typical programming language authors than that of the authors of the languages whose history we are writing.

In the summer of 1958 McCarthy worked at IBM on a program for symbolic differentiation. McCarthy did not consider recursion important until he wrote the differentiation program—it was obviously recursive. Working on this program strengthened his conviction that a programming language must have higher-order functions. He came up with the idea of such a function: maplist.

That same summer, in early June 1958, McCarthy also tried once more to challenge the decisions of the Zurich meeting. He proposed adding functional variables, lambdas, function composition, arithmetic operations on functions from numbers to numbers, and even standard higher-order functions for differentiation and integration to IAL (International Algorithmic Language). To no avail.

It now became obvious to McCarthy that he would not be able to add all the features needed to turn FORTRAN or IAL into a language for programming artificial intelligence. McCarthy needed to create his own language. And soon he had such an opportunity.

Not a lot of silly parentheses (for now)

Without any doubt it is hard to imagine that this program could ever run correctly. However, McCarthy himself had this feeling, too. Herbert Stoyan. Early LISP history [^Stoy84]

In the summer of 1958 McCarthy met a student from the University of Michigan who was looking for a programming job—Klim Maling. McCarthy hired Russell and Maling for his new joint project with Minsky in September 1958. As part of this project, the first LISP was designed and implemented.

What did McCarthy end up with after combining all the components of his language for programming artificial intelligence? In the very first report ⁷ of the MIT AI Laboratory, we see code with lists, conditional expressions, recursion, FORTRAN with its goto and mutability, as well as something resembling higher-order functions—but looking rather strange. The function maplist is applied like this:

maplist(cdr(J),K,diff(K))

and like this:

maplist (cdr(J),L,(L = K → diff(K), L ≠ K → copy (L)))

which means roughly:

maplist (tail j) (\l -> if l == k then diff l else copy l)

but the “lambda” is somehow passed to maplist via two parameters: the bound variable separately and the body of the lambda separately. How does that even work? You need to know which variables are bound and which are free at the point of function construction. But such oddities did not last long. In later reports, map looks like this:

maplist(L,f)=(L=0→0,1→consls(f(L),maplist(cdr(L),f)))

and is applied as:

maplist (cdr(J),λ(L,(L = K → diff(L), L ≠ K → copy (L))))

which is perfectly normal.

In McCarthy’s article “Recursive functions of symbolic expressions and their computation by machine” ⁸, Lisp looks like this:

maplist[x; f] = [null[x] → NIL;
                 T → cons[f[x]; maplist[cdr[x]; f]]]

maplist[(1,2,3); λ[[x]; x + y]]

This maplist differs from the more familiar map function in the standard libraries of today’s functional languages in that the function passed to it is applied not to the elements of the list but to its tails. But this is not a fundamental difference.

Naturally, this article made an indelible impression on future authors and implementers of functional languages.

Does this Lisp look unlike modern Lisp? It does not resemble the Lisp that actually existed at the time either. This is pseudocode for articles about Lisp—more or less resembling M-expressions, which looked like this ⁹:

MAPCON(L,F)=
    (L=0 YIELDS 0,
     1 YIELDS NCONC(F(L),MAPCON(CDR(L),F)))

M-expressions were intended to be implemented, but nobody knew how. No one in McCarthy’s group knew how to write a compiler. The example of FORTRAN was not particularly inspiring—it took 30 man-years to write its compiler, which was quite a lot for that time, and especially for McCarthy’s group. Clearly, creating your own language has disadvantages compared to adding features to existing ones.

So M-expressions remained only pseudocode—not in articles, but in comments to real code written in assembler and Lisp. And the real Lisp code—S-expressions—looked ¹⁰ much the same as they do today:

(MAPLIST (LAMBDA (X FN) (COND
      ((NULL X) NIL)
      (T (CONS (FN X) (MAPLIST (CDR X) FN))))))

But for those who wanted to use Lisp implementations to create implementations of functional languages in the narrower sense, this was not a problem. Translating code into S-expressions is even easier. The main thing was that such implementations existed—and they did.

In 1958, to gain experience with translation, programmers Russell and Maling began rewriting simple programs from proto-Lisp into IBM assembler. By early April of the following year, they had rewritten 20 functions. McCarthy wrote programs in the M-expression language, and the two programmers wrote the corresponding assembler code.

The naive Lisp interpreter was initially a specification in M-expressions, and Russell, on his own initiative, made this specification executable. Naturally, the first specification executed incorrectly, but after several iterations of writing new specifications and manually translating them, the interpreter began to work.

For writing real programs, a naive interpreter is not enough, but more serious implementations were not long in coming.

No later than April 1959, Klim Maling began writing a Lisp compiler in Lisp, but the project was abandoned. Lisp programmers considered that a compiler written in assembly was obviously more practical.

Robert Brayton and David Park, who assisted him, wrote the LISP 1 compiler in assembly. This compiler became operational in January 1960. The code it generated ran 50 times faster than the interpreter [^Stoy79].

But in 1960 Brayton left MIT, and without him the Lisp programmers could no longer understand or develop the compiler ¹¹. McCarthy briefly described this experience as “unsuccessful” ⁴.

So the Lisp programmers decided to try writing a Lisp compiler in Lisp after all. This experiment ended in success, which became clear no later than August 1962. Timothy P. Hart and Michael Levin, better known for their other work, wrote the LISP 1.5 compiler, which is considered the first compiler to compile itself ¹² ¹⁰. The Lisp programmers were able to develop this Lisp-in-Lisp compiler.

The Hart and Levin compiler produced code that ran 40 times faster than the interpreter ¹², or from 10 to 100 times faster depending on the program ¹⁰. Depending on which properties of the program? A good question.

Bootstrapping took only 5 minutes, despite the fact that most of the time most of the compiler was still being interpreted.

McCarthy writes that the speed of compiled Lisp code is 10–100 times slower than FORTRAN code on numerical tasks. But there is no comparison with list processing written in a more traditional programming language, such as FORTRAN with appropriate extensions. Such a comparison would have been more interesting.

However, a compiler is a real application, so it can be concluded that Lisp programming existed in 1962.

And if already in 1962—or even in 1960, if you were brave enough—it was possible to compile a language with first-class functions, then functional programming in the broad sense existed from the early 1960s. And for functional programming in the narrow sense, on the languages of the Edinburgh research program, it remained only to invent and implement pattern matching and parametric polymorphism. After all, Lisp is a functional language with first-class functions, right?

That’s what David Turner, an important hero of our story, thought when he began studying Lisp in 1972. But he quickly noticed that the code was not doing what he would expect from a language that has lambda [^Turn12]. Turner read the documentation ¹⁰ and realized that…

LAMBDA does not construct a function

We suspect that McCarthy simply had forgotten the functional arguments – a mistake of lasting importance. Herbert Stoyan. Early LISP history [^Stoy84]

The problem was first discovered in 1959 by James Slagle—the same one who took notes on “Fortran with lambdas.” Slagle worked with the first Lisp implementation. In his history of Lisp ⁴, McCarthy even gives pseudocode corresponding to Slagle’s code, while debugging which Lisp programmers first realized that something was not working as expected:

testr[x, p, f, u] <- if p[x] then f[x] else if atom[x] then u[] else
        testr[cdr[x], p, f, λ:testr[car[x], p, f, u]].

Yes, as was customary in the first couple of decades of Lisp’s existence, code examples were usually pseudocode. After all, you wouldn’t publish all those parentheses in a paper.

Slagle thought he had written something like this:

data Lisp = Atom String | Cons Lisp Lisp deriving (Show, Eq)

testr x p f u | p x = f x
testr (Atom _) p f u = u ()
testr (Cons h t) p f u = testr t p f $ \() -> testr h p f u

but in reality he had written something more like this:

testr x p f u | p x = f x
testr h@(Atom _) p f u = u h p f u -- <==
testr (Cons h t) p f u = testr t p f testr -- <==

But why? Slagle tried to find out from the programmers, but they did not know why. So Slagle and the programmers turned to McCarthy. But that did not help either—at least not immediately. Let us give the floor to McCarthy himself [^Stoy79]:

“…naturally he complained. And I never understood the complaint very well, namely, I said: ‘oh, there must be a bug somewhere, fix it!’ And Steve Russell and Dan Edwards said, there is a fundamental difficulty and I said, there can’t be, fix it, and eventually they fixed it and there was something they called the funarg device. He tried to explain it to me but I was distracted or something and I didn’t pay much attention so I didn’t really learn what the funarg thing did until really quite a few years later.”

What can one say? This is rather unexpected.

In his talks and articles on the history of Lisp in the 1970s, McCarthy said that he had not fully read Church’s work on the lambda calculus, and that what he had read he had not fully understood. Lisp historian Herbert Stoyan initially thought this was a joke [^Stoy79], but later reconstructed McCarthy’s path to mastering lambda [^Stoy84].

Remember the strange higher-order functions from the first report? When that report came out in the fall of 1958, McCarthy had only a vague idea of what lambda was. After unsuccessful attempts to write such a maplist in assembly, McCarthy decided to read about lambdas and entered the state he himself described as “not fully read, not fully understood.” But that was already enough for the appearance of HOFs in later reports to stop looking suspicious.

Stoyan finds it strange that McCarthy had only a vague understanding of lambda, yet months earlier spoke of dreaming of FORTRAN with lambdas and recommended adding them to the international algorithmic language. Stoyan suggests something might be wrong with the dates. But if the code from the first report was written earlier, it is still strange that no one noticed the suspicious HOFs.

Stoyan discussed this with McCarthy, and McCarthy himself was surprised, certain that only an unsuccessful attempt to implement such a maplist led him to using lambda. Why, then, did he recommend adding lambda to other languages long before that moment?

We are not inclined to treat this as a mystery and see no reason why one should not recommend that other programming language designers pay attention to lambdas, even if one only knows of their existence and does not yet properly understand them before attempting implementation.

So, the obviously incorrect lambda first became non-obviously incorrect. What was wrong with this one?

A not-so-obviously wrong lambda is a metaprogramming tool. It creates a description of function code that the interpreter can execute using the environment at the call site. The function code description has a free variable named x; the interpreter goes through a list of pairs and looks for the first key "x". And which pair will be found first when executing Slagle’s code? The one added during the function call, corresponding to one of its parameters. This is so-called “dynamic scope.”

Functional programming, however, requires lexical scope. Lambda must construct a closure over the environment at the point where the lambda expression is evaluated. The Haskell code above works because closures form a stack for traversing the tree. But the Lisp interpreter did not construct closures. Closures as an implementation technique for functional languages had not yet been invented. The interpreter was looking in the wrong list of pairs—and it did not even have the right one.

Today one might assume that a metaprogramming “lambda” makes sense in Lisp, known for its use of metaprogramming. But “lambda” as a code fragment and an object of metaprogramming, rather than as a function, was not the result of careful design. Stoyan came to this conclusion by studying the process of its emergence. McCarthy wrote an interpreter for S-expressions of proto-Lisp in M-pseudocode. Russell or another programmer tried to rewrite it in assembly, discovering errors. The process repeated. Stoyan believes that during these iterations, at some point McCarthy simply forgot about functional arguments. And one of the programmers—most likely Russell—invented the metaprogramming “lambda” when he could not find a corresponding implementation in the pseudocode specification written by McCarthy. It seems that not all components of the language for programming artificial intelligence were equally important. Conditional expressions occupied McCarthy more than first-class functions.

It turned out that Russell most likely created the problem with the non-obviously incorrect lambda. But he also fixed it in the same year, 1959—with some help from Patrick Fischer. Russell added closures to Lisp: a pair of code and the very list in which linear search by variable name finds the correct variable from the environment at the point where the closure is constructed. At the point of lambda application, this list is added to the front of the environment list at the application site. After that, the interpreter interprets the lambda code as Slagle—or anyone else familiar with lambda calculus—would expect.

Technically, the error was not fixed. The error was classified as a feature, and the fix as another feature that one could choose not to use. The default scoping did not change, and closure construction was made explicit.

It is not entirely clear why the default behavior was not changed. It was not as if enough Lisp code had been written by then to worry about backward compatibility. Perhaps performance was the issue. Perhaps the fix was not considered important. Russell could not explain either the solution or the problem even to McCarthy, according to McCarthy himself.

Fortunately, for someone translating a functional language into Lisp, this does not matter: one can explicitly specify that a closure should be constructed for a lambda. And so—is the problem solved? Alas, no.

Unfortunately, this explicit construction of closures is only one manifestation of the indifference of Lisp programmers of that time to functional programming. Another, more important issue for compiling functional languages by translation into Lisp, was that advances in Lisp compilation and performance optimization were usually not applied to these closures.

Remember the caveat that the compiler compiles different Lisp code with varying degrees of success? Lambdas are precisely such code. In the case of selected functions like map and a known lambda, an advanced compiler can inline both, solving both the scoping and performance problems. But in the case of an unknown function, as in Slagle’s example, compilation will not occur at all. The lambda code will be interpreted, and the interpreter will traverse a singly linked list of pairs in search of a string with the desired name. This is not merely an explanation of semantics—it is, for the time being, the actual implementation.

Instead of working on an implementation that would execute Slagle’s code better, Lisp programmers abandoned functions that handled errors by calling a passed-in handler, in favor of functions that reported errors by returning the empty list.

All right—perhaps Lisp programmers did not yet want to create a practical implementation of first-class functions themselves. Still, Lisp implementations could be useful to a functional language implementer. Functional language implementers can implement functions by themselves, provided there is garbage collection. And Lisp was the first language with garbage collection.

THE GARBAGE COLLECTOR HAS BEEN CALLED

Once we decided on garbage collection, its actual implementation could be postponed, because only toy examples were being done. McCarthy J. History of LISP ⁴

Early Lisp programmers did not want to manage memory manually as in IPL, and chose between reference counting and garbage collection. They chose garbage collection because they wanted a cons cell to fit into a single word. On the IBM 704, this was 36 bits. An address was 15 bits. Two groups of three bits were too small for a reference count ⁴.

This was a rather fortunate coincidence for functional programming, which often creates cyclic references. But this fortunate coincidence was probably not critical for the emergence of functional languages. For example, the authors of SIMULA eventually abandoned reference counting in favor of garbage collection because they conducted experiments and compared the performance of reference counting combined with garbage collection for collecting heap cycles with that of a compacting garbage collector. At least, they recall this ¹³. But perhaps it was still beneficial that Lisp programmers began working on garbage collection earlier.

Having chosen garbage collection over reference counting, Lisp programmers postponed its implementation because they were writing only toy programs for the time being. But they did not postpone it for long. Dan Edwards wrote a garbage collector in June–July 1959 [^Stoy84].

The first public demonstration of Lisp, in 1960 or 1961, also became the first public demonstration of garbage collection—although this was not planned. It didn’t trigger during the rehearsal, but the rehearsal took up enough memory that the garbage collector started working during the demonstration and ran for the rest of the time allotted to it, printing statistics ⁴. It is hard to say why Lisp programmers did not want to show off what was, in Turner’s opinion [^Turn12], their greatest achievement. But fortunately, everything ended well and they did show it.

In the printed statistics, the collector is already proudly called a garbage collector, but in the first Lisp paper ⁸ they are still shy about calling it that. McCarthy is too serious there, but this is compensated by other Lisp programmers in other papers.

McCarthy writes that the “reclamation” process takes “a few seconds.” This is true when memory is 8192 words. But by the time of the collector demonstration, it was already 32768 words, and the process took at least four times “a few” seconds. Not great.

The main problem [^Wilk92] with McCarthy’s collector: even if almost no live objects remain at the time of collection, the entire heap must still be traversed; the collector’s run time depends on the size of the heap.

Proponents of reference counting criticized McCarthy’s method almost from the very beginning ¹⁴ for the fact that the collector’s running time does not depend on how much memory is freed. But the fact that the running time does not depend on how much memory is freed is actually a good thing. What is bad is that the running time depends not only on the number of live objects.

It turns out that memory growth brings not only new opportunities but also new challenges. More time passes before the next collection—time that can accommodate not just a demonstration (but not along with its rehearsal), but some more interesting work. But the break in work also becomes increasingly painful.

Of course, such a view of the problem is entirely unsuitable for functional programming. Functional code allocates enough to quickly fill memory. Functional programming can benefit from memory growth only if collection time does not grow as fast as memory size.

McCarthy’s collector has another problem that is very important for functional languages. Since the work of a functional program boils down to allocation to such a significant extent, allocation must be fast. And searching for memory in a free-list is not the fastest way to allocate.

And Lisp programmers thought less about how to make garbage collection more efficient, and more about how to reduce heap allocations, reducing the load on the collector.

Abandoning garbage collection and keeping Lisp as it was was practically impossible. But if one allocates less on the heap, reuses data structures by modifying them in place, one can preserve this illusion of a solution—pushing collection into the future, where perhaps the computation will already be finished and no collection will be needed at all.

Thus, for functional programming and for use as a basis for implementing functional languages, Lisp compilers were not yet suitable.

In the summer of 1962 McCarthy left MIT and went to work at Stanford [^Stoy79]. And however indifferent he may have been to functional programming, the Lisp programmers remaining at MIT seem to have been even more indifferent. This marked the end of the first unsuccessful attempt to implement a functional language.

Core War

The next page in the history of Lisp began with the arrival of a new computer at MIT. IBM machines were replaced by DEC computers. This decision had serious consequences and requires explanation.

How did it come about that MIT began using PDP-6/10 mainframes instead of the much more popular IBM mainframes, especially considering that Lisp development began on IBM machines? One reason was a delay in the release of IBM’s new line ¹⁵. The main reason was that relations between MIT and IBM deteriorated due to a patent dispute [^Stee96] ¹⁵.

MIT claimed the invention of core memory and demanded IBM pay two cents for every bit. Meanwhile, in 1965, the production of core memory cost IBM 1–3 cents per bit, depending on its speed. IBM was so reluctant to pay that it developed several new types of memory to replace core. None of them were practical or were ready to be practical fast enough. But Jay Wright Forrester, to whom IBM demonstrated its developments, did not know this.

And MIT, mistakenly believing that IBM was about to render their patent worthless, agreed in February 1964 to a one-time payment of $13 million ($134 million in 2025). At the time this was a record patent settlement, but had MIT not been frightened by non-working inventions, it could have obtained much more ¹⁶.

Feeling cheated, MIT decided not to buy new IBM machines.

Why not IBM is clear. But why choose the DEC PDP-10? DEC was founded by MIT alumni, who had also been members of the Tech Model Railroad Club. The trains were controlled by a computer, so McCarthy and Minsky used the club to recruit programmers for the AI Laboratory.

It was there that they found Richard D. Greenblatt, the implementer of Lisp on the new DEC machine.

MIT was not yet offended by DEC’s founders. Not yet.

Another reason was probably the same as before with IBM. IBM donated an IBM 704 to MIT in 1957, and DEC donated a PDP-1 in 1960. And just as MIT later bought newer IBM machines, the AI Laboratory acquired the PDP-6, and then its newer, more reliable and more popular version, the PDP-10 ¹⁵.

Yet another reason was that the PDP-6/10, in the opinion of Lisp programmers, was well suited for implementing Lisp—because a cons cell fit into a single word, and because of instructions needed to implement some new ideas for improving memory management in Lisp.

Lisp programmers even claim that the PDP-6 designers listened to their wishes [^Stee96]. This is not so obvious. For example, the IBM 704 also fits two pointers into a single word, and no one intended to make it a “Lisp machine.” And when Lisp programmers actually designed a computer for Lisp, it had little in common with the PDP-6/10.

Lisp programmers participated in pre-release testing of the PDP-6 in 1964, and Greenblatt’s new Lisp implementation was one of the first programs to run on that computer [^Whit77]. MIT Lisp programmers received the PDP-6 in December 1964 [^Stoy79], and the PDP-10 in 1968.

Since Lisp programmers were not yet particularly interested in functional programming, they could concentrate on improving the non-functional subset, reducing the load on the garbage collector and increasing allocation speed. And in 1966 Greenblatt decided to abandon searching for variable names in a singly linked list and instead use techniques developed by implementers of another language—or rather, McCarthy’s first language.

And although McCarthy’s colleagues did not immediately accept his ideas about adding lambdas to the language, that struggle was not yet over. Its implementers had something to borrow not only for anti-functional Lisp programmers. In fact, this language was the second likely candidate to become the basis for the first practical functional language.

To be continued…

REFERENCES

[^O’Do84]: Christoph M. Hoffmann and Michael J. O’Donnell. 1984. Implementation of an interpreter for abstract equations. In Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (POPL ‘84). Association for Computing Machinery, New York, NY, USA, 111–121. doi:10.1145/800017.800522
[^O’Do87]: O’Donnell, M.J. (1987). Term-rewriting implementation of equational logic programming. In: Lescanne, P. (eds) Rewriting Techniques and Applications. RTA 1987. Lecture Notes in Computer Science, vol 256. Springer, Berlin, Heidelberg. doi:10.1007/3-540-17220-3_1
[^O’Ke83]: R. A. O’Keefe. 1983. Prolog compared with LISP? SIGPLAN Not. 18, 5 (May 1983), 46–56. doi:10.1145/948249.948255
[^Padg88]: Padget, Julian. “Three Uncommon Lisps.” In First International Workshop on Lisp Evolution and Standardization. 1988.
[^PAL360]: PAL https://www.softwarepreservation.org/projects/lang/PAL/
[^Part84]: Partsch, H. (1984). The CIP Transformation System. In: Pepper, P. (eds) Program Transformation and Programming Environments. NATO ASI Series, vol 8. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-46490-4_27
[^Pate76]: M. S. Paterson and M. N. Wegman. 1976. Linear unification. In Proceedings of the eighth annual ACM symposium on Theory of computing (STOC ‘76). Association for Computing Machinery, New York, NY, USA, 181–186. doi:10.1145/800113.803646
[^Perl78]: Alan J. Perlis. 1978. The American side of the development of ALGOL. History of programming languages. Association for Computing Machinery, New York, NY, USA, 75–91. doi:10.1145/800025.1198352
[^Pett78]: Pettorossi, A. (1978). Improving memory utilization in transforming recursive programs. In: Winkowski, J. (eds) Mathematical Foundations of Computer Science 1978. MFCS 1978. Lecture Notes in Computer Science, vol 64. Springer, Berlin, Heidelberg. doi:10.1007/3-540-08921-7_89
[^Pigo95]: Diarmuid Pigott, HOPL: an interactive Roster of Programming Languages, HOPE https://hopl.info/showlanguage.prx?exp=810
[^Plot2000]: Plotkin, Gordon D., Colin Stirling, and Mads Tofte. “A brief scientific biography of Robin Milner.” In Proof, Language, and Interaction, pp. 1-18. 2000. DOI:10.7551/mitpress/5641.003.0004
[^Plot10]: Gordon Plotkin - Robin Milner: A Craftsman of Tools for the Mind https://www.youtube.com/watch?v=Jg5VCLb2cMo
[^Popp2002]: Robin Popplestone. 2002. POP, A Broad-Spectrum Programming Language, 1967–2002. Form. Asp. Comput. 13, 3–5 (Jul 2002), 196–213. doi:10.1007/s001650200009
[^Popplestone]: Popplestone, R. J. The Early Development of POP https://www-robotics.cs.umass.edu/Popplestone/pop_development.html
[^Prat73]: Vaughan R. Pratt. 1973. Top down operator precedence. In Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages (POPL ‘73). Association for Computing Machinery, New York, NY, USA, 41–51. doi:10.1145/512927.512931
[^Prolog]: Prolog and Logic Programming Historical Sources Archive https://www.softwarepreservation.org/projects/prolog/index.html#Edinburgh
[^Prolog75]: PROLOG to DEC 10 Machine Code Compiler, Version 13 Sep 1975 https://www.softwarepreservation.org/projects/prolog/edinburgh/src/Warren-Prolog_Compiler-1975_09_13.pdf
[^Prolog81]: DEC10 Prolog version 3 https://saildart.org/[PRO,SYS]/
[^Quar86]: JOHN S. QUARTERMAN, ABRAHAM SILBERSCHATZ, and JAMES L. PETERSON. 4.2BSD and 4.3BSD as Examples of the UNIX System
[^RABBIT]: RABBIT Scheme compiler http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/impl/rabbit/rabbit.lsp
[^Rand64]: Randell, Brian. & Russell, L. J. (1964). ALGOL 60 implementation : the translation and use of ALGOL 60 programs on a computer. London ; New York : Published for the Automatic Programming Information Centre, Brighton College of Technology, England, by Academic Press
[^Rees2010]: Jonathan Rees, The T Project http://mumble.net/~jar/tproject/ [^Reev86]: Peter G. Harrison and Mike Reeve. 1986. The parallel graph reduction machine, Alice. In Proceedings of the Workshop on Graph Reduction. Springer-Verlag, Berlin, Heidelberg, 181–202.
[^Reyn69]: Reynolds, John C.. “GEDANKEN: A SIMPLE TYPELESS LANGUAGE WHICH PERMITS FUNCTIONAL DATA STRUCTURES AND COROUTINES.” (1969).
[^Reyn70]: Reynolds, John C.. “GEDANKEN—a simple typeless language based on the principle of completeness and the reference concept.” Communications of the ACM 13 (1970): 308 - 319.
[^Reyn72]: John C. Reynolds. 1972. Definitional interpreters for higher-order programming languages. In Proceedings of the ACM annual conference - Volume 2 (ACM ‘72). Association for Computing Machinery, New York, NY, USA, 717–740. doi:10.1145/800194.805852
[^Reyn74]: Reynolds, John C. “Towards a theory of type structure.” In Programming Symposium, pp. 408-425. Springer, Berlin, Heidelberg, 1974.
[^Reyn93]: Reynolds, John C. “The discoveries of continuations.” Lisp and symbolic computation 6, no. 3 (1993): 233-247. doi:10.1007/bf01019459
[^Reyn98]: Reynolds, John C. “Definitional interpreters revisited.” Higher-Order and Symbolic Computation 11, no. 4 (1998): 355-361. doi:10.1023/A:1010075320153
[^Reyn2012]: John C. Reynolds https://www.softwarepreservation.org/projects/lang/GEDANKEN
[^Rich13]: Richards, Martin. “How BCPL Evolved from CPL.” Comput. J. 56 (2013): 664-670.
[^Rich2000]: Richards, M. Christopher Strachey and the Cambridge CPL Compiler (2000). Higher-Order and Symbolic Computation, 13(1/2), 85–88. doi:10.1023/a:1010014110806
[^Rich69]: Richards, M. (1969, May). BCPL: A tool for compiler writing and system programming. In Proceedings of the May 14-16, 1969, spring joint computer conference (pp. 557-566). doi:10.1145/1476793.1476880
[^Rich74]: RICHARDS, Martin; EVANS JR, Arthur; MABEE, Robert F. The BCPL reference manual. MASSACHUSETTS INST OF TECH CAMBRIDGE PROJECT MAC, 1974.
[^Ritc93]: Ritchie, Dennis M. “The development of the C language.” ACM Sigplan Notices 28.3 (1993): 201-208.
[^Robi76]: Robinson, Lawrence, and Oliver Roubine. Special: A specification and assertion language. Menlo Park, Ca.: Stanford Research Institute, 1976.
[^Roch71]: Arnold Rochfeld. 1971. New LISP techniques for a paging environment. Commun. ACM 14, 12 (Dec. 1971), 791–795. doi:10.1145/362919.362937
[^Rosetta1]: Man or boy test in Pascal https://rosettacode.org/wiki/Man_or_boy_test#Pascal
[^Ross61]: ROSS, DOUGLAS T., and STEVEN A. COONS. INVESTIGATIONS IN COMPUTER-AIDED DESIGN. MASSACHUSETTS INST OF TECH CAMBRIDGE ELECTRONIC SYSTEMS LAB, 1961.
[^Ruti67]: Rutishauser, Heinz. “Description of ALGOL 60, volume 1, edited by Bauer, FL.” (1967).
[^Ryde82]: Rydeheard, David Eric. “Applications of category theory to programming and program specification.” (1982).
[^Ryde2002]: RYDEHEARD, D. E., & SANNELLA, D. T. (2002). A Collection of Papers and Memoirs Celebrating the Contribution of Rod Burstall to Advances in Computer Science. Formal Aspects of Computing, 13(3-5), 187–193. doi:10.1007/s001650200006
[^Salu94]: Salus PH. A quarter century of UNIX. ACM Press/Addison-Wesley Publishing Co.; 1994 Dec.
[^Same65]: K. Samelson, AB20.3.3. Functionals and functional transformations, Algol Bulletin No. 20, July 1965 https://archive.computerhistory.org/resources/text/algol/algol_bulletin/A20/P33.HTM
[^Sann82]: Sannella, Donald. “Semantics, implementation and pragmatics of Clear, a program specification language.” (1982).
[^Sann94]: Sannella, Donald and Martin Wirsing. “Specification Languages” (1994).
[^Sann14]: D. Sannella, CV https://homepages.inf.ed.ac.uk/dts/cv.html
[^Scho67]: H. Schorr and W. M. Waite. 1967. An efficient machine-independent procedure for garbage collection in various list structures. Commun. ACM 10, 8 (Aug. 1967), 501–506. doi:10.1145/363534.363554
[^Schu74]: S.A. Schuman, AB37.4.1: Toward Modular Programming in High-Level Languages, Algol Bulletin No. 37, July 1974 http://archive.computerhistory.org/resources/text/algol/algol_bulletin/A37/P41.HTM
[^Schw71]: Jacob T. Schwartz. Abstract algorithms and a set theoretic language for their expression. Computer Science Department, Courant Institute of Mathematical Sciences, New York University. Preliminary draft, first part. 1970-1971, 16+289 pages. This copy scanned from NYU Library, courtesy of Alex Kennedy-Grant. http://www.softwarepreservation.net/projects/SETL/setl/doc/Schwartz-Abstract_Algorithms-1971.pdf
[^Schw82]: Schwarz, J. (1982). Using Annotations to Make Recursion Equations Behave. IEEE Transactions on Software Engineering, SE-8(1), 21–33. doi:10.1109/tse.1982.234771
[^Somm77]: Sommerville JF. An experiment in high-level microprogramming. University of St. Andrews (United Kingdom); 1977.
[^SPJ82]: Simon L Peyton Jones. 1982. An investigation of the relative efficiencies of combinators and lambda expressions. In Proceedings of the 1982 ACM symposium on LISP and functional programming (LFP ‘82). Association for Computing Machinery, New York, NY, USA, 150–158. doi:10.1145/800068.802145
[^SPJ85]: Jones, S. L. P. (1985). Yacc in sasl — an exercise in functional programming. Software: Practice and Experience, 15(8), 807–820. doi:10.1002/spe.4380150807
[^SPJ87]: Peyton Jones, Simon L. The implementation of functional programming languages (prentice-hall international series in computer science). Prentice-Hall, Inc., 1987.
[^Scot93]: Dana S. Scott, A type-theoretical alternative to ISWIM, CUCH, OWHY, Theoretical Computer Science, Volume 121, Issues 1–2, 1993, Pages 411-440, ISSN 0304-3975, doi:10.1016/0304-3975(93)90095-B
[^Shivers]: Olin Shivers, History of T http://www.paulgraham.com/thist.html
[^Slom89]: Sloman, Aaron. “The Evolution of Poplog and Pop-11 at Sussex University.” POP-11 Comes of Age: The Advancement of an AI Programming Language (1989): 30-54.
[^Smit73]: Smith, David Canfield and Enea, Horace J. (1973) MLISP 2 http://i.stanford.edu/TR/CS-TR-73-356.html
[^Stee75]: Sussman, Gerald J. and Guy L. Steele. “An Interpreter for Extended Lambda Calculus: SCHEME,.” (1975).
[^Stee75b]: Guy L. Steele. 1975. Multiprocessing compactifying garbage collection. Commun. ACM 18, 9 (Sept. 1975), 495–508. doi:10.1145/361002.361005
[^Stee76]: Steele Jr, Guy Lewis. “LAMBDA: The ultimate declarative.” (1976).
[^Stee76b]: Steele, Guy L. and Gerald J. Sussman. “Lambda: The Ultimate Imperative.” (1976).
[^Stee77]: Guy Lewis Steele. 1977. Debunking the “expensive procedure call” myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO. In Proceedings of the 1977 annual conference (ACM ‘77). Association for Computing Machinery, New York, NY, USA, 153–162. doi:10.1145/800179.810196
[^Stee77m]: Guy Lewis Steele. 1977. Macaroni is better than spaghetti. In Proceedings of the 1977 symposium on Artificial intelligence and programming languages. Association for Computing Machinery, New York, NY, USA, 60–66. https://doi.org/10.1145/800228.806933
[^Stee77b]: Steele Jr GL. Fast Arithmetic in MacLISP. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB; 1977 Sep 1.
[^Stee78]: Steele, Jr. Guy L. “Rabbit: A Compiler for Scheme.” (1978).
[^Stee82]: Guy L. Steele. 1982. An overview of COMMON LISP. In Proceedings of the 1982 ACM symposium on LISP and functional programming (LFP ‘82). Association for Computing Machinery, New York, NY, USA, 98–107. doi:10.1145/800068.802140
[^Stee96]: Guy L. Steele and Richard P. Gabriel. 1996. The evolution of Lisp. History of programming languages—II. Association for Computing Machinery, New York, NY, USA, 233–330. doi:10.1145/234286.1057818
[^Stee98]: Sussman, Gerald Jay, and Guy L. Steele Jr. “The first report on Scheme revisited.” Higher-Order and Symbolic Computation 11, no. 4 (1998): 399-404.
[^Stoy79]: Herbert Stoyan. 1979. LISP history. Lisp Bull., 3 (December 1979), 42–53.
[^Stoy84]: Herbert Stoyan. 1984. Early LISP history (1956–1959). In Proceedings of the 1984 ACM Symposium on LISP and Functional Programming (LFP ’84), 299–310.
[^Stra61]: Strachey, C., & Wilkes, M. V. (1961). Some proposals for improving the efficiency of ALGOL 60. Communications of the ACM, 4(11), 488-491.
[^Stra66]: BARRON D. W. and STRACHEY C. Programming in Advances in programming and non-numerical computation. 1966
[^Stra66b]: CPL Elementary Programming Manual https://web.archive.org/web/20190813130006/http://www.ancientgeek.org.uk/CPL/CPL_Elementary_Programming_Manual.pdf
[^Stra67]: Strachey, Christopher S.. “Fundamental Concepts in Programming Languages.” Higher-Order and Symbolic Computation 13 (2000): 11-49. DOI:10.1023/A:1010000313106
[^Strachey]: Catalogue of the papers and correspondence of Christopher Strachey https://archives.bodleian.ox.ac.uk/repositories/2/resources/2561
[^Stro70]: H. R. Strong. 1970. Translating recursion equations into flow charts. In Proceedings of the second annual ACM symposium on Theory of computing (STOC ‘70). Association for Computing Machinery, New York, NY, USA, 184–197. doi:10.1145/800161.805164
[^Stro93]: Bjarne Stroustrup. 1993. A history of C++: 1979–1991. In The second ACM SIGPLAN conference on History of programming languages (HOPL-II). Association for Computing Machinery, New York, NY, USA, 271–297. doi:10.1145/154766.155375
[^Thom90]: Lins, Rafael Dueire, and Simon J. Thompson. “Implementing SASL using categorical multi-combinators.” Software - Practice and Experience 20, no. 11 (1990): 1137-1165.
[^TITAN]: Cambridge Atlas http://www.chilton-computing.org.uk/acl/technology/atlas/p011.htm
[^Teit74]: Teitelman, Warren. “The interlisp reference manual.” (1974). https://www.softwarepreservation.org/projects/LISP/interlisp/Interlisp-Oct_1974.pdf
[^Teit78]: Warren Teitelman with contributions by J. W. Goodwin, A. K. Hartley, D. C. Lewis, J. J. Vittal, M. D. Yonke, D. G. Bobrow, R. M. Kaplan, L. M. Masinter, and B. A. Sheil. Interlisp Reference Manual. Bolt, Beranek & Newman and Xerox Corporation, October 1978. https://www.softwarepreservation.org/projects/LISP/interlisp/Interlisp-Oct_1978.pdf
[^Teit2008]: Teitelman, W. (2008). History of Interlisp. Celebrating the 50th Anniversary of Lisp on - LISP50. doi:10.1145/1529966.1529971
[^Tenn77]: Tennent, R. D. (1977). On a new approach to representation independent data classes. Acta Informatica, 8(4), 315–324. doi:10.1007/bf00271340
[^Turn79]: Turner, D. A. (1979). A new implementation technique for applicative languages. Software: Practice and Experience, 9(1), 31–49. doi:10.1002/spe.4380090105
[^Turn79b]: Turner, D. A. (1979). Another algorithm for bracket abstraction . The Journal of Symbolic Logic, 44(02), 267–270. doi:10.2307/2273733
[^Turn81]: D. A. Turner. 1981. The semantic elegance of applicative languages. In Proceedings of the 1981 conference on Functional programming languages and computer architecture (FPCA ‘81). Association for Computing Machinery, New York, NY, USA, 85–92. doi:10.1145/800223.806766
[^Turn82]: Turner, D.A. (1982). Recursion Equations as a Programming Language. In: Darlington, John, David Turner and Peter B. Henderson. “Functional Programming and its Applications: An Advanced Course.”
[^Turn83]: Turner, D. A. “SASL language manual (revised version).” University of Kent (1983).
[^Turn12]: Turner DA. Some history of functional programming languages. In International Symposium on Trends in Functional Programming 2012 Jun 12 (pp. 1-20). Springer, Berlin, Heidelberg.
[^Turn16]: Kent Recursive Calculator https://www.cs.kent.ac.uk/people/staff/dat/krc/
[^Turn19]: David Turner. 2019 Peter Landin Semantics Seminar https://www.youtube.com/watch?v=ezFZIPuSQU8
[^Vase85]: Vasey, Philip Edgar. First-order logic applied to the description and derivation of programs. (1985).
[^Vuil73]: Jean Vuillemin. 1973. Correct and optimal implementations of recursion in a simple programming language. In Proceedings of the fifth annual ACM symposium on Theory of computing (STOC ‘73). Association for Computing Machinery, New York, NY, USA, 224–239. doi:10.1145/800125.804054
[^Vuil74]: B. Courcelle and J. Vuillemin. 1974. Semantics and axiomatics of a simple recursive language. In Proceedings of the sixth annual ACM symposium on Theory of computing (STOC ‘74). Association for Computing Machinery, New York, NY, USA, 13–26. doi:10.1145/800119.803880
[^Wads71]: Wadsworth, P. L. “Semantics and paragmatics of the lambda calculus.” PhD thesis, Oxford Univ. (1971).
[^Wads2000]: Wadsworth, C.P. Continuations Revisited. Higher-Order and Symbolic Computation 13, 131–133 (2000). doi:10.1023/a:1010074329461
[^Walt75]: Waltz, David L. “Understanding line drawings of scenes with shadows.” The psychology of computer vision (1975): 19-91. http://www1.cs.columbia.edu/~waltz/Papers/Understanding%20Line%20Drawing%20of%20Scenes%20with%20Shadows%20PH%20Winston%201975.pdf
[^Wand77]: Wand, M. Algebraic theories and tree rewriting systems. Indiana University. Computer Science Department Technical Report 66 (1977).
[^Wand80]: Wand, M. First-order identities as a defining language. Acta Informatica 14, 337–357 (1980). doi:10.1007/BF00286491
[^Warr75]: David H. D. Warren. Example 1: Quicksort. Circa 1975. Example of generated code from PROLOG to DEC 10 Machine Code Compiler. https://www.softwarepreservation.org/projects/prolog/edinburgh/src/Warren-Prolog_Compiler_Example_1.pdf
[^Warr77]: David H D Warren, Luis M. Pereira, and Fernando Pereira. 1977. Prolog - the language and its implementation compared with Lisp. In Proceedings of the 1977 symposium on Artificial intelligence and programming languages. Association for Computing Machinery, New York, NY, USA, 109–115. doi:10.1145/800228.806939
[^Warr77b]: David H D Warren. 1977. Prolog - the language and its implementation compared with Lisp. Slides https://www.softwarepreservation.org/projects/prolog/edinburgh/doc/slides-ACM1977.pdf
[^Warr78]: Warren, David HD. “Applied logic: its use and implementation as a programming tool.” (1978).
[^Warr81]: Warren, David HD. Higher-order extensions to Prolog - are they needed? Department of Artificial Intelligence, University of Edinburgh, 1981.
[^Weinreb]: Dan Weinreb on NIL http://www.paulgraham.com/weinreb.html
[^Weiz68]: Weizenbaum, Joseph. “The FUNARG Problem Explained. unpublished memorandum.” (1968).
[^Well76]: Welliver, Leon. “IDEA: a symbolic integration program.” PhD diss., University of St Andrews, 1976.
[^Whit70]: White, John L.. “An Interim LISP User’s Guide.” (1970).
[^Whit77]: White, Jon L. “Lisp: Program is Data: A historical perspective on MACLISP.” In Proceedings of the 1977 MACSYMA Users’ Conference, MIT Laboratory for Computer Science, Cambridge, Mass, pp. 181-189. 1977.
[^Whit78]: Jon L White. 1978. LISP/370: a short technical description of the implementation. SIGSAM Bull. 12, 4 (November 1978), 23–27. doi:10.1145/1088276.1088280
[^Whit79]: Jon L. White. NIL: A perspective. Proceedings of 1979 MACSYMA Users’ Conference, Washington, D.C., June 1979. https://www.softwarepreservation.org/projects/LISP/MIT/White-NIL_A_Perspective-1979.pdf
[^Whit80]: White JL. Address/memory management for a gigantic Lisp environment or, GC considered harmful. InProceedings of the 1980 ACM Conference on LISP and Functional Programming 1980 Aug 25 (pp. 119-127).
[^Wich76]: Wichmann, B.A. Ackermann’s function: A study in the efficiency of calling procedures. BIT 16, 103–110 (1976). doi:10.1007/BF01940783
[^Wijn66]: van Wijngaarden, Adriaan. Recursive Definition of Syntax and Semantics : (proceedings IFIP Working Conference on Formal Language Description Languages, Vienna 1966, P 13-24). Stichting Mathematisch Centrum. Rekenafdeling. Stichting Mathematisch Centrum, 1966.
[^Wijn69]: A. van Wijngaarden (Ed.), Mailloux, B. J., Peck, J. E. L., & Koster, C. H. A. (1969). Report on the Algorithmic Language ALGOL 68. Numerische Mathematik, 14(2), 79–218. doi:10.1007/bf02163002
[^Wijn77]: A. van Wijngaarcien, B. J. Mailloux, J. E. L. Peck, C. H. A. Kostcr, M. Sintzoff, C. H. Lindsey, L. G. L. T. Meertens, and R. G. Fisker. 1977. Revised Report on the Algorithmic Language ALGOL 68. SIGPLAN Not. 12, 5 (May 1977), 1–70. doi:10.1145/954652.1781176
[^Wilk92]: Wilkes, M. V. (1992). EDSAC 2. IEEE Annals of the History of Computing, 14(4), 49–56. doi:10.1109/85.194055
[^Wils92]: Wilson, P.R. (1992). Uniprocessor garbage collection techniques. In: Bekkers, Y., Cohen, J. (eds) Memory Management. IWMM 1992. Lecture Notes in Computer Science, vol 637. Springer, Berlin, Heidelberg. doi:10.1007/BFb0017182
[^Wirs95]: Wirsing, M. (1995). Algebraic specification languages: An overview. Lecture Notes in Computer Science, 81–115. doi:10.1007/bfb0014423
[^Wirt66]: Niklaus Wirth and C. A. R. Hoare. 1966. A contribution to the development of ALGOL. Commun. ACM 9, 6 (June 1966), 413–432. doi:10.1145/365696.365702
[^Wirt76]: Wirth, Niklaus. “MODULA: a language for modular multiprogramming.” Berichte des Instituts für Informatik 18 (1976). doi:10.3929/ethz-a-000199440
[^Wood66]: Woodward, Philip M. List Programming in Advances in programming and non-numerical computation. 1966
[^Wood72]: Woodward, Philip M.. “Practical experience with algol 68.” Software: Practice and Experience 2 (1972) doi:10.1002/spe.4380020103
[^Woze71]: J. M. Wozencraft and A. Evans. Notes on Programming Linguistics. M.I.T. Department of Electrical Engineering, February 1971
[^Wray86]: Wray, Stuart Charles. Implementation and programming techniques for functional languages. No. UCAM-CL-TR-92. University of Cambridge, Computer Laboratory, 1986. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-92.pdf
[^Wulf73]: Wulf, William Allan et al. “The design of an optimizing compiler.” (1973).
[^Zill70]: Stephen N. Zilles. An Expansion of the Data Structuring Capabilities of PAL. Report MIT-LCS-TM-015, Laboratory for Computer Science, Massachusetts Institute of Technology, October 1, 1970.
[^Zill74]: Zilles, Stephen N. “Algebraic specification of data types.” Project MAC Progress Report 11 (1974): 28-52.
[^Zill75]: Barbara Liskov and Stephen Zilles. 1975. Specification techniques for data abstractions. In Proceedings of the international conference on Reliable software. Association for Computing Machinery, New York, NY, USA, 72–87. doi:10.1145/800027.808426

MacQueen, David B., Robert Harper and John H. Reppy. “The history of Standard ML.” Proceedings of the ACM on Programming Languages 4 (2020): 1 - 100.DOI:10.1145/3386336 ↩ ↩² ↩³
Paul Hudak. 1989. Conception, evolution, and application of functional programming languages. ACM Comput. Surv. 21, 3 (Sep. 1989), 359–411. DOI:10.1145/72551.72554 ↩ ↩²
Paul Hudak, John Hughes, Simon Peyton Jones, and Philip Wadler. 2007. A history of Haskell: being lazy with class. In Proceedings of the third ACM SIGPLAN conference on History of programming languages (HOPL III). Association for Computing Machinery, New York, NY, USA, 12–1–12–55. DOI:10.1145/1238844.1238856 ↩ ↩²
McCarthy J. History of LISP. In History of programming languages 1978 Jun 1 (pp. 173-185). ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Campbell-Kelly, Martin. “Christopher Strachey, 1916-1975: A Biographical Note.” Annals of the History of Computing 7 (1985): 19-42. ↩
Luca Cardelli and the Early Evolution of ML, by David MacQueen. A paper presented at the Luca Cardelli Fest at Microsoft Research Cambridge on Sept. 8, 2014. ↩
J. McCarthy: An Algebraic Language for the Manipulation of Symbolic Expressions. MIT AI Lab., AI Memo No. 1, Cambridge, Sept. 1958. ↩
John McCarthy. 1960. Recursive functions of symbolic expressions and their computation by machine, Part I. Commun. ACM 3, 4 (April 1960), 184–195. doi:10.1145/367177.367199 ↩ ↩²
LISP system assembly listing. “FIELD TEST ASSEMBLY OF LISP 1.5 SEPTEMBER 1961”, labeled “Bonnie’s Birthday Assembly”. Science and Technology Collection, M.I.T. Museum, Cambridge, Massachusetts, catalog number 1993.053, donated by Timothy P. Hart. ↩
McCarthy, J., Abrahams, P. W., Edwards, D. J., Hart, T. P., & Levin, M. I. (1962). LISP 1.5 programmer’s manual. MIT press. ↩ ↩² ↩³ ↩⁴
Fred W. Blair. Structure of the Lisp Compiler. IBM Research, Yorktown Heights, circa 1970. https://www.softwarepreservation.org/projects/LISP/ibm/Blair-StructureOfLispCompiler.pdf ↩
T. Hart and M. Levin. The New Compiler. Memo 39, Artificial Intelligence Project, RLE and MIT Computation Center, no date (circa 1962?) http://www.bitsavers.org/pdf/mit/ai/aim/AIM-039.pdf ↩ ↩²
Kristen Nygaard and Ole-Johan Dahl. 1978. The development of the SIMULA languages. History of programming languages. Association for Computing Machinery, New York, NY, USA, 439–480. doi:10.1145/800025.1198392 ↩
George E. Collins. 1960. A method for overlapping and erasure of lists. Commun. ACM 3, 12 (Dec. 1960), 655–657. doi:10.1145/367487.367501 ↩
S. Chiou et al., “A Marriage of Convenience: The Founding of the MIT Artificial Intelligence Laboratory” ↩ ↩² ↩³
Emerson W. Pugh, Lyle R. Johnson, and John H. Palmer. IBM’s 360 and Early 370 Systems. Cambridge, Mass.: MIT Press, 1991 ↩