Announcing the Pre-Scheme Restoration

by Andrew Whatson — Thu 20 June 2024

Pre-Scheme Restoration (R7RS)

I'm thrilled to announce that the Pre-Scheme Restoration project is now underway, thanks to a generous grant from the NLnet foundation under the NGI Zero Core program. This project is primarily an exercise in software archaeology, bringing an obscure yet important compiler to a wider audience, and using it as the basis for a modern, statically-typed, low-level functional programming language. In this article I'll provide some background on Scheme and Pre-Scheme, and discuss the objectives of this project over the coming months.

What is Scheme?

Scheme is a programming language in the Lisp family, born out of the research efforts of Gerald Jay Sussman and Guy L. Steele Jr. at MIT in the mid-to-late 1970s. Their findings were published in a series of AI memos collectively known as the "Lambda Papers", which have influenced the design and implementation of programming languages for nearly 50 years. During that time, Scheme has been revised and standardized, inspired many famous books, served as a foundation for a huge body of research, and spawned many excellent implementations. Perhaps most importantly, Scheme has been used to introduce computer science principles to multiple generations of students and programmers around the world.

Outside of academia, there are a number of active Scheme communities, mostly organized around particular implementations. A fun way that Scheme and Lisp communities come together is through the regular Lisp Game Jam, which saw 48 complete entries in the Spring 2024 competition, half of which were implemented in Scheme. The team at the Spritely Institute built an impressive entry called Cirkoban, which serves as a tech demo for their work in porting the Goblins distributed programming environment to Hoot, a Scheme to WebAssembly compiler. On another front, the Guix project is a major force bringing new users to Scheme, providing an unparalleled foundation for free and reproducible computing. In 2023 they achieved a historic full-source bootstrap, and in 2024 are making excellent progress towards the same for RISC-V. For newcomers to the language, the Scheme Primer offers an accessible hands-on introduction.

What is Pre-Scheme?

One influential Scheme implementation is Scheme 48, written by Richard Kelsey and Jonathan Rees in the late 80s. Frustrated with the complexity of existing Lisp implementations at the time, they set out to write something simpler. To bootstrap the virtual machine and garbage collector, they wrote in a restricted subset of Scheme which could be easily ported between host environments. This subset was dubbed Pre-Scheme, and a few years later Richard Kelsey developed a compiler for Pre-Scheme based on the techniques described in his dissertation and earlier work on a compiler for the T Project.

As described in the Pre-Scheme paper, the language offers a unique combination of features:

  • Scheme syntax, with full support for macros, and a compatibility library to run Pre-Scheme code in a Scheme interpreter. The compiler is also implemented in Scheme, enabling both interactive development and compile-time evaluation.
  • A static type system based on Hindley/Milner type reconstruction, as used in the ML family of languages (eg. Standard ML, OCaml, Haskell). Pre-Scheme supports parametric polymorphism, and has nascent support for algebraic data types and pattern matching, which are recently gaining popularity in mainstream languages.
  • An optimizing compiler targeting C, allowing for efficient native code generation and portable low-level machine access. C remains the common interface language for operating system facilities, and compatibility at this level is essential for modern systems languages.

Due to the restrictions of static typing and the C runtime model, Pre-Scheme does not (currently) support many of Scheme's high-level features, such as garbage collection, universal tail-call optimization, heap-allocated runtime closures, first-class continuations, runtime type checks, heterogenous lists, and the full numeric tower. Even with these limitations, Pre-Scheme enables a programming style that is familiar to Scheme programmers and more expressive than writing directly in C.

Pre-Scheme remains the bootstrapping mechanism for Scheme 48, but didn't see much adoption outside of this use-case. One reason for this is that the Pre-Scheme compiler wasn't exposed as part of an installed Scheme 48 distribution, it needs to be loaded from the source tree, making it an awkward dependency for other projects. Another reason is that the language and compiler interface weren't fully documented until the mid 2000s, when Taylor Campbell wrote "The Nearly Complete Scheme48 Reference Manual" with a detailed description of Pre-Scheme tucked away in Chapter 9. In the time since, most development in Scheme has been focused on other implementations, and the innovations of Scheme 48 are mostly viewed as artifacts of history.

Pre-Scheme Restoration

Given the rise of modern low-level systems languages like Rust and Zig, increased recognition of statically-typed functional programming techniques thanks to the influence of Haskell, and the steady growth of a Scheme community interested in systems-level development driven by the Guix project, it seems clear that there is latent demand for a language like Pre-Scheme. Scheme offers an expressive and flexible language with good performance for general-purpose programming tasks, but there remain use-cases where dropping down to a lower-level language is beneficial, such as when interfacing with external libraries or writing performance-critical components. There are also applications where Scheme is not generally considered suitable, such as in writing virtual machines, garbage collectors, operating systems, and embedded systems. While there is precedent for using Scheme at this level, the use of C or other low-level languages remains common practice.

The primary objective of the Pre-Scheme Restoration project is to make Pre-Scheme available as a practical alternative to C for the wider Scheme community. This means bringing the Pre-Scheme compiler to as many Scheme implementations as possible, improving tooling so that it integrates smoothly with existing development workflows, revising the Pre-Scheme language to improve compatibility with both Scheme and C, and investing in documentation and examples so that it's easy to adopt. Due to the age and lack of exposure of the existing implementation, there are also open questions about compiler performance and correctness which will need to be addressed.

Thanks to the grant from NLnet, resources are now available to make significant progress towards meeting these objectives. A detailed plan can be found on the roadmap, but the main points are as follows:

  1. Port the Pre-Scheme compiler to a selection of R7RS-compatible Scheme implementations, replacing the Scheme 48 dependency with a portable reader & macro expander.
  2. Document the compiler architecture and develop an initial test suite to aid in further porting and regression testing.
  3. Extend the language and type system to support the set of sized numeric types commonly found in systems languages.
  4. Complete support for sum types (tagged unions), and implement syntax for algebraic data-types and pattern matching.
  5. Replace the default string type with a length-prefixed UTF-8 representation.
  6. Revise other parts of the core language to improve R7RS compatibility, and implement a standard library covering as much of R7RS and relevant SRFIs as possible.
  7. Implement a conventional command-line compiler interface, and an Emacs plugin to support interactive development.
  8. Document the language, write tutorials, and develop some example projects.

If you are interested in following this project, you can follow me or #prescheme on the fediverse, subscribe to the Atom feed or RSS feed, or join us in the #guile-steel channel on IRC. Repositories for the port, this website, and related projects can be found on Codeberg.

Acknowledgements

This project is the continuation of my earlier efforts in porting the Pre-Scheme compiler to Guile, which were inspired by Christine Lemmer-Webber's post "Guile Steel: a proposal for a systems lisp". Christine has been a driving force behind this project from the beginning, cheering on those efforts, encouraging me to present at FOSDEM, and pushing me to apply for an NLnet grant. Her role in the current Scheme renaissance cannot be overstated. I'm grateful to everyone in the Scheme community who has expressed interest and support for this project, it means a lot, and I'm looking forward to future conversations and projects together. I would also like to thank Luis Filipe for the beautiful artwork, which is made available under CC BY-SA 4.0, and can be found in the website repository.