First report on the Pre-Scheme Restoration

by Andrew Whatson — Thu 10 October 2024

It's been over 3 months since kicking off the Pre-Scheme Restoration project, so it's well and truly time for a progress update! I'm pleased to report that the bulk of the port to R7RS has been completed, with approximately 75% of the codebase successfully loading in 3 different R7RS-compatible Scheme implementations (Chibi, Sagittarius, and Guile) and 100% of the codebase running via a new R7RS compatibility layer for Scheme 48. The libraries which haven't yet been ported all directly interface with the Scheme expander front-end, and replacing that with a portable expander is the next major focus of the project. In the rest of this article I'll discuss the work that's been done to get to this point, and briefly outline the upcoming work.

Retaining compatibility with Scheme 48

One challenge with porting software is ensuring that errors aren't introduced during the porting process. The original Pre-Scheme compiler doesn't have an established test suite, so the only real test (aside from just loading the code) is to check that it continues to translate the Scheme 48 VM to identical C code. Being an end-to-end test, this requires the entire compiler to remain functioning throughout the porting process to verify that errors haven't been introduced. Any change breaking compatibility with the original platform would prevent the test from being run, and leave us in the painful situation of only being able to detect and debug errors after everything has been ported.

To avoid this situation, I've developed "The Incomplete Scheme48 R7RS Compatibility Library" (s48-r7rs). While not a fully compliant R7RS implementation, it's compatible enough to allow Scheme 48 to load R7RS library definitions from the filesystem, and use cond-expand to paper over implementation differences. This has allowed me to keep the end-to-end test passing throughout the porting process, and debug any errors as they're introduced. The compatibility layer also includes an implementation of SRFI 64, forming the basis of a portable test suite which will continue to be expanded as the project progresses.

Using Scsh as a tooling platform

While Scheme 48 includes a lot of functionality, it lacks some conveniences for "scripting" work, so I adopted Scsh as the platform for the tooling used during the port (ps-workbench). Scsh is built on top of Scheme 48, so is capable of loading the original Pre-Scheme compiler and performing introspection, while also providing better support for dealing with the filesystem and external processes. The R7RS compatibility layer also works with Scsh, and might be useful for any future efforts porting Scsh functionality to other Scheme implementations. I don't intend for Scsh (or Scheme 48) to be a hard dependency of Pre-Scheme, so much of this tooling is temporary, but might still be of some interest.

Implementing R7RS-small for Scheme 48

Scheme 48 is a complete R5RS implementation, so already includes the majority of functionality needed for R7RS-small. Implementing the initial compatibility layer was mainly a matter of implementing a loader for R7RS library definitions, and defining the core libraries as Scheme 48 modules which re-export these existing procedures. The Scheme 48 module system was in fact an inspiration for the design of R7RS libraries; the systems are similar enough that R7RS libraries can be implemented as a syntactic layer over the procedural module interface. To aid in generating the core library definitions I implemented a parser for the Scheme Index data-set, allowing the lists of exported identifiers to be generated at an Scsh REPL.

Porting the Pre-Scheme compiler to s48-r7rs

The bulk of the Pre-Scheme compiler depends on Scheme 48's big-scheme structure, which provides an extended Scheme environment with useful functionality not covered by the standard, such as hash-tables, list-queues, and a format routine. Porting to R7RS was a matter of replacing big-scheme imports with r7rs-base (aka. (scheme base)), and adding missing dependencies as indicated by compiler diagnostics and test failures. Non-standard dependencies have been factored out as separate libraries in the (ps-compiler util) namespace where they can be implemented using whatever equivalents are available in the target Scheme implementations. For example, the (ps-compiler util queues) library exports the Scheme 48 queues interface, using (ice-9 q) on Guile and SRFI 117 on Chibi and Sagittarius. These utility libraries separate the core of the Pre-Scheme compiler from the differences of the target Scheme implementations, and provide a convenient point for test coverage.

Porting Pre-Scheme compiler macros

Scheme 48 provides an "explicit renaming" procedural macro system with the ability to break hygiene, and the Pre-Scheme compiler makes use of this for some of its internal macros. This presents a portability issue because R7RS-small (as with R5RS) only standardizes hygienic syntax-rules macros. In practice, most R7RS implementations offer procedural macros via either er-macro-transformer (ie. explicit renaming macros) or syntax-case (standardized in R6RS). My solution has been to re-implement all of these internal macros with syntax-case, and ship both versions along with SRFI 211 stubs which can be used to select the appropriate implementation for the target Scheme. An example of this is the (ps-compiler util enums) library, which provides a define-enumeration macro used internally in the compiler, and also exposed as part of the Pre-Scheme language.

Porting Pre-Scheme library definitions

Scheme 48 structures and R7RS libraries are similar enough that we can generate the equivalent library definition for a structure using the procedural module interface. This is a matter of loading the Pre-Scheme codebase into Scsh, iterating through the loaded modules to identify the Pre-Scheme modules, using introspection to build export/import/include lists, and pretty-printing a library definition file. A subtle difference between Scheme 48 structures and R7RS libraries is that a structure is a view into a package (an environment), and multiple structures can be backed by the same package. I've simulated this architecture by making "view" libraries which simply re-export a subset of an underlying "impl" library, as can be seen with the parameters and set-parameters interfaces to the parameters-impl library, replicating the structure definitions here.

Initial target implementations

The current target implementations for this port (aside from s48-r7rs) are Chibi Scheme, Sagittarius Scheme, and Guile. These implementations all support the de-facto standard filesystem layout for R7RS libraries, with directories matching namespaces and library definitions in .sld files, which makes them easy to support from the same source tree. They also offer a mix of er-macro-transformer (Chibi & Sagittarius) and syntax-case (Guile & Sagittarius) macro systems, and some variety in the set of supported SRFIs and implementation-specific libraries. I believe this selection gives "just enough" difference to ensure that the project architecture is flexible enough to support a variety of implementations, without getting overwhelmed by platform-specific concerns. Support for more implementations will be added as the project matures and a test-suite coalesces.

Next steps: Portable expander, tests, and documentation

With the bulk of the libraries converted, my immediate focus is now on integrating Unsyntax as the portable expander for Pre-Scheme, replacing the current integration with Scheme 48's expander as the front-end for the Pre-Scheme compiler. Unsyntax is particularly interesting as it's a modern and compliant Scheme implementation which is itself implemented in Scheme, and designed to run on top of other Scheme implementations. It works by expanding/compiling R7RS Scheme with syntax-case macros to a simpler Scheme subset that can run on a less sophisticated host implementation. This is exactly what's required for the front-end of the Pre-Scheme compiler, which will take that subset, translate it into the compiler's AST, perform type inference and static type checking, and ultimately compile the resulting program to C.

Unsyntax currently only supports Chibi Scheme as a host implementation, however an initial analysis suggests that it's only reliant on a handful of SRFIs (boxes, hash-tables, comparators, and generators) and a single non-portable feature (Chibi's type-printers, known as "disclosers" in Scheme 48). This should be fairly easy to port to s48-r7rs, where I can experiment with how best to wrap up its expander for use with the Pre-Scheme compiler. There is community interest in using Unsyntax with other Scheme implementations outside of this project, so I'll be documenting my efforts and contributing upstream as appropriate.

Aside from the expander, I'll continue to expand the Pre-Scheme compiler's test coverage and begin work on documenting its internal architecture. Having a decent test-suite and documentation in-place will be essential for the later stages of this project involving modifications to the type system and extensions to the core language.

If you are interested in following this project, you can follow me or #prescheme on the fediverse, subscribe to the Atom feed or RSS feed, or join us in the #guile-steel channel on IRC. Repositories for the port, this website, and related projects can be found on Codeberg. Please feel free to get in touch via any of these channels with any questions about this project or related projects, I'll be happy to help.