First report on the Pre-Scheme Restoration
by Andrew Whatson — Thu 10 October 2024It's been over 3 months since kicking off the Pre-Scheme Restoration project, so it's well and truly time for a progress update! I'm pleased to report that the bulk of the port to R7RS has been completed, with approximately 75% of the codebase successfully loading in 3 different R7RS-compatible Scheme implementations (Chibi, Sagittarius, and Guile) and 100% of the codebase running via a new R7RS compatibility layer for Scheme 48. The libraries which haven't yet been ported all directly interface with the Scheme expander front-end, and replacing that with a portable expander is the next major focus of the project. In the rest of this article I'll discuss the work that's been done to get to this point, and briefly outline the upcoming work.
Retaining compatibility with Scheme 48
One challenge with porting software is ensuring that errors aren't introduced during the porting process. The original Pre-Scheme compiler doesn't have an established test suite, so the only real test (aside from just loading the code) is to check that it continues to translate the Scheme 48 VM to identical C code. Being an end-to-end test, this requires the entire compiler to remain functioning throughout the porting process to verify that errors haven't been introduced. Any change breaking compatibility with the original platform would prevent the test from being run, and leave us in the painful situation of only being able to detect and debug errors after everything has been ported.
To avoid this situation, I've developed "The Incomplete Scheme48 R7RS
Compatibility Library" (s48-r7rs). While not a fully
compliant R7RS implementation, it's compatible enough to allow Scheme 48
to load R7RS library definitions from the filesystem, and use
cond-expand
to paper over implementation differences. This has
allowed me to keep the end-to-end test passing
throughout the porting process, and debug any errors as they're
introduced. The compatibility layer also includes an implementation of
SRFI 64, forming the basis of a portable test suite which
will continue to be expanded as the project progresses.
Using Scsh as a tooling platform
While Scheme 48 includes a lot of functionality, it lacks some conveniences for "scripting" work, so I adopted Scsh as the platform for the tooling used during the port (ps-workbench). Scsh is built on top of Scheme 48, so is capable of loading the original Pre-Scheme compiler and performing introspection, while also providing better support for dealing with the filesystem and external processes. The R7RS compatibility layer also works with Scsh, and might be useful for any future efforts porting Scsh functionality to other Scheme implementations. I don't intend for Scsh (or Scheme 48) to be a hard dependency of Pre-Scheme, so much of this tooling is temporary, but might still be of some interest.
Implementing R7RS-small for Scheme 48
Scheme 48 is a complete R5RS implementation, so already includes the majority of functionality needed for R7RS-small. Implementing the initial compatibility layer was mainly a matter of implementing a loader for R7RS library definitions, and defining the core libraries as Scheme 48 modules which re-export these existing procedures. The Scheme 48 module system was in fact an inspiration for the design of R7RS libraries; the systems are similar enough that R7RS libraries can be implemented as a syntactic layer over the procedural module interface. To aid in generating the core library definitions I implemented a parser for the Scheme Index data-set, allowing the lists of exported identifiers to be generated at an Scsh REPL.
Porting the Pre-Scheme compiler to s48-r7rs
The bulk of the Pre-Scheme compiler depends on Scheme 48's big-scheme structure, which provides an extended Scheme environment with useful functionality not covered by the standard, such as hash-tables, list-queues, and a format routine. Porting to R7RS was a matter of replacing big-scheme imports with r7rs-base (aka. (scheme base)), and adding missing dependencies as indicated by compiler diagnostics and test failures. Non-standard dependencies have been factored out as separate libraries in the (ps-compiler util) namespace where they can be implemented using whatever equivalents are available in the target Scheme implementations. For example, the (ps-compiler util queues) library exports the Scheme 48 queues interface, using (ice-9 q) on Guile and SRFI 117 on Chibi and Sagittarius. These utility libraries separate the core of the Pre-Scheme compiler from the differences of the target Scheme implementations, and provide a convenient point for test coverage.
Porting Pre-Scheme compiler macros
Scheme 48 provides an "explicit renaming" procedural macro system with
the ability to break hygiene, and the Pre-Scheme compiler makes use of
this for some of its internal macros. This presents a portability issue
because R7RS-small (as with R5RS) only standardizes hygienic
syntax-rules
macros. In practice, most R7RS implementations offer
procedural macros via either er-macro-transformer
(ie. explicit
renaming macros) or syntax-case
(standardized in R6RS). My solution
has been to re-implement all of these internal macros with
syntax-case
, and ship both versions along with SRFI 211
stubs which can be used to select the appropriate implementation for the
target Scheme. An example of this is the (ps-compiler util
enums) library, which provides a
define-enumeration
macro used internally in the compiler, and also
exposed as part of the Pre-Scheme language.
Porting Pre-Scheme library definitions
Scheme 48 structures and R7RS libraries are similar enough that we can generate the equivalent library definition for a structure using the procedural module interface. This is a matter of loading the Pre-Scheme codebase into Scsh, iterating through the loaded modules to identify the Pre-Scheme modules, using introspection to build export/import/include lists, and pretty-printing a library definition file. A subtle difference between Scheme 48 structures and R7RS libraries is that a structure is a view into a package (an environment), and multiple structures can be backed by the same package. I've simulated this architecture by making "view" libraries which simply re-export a subset of an underlying "impl" library, as can be seen with the parameters and set-parameters interfaces to the parameters-impl library, replicating the structure definitions here.
Initial target implementations
The current target implementations for this port (aside from s48-r7rs)
are Chibi Scheme, Sagittarius Scheme, and
Guile. These implementations all support the de-facto standard
filesystem layout for R7RS libraries, with directories matching
namespaces and library definitions in .sld
files, which makes them
easy to support from the same source tree. They also offer a mix of
er-macro-transformer
(Chibi & Sagittarius) and syntax-case
(Guile &
Sagittarius) macro systems, and some variety in the set of supported
SRFIs and implementation-specific libraries. I believe this selection
gives "just enough" difference to ensure that the project architecture
is flexible enough to support a variety of implementations, without
getting overwhelmed by platform-specific concerns. Support for more
implementations will be added as the project matures and a test-suite
coalesces.
Next steps: Portable expander, tests, and documentation
With the bulk of the libraries converted, my immediate focus is now on
integrating Unsyntax as the portable expander for
Pre-Scheme, replacing the current integration with Scheme 48's expander
as the front-end for the Pre-Scheme compiler. Unsyntax is particularly
interesting as it's a modern and compliant Scheme implementation which
is itself implemented in Scheme, and designed to run on top of other
Scheme implementations. It works by expanding/compiling R7RS Scheme
with syntax-case
macros to a simpler Scheme subset that can run on a
less sophisticated host implementation. This is exactly what's required
for the front-end of the Pre-Scheme compiler, which will take that
subset, translate it into the compiler's AST, perform type inference and
static type checking, and ultimately compile the resulting program to C.
Unsyntax currently only supports Chibi Scheme as a host implementation, however an initial analysis suggests that it's only reliant on a handful of SRFIs (boxes, hash-tables, comparators, and generators) and a single non-portable feature (Chibi's type-printers, known as "disclosers" in Scheme 48). This should be fairly easy to port to s48-r7rs, where I can experiment with how best to wrap up its expander for use with the Pre-Scheme compiler. There is community interest in using Unsyntax with other Scheme implementations outside of this project, so I'll be documenting my efforts and contributing upstream as appropriate.
Aside from the expander, I'll continue to expand the Pre-Scheme compiler's test coverage and begin work on documenting its internal architecture. Having a decent test-suite and documentation in-place will be essential for the later stages of this project involving modifications to the type system and extensions to the core language.
If you are interested in following this project, you can follow me or #prescheme on the fediverse, subscribe to the Atom feed or RSS feed, or join us in the #guile-steel channel on IRC. Repositories for the port, this website, and related projects can be found on Codeberg. Please feel free to get in touch via any of these channels with any questions about this project or related projects, I'll be happy to help.