Roadmap
Thanks to an NGI Zero grant facilitated by the NLnet Foundation, the Pre-Scheme Restoration project is now underway! A high-level overview of the project is available in the announcement post, and the latest progress is detailed in the first progress report.
Portable Compiler
A major objective of this modern Pre-Scheme implementation is to have the compiler run on a wide variety of Scheme implementations. To achieve this, the codebase will be updated to target R7RS (the most recent Scheme standard).
The main portability challenge is the Pre-Scheme compiler's dependence on the Scheme 48 reader & macro expander, which is tightly integrated with the Scheme 48 virtual machine. Thankfully, a portable expander targeting R7RS is available in the Unsyntax project. This will be adapted as a new front-end for the Pre-Scheme compiler, bringing improvements such as support for syntax-case macros.
During this initial port, the compiler architecture will be documented, and an initial test suite will be written to serve as a baseline for subsequent changes to the language and compiler.
Language Modernization
The original Pre-Scheme dialect offers a minimal set of functionality which was sufficient for bootstrapping the Scheme 48 virtual machine, but is lacking some features expected of modern low-level programming languages.
Sized Numeric Types: Pre-Scheme currently only implements "long" sized fixnums and "float" sized flonums. This will be extended to cover the full set of standard numeric types offered by other modern low-level languages (ie. 8/16/32/64-bit integers and 32/64-bit floating-point numbers), improving the ability to write efficient numerical code and directly interface with foreign functions and data structures.
Polymorphic Arithmetic: Pre-Scheme currently uses separate procedures for signed integer, unsigned integer, and floating-point arithmetic. Continuing this approach for sized numeric types would lead to a significant increase in the number of arithmetic procedures. The solution is to introduce generic arithmetic operators as found in most programming languages including both Scheme and C. Pre-Scheme already supports polymorphic primitive operations (eg. deallocate), but some care will be needed to design reasonable conversion semantics.
Algebraic Data Types: Pre-Scheme supports record types (ie. product types), and has nascent support for C-style tagged unions (ie. sum types), but that functionality was never completed. Finishing this feature will enable full support for Algebraic Data Types (ADTs), including data-type declaration and destructuring syntax. ADTs and pattern matching are becoming more widely adopted in mainstream programming languages, and this feature will help to support modern functional programming practices in Pre-Scheme.
UTF-8 Strings: Pre-Scheme offers C-style null-terminated byte strings, but the latest Scheme standards expect UTF-8 support, and this seems to be a consensus among modern languages. C-style strings are still useful for interfacing with legacy APIs, but in Scheme this is best handled by the bytevector type. Pre-Scheme will adopt length-prefixed, null-terminated UTF-8 strings as the default string representation, and provide a library of string routines covering as much of R7RS and SRFI-152 as possible.
Bytevectors: R6RS standardized support for bytevectors, and R7RS standardizes a subset of that functionality (with some minor incompatibilities). Bytevectors are analogous to C character/byte arrays, and will be implemented for Pre-Scheme along with a library of routines covering R7RS and as much of R6RS as possible.
Ports: Pre-Scheme offers minimal support for input and output ports, this will be extended to cover as much of R7RS as possible. Full support for a polymorphic port interface with string, bytevector, and SRFI-181 custom ports would be ideal, but further investigation will be needed to determine the best implementation approach.
R7RS Compatibility: All other R7RS procedures which can be reasonably implemented in Pre-Scheme (ie. don't require intermediate allocation, lists, or vectors) will be implemented. A more detailed compatibility analysis will be published as the project progresses.
Tooling Improvements
The original Pre-Scheme compiler includes a minimal Scheme interface for invoking the compiler, and little in the way of user documentation. Better documentation now exists, but more attention is needed to meet the needs of the present-day developer audience.
Command-line Interface: Scheme implementations which compile to C (eg. Chicken, Bigloo, Gambit) usually provide ergonomic command-line interfaces for compilation and linking, simplifying integration with build systems like GNU autotools and CMake. A command-line interface which respects established conventions will be developed for Pre-Scheme.
Editor Integration: The Pre-Scheme compiler runs in a Scheme interpreter, which provides opportunities for interactive development workflows. The Scheme interface to the compiler will be extended to better support interactive development, and an Emacs plugin will be developed as an initial example of editor integration.
Documentation and Examples: Good documentation and examples are essential for any programming language, both in on-boarding new users and supporting established developers. The Pre-Scheme language and tooling will be documented thoroughly, and introductory material and example projects will be developed to help newcomers get up-to-speed quickly.
Future Work
These improvements will enable Pre-Scheme to fulfill its objective of providing a practical alternative to C for Scheme programmers, but there are many interesting possibilities for the language and compiler beyond this point. A few of these possibilities are:
- Re-purposing the compiler for other languages, as demonstrated with the TTCN-3 compiler and Scheme 48 bytecode optimizer.
- Re-purposing the type reconstruction pass as a static analysis tool for Scheme code.
- Developing new backends for the Pre-Scheme language, such as LLVM or WebAssembly.
- Extending the language to support advanced features like user-defined effects, ownership analysis, optional automatic memory management, etc.