灯下 · SICP · 5.5 Compilation

内容 5.5 Compilation · 449

底本文段

The explicit-control evaluator of 5.4 is a register machine whose
controller interprets Scheme programs. In this section we will see how to run
Scheme programs on a register machine whose controller is not a Scheme
interpreter.
5.4 的显式控制求值器是一台寄存器机器，其控制器用于解释 Scheme 程序。在本节中，我们将探讨如何在控制器不是 Scheme 解释器的寄存器机器上运行 Scheme 程序。
The explicit-control evaluator machine is universal—it can carry out any
computational process that can be described in Scheme. The evaluator’s
controller orchestrates the use of its data paths to perform the desired
computation. Thus, the evaluator’s data paths are universal: They are
sufficient to perform any computation we desire, given an appropriate
controller.
显式控制求值器机器是通用的——它能执行任何可以用 Scheme 描述的计算过程。求值器的控制器统筹使用其数据通路来执行所期望的计算。因此，求值器的数据通路是通用的：在给定适当控制器的情况下，它们足以执行任何我们期望的计算。
Commercial general-purpose computers are register machines organized around a
collection of registers and operations that constitute an efficient and
convenient universal set of data paths. The controller for a general-purpose
machine is an interpreter for a register-machine language like the one we have
been using. This language is called the
native language of the
machine, or simply
machine language. Programs written in machine
language are sequences of instructions that use the machine’s data paths. For
example, the explicit-control evaluator’s instruction sequence can be thought
of as a machine-language program for a general-purpose computer rather than as
the controller for a specialized interpreter machine.
商用通用计算机是围绕一组寄存器和操作组织的寄存器机器，这些寄存器和操作构成了高效且便捷的通用数据通路集合。通用机器的控制器是类似于我们一直使用的寄存器机器语言的解释器。这种语言称为机器的原生语言 (native language)，或简称机器语言 (machine language)。用机器语言编写的程序是使用机器数据通路的指令序列。例如，显式控制求值器的指令序列可以被视为通用计算机的机器语言程序，而非专用解释器机器的控制器。
There are two common strategies for bridging the gap between higher-level
languages and register-machine languages. The explicit-control evaluator
illustrates the strategy of interpretation. An interpreter written in the
native language of a machine configures the machine to execute programs written
in a language (called the
source language) that may differ from the
native language of the machine performing the evaluation. The primitive
procedures of the source language are implemented as a library of subroutines
written in the native language of the given machine. A program to be
interpreted (called the
source program) is represented as a data
structure. The interpreter traverses this data structure, analyzing the source
program. As it does so, it simulates the intended behavior of the source
program by calling appropriate primitive subroutines from the library.
弥合高级语言与寄存器机器语言之间差距有两种常见策略。显式控制求值器体现了解释 (interpretation) 策略：用机器原生语言编写的解释器将机器配置为执行以某种语言（称为源语言 (source language)）编写的程序，该语言可能与执行求值的机器的原生语言不同。源语言的基本过程被实现为用给定机器原生语言编写的子程序库。待解释的程序（称为源程序 (source program)）被表示为一种数据结构，解释器遍历该数据结构以分析源程序，并通过调用库中适当的基本子程序来模拟源程序的预期行为。

底本文段 2

In this section, we explore the alternative strategy of
compilation.
A compiler for a given source language and machine translates a source program
into an equivalent program (called the
object program) written in the
machine’s native language. The compiler that we implement in this section
translates programs written in Scheme into sequences of instructions to be
executed using the explicit-control evaluator machine’s data
paths.
在本节中，我们探讨另一种策略——编译 (compilation)。针对给定源语言和机器的编译器将源程序翻译为等价程序（称为目标程序 (object program)），目标程序用机器的原生语言编写。我们在本节实现的编译器将 Scheme 程序翻译为指令序列，以便在显式控制求值器机器的数据通路上执行。
Compared with interpretation, compilation can provide a great increase in the
efficiency of program execution, as we will explain below in the overview of
the compiler. On the other hand, an interpreter provides a more powerful
environment for interactive program development and debugging, because the
source program being executed is available at run time to be examined and
modified. In addition, because the entire library of primitives is present,
new programs can be constructed and added to the system during debugging.
与解释相比，编译能大幅提高程序执行效率，我们将在下面的编译器概述中加以解释。另一方面，解释器为交互式程序开发和调试提供了更强大的环境，因为被执行的源程序在运行时可供检查和修改。此外，由于整个基本过程库都在场，调试期间可以构造新程序并添加到系统中。
In view of the complementary advantages of compilation and interpretation,
modern program-development environments pursue a mixed strategy. Lisp
interpreters are generally organized so that interpreted procedures and
compiled procedures can call each other. This enables a programmer to compile
those parts of a program that are assumed to be debugged, thus gaining the
efficiency advantage of compilation, while retaining the interpretive mode of
execution for those parts of the program that are in the flux of interactive
development and debugging. In 5.5.7, after we have implemented
the compiler, we will show how to interface it with our interpreter to produce
an integrated interpreter-compiler development system.
Subheading: An overview of the compiler
鉴于编译与解释各有互补的优势，现代程序开发环境采用混合策略。Lisp 解释器通常组织为使解释型过程和编译型过程能够相互调用。这使程序员能够对假定已调试好的程序部分进行编译，从而获得编译带来的效率优势，同时对处于交互开发和调试中的程序部分保留解释执行模式。在 5.5.7 中，实现编译器之后，我们将展示如何将它与解释器对接，以产生一个集成的解释器——编译器开发系统。子标题：编译器概述

底本文段 3

Our compiler is much like our interpreter, both in its structure and in the
function it performs. Accordingly, the mechanisms used by the compiler for
analyzing expressions will be similar to those used by the interpreter.
Moreover, to make it easy to interface compiled and interpreted code, we will
design the compiler to generate code that obeys the same conventions of
register usage as the interpreter: The environment will be kept in the
env register, argument lists will be accumulated in argl, a
procedure to be applied will be in proc, procedures will return their
answers in val, and the location to which a procedure should return will
be kept in continue. In general, the compiler translates a source
program into an object program that performs essentially the same register
operations as would the interpreter in evaluating the same source program.
我们的编译器与解释器在结构和执行功能上都非常相似。因此，编译器用于分析表达式的机制将与解释器所用的机制类似。此外，为了便于将编译代码与解释代码对接，我们将把编译器设计为生成遵循与解释器相同寄存器使用约定的代码：环境保存在 env 寄存器中，参数表在 argl 中积累，待应用的过程在 proc 中，过程在 val 中返回其结果，过程应返回的位置保存在 continue 中。总体而言，编译器将源程序翻译为一个目标程序，该目标程序执行的寄存器操作与解释器在求值同一源程序时所执行的操作本质上相同。
This description suggests a strategy for implementing a rudimentary compiler:
We traverse the expression in the same way the interpreter does. When we
encounter a register instruction that the interpreter would perform in
evaluating the expression, we do not execute the instruction but instead
accumulate it into a sequence. The resulting sequence of instructions will be
the object code. Observe the efficiency advantage of compilation over
interpretation. Each time the interpreter evaluates an expression—for
example, (f 84 96)—it performs the work of classifying the expression
(discovering that this is a procedure application) and testing for the end of
the operand list (discovering that there are two operands). With a compiler,
the expression is analyzed only once, when the instruction sequence is
generated at compile time. The object code produced by the compiler contains
only the instructions that evaluate the operator and the two operands, assemble
the argument list, and apply the procedure (in proc) to the arguments
(in argl).
这一描述提示了一种实现简单编译器的策略：以与解释器相同的方式遍历表达式。当遇到解释器在求值该表达式时会执行的寄存器指令时，我们不执行该指令，而是将其积累到一个序列中。由此得到的指令序列即为目标代码。注意编译相对于解释的效率优势：每次解释器对一个表达式求值时——例如 (f 84 96)——它都要执行对表达式分类（发现这是一个过程应用）以及测试运算对象表是否结束（发现有两个运算对象）的工作。而使用编译器时，表达式只在编译时生成指令序列的那一次被分析。编译器产生的目标代码仅包含对运算符和两个运算对象求值、组装参数表，以及将过程（在 proc 中）应用于参数（在 argl 中）的指令。

底本文段 4

This is the same kind of optimization we implemented in the analyzing evaluator
of 4.1.7. But there are further opportunities to gain efficiency
in compiled code. As the interpreter runs, it follows a process that must be
applicable to any expression in the language. In contrast, a given segment of
compiled code is meant to execute some particular expression. This can make a
big difference, for example in the use of the stack to save registers. When
the interpreter evaluates an expression, it must be prepared for any
contingency. Before evaluating a subexpression, the interpreter saves all
registers that will be needed later, because the subexpression might require an
arbitrary evaluation. A compiler, on the other hand, can exploit the structure
of the particular expression it is processing to generate code that avoids
unnecessary stack operations.
这与我们在 4.1.7 的分析式求值器中实现的优化属于同一类。但编译代码还有进一步提升效率的机会。解释器运行时遵循的计算过程必须适用于语言中的任意表达式。相比之下，给定的一段编译代码只用于执行某一特定表达式，这在使用栈来保存寄存器方面可以产生很大差异。解释器在对一个表达式求值时，必须为任何意外情况做好准备：在对子表达式求值之前，解释器保存所有后续将要用到的寄存器，因为子表达式可能需要任意形式的求值。而编译器可以利用它所处理的特定表达式的结构来生成代码，避免不必要的栈操作。
As a case in point, consider the combination (f 84 96). Before the
interpreter evaluates the operator of the combination, it prepares for this
evaluation by saving the registers containing the operands and the environment,
whose values will be needed later. The interpreter then evaluates the operator
to obtain the result in val, restores the saved registers, and finally
moves the result from val to proc. However, in the particular
expression we are dealing with, the operator is the symbol f, whose
evaluation is accomplished by the machine operation
lookup-variable-value, which does not alter any registers. The compiler
that we implement in this section will take advantage of this fact and generate
code that evaluates the operator using the instruction
以组合式 (f 84 96) 为例。解释器在对该组合式的运算符求值之前，会通过保存包含运算对象和环境的寄存器（这些值后续将要用到）来为此求值做准备。然后解释器对运算符求值以在 val 中得到结果，恢复已保存的寄存器，最后将结果从 val 移至 proc。然而，在我们所处理的这个特定表达式中，运算符是符号 f，对它的求值由机器操作 lookup-variable-value 完成，该操作不会改变任何寄存器。我们在本节实现的编译器将利用这一事实，生成用下列指令对运算符求值的代码：
This code not only avoids the unnecessary saves and restores but also assigns
the value of the lookup directly to proc, whereas the interpreter would
obtain the result in val and then move this to proc.
这段代码不仅避免了不必要的保存和恢复操作，还将查找所得的值直接赋给 proc，而解释器则会先将结果存入 val，再将其移至 proc。

底本文段 5

A compiler can also optimize access to the environment. Having analyzed the
code, the compiler can in many cases know in which frame a particular variable
will be located and access that frame directly, rather than performing the
lookup-variable-value search. We will discuss how to implement such
variable access in 5.5.6. Until then, however, we will focus on
the kind of register and stack optimizations described above. There are many
other optimizations that can be performed by a compiler, such as coding
primitive operations “in line” instead of using a general apply
mechanism (see Exercise 5.38); but we will not emphasize these here. Our
main goal in this section is to illustrate the compilation process in a
simplified (but still interesting) context.
编译器还能优化对环境的访问。通过分析代码，编译器在许多情况下能够确定某个变量位于哪个框架中，并直接访问该框架，而无需执行 lookup-variable-value 的逐层查找。我们将在 5.5.6 节讨论如何实现这种变量访问方式。然而在此之前，我们将专注于上面描述的那类寄存器与栈的优化。编译器还能执行许多其他优化，例如将基本操作"内联"编码，而不使用通用的 apply 机制（见练习 5.38）；但这里不会对这些加以强调。本节的主要目标是在一个简化（但仍然有趣）的语境中阐明编译过程。

SICP source code scheme

Racket #lang sicp

(assign proc
 (op lookup-variable-value)
 (const f)
 (reg env))

我的笔记自动保存