灯下 · SICP · 5.5.1 Structure of the Compiler

内容 5.5 Compilation · 450

底本文段

In 4.1.7 we modified our original metacircular interpreter to
separate analysis from execution. We analyzed each expression to produce an
execution procedure that took an environment as argument and performed the
required operations. In our compiler, we will do essentially the same
analysis. Instead of producing execution procedures, however, we will generate
sequences of instructions to be run by our register machine.
在 4.1.7 节中，我们修改了原来的元循环解释器，将分析与执行分离。我们分析每个表达式，以产生一个以环境为参数并执行所需操作的执行过程。在我们的编译器中，我们将做本质上相同的分析。然而，我们不是产生执行过程，而是生成供寄存器机器运行的指令序列。
The procedure compile is the top-level dispatch in the compiler. It
corresponds to the eval procedure of 4.1.1, the
analyze procedure of 4.1.7, and the eval-dispatch
entry point of the explicit-control-evaluator in 5.4.1. The
compiler, like the interpreters, uses the expression-syntax procedures defined
in 4.1.2. Compile performs a
case analysis on the syntactic type of the expression to be compiled. For each
type of expression, it dispatches to a specialized
code generator:
过程 compile 是编译器中的顶层分派。它对应于 4.1.1 节的 eval 过程、4.1.7 节的 analyze 过程，以及 5.4.1 节显式控制求值器中的 eval-dispatch 入口点。与解释器一样，编译器使用 4.1.2 节中定义的表达式语法过程。compile 对待编译表达式的语法类型进行分情况分析，并对每种类型的表达式分派给专门的代码生成器：
Subheading: Targets and linkages
副标题：目标与链接
Compile and the code generators that it calls take two arguments in
addition to the expression to compile. There is a
target, which
specifies the register in which the compiled code is to return the value of the
expression. There is also a
linkage descriptor, which describes how
the code resulting from the compilation of the expression should proceed when
it has finished its execution. The linkage descriptor can require that the
code do one of the following three things:
compile 及其所调用的代码生成器除待编译的表达式外，还接受两个参数。其一是目标 (target)，它指定编译后的代码将表达式的值返回到哪个寄存器中。其二是链接描述符 (linkage descriptor)，它描述了由该表达式编译所产生的代码在完成执行后应如何继续。链接描述符可以要求代码执行以下三件事之一：

底本文段 2

continue at the next instruction in sequence (this is specified by the linkage
descriptor next),
继续执行序列中的下一条指令（由链接描述符 next 指定），
return from the procedure being compiled (this is specified by the linkage
descriptor return), or
从正在编译的过程中返回（由链接描述符 return 指定），或者
jump to a named entry point (this is specified by using the designated label as
the linkage descriptor).
跳转到一个具名入口点（由使用指定标号作为链接描述符来指定）。
For example, compiling the expression 5 (which is self-evaluating) with
a target of the val register and a linkage of next should produce
the instruction
例如，以 val 寄存器为目标、以 next 为链接描述符来编译自求值表达式 5，应产生如下指令：
Compiling the same expression with a linkage of return should produce
the instructions
以 return 为链接描述符来编译同一表达式，应产生如下指令：
In the first case, execution will continue with the next instruction in the
sequence. In the second case, we will return from a procedure call. In both
cases, the value of the expression will be placed into the target val
register.
Subheading: Instruction sequences and stack usage
在第一种情况下，执行将继续到序列中的下一条指令。在第二种情况下，我们将从一个过程调用中返回。在两种情况下，表达式的值都将被放入目标寄存器 val 中。
副标题：指令序列与栈的使用

底本文段 3

Each code generator returns an
instruction sequence containing the
object code it has generated for the expression. Code generation for a
compound expression is accomplished by combining the output from simpler code
generators for component expressions, just as evaluation of a compound
expression is accomplished by evaluating the component expressions.
每个代码生成器都返回一个指令序列 (instruction sequence)，其中包含它为该表达式生成的目标代码。复合表达式的代码生成，是通过组合各组成表达式的较简单代码生成器的输出来完成的，正如复合表达式的求值是通过对各组成表达式求值来完成的一样。
The simplest method for combining instruction sequences is a procedure called
append-instruction-sequences. It takes as arguments any number of
instruction sequences that are to be executed sequentially; it appends them and
returns the combined sequence. That is, if ⟨
组合指令序列的最简单方法是一个名为 append-instruction-sequences 的过程。它接受任意数量的待顺序执行的指令序列作为参数，将它们拼接起来并返回组合后的序列。也就是说，若 ⟨
s
 e
q
 1
⟩
and ⟨
⟩
与 ⟨
s
 e
q
 2
⟩ are sequences of instructions, then
evaluating
⟩ 是两个指令序列，那么对
produces the sequence
求值将产生如下序列：
Whenever registers might need to be saved, the compiler’s code generators use
preserving, which is a more subtle method for combining instruction
sequences. Preserving takes three arguments: a set of registers and two
instruction sequences that are to be executed sequentially. It appends the
sequences in such a way that the contents of each register in the set is
preserved over the execution of the first sequence, if this is needed for the
execution of the second sequence. That is, if the first sequence modifies the
register and the second sequence actually needs the register’s original
contents, then preserving wraps a save and a restore of
the register around the first sequence before appending the sequences.
Otherwise, preserving simply returns the appended instruction sequences.
Thus, for example,
(preserving (list ⟨reg₁⟩ ⟨reg₂⟩) ⟨seg₁⟩ ⟨seg₂⟩)
produces one of the following four sequences of instructions, depending on how
⟨
每当寄存器可能需要被保存时，编译器的代码生成器就会使用 preserving，这是一种组合指令序列的更为精妙的方法。preserving 接受三个参数：一组寄存器，以及两个待顺序执行的指令序列。它以这样一种方式拼接两个序列：若第二个序列的执行需要用到某寄存器（属于给定寄存器集合）在第一个序列执行之前的原始内容，则 preserving 会在拼接两个序列之前，在第一个序列的外面包裹一个对该寄存器的 save 与 restore 操作。否则，preserving 只是简单地返回拼接后的指令序列。因此，举例来说，
(preserving (list ⟨reg₁⟩ ⟨reg₂⟩) ⟨seg₁⟩ ⟨seg₂⟩)
将根据 ⟨
s
 e
q
 1
⟩ and ⟨
⟩ 与 ⟨
s
 e
q
 2
⟩
use ⟨
⟩
如何使用 ⟨
r
 e
g
 1
⟩ and ⟨
⟩ 与 ⟨
r
 e
g
 2
⟩:
⟨
s
 e
q
 1
⟩
(save
(save
(save
⟨
r
 e
g
 2
⟩
 )
⟨
s
 e
q
 2
⟩
⟨
r
 e
g
 1
⟩
 )
⟨
r
 e
g
 2
⟩
 )
(save
⟨
r
 e
g
 1
⟩
 )
⟨
s
 e
q
 1
⟩
⟨
s
 e
q
 1
⟩
⟨
s
 e
q
 1
⟩
(restore
(restore
(restore
⟨
r
 e
g
 1
⟩
 )
⟨
r
 e
g
 1
⟩
 )
⟨
r
 e
g
 2
⟩
 )
(restore
⟨
r
 e
g
 2
⟩
 )
⟨
s
 e
q
 2
⟩
⟨
s
 e
q
 2
⟩
⟨
s
 e
q
 2
⟩

底本文段 4

By using preserving to combine instruction sequences the compiler avoids
unnecessary stack operations. This also isolates the details of whether or not
to generate save and restore instructions within the
preserving procedure, separating them from the concerns that arise in
writing each of the individual code generators. In fact no save or
restore instructions are explicitly produced by the code generators.
通过使用 `preserving` 来组合指令序列，编译器得以避免不必要的栈操作。这样做还将是否生成 `save` 和 `restore` 指令的细节隔离在 `preserving` 过程内部，使其与编写各个代码生成器时所关注的问题相分离。事实上，各代码生成器并不直接产生任何 `save` 或 `restore` 指令。
In principle, we could represent an instruction sequence simply as a list of
instructions. Append-instruction-sequences could then combine
instruction sequences by performing an ordinary list append. However,
preserving would then be a complex operation, because it would have to
analyze each instruction sequence to determine how the sequence uses its
registers. Preserving would be inefficient as well as complex, because
it would have to analyze each of its instruction sequence arguments, even
though these sequences might themselves have been constructed by calls to
preserving, in which case their parts would have already been analyzed.
To avoid such repetitious analysis we will associate with each instruction
sequence some information about its register use. When we construct a basic
instruction sequence we will provide this information explicitly, and the
procedures that combine instruction sequences will derive register-use
information for the combined sequence from the information associated with the
component sequences.
An instruction sequence will contain three pieces of information:
原则上，我们可以将指令序列简单地表示为一个指令的表。`append-instruction-sequences` 可以通过执行普通的表追加操作来组合指令序列。然而，`preserving` 将会是一个复杂的操作，因为它必须分析每个指令序列，以确定该序列如何使用其寄存器。`preserving` 不仅复杂，而且低效，因为它必须对每个指令序列参数进行分析，即便这些序列本身可能是通过调用 `preserving` 构造的——在那种情况下，其各个组成部分早已分析过了。为了避免这种重复分析，我们将为每个指令序列关联一些关于其寄存器使用情况的信息。在构造基本指令序列时，我们将显式提供这些信息；而组合指令序列的那些过程，则将根据各组成序列所携带的信息，推导出组合后序列的寄存器使用信息。一个指令序列将包含三部分信息：
the set of registers that must be initialized before the instructions in the
sequence are executed (these registers are said to be
needed by the
sequence),
在执行序列中的指令之前必须已完成初始化的寄存器集合（这些寄存器称为该序列所“需要”的寄存器），
the set of registers whose values are modified by the instructions in the
sequence, and
其值会被序列中的指令所修改的寄存器集合，以及
the actual instructions (also called
statements) in the sequence.
序列中的实际指令（也称为语句）。

底本文段 5

We will represent an instruction sequence as a list of its three parts. The
constructor for instruction sequences is thus
我们将把指令序列表示为其三个部分组成的表。指令序列的构造函数因此为
For example, the two-instruction sequence that looks up the value of the
variable x in the current environment, assigns the result to val,
and then returns, requires registers env and continue to have
been initialized, and modifies register val. This sequence would
therefore be constructed as
例如，这条两指令序列在当前环境中查找变量 x 的值，将结果赋给 val，然后返回——它需要寄存器 env 和 continue 已经完成初始化，并会修改寄存器 val。因此，该序列应构造为
We sometimes need to construct an instruction sequence with no statements:
有时我们需要构造一条不含任何语句的指令序列：
The procedures for combining instruction sequences are shown in
5.5.4.
用于组合指令序列的各个过程将在 5.5.4 中展示。

SICP source code scheme

Racket #lang sicp

(define (compile exp target linkage)
 (cond ((self-evaluating? exp)
 (compile-self-evaluating
 exp target linkage))
 ((quoted? exp)
 (compile-quoted exp target linkage))
 ((variable? exp)
 (compile-variable
 exp target linkage))
 ((assignment? exp)
 (compile-assignment
 exp target linkage))
 ((definition? exp)
 (compile-definition
 exp target linkage))
 ((if? exp)
 (compile-if exp target linkage))
 ((lambda? exp)
 (compile-lambda exp target linkage))
 ((begin? exp)
 (compile-sequence
 (begin-actions exp) target linkage))
 ((cond? exp)
 (compile
 (cond->if exp) target linkage))
 ((application? exp)
 (compile-application
 exp target linkage))
 (else
 (error "Unknown expression type:
 COMPILE"
 exp))))

SICP source code scheme

Racket #lang sicp

(assign val (const 5))

SICP source code scheme

Racket #lang sicp

(assign val (const 5))
(goto (reg continue))

SICP source code scheme

Racket #lang sicp

(append-instruction-sequences ⟨seq₁⟩ ⟨seq₂⟩)

SICP source code scheme

Racket #lang sicp

⟨seq₁⟩
⟨seq₂⟩

SICP source code scheme

Racket #lang sicp

(define (make-instruction-sequence
 needs modifies statements)
 (list needs modifies statements))

SICP source code scheme

Racket #lang sicp

(make-instruction-sequence
 '(env continue)
 '(val)
 '((assign val
 (op lookup-variable-value)
 (const x)
 (reg env))
 (goto (reg continue))))

SICP source code scheme

Racket #lang sicp

(define (empty-instruction-sequence)
 (make-instruction-sequence '() '() '()))

我的笔记自动保存