灯下 登录
计算机科学 / SICP / subsection 中英对照

5.5.1 Structure of the Compiler

In 4.1.7 we modified our original metacircular interpreter to

separate analysis from execution. We analyzed each expression to produce an

execution procedure that took an environment as argument and performed the

required operations. In our compiler, we will do essentially the same

analysis. Instead of producing execution procedures, however, we will generate

sequences of instructions to be run by our register machine.

在 4.1.7 节中,我们修改了原来的元循环解释器,将分析与执行分离。我们分析每个表达式,以产生一个以环境为参数并执行所需操作的执行过程。在我们的编译器中,我们将做本质上相同的分析。然而,我们不是产生执行过程,而是生成供寄存器机器运行的指令序列。

The procedure compile is the top-level dispatch in the compiler. It

corresponds to the eval procedure of 4.1.1, the

analyze procedure of 4.1.7, and the eval-dispatch

entry point of the explicit-control-evaluator in 5.4.1. The

compiler, like the interpreters, uses the expression-syntax procedures defined

in 4.1.2. Compile performs a

case analysis on the syntactic type of the expression to be compiled. For each

type of expression, it dispatches to a specialized

code generator:

过程 compile 是编译器中的顶层分派。它对应于 4.1.1 节的 eval 过程、4.1.7 节的 analyze 过程,以及 5.4.1 节显式控制求值器中的 eval-dispatch 入口点。与解释器一样,编译器使用 4.1.2 节中定义的表达式语法过程。compile 对待编译表达式的语法类型进行分情况分析,并对每种类型的表达式分派给专门的代码生成器:

Subheading: Targets and linkages

副标题:目标与链接

Compile and the code generators that it calls take two arguments in

addition to the expression to compile. There is a

target, which

specifies the register in which the compiled code is to return the value of the

expression. There is also a

linkage descriptor, which describes how

the code resulting from the compilation of the expression should proceed when

it has finished its execution. The linkage descriptor can require that the

code do one of the following three things:

compile 及其所调用的代码生成器除待编译的表达式外,还接受两个参数。其一是目标 (target),它指定编译后的代码将表达式的值返回到哪个寄存器中。其二是链接描述符 (linkage descriptor),它描述了由该表达式编译所产生的代码在完成执行后应如何继续。链接描述符可以要求代码执行以下三件事之一:

continue at the next instruction in sequence (this is specified by the linkage

descriptor next),

继续执行序列中的下一条指令(由链接描述符 next 指定),

return from the procedure being compiled (this is specified by the linkage

descriptor return), or

从正在编译的过程中返回(由链接描述符 return 指定),或者

jump to a named entry point (this is specified by using the designated label as

the linkage descriptor).

跳转到一个具名入口点(由使用指定标号作为链接描述符来指定)。

For example, compiling the expression 5 (which is self-evaluating) with

a target of the val register and a linkage of next should produce

the instruction

例如,以 val 寄存器为目标、以 next 为链接描述符来编译自求值表达式 5,应产生如下指令:

Compiling the same expression with a linkage of return should produce

the instructions

以 return 为链接描述符来编译同一表达式,应产生如下指令:

In the first case, execution will continue with the next instruction in the

sequence. In the second case, we will return from a procedure call. In both

cases, the value of the expression will be placed into the target val

register.

Subheading: Instruction sequences and stack usage

在第一种情况下,执行将继续到序列中的下一条指令。在第二种情况下,我们将从一个过程调用中返回。在两种情况下,表达式的值都将被放入目标寄存器 val 中。

副标题:指令序列与栈的使用

Each code generator returns an

instruction sequence containing the

object code it has generated for the expression. Code generation for a

compound expression is accomplished by combining the output from simpler code

generators for component expressions, just as evaluation of a compound

expression is accomplished by evaluating the component expressions.

每个代码生成器都返回一个指令序列 (instruction sequence),其中包含它为该表达式生成的目标代码。复合表达式的代码生成,是通过组合各组成表达式的较简单代码生成器的输出来完成的,正如复合表达式的求值是通过对各组成表达式求值来完成的一样。

The simplest method for combining instruction sequences is a procedure called

append-instruction-sequences. It takes as arguments any number of

instruction sequences that are to be executed sequentially; it appends them and

returns the combined sequence. That is, if ⟨

组合指令序列的最简单方法是一个名为 append-instruction-sequences 的过程。它接受任意数量的待顺序执行的指令序列作为参数,将它们拼接起来并返回组合后的序列。也就是说,若 ⟨

s
e

q
1

and ⟨

与 ⟨

s
e

q
2

⟩ are sequences of instructions, then

evaluating

⟩ 是两个指令序列,那么对

produces the sequence

求值将产生如下序列:

Whenever registers might need to be saved, the compiler’s code generators use

preserving, which is a more subtle method for combining instruction

sequences. Preserving takes three arguments: a set of registers and two

instruction sequences that are to be executed sequentially. It appends the

sequences in such a way that the contents of each register in the set is

preserved over the execution of the first sequence, if this is needed for the

execution of the second sequence. That is, if the first sequence modifies the

register and the second sequence actually needs the register’s original

contents, then preserving wraps a save and a restore of

the register around the first sequence before appending the sequences.

Otherwise, preserving simply returns the appended instruction sequences.

Thus, for example,

(preserving (list ⟨reg₁⟩ ⟨reg₂⟩) ⟨seg₁⟩ ⟨seg₂⟩)

produces one of the following four sequences of instructions, depending on how

每当寄存器可能需要被保存时,编译器的代码生成器就会使用 preserving,这是一种组合指令序列的更为精妙的方法。preserving 接受三个参数:一组寄存器,以及两个待顺序执行的指令序列。它以这样一种方式拼接两个序列:若第二个序列的执行需要用到某寄存器(属于给定寄存器集合)在第一个序列执行之前的原始内容,则 preserving 会在拼接两个序列之前,在第一个序列的外面包裹一个对该寄存器的 save 与 restore 操作。否则,preserving 只是简单地返回拼接后的指令序列。因此,举例来说,

(preserving (list ⟨reg₁⟩ ⟨reg₂⟩) ⟨seg₁⟩ ⟨seg₂⟩)

将根据 ⟨

s
e

q
1

⟩ and ⟨

⟩ 与 ⟨

s
e

q
2

use ⟨

如何使用 ⟨

r
e

g
1

⟩ and ⟨

⟩ 与 ⟨

r
e

g
2

⟩:

s
e

q
1

(save

(save

(save

r
e

g
2


)

s
e

q
2

r
e

g
1


)

r
e

g
2


)

(save

r
e

g
1


)

s
e

q
1

s
e

q
1

s
e

q
1

(restore

(restore

(restore

r
e

g
1


)

r
e

g
1


)

r
e

g
2


)

(restore

r
e

g
2


)

s
e

q
2

s
e

q
2

s
e

q
2

By using preserving to combine instruction sequences the compiler avoids

unnecessary stack operations. This also isolates the details of whether or not

to generate save and restore instructions within the

preserving procedure, separating them from the concerns that arise in

writing each of the individual code generators. In fact no save or

restore instructions are explicitly produced by the code generators.

通过使用 `preserving` 来组合指令序列,编译器得以避免不必要的栈操作。这样做还将是否生成 `save` 和 `restore` 指令的细节隔离在 `preserving` 过程内部,使其与编写各个代码生成器时所关注的问题相分离。事实上,各代码生成器并不直接产生任何 `save` 或 `restore` 指令。

In principle, we could represent an instruction sequence simply as a list of

instructions. Append-instruction-sequences could then combine

instruction sequences by performing an ordinary list append. However,

preserving would then be a complex operation, because it would have to

analyze each instruction sequence to determine how the sequence uses its

registers. Preserving would be inefficient as well as complex, because

it would have to analyze each of its instruction sequence arguments, even

though these sequences might themselves have been constructed by calls to

preserving, in which case their parts would have already been analyzed.

To avoid such repetitious analysis we will associate with each instruction

sequence some information about its register use. When we construct a basic

instruction sequence we will provide this information explicitly, and the

procedures that combine instruction sequences will derive register-use

information for the combined sequence from the information associated with the

component sequences.

An instruction sequence will contain three pieces of information:

原则上,我们可以将指令序列简单地表示为一个指令的表。`append-instruction-sequences` 可以通过执行普通的表追加操作来组合指令序列。然而,`preserving` 将会是一个复杂的操作,因为它必须分析每个指令序列,以确定该序列如何使用其寄存器。`preserving` 不仅复杂,而且低效,因为它必须对每个指令序列参数进行分析,即便这些序列本身可能是通过调用 `preserving` 构造的——在那种情况下,其各个组成部分早已分析过了。为了避免这种重复分析,我们将为每个指令序列关联一些关于其寄存器使用情况的信息。在构造基本指令序列时,我们将显式提供这些信息;而组合指令序列的那些过程,则将根据各组成序列所携带的信息,推导出组合后序列的寄存器使用信息。一个指令序列将包含三部分信息:

the set of registers that must be initialized before the instructions in the

sequence are executed (these registers are said to be

needed by the

sequence),

在执行序列中的指令之前必须已完成初始化的寄存器集合(这些寄存器称为该序列所“需要”的寄存器),

the set of registers whose values are modified by the instructions in the

sequence, and

其值会被序列中的指令所修改的寄存器集合,以及

the actual instructions (also called

statements) in the sequence.

序列中的实际指令(也称为语句)。

We will represent an instruction sequence as a list of its three parts. The

constructor for instruction sequences is thus

我们将把指令序列表示为其三个部分组成的表。指令序列的构造函数因此为

For example, the two-instruction sequence that looks up the value of the

variable x in the current environment, assigns the result to val,

and then returns, requires registers env and continue to have

been initialized, and modifies register val. This sequence would

therefore be constructed as

例如,这条两指令序列在当前环境中查找变量 x 的值,将结果赋给 val,然后返回——它需要寄存器 env 和 continue 已经完成初始化,并会修改寄存器 val。因此,该序列应构造为

We sometimes need to construct an instruction sequence with no statements:

有时我们需要构造一条不含任何语句的指令序列:

The procedures for combining instruction sequences are shown in

5.5.4.

用于组合指令序列的各个过程将在 5.5.4 中展示。

Racket #lang sicp
(define (compile exp target linkage)
 (cond ((self-evaluating? exp)
 (compile-self-evaluating
 exp target linkage))
 ((quoted? exp)
 (compile-quoted exp target linkage))
 ((variable? exp)
 (compile-variable
 exp target linkage))
 ((assignment? exp)
 (compile-assignment
 exp target linkage))
 ((definition? exp)
 (compile-definition
 exp target linkage))
 ((if? exp)
 (compile-if exp target linkage))
 ((lambda? exp)
 (compile-lambda exp target linkage))
 ((begin? exp)
 (compile-sequence
 (begin-actions exp) target linkage))
 ((cond? exp)
 (compile
 (cond->if exp) target linkage))
 ((application? exp)
 (compile-application
 exp target linkage))
 (else
 (error "Unknown expression type:
 COMPILE"
 exp))))
Racket #lang sicp
(assign val (const 5))
Racket #lang sicp
(assign val (const 5))
(goto (reg continue))
Racket #lang sicp
(append-instruction-sequences ⟨seq₁⟩ ⟨seq₂⟩)
Racket #lang sicp
⟨seq₁⟩
⟨seq₂⟩
Racket #lang sicp
(define (make-instruction-sequence
 needs modifies statements)
 (list needs modifies statements))
Racket #lang sicp
(make-instruction-sequence
 '(env continue)
 '(val)
 '((assign val
 (op lookup-variable-value)
 (const x)
 (reg env))
 (goto (reg continue))))
Racket #lang sicp
(define (empty-instruction-sequence)
 (make-instruction-sequence '() '() '()))