灯下 · SICP · 5.3.1 Memory as Vectors

内容 5.3 Storage Allocation and Garbage Collection · 431

底本文段

A conventional computer memory can be thought of as an array of cubbyholes,
each of which can contain a piece of information. Each cubbyhole has a unique
name, called its
address or
location. Typical memory
systems provide two primitive operations: one that fetches the data stored in a
specified location and one that assigns new data to a specified location.
Memory addresses can be incremented to support sequential access to some set of
the cubbyholes. More generally, many important data operations require that
memory addresses be treated as data, which can be stored in memory locations
and manipulated in machine registers. The representation of list structure is
one application of such
address arithmetic.
常规计算机内存可以看作一个格子数组，每个格子可以存放一条信息。每个格子都有唯一的名字，称为其地址 (address) 或位置 (location)。典型的内存系统提供两种基本操作：一种是取出存放在指定位置的数据，另一种是将新数据写入指定位置。内存地址可以递增，以便对一组格子进行顺序访问。更一般地，许多重要的数据操作要求将内存地址当作数据来处理——它既可以存放在内存位置中，也可以在机器寄存器中操作。表结构的表示便是这种地址运算 (address arithmetic) 的一种应用。
To model computer memory, we use a new kind of data structure called a
为了对计算机内存建立模型，我们引入一种新的数据结构，称为
vector. Abstractly, a vector is a compound data object whose
individual elements can be accessed by means of an integer index in an amount
of time that is independent of the index. In order to describe memory operations, we use two
primitive Scheme procedures for manipulating vectors:
向量 (vector)。抽象地说，向量是一种复合数据对象，其中的每个元素都可以通过一个整数索引在与索引无关的固定时间内访问。为了描述内存操作，我们使用两种操作向量的基本 Scheme 过程：
(vector-ref ⟨vector⟩ ⟨n⟩) returns the n
`(vector-ref ⟨vector⟩ ⟨n⟩)` 返回向量的第 n
th
element of the vector.
个元素。
(vector-set! ⟨vector⟩ ⟨n⟩ ⟨value⟩)
sets the n
`(vector-set! ⟨vector⟩ ⟨n⟩ ⟨value⟩)` 将向量的第 n
th element of the vector to the designated value.
个元素设置为指定的值。
For example, if v is a vector, then (vector-ref v 5) gets the
fifth entry in the vector v and (vector-set! v 5 7) changes the
value of the fifth entry of the vector v to 7. For computer memory, this access can
be implemented through the use of address arithmetic to combine a
base address
that specifies the beginning location of a vector in memory with an
例如，若 `v` 是一个向量，则 `(vector-ref v 5)` 取出向量 `v` 的第五个元素，而 `(vector-set! v 5 7)` 将向量 `v` 的第五个元素的值改为 7。对于计算机内存而言，这种访问可以通过地址运算来实现——将指定向量在内存中起始位置的基地址 (base address) 与
index that specifies the offset of a particular element of the
vector.
Subheading: Representing Lisp data
指定向量中特定元素偏移量的索引 (index) 相结合。子标题：Lisp 数据的表示

底本文段 2

We can use vectors to implement the basic pair structures required for a
list-structured memory. Let us imagine that computer memory is divided into
two vectors: the-cars and the-cdrs. We will represent list
structure as follows: A pointer to a pair is an index into the two vectors.
The car of the pair is the entry in the-cars with the designated
index, and the cdr of the pair is the entry in the-cdrs with the
designated index. We also need a representation for objects other than pairs
(such as numbers and symbols) and a way to distinguish one kind of data from
another. There are many methods of accomplishing this, but they all reduce to
using
typed pointers, that is, to extending the notion of “pointer”
to include information on data type. The data type
enables the system to distinguish a pointer to a pair (which consists of the
“pair” data type and an index into the memory vectors) from pointers to other
kinds of data (which consist of some other data type and whatever is being used
to represent data of that type). Two data objects are considered to be the
same (eq?) if their pointers are identical.
Figure 5.14 illustrates the use of this method to
represent the list ((1 2) 3 4), whose box-and-pointer diagram is also
shown. We use letter prefixes to denote the data-type information. Thus, a
pointer to the pair with index 5 is denoted p5, the empty list is
denoted by the pointer e0, and a pointer to the number 4 is denoted
n4. In the box-and-pointer diagram, we have indicated at the lower left
of each pair the vector index that specifies where the car and
cdr of the pair are stored. The blank locations in the-cars and
the-cdrs may contain parts of other list structures (not of interest
here).
我们可以用向量来实现表结构内存所需的基本序对结构。设想计算机内存被划分为两个向量：`the-cars` 和 `the-cdrs`。我们按如下方式表示表结构：指向序对的指针是这两个向量的一个索引。序对的 `car` 是 `the-cars` 中具有该索引的元素，序对的 `cdr` 是 `the-cdrs` 中具有该索引的元素。我们还需要一种表示非序对对象（如数和符号）的方式，以及区分不同类型数据的方法。实现这一点有很多种方法，但它们都归结为使用带类型的指针 (typed pointers)，即将“指针”的概念扩展为包含数据类型信息。数据类型使系统能够区分指向序对的指针（由“序对”数据类型与内存向量中的一个索引构成）和指向其他类型数据的指针（由其他某种数据类型以及用于表示该类型数据的任何内容构成）。如果两个数据对象的指针相同，就认为它们是相同的（`eq?`）。图 5.14 展示了用这种方法表示表 `((1 2) 3 4)` 的情形，图中也给出了相应的盒子指针图。我们用字母前缀来表示数据类型信息：指向索引为 5 的序对的指针记为 `p5`，空表用指针 `e0` 表示，而指向数 4 的指针记为 `n4`。在盒子指针图中，我们在每个序对的左下角标注了向量索引，指明该序对的 `car` 和 `cdr` 存放的位置。`the-cars` 和 `the-cdrs` 中的空白位置可能存放着其他表结构的部分内容（此处不予关注）。

底本文段 3

Figure 5.14: Box-and-pointer and memory-vector representations of the list ((1 2) 3 4).
图 5.14：表 `((1 2) 3 4)` 的盒子指针表示与内存向量表示。
A pointer to a number, such as n4, might consist of a type indicating
numeric data together with the actual representation of the number
4. To deal with numbers that are too large to be represented in the
fixed amount of space allocated for a single pointer, we could use a distinct
一个指向数的指针，例如 `n4`，可能由一个表示数值数据的类型标记与数 4 的实际表示共同构成。对于无法在单个指针所分配的固定空间内表示的大数，我们可以使用一种专门的
bignum data type, for which the pointer designates a list in which
the parts of the number are stored.
大数 (bignum) 数据类型，其中指针指向一个表，该数的各个组成部分存放在这个表中。
A symbol might be represented as a typed pointer that designates a sequence of
the characters that form the symbol’s printed representation. This sequence is
constructed by the Lisp reader when the character string is initially
encountered in input. Since we want two instances of a symbol to be recognized
as the “same” symbol by eq? and we want eq? to be a simple test
for equality of pointers, we must ensure that if the reader sees the same
character string twice, it will use the same pointer (to the same sequence of
characters) to represent both occurrences. To accomplish this, the reader
maintains a table, traditionally called the
obarray, of all the
symbols it has ever encountered. When the reader encounters a character string
and is about to construct a symbol, it checks the obarray to see if it has ever
before seen the same character string. If it has not, it uses the characters
to construct a new symbol (a typed pointer to a new character sequence) and
enters this pointer in the obarray. If the reader has seen the string before,
it returns the symbol pointer stored in the obarray. This process of replacing
character strings by unique pointers is called
interning symbols.
Subheading: Implementing the primitive list operations
符号可以表示为一个带类型的指针，指向构成该符号打印表示的字符序列。这个序列由 Lisp 读入器 (reader) 在初次遇到该字符串时构造。由于我们希望同一符号的两次出现能被 `eq?` 识别为“相同”的符号，并且希望 `eq?` 只是一个简单的指针相等性测试，因此必须保证：当读入器两次遇到相同的字符串时，它会使用同一个指针（指向同一字符序列）来表示这两次出现。为此，读入器维护一张表，传统上称为对象数组 (obarray)，记录它曾经遇到过的所有符号。当读入器遇到一个字符串并准备构造一个符号时，它先检查对象数组，看是否曾经见过相同的字符串。若未见过，则用这些字符构造一个新符号（一个指向新字符序列的带类型指针），并将此指针写入对象数组。若已见过，则返回对象数组中存储的那个符号指针。这种将字符串替换为唯一指针的过程称为符号驻留 (interning symbols)。子标题：实现基本表操作

底本文段 4

Given the above representation scheme, we can replace each “primitive” list
operation of a register machine with one or more primitive vector operations.
We will use two registers, the-cars and the-cdrs, to identify the
memory vectors, and will assume that vector-ref and vector-set!
are available as primitive operations. We also assume that numeric operations
on pointers (such as incrementing a pointer, using a pair pointer to index a
vector, or adding two numbers) use only the index portion of the typed pointer.
For example, we can make a register machine support the instructions
有了上述表示方案，我们就可以用一个或多个基本向量操作来替换寄存器机器中每一个“基本”表操作。我们将使用两个寄存器 `the-cars` 和 `the-cdrs` 来标识内存向量，并假设 `vector-ref` 和 `vector-set!` 可作为基本操作使用。我们还假设对指针的数值操作（如递增指针、用序对指针作为向量索引，或将两个数相加）只使用带类型指针的索引部分。例如，我们可以让寄存器机器支持以下指令
if we implement these, respectively, as
如果我们分别将它们实现为
The instructions
指令
are implemented as
则实现为

底本文段 5

Cons is performed by allocating an unused index and storing the
arguments to cons in the-cars and the-cdrs at that indexed
vector position. We presume that there is a special register, free,
that always holds a pair pointer containing the next available index, and that
we can increment the index part of that pointer to find the next free
location. For example, the instruction
`cons` 通过分配一个未使用的索引，并将 `cons` 的两个参数分别存入 `the-cars` 和 `the-cdrs` 中该索引对应的位置来执行。我们假设有一个特殊寄存器 `free`，它始终保存一个序对指针，其中包含下一个可用的索引；递增该指针的索引部分即可找到下一个空闲位置。例如，指令
is implemented as the following sequence of vector operations:
被实现为以下向量操作序列：
The eq? operation
`eq?` 操作
simply tests the equality of all fields in the registers, and predicates such
as pair?, null?, symbol?, and number? need only
check the type field.
Subheading: Implementing stacks
只需检验两个寄存器中所有字段是否相等，而 `pair?`、`null?`、`symbol?`、`number?` 等谓词只需检查类型字段即可。子标题：实现栈

底本文段 6

Although our register machines use stacks, we need do nothing special here,
since stacks can be modeled in terms of lists. The stack can be a list of the
saved values, pointed to by a special register the-stack. Thus,
(save ⟨reg⟩) can be implemented as
虽然我们的寄存器机器使用了栈，但这里无需做任何特殊处理，因为栈可以用表来模拟。栈可以是一个保存过的值的表，由一个特殊寄存器 `the-stack` 指向。因此，`(save ⟨reg⟩)` 可以实现为
Similarly, (restore ⟨reg⟩) can be implemented as
类似地，`(restore ⟨reg⟩)` 可以实现为
and (perform (op initialize-stack)) can be implemented as
而 `(perform (op initialize-stack))` 可以实现为
These operations can be further expanded in terms of the vector operations
given above. In conventional computer architectures, however, it is usually
advantageous to allocate the stack as a separate vector. Then pushing and
popping the stack can be accomplished by incrementing or decrementing an index
into that vector.
这些操作还可以进一步用上面给出的向量操作来展开。然而，在常规计算机体系结构中，通常更有利的做法是将栈分配为一个独立的向量。这样，压栈和出栈操作就可以通过递增或递减该向量的一个索引来完成。

SICP source code scheme

Racket #lang sicp

(assign ⟨reg₁⟩ (op car) (reg ⟨reg₂⟩))
(assign ⟨reg₁⟩ (op cdr) (reg ⟨reg₂⟩))

SICP source code scheme

Racket #lang sicp

(assign ⟨reg₁⟩
 (op vector-ref)
 (reg the-cars)
 (reg ⟨reg₂⟩))
(assign ⟨reg₁⟩
 (op vector-ref)
 (reg the-cdrs)
 (reg ⟨reg₂⟩))

SICP source code scheme

Racket #lang sicp

(perform (op set-car!) (reg ⟨reg₁⟩) (reg ⟨reg₂⟩))
(perform (op set-cdr!) (reg ⟨reg₁⟩) (reg ⟨reg₂⟩))

SICP source code scheme

Racket #lang sicp

(perform (op vector-set!)
 (reg the-cars)
 (reg ⟨reg₁⟩)
 (reg ⟨reg₂⟩))
(perform (op vector-set!)
 (reg the-cdrs)
 (reg ⟨reg₁⟩)
 (reg ⟨reg₂⟩))

SICP source code scheme

Racket #lang sicp

(assign ⟨reg₁⟩
 (op cons)
 (reg ⟨reg₂⟩)
 (reg ⟨reg₃⟩))

SICP source code scheme

Racket #lang sicp

(perform (op vector-set!)
 (reg the-cars)
 (reg free)
 (reg ⟨reg₂⟩))
(perform (op vector-set!)
 (reg the-cdrs)
 (reg free)
 (reg ⟨reg₃⟩))
(assign ⟨reg₁⟩ (reg free))
(assign free (op +) (reg free) (const 1))

SICP source code scheme

Racket #lang sicp

(op eq?) (reg ⟨reg₁⟩) (reg ⟨reg₂⟩)

SICP source code scheme

Racket #lang sicp

(assign the-stack
 (op cons)
 (reg ⟨reg⟩)
 (reg the-stack))

SICP source code scheme

Racket #lang sicp

(assign ⟨reg⟩ (op car) (reg the-stack))
(assign the-stack (op cdr) (reg the-stack))

SICP source code scheme

Racket #lang sicp

(assign the-stack (const ()))

我的笔记自动保存