灯下 登录
计算机科学 / SICP / subsection 中英对照

3.5.1 Streams Are Delayed Lists

As we saw in 2.2.3, sequences can serve as standard interfaces

for combining program modules. We formulated powerful abstractions for

manipulating sequences, such as map, filter, and

accumulate, that capture a wide variety of operations in a manner that

is both succinct and elegant.

如我们在 2.2.3 节中所见,序列可以作为组合程序模块的标准接口。我们已建立了操纵序列的强大抽象,如 map、filter 和 accumulate,它们以简洁而优雅的方式捕获了各种各样的操作。

Unfortunately, if we represent sequences as lists, this elegance is bought at

the price of severe inefficiency with respect to both the time and space

required by our computations. When we represent manipulations on sequences as

transformations of lists, our programs must construct and copy data structures

(which may be huge) at every step of a process.

遗憾的是,如果我们用表来表示序列,这种优雅是以计算所需时间和空间上严重的低效为代价的。当我们将对序列的操作表示为对表的变换时,程序必须在计算过程的每一步都构造并复制数据结构(这些结构可能非常庞大)。

To see why this is true, let us compare two programs for computing the sum of

all the prime numbers in an interval. The first program is written in standard

iterative style:

为了说明这一点,让我们比较两个计算某区间内所有素数之和的程序。第一个程序以标准迭代风格编写:

The second program performs the same computation using the sequence operations

of 2.2.3:

第二个程序使用 2.2.3 节的序列操作完成同样的计算:

In carrying out the computation, the first program needs to store only the sum

being accumulated. In contrast, the filter in the second program cannot do any

testing until enumerate-interval has constructed a complete list of the

numbers in the interval. The filter generates another list, which in turn is

passed to accumulate before being collapsed to form a sum. Such large

intermediate storage is not needed by the first program, which we can think of

as enumerating the interval incrementally, adding each prime to the sum as it

is generated.

在执行计算的过程中,第一个程序只需存储正在积累的那个和。相比之下,第二个程序中的 filter 在 enumerate-interval 构造完区间内所有数的完整表之前无法进行任何检验。filter 再生成另一个表,该表又被传递给 accumulate,然后折叠成一个和。第一个程序不需要这样庞大的中间存储——我们可以把它看作逐步枚举区间,每找到一个素数就将其加入和中。

The inefficiency in using lists becomes painfully apparent if we use the

sequence paradigm to compute the second prime in the interval from 10,000 to

1,000,000 by evaluating the expression

如果用序列范式来计算 10,000 到 1,000,000 区间内的第二个素数,通过求值以下表达式,用表带来的低效就会显得格外触目惊心:

This expression does find the second prime, but the computational overhead is

outrageous. We construct a list of almost a million integers, filter this list

by testing each element for primality, and then ignore almost all of the

result. In a more traditional programming style, we would interleave the

enumeration and the filtering, and stop when we reached the second prime.

这个表达式确实能找到第二个素数,但其计算开销之大令人咋舌。我们构造了一个近百万个整数的表,用素性检验过滤该表的每个元素,然后几乎丢弃所有结果。在更传统的编程风格中,我们会将枚举与过滤交错进行,一旦找到第二个素数便停止。

Streams are a clever idea that allows one to use sequence manipulations without

incurring the costs of manipulating sequences as lists. With streams we can

achieve the best of both worlds: We can formulate programs elegantly as

sequence manipulations, while attaining the efficiency of incremental

computation. The basic idea is to arrange to construct a stream only

partially, and to pass the partial construction to the program that consumes

the stream. If the consumer attempts to access a part of the stream that has

not yet been constructed, the stream will automatically construct just enough

more of itself to produce the required part, thus preserving the illusion that

the entire stream exists. In other words, although we will write programs as

if we were processing complete sequences, we design our stream implementation

to automatically and transparently interleave the construction of the stream

with its use.

流 (streams) 是一种巧妙的构想,它允许人们使用序列操作而无需承担将序列作为表来操纵的代价。借助流,我们可以兼得两全:既能以序列操作的形式优雅地表达程序,又能实现增量式计算的效率。其基本思想是:只部分地构造一个流,并将部分构造的结果传递给消费该流的程序。如果消费者试图访问流中尚未构造的部分,流将自动构造恰好足够多的自身内容,以产生所需部分,从而维持整个流都已存在的假象。换言之,尽管我们将像处理完整序列那样编写程序,但我们设计的流实现会自动且透明地将流的构造与其使用交错进行。

On the surface, streams are just lists with different names for the procedures

that manipulate them. There is a constructor, cons-stream, and two

selectors, stream-car and stream-cdr, which satisfy the

constraints

从表面上看,流不过是换了操纵过程名称的表。它有一个构造函数 cons-stream,以及两个选择函数 stream-car 和 stream-cdr,满足如下约束:

There is a distinguishable object, the-empty-stream, which cannot be the

result of any cons-stream operation, and which can be identified with

the predicate stream-null?. Thus we can

make and use streams, in just the same way as we can make and use lists, to

represent aggregate data arranged in a sequence. In particular, we can build

stream analogs of the list operations from Chapter 2, such as

list-ref, map, and for-each:

存在一个特殊的对象 the-empty-stream,它不可能是任何 cons-stream 操作的结果,并可由谓词 stream-null? 来识别。因此,我们可以像使用表那样创建和使用流,以表示按序列排列的聚合数据。特别地,我们可以构造第 2 章表操作的流版本,例如 list-ref、map 和 for-each:

Stream-for-each is useful for viewing streams:

stream-for-each 对于查看流的内容很有用:

To make the stream implementation automatically and transparently interleave

the construction of a stream with its use, we will arrange for the cdr

of a stream to be evaluated when it is accessed by the stream-cdr

procedure rather than when the stream is constructed by cons-stream.

This implementation choice is reminiscent of our discussion of rational numbers

in 2.1.2, where we saw that we can choose to implement rational

numbers so that the reduction of numerator and denominator to lowest terms is

performed either at construction time or at selection time. The two

rational-number implementations produce the same data abstraction, but the

choice has an effect on efficiency. There is a similar relationship between

streams and ordinary lists. As a data abstraction, streams are the same as

lists. The difference is the time at which the elements are evaluated. With

ordinary lists, both the car and the cdr are evaluated at

construction time. With streams, the cdr is evaluated at selection

time.

为使流的实现自动且透明地将流的构造与其使用交错进行,我们将安排在通过 stream-cdr 过程访问流的 cdr 时才对其求值,而不是在由 cons-stream 构造流时就求值。这一实现选择令人联想到我们在 2.1.2 节对有理数的讨论——我们看到,可以选择在构造时或选取时对分子和分母进行约分,两种实现产生相同的数据抽象,但选择会影响效率。流与普通表之间存在类似的关系:作为数据抽象,流与表是相同的;区别在于元素求值的时机。对于普通表,car 和 cdr 在构造时均被求值;对于流,cdr 在选取时才被求值。

Our implementation of streams will be based on a special form called

delay. Evaluating (delay ⟨exp⟩) does not evaluate the

expression ⟨exp⟩, but rather returns a so-called

我们对流的实现将基于一个称为 delay 的特殊形式。对 (delay ⟨exp⟩) 求值时,并不对表达式 ⟨exp⟩ 进行求值,而是返回一个所谓的

delayed object, which we can think of as a “promise” to evaluate

⟨exp⟩ at some

future time. As a companion to delay, there is a procedure called

force that takes a delayed object as argument and performs the

evaluation—in effect, forcing the delay to fulfill its promise. We

will see below how delay and force can be implemented, but first

let us use these to construct streams.

Cons-stream is a special form defined so that

延迟对象 (delayed object),我们可以将其理解为在将来某个时刻对 ⟨exp⟩ 求值的“承诺”。与 delay 配套,有一个名为 force 的过程,它以延迟对象为参数并执行求值——实际上是强制 delay 履行其“承诺”。我们将在下文看到 delay 和 force 的实现,但首先让我们用它们来构造流。cons-stream 是一个特殊形式,其定义使得:

is equivalent to

等价于

What this means is that we will construct streams using pairs. However, rather

than placing the value of the rest of the stream into the cdr of the

pair we will put there a promise to compute the rest if it is ever requested.

Stream-car and stream-cdr can now be defined as procedures:

这意味着我们将用序对来构造流。然而,我们不会把流其余部分的值放入序对的 cdr,而是在那里放一个承诺 (promise):若将来有需要,才去计算流的其余部分。stream-car 和 stream-cdr 现在可以定义为过程:

Stream-car selects the car of the pair; stream-cdr selects

the cdr of the pair and evaluates the delayed expression found there to

obtain the rest of the stream.

Subheading: The stream implementation in action

stream-car 取序对的 car;stream-cdr 取序对的 cdr,并对那里找到的延迟表达式求值,从而得到流的其余部分。

小节标题:流实现的运作过程

To see how this implementation behaves, let us analyze the “outrageous” prime

computation we saw above, reformulated in terms of streams:

为了了解这一实现的运作方式,让我们分析前面那个"骇人听闻"的素数计算,将其用流的形式重新表述:

We will see that it does indeed work efficiently.

我们将会看到,它确实能高效地运行。

We begin by calling stream-enumerate-interval with the arguments 10,000

and 1,000,000. Stream-enumerate-interval is the stream analog of

enumerate-interval (2.2.3):

我们先以 10,000 和 1,000,000 为参数调用 stream-enumerate-interval。stream-enumerate-interval 是 enumerate-interval(2.2.3 节)的流版本:

and thus the result returned by stream-enumerate-interval, formed by the

cons-stream, is

于是,stream-enumerate-interval 由 cons-stream 构造返回的结果是

That is, stream-enumerate-interval returns a stream represented as a

pair whose car is 10,000 and whose cdr is a promise to enumerate

more of the interval if so requested. This stream is now filtered for primes,

using the stream analog of the filter procedure (2.2.3):

也就是说,stream-enumerate-interval 返回一个流,该流表示为一个序对,其 car 为 10,000,其 cdr 是一个承诺:若有需要则枚举区间中的更多元素。这个流随后被用素数进行过滤,使用的是 filter 过程(2.2.3 节)的流版本:

Stream-filter tests the stream-car of the stream (the car

of the pair, which is 10,000). Since this is not prime, stream-filter

examines the stream-cdr of its input stream. The call to

stream-cdr forces evaluation of the delayed

stream-enumerate-interval, which now returns

stream-filter 检验该流的 stream-car(即序对的 car,其值为 10,000)。由于 10,000 不是素数,stream-filter 继续检查其输入流的 stream-cdr。对 stream-cdr 的调用强制对延迟的 stream-enumerate-interval 求值,后者此时返回

Stream-filter now looks at the stream-car of this stream, 10,001,

sees that this is not prime either, forces another stream-cdr, and so

on, until stream-enumerate-interval yields the prime 10,007, whereupon

stream-filter, according to its definition, returns

stream-filter 接着查看这个流的 stream-car,即 10,001,发现它也不是素数,便再次强制求 stream-cdr,如此继续,直到 stream-enumerate-interval 产出素数 10,007,此时 stream-filter 按其定义返回

which in this case is

在这种情况下,结果是

This result is now passed to stream-cdr in our original expression.

This forces the delayed stream-filter, which in turn keeps forcing the

delayed stream-enumerate-interval until it finds the next prime, which

is 10,009. Finally, the result passed to stream-car in our original

expression is

这个结果随后被传给原始表达式中的 stream-cdr。这将强制对延迟的 stream-filter 求值,后者反过来持续强制延迟的 stream-enumerate-interval,直到找到下一个素数 10,009。最终,传给原始表达式中 stream-car 的结果是

Stream-car returns 10,009, and the computation is complete. Only as

many integers were tested for primality as were necessary to find the second

prime, and the interval was enumerated only as far as was necessary to feed the

prime filter.

stream-car 返回 10,009,计算完成。在整个过程中,被检验素性的整数恰好是为找到第二个素数所必需的那些,而区间的枚举也只进行到向素数过滤器提供所需数据为止。

In general, we can think of delayed evaluation as “demand-driven”

programming, whereby each stage in the stream process is activated only enough

to satisfy the next stage. What we have done is to decouple the actual order

of events in the computation from the apparent structure of our procedures. We

write procedures as if the streams existed “all at once” when, in reality,

the computation is performed incrementally, as in traditional programming

styles.

Subheading: Implementing delay and force

一般而言,我们可以把延迟求值视为"按需驱动"的程序设计方式——流计算过程的每个阶段只被激活到足以满足下一阶段的程度。我们所做的,是将计算中事件的实际发生顺序与过程的表观结构解耦。我们编写过程时仿佛流"同时全部存在",而实际上计算是像传统程序设计风格那样增量进行的。

小节标题:delay 和 force 的实现

Although delay and force may seem like mysterious operations,

their implementation is really quite straightforward. Delay must

package an expression so that it can be evaluated later on demand, and we can

accomplish this simply by treating the expression as the body of a procedure.

Delay can be a special form such that

尽管 delay 和 force 看起来像是神秘的操作,它们的实现其实相当直接。delay 必须将一个表达式打包,使其能够按需在日后求值,而我们可以简单地把该表达式视为一个过程的体来实现这一点。delay 可以是一种特殊形式,使得

is syntactic sugar for

是以下形式的语法糖

Force simply calls the procedure (of no arguments) produced by

delay, so we can implement force as a procedure:

force 只是调用由 delay 产生的那个(无参数的)过程,因此可以将 force 实现为一个过程:

This implementation suffices for delay and force to work as

advertised, but there is an important optimization that we can include. In

many applications, we end up forcing the same delayed object many times. This

can lead to serious inefficiency in recursive programs involving streams. (See

Exercise 3.57.) The solution is to build delayed objects so that the

first time they are forced, they store the value that is computed. Subsequent

forcings will simply return the stored value without repeating the computation.

In other words, we implement delay as a special-purpose memoized

procedure similar to the one described in Exercise 3.27. One way to

accomplish this is to use the following procedure, which takes as argument a

procedure (of no arguments) and returns a memoized version of the procedure.

The first time the memoized procedure is run, it saves the computed result. On

subsequent evaluations, it simply returns the result.

这一实现已足以使 delay 和 force 按预期工作,但我们还可以加入一个重要的优化。在许多应用中,我们会多次强制求同一个延迟对象的值。在涉及流的递归程序中,这可能导致严重的低效。(见练习 3.57。)解决办法是将延迟对象构造成这样:第一次被强制求值时,它存储计算得到的值;之后的每次强制求值只需返回已存储的值,而无需重复计算。换言之,我们将 delay 实现为一种专用的记忆化过程,类似于练习 3.27 中描述的那种。实现这一点的方法之一,是使用下面这个过程——它以一个(无参数的)过程作为参数,返回该过程的记忆化版本。记忆化过程第一次运行时,它保存计算结果;此后求值时,直接返回该结果。

Delay is then defined so that (delay ⟨exp⟩) is equivalent

to

然后将 delay 定义为使 `(delay ⟨exp⟩)` 等价于

and force is as defined previously.

而 force 则与前面的定义相同。

Racket #lang sicp
(define (sum-primes a b)
 (define (iter count accum)
 (cond ((> count b) accum)
 ((prime? count)
 (iter (+ count 1)
 (+ count accum)))
 (else (iter (+ count 1) accum))))
 (iter a 0))
Racket #lang sicp
(define (sum-primes a b)
 (accumulate
 +
 0
 (filter prime? (enumerate-interval a b))))
Racket #lang sicp
(car (cdr
 (filter
 prime?
 (enumerate-interval 10000 1000000))))
Racket #lang sicp
(stream-car (cons-stream x y)) = x
(stream-cdr (cons-stream x y)) = y
Racket #lang sicp
(define (stream-ref s n)
 (if (= n 0)
 (stream-car s)
 (stream-ref (stream-cdr s) (- n 1))))

(define (stream-map proc s)
 (if (stream-null? s)
 the-empty-stream
 (cons-stream
 (proc (stream-car s))
 (stream-map proc (stream-cdr s)))))

(define (stream-for-each proc s)
 (if (stream-null? s)
 'done
 (begin
 (proc (stream-car s))
 (stream-for-each proc
 (stream-cdr s)))))
Racket #lang sicp
(define (display-stream s)
 (stream-for-each display-line s))

(define (display-line x)
 (newline)
 (display x))
Racket #lang sicp
(cons-stream ⟨a⟩ ⟨b⟩)
Racket #lang sicp
(cons ⟨a⟩ (delay ⟨b⟩))
Racket #lang sicp
(define (stream-car stream)
 (car stream))

(define (stream-cdr stream)
 (force (cdr stream)))
Racket #lang sicp
(stream-car
 (stream-cdr
 (stream-filter
 prime? (stream-enumerate-interval
 10000 1000000))))
Racket #lang sicp
(define (stream-enumerate-interval low high)
 (if (> low high)
 the-empty-stream
 (cons-stream
 low
 (stream-enumerate-interval (+ low 1)
 high))))
Racket #lang sicp
(cons 10000
 (delay
 (stream-enumerate-interval
 10001
 1000000)))
Racket #lang sicp
(define (stream-filter pred stream)
 (cond ((stream-null? stream)
 the-empty-stream)
 ((pred (stream-car stream))
 (cons-stream
 (stream-car stream)
 (stream-filter
 pred
 (stream-cdr stream))))
 (else (stream-filter
 pred
 (stream-cdr stream)))))
Racket #lang sicp
(cons 10001
 (delay
 (stream-enumerate-interval
 10002
 1000000)))
Racket #lang sicp
(cons-stream
 (stream-car stream)
 (stream-filter pred (stream-cdr stream)))
Racket #lang sicp
(cons 10007
 (delay
 (stream-filter
 prime?
 (cons 10008
 (delay
 (stream-enumerate-interval
 10009 1000000))))))
Racket #lang sicp
(cons 10009
 (delay
 (stream-filter
 prime?
 (cons 10010
 (delay
 (stream-enumerate-interval
 10011 1000000))))))
Racket #lang sicp
(delay ⟨exp⟩)
Racket #lang sicp
(lambda () ⟨exp⟩)
Racket #lang sicp
(define (force delayed-object)
 (delayed-object))
Racket #lang sicp
(define (memo-proc proc)
 (let ((already-run? false) (result false))
 (lambda ()
 (if (not already-run?)
 (begin (set! result (proc))
 (set! already-run? true)
 result)
 result))))
Racket #lang sicp
(memo-proc (lambda () ⟨exp⟩))