灯下 登录
计算机科学 / SICP / 2.3.3 Example: Representing Sets

Exercise 2.65 · 习题

Exercise 2.65: Use the results of Exercise 2.63
and Exercise 2.64 to give Θ
(
n
) implementations of
union-set and intersection-set for sets implemented as (balanced)
binary trees.

Sets and information retrieval

We have examined options for using lists to represent sets and have seen how
the choice of representation for a data object can have a large impact on the
performance of the programs that use the data. Another reason for
concentrating on sets is that the techniques discussed here appear again and
again in applications involving information retrieval.

Consider a data base containing a large number of individual records, such as
the personnel files for a company or the transactions in an accounting system.
A typical data-management system spends a large amount of time accessing or
modifying the data in the records and therefore requires an efficient method
for accessing records. This is done by identifying a part of each record to
serve as an identifying
key. A key can be anything that uniquely
identifies the record. For a personnel file, it might be an employee’s ID
number. For an accounting system, it might be a transaction number. Whatever
the key is, when we define the record as a data structure we should include a
key selector procedure that retrieves the key associated with a given
record.

Now we represent the data base as a set of records. To locate the record with a
given key we use a procedure lookup, which takes as arguments a key and
a data base and which returns the record that has that key, or false if there
is no such record. Lookup is implemented in almost the same way as
element-of-set?. For example, if the set of records is implemented as
an unordered list, we could use

(define (lookup given-key set-of-records)
(cond ((null? set-of-records) false)
((equal? given-key
(key (car set-of-records)))
(car set-of-records))
(else
(lookup given-key
(cdr set-of-records)))))

Of course, there are better ways to represent large sets than as unordered

lists. Information-retrieval systems in which records have to be “randomly

accessed” are typically implemented by a tree-based method, such as the

binary-tree representation discussed previously. In designing such a system

the methodology of data abstraction can be a great help. The designer can

create an initial implementation using a simple, straightforward representation

such as unordered lists. This will be unsuitable for the eventual system, but

it can be useful in providing a “quick and dirty” data base with which to

test the rest of the system. Later on, the data representation can be modified

to be more sophisticated. If the data base is accessed in terms of abstract

selectors and constructors, this change in representation will not require any

changes to the rest of the system.

练习 2.65:利用练习 2.63 和练习 2.64 的结果,给出以(平衡)二叉树实现的集合的 union-set 和 intersection-set 的 Θ(n) 实现。

集合与信息检索

我们已经考察了用表表示集合的各种方案,并看到了数据对象表示方式的选择对使用该数据的程序的性能有多大影响。关注集合的另一个原因是,这里讨论的技术在涉及信息检索的应用中反复出现。

考虑一个包含大量独立记录的数据库,例如某公司的人事档案或某会计系统的交易记录。典型的数据管理系统花费大量时间访问或修改记录中的数据,因此需要一种高效的记录访问方法。为此,要将每条记录的某一部分指定为标识键 (key)。键可以是任何能唯一标识该记录的内容。对于人事档案,可以是员工的工号;对于会计系统,可以是交易号。无论键是什么,当我们将记录定义为数据结构时,都应当包含一个键选择函数,用于提取给定记录对应的键。

现在将数据库表示为记录的集合。要根据给定键定位记录,使用过程 lookup,它以键和数据库为参数,返回具有该键的记录,若不存在则返回 false。lookup 的实现方式与 element-of-set? 几乎相同。例如,若记录集合以无序表实现,可以使用:

(define (lookup given-key set-of-records)
(cond ((null? set-of-records) false)
((equal? given-key
(key (car set-of-records)))
(car set-of-records))
(else
(lookup given-key
(cdr set-of-records)))))

当然,对于大型集合,无序表并不是最好的表示方式。需要“随机访问”记录的信息检索系统通常采用基于树的方法,例如前面讨论过的二叉树表示。在设计这样的系统时,数据抽象的方法论大有裨益。设计者可以先用简单直接的表示(如无序表)给出初始实现。这对于最终系统而言可能不够理想,但可以用来构建一个“粗糙但可用”的数据库,以便测试系统的其余部分。之后再将数据表示修改得更加复杂。如果数据库是通过抽象的选择函数和构造函数来访问的,那么这种表示上的变化就不需要对系统其余部分作任何改动。

Racket #lang sicp
(define (lookup given-key set-of-records)
 (cond ((null? set-of-records) false)
 ((equal? given-key
 (key (car set-of-records)))
 (car set-of-records))
 (else
 (lookup given-key
 (cdr set-of-records)))))