Exercise 2.60: We specified that a set would be
represented as a list with no duplicates. Now suppose we allow duplicates.
For instance, the set {
1
,
2
,
3
} could be represented as the list (2 3 2 1
3 2 2). Design procedures element-of-set?, adjoin-set,
union-set, and intersection-set that operate on this
representation. How does the efficiency of each compare with the corresponding
procedure for the non-duplicate representation? Are there applications for
which you would use this representation in preference to the non-duplicate one?
Sets as ordered lists
One way to speed up our set operations is to change the representation so that
the set elements are listed in increasing order. To do this, we need some way
to compare two objects so that we can say which is bigger. For example, we
could compare symbols lexicographically, or we could agree on some method for
assigning a unique number to an object and then compare the elements by
comparing the corresponding numbers. To keep our discussion simple, we will
consider only the case where the set elements are numbers, so that we can
compare elements using > and . We will represent a set of
numbers by listing its elements in increasing order. Whereas our first
representation above allowed us to represent the set {
1
,
3
,
6
,
10
} by listing
the elements in any order, our new representation allows only the list (1
3 6 10).
One advantage of ordering shows up in element-of-set?: In checking for
the presence of an item, we no longer have to scan the entire set. If we reach
a set element that is larger than the item we are looking for, then we know
that the item is not in the set:
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (car set)) true)
(( x (car set)) false)
(else (element-of-set? x (cdr set)))))
How many steps does this save? In the worst case, the item we are looking for
may be the largest one in the set, so the number of steps is the same as for
the unordered representation. On the other hand, if we search for items of
many different sizes we can expect that sometimes we will be able to stop
searching at a point near the beginning of the list and that other times we
will still need to examine most of the list. On the average we should expect
to have to examine about half of the items in the set. Thus, the average
number of steps required will be about n
/
2. This is still
Θ
(
n
) growth, but it does save us, on the average, a factor of 2
in number of steps over the previous implementation.
We obtain a more impressive speedup with intersection-set. In the
unordered representation this operation required Θ
(
n
2
) steps,
because we performed a complete scan of set2 for each element of
set1. But with the ordered representation, we can use a more clever
method. Begin by comparing the initial elements, x1 and x2, of
the two sets. If x1 equals x2, then that gives an element of the
intersection, and the rest of the intersection is the intersection of the
cdr-s of the two sets. Suppose, however, that x1 is less than
x2. Since x2 is the smallest element in set2, we can
immediately conclude that x1 cannot appear anywhere in set2 and
hence is not in the intersection. Hence, the intersection is equal to the
intersection of set2 with the cdr of set1. Similarly, if
x2 is less than x1, then the intersection is given by the
intersection of set1 with the cdr of set2. Here is the
procedure:
(define (intersection-set set1 set2)
(if (or (null? set1) (null? set2))
'()
(let ((x1 (car set1)) (x2 (car set2)))
(cond ((= x1 x2)
(cons x1 (intersection-set
(cdr set1)
(cdr set2))))
(( x1 x2) (intersection-set
(cdr set1)
set2))
(( x2 x1) (intersection-set
set1
(cdr set2)))))))
To estimate the number of steps required by this process, observe that at each
step we reduce the intersection problem to computing intersections of smaller
sets—removing the first element from set1 or set2 or both.
Thus, the number of steps required is at most the sum of the sizes of
set1 and set2, rather than the product of the sizes as with the
unordered representation. This is Θ
(
n
) growth rather than
Θ
(
n
2
)—a considerable speedup, even for sets of moderate size.
练习 2.60:我们规定集合用没有重复元素的表来表示。现在假设允许重复元素。例如,集合 {1, 2, 3} 可以表示为表 (2 3 2 1 3 2 2)。设计在这种表示上操作的过程 element-of-set?、adjoin-set、union-set 和 intersection-set。与对应的无重复表示的过程相比,各过程的效率如何?是否存在某些应用场景,使你倾向于使用这种表示而非无重复的表示?
集合作为有序表
加速集合操作的一种方法是改变表示,使集合元素按递增顺序排列。为此,我们需要某种比较两个对象大小的方法。例如,我们可以按字典序比较符号,也可以约定某种为每个对象分配唯一数字的方法,然后通过比较对应数字来比较元素。为了简化讨论,我们只考虑集合元素为数的情况,从而可以用 > 和 < 来比较元素。我们将通过按递增顺序列出元素来表示数的集合。前面的第一种表示允许我们以任意顺序列出元素来表示集合 {1, 3, 6, 10},而新的表示只允许使用表 (1 3 6 10)。
有序表示的一个优点体现在 element-of-set? 中:在检查一个元素是否存在时,我们不再需要扫描整个集合。如果遇到一个大于目标元素的集合元素,就知道目标不在集合中:
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (car set)) true)
((< x (car set)) false)
(else (element-of-set? x (cdr set)))))
这能节省多少步骤?在最坏情况下,目标元素可能是集合中最大的,此时步骤数与无序表示相同。另一方面,若我们搜索许多不同大小的元素,可以预期有时可以在表的较前位置停止搜索,有时仍需检查表的大部分。平均而言,我们应当期望只需检查集合中约一半的元素。因此,平均所需步骤数约为 n/2。这仍然是 Θ(n) 的增长,但与前一实现相比,平均节省了约一半的步骤数。
对于 intersection-set,我们可以得到更显著的加速。在无序表示中,该操作需要 Θ(n²) 步,因为我们对 set1 的每个元素都要完整扫描 set2。但有了有序表示,我们可以使用更巧妙的方法。先比较两个集合的首元素 x1 和 x2。若 x1 等于 x2,则它是交集的一个元素,其余交集是两个集合各自的 cdr 的交集。然而,若 x1 小于 x2,由于 x2 是 set2 中最小的元素,我们可以立即断定 x1 不可能出现在 set2 中,因而不在交集中。故交集等于 set2 与 set1 的 cdr 的交集。类似地,若 x2 小于 x1,则交集由 set1 与 set2 的 cdr 的交集给出。以下是相应过程:
(define (intersection-set set1 set2)
(if (or (null? set1) (null? set2))
'()
(let ((x1 (car set1)) (x2 (car set2)))
(cond ((= x1 x2)
(cons x1 (intersection-set
(cdr set1)
(cdr set2))))
((< x1 x2) (intersection-set
(cdr set1)
set2))
((< x2 x1) (intersection-set
set1
(cdr set2)))))))
估计此计算过程所需的步骤数,注意每一步我们都将交集问题化归为更小集合的交集——从 set1 或 set2 或两者同时移去首元素。因此,所需步骤数至多是 set1 和 set2 大小之和,而非如无序表示那样是大小之积。这是 Θ(n) 的增长,而非 Θ(n²)——即便对于中等规模的集合,这也是相当可观的加速。
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (car set)) true)
(( x (car set)) false)
(else (element-of-set? x (cdr set))))) (define (intersection-set set1 set2)
(if (or (null? set1) (null? set2))
'()
(let ((x1 (car set1)) (x2 (car set2)))
(cond ((= x1 x2)
(cons x1 (intersection-set
(cdr set1)
(cdr set2))))
(( x1 x2) (intersection-set
(cdr set1)
set2))
(( x2 x1) (intersection-set
set1
(cdr set2)))))))