Exercise 2.62: Give a Θ
(
n
)
implementation of union-set for sets represented as ordered lists.
Sets as binary trees
We can do better than the ordered-list representation by arranging the set
elements in the form of a tree. Each node of the tree holds one element of the
set, called the “entry” at that node, and a link to each of two other
(possibly empty) nodes. The “left” link points to elements smaller than the
one at the node, and the “right” link to elements greater than the one at the
node. Figure 2.16 shows some trees that represent the set
{
1
,
3
,
5
,
7
,
9
,
11
}. The same set may be represented by a tree in a number of
different ways. The only thing we require for a valid representation is that
all elements in the left subtree be smaller than the node entry and that all
elements in the right subtree be larger.
SVG
Figure 2.16: Various binary trees that represent the set {
1
,
3
,
5
,
7
,
9
,
11
}.
The advantage of the tree representation is this: Suppose we want to check
whether a number x is contained in a set. We begin by comparing x with
the entry in the top node. If x is less than this, we know that we need
only search the left subtree; if x is greater, we need only search the
right subtree. Now, if the tree is “balanced,” each of these subtrees will
be about half the size of the original. Thus, in one step we have reduced the
problem of searching a tree of size n to searching a tree of size n
/
2.
Since the size of the tree is halved at each step, we should expect that the
number of steps needed to search a tree of size n grows as
Θ
(
log
n
). For large sets, this will be a
significant speedup over the previous representations.
We can represent trees by using lists. Each node will be a list of three
items: the entry at the node, the left subtree, and the right subtree. A left
or a right subtree of the empty list will indicate that there is no subtree
connected there. We can describe this representation by the following
procedures:
(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
(list entry left right))
Now we can write the element-of-set? procedure using the strategy
described above:
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (entry set)) true)
(( x (entry set))
(element-of-set?
x
(left-branch set)))
((> x (entry set))
(element-of-set?
x
(right-branch set)))))
Adjoining an item to a set is implemented similarly and also requires
Θ
(
log
n
) steps. To adjoin an item x, we compare
x with the node entry to determine whether x should be added to
the right or to the left branch, and having adjoined x to the
appropriate branch we piece this newly constructed branch together with the
original entry and the other branch. If x is equal to the entry, we
just return the node. If we are asked to adjoin x to an empty tree, we
generate a tree that has x as the entry and empty right and left
branches. Here is the procedure:
(define (adjoin-set x set)
(cond ((null? set) (make-tree x '() '()))
((= x (entry set)) set)
(( x (entry set))
(make-tree
(entry set)
(adjoin-set x (left-branch set))
(right-branch set)))
((> x (entry set))
(make-tree
(entry set)
(left-branch set)
(adjoin-set x (right-branch set))))))
The above claim that searching the tree can be performed in a logarithmic
number of steps rests on the assumption that the tree is “balanced,” i.e.,
that the left and the right subtree of every tree have approximately the same
number of elements, so that each subtree contains about half the elements of
its parent. But how can we be certain that the trees we construct will be
balanced? Even if we start with a balanced tree, adding elements with
adjoin-set may produce an unbalanced result. Since the position of a
newly adjoined element depends on how the element compares with the items
already in the set, we can expect that if we add elements “randomly” the tree
will tend to be balanced on the average. But this is not a guarantee. For
example, if we start with an empty set and adjoin the numbers 1 through 7 in
sequence we end up with the highly unbalanced tree shown in Figure 2.17.
In this tree all the left subtrees are empty, so it has no advantage over a
simple ordered list. One way to solve this problem is to define an operation
that transforms an arbitrary tree into a balanced tree with the same elements.
Then we can perform this transformation after every few adjoin-set
operations to keep our set in balance. There are also other ways to solve this
problem, most of which involve designing new data structures for which
searching and insertion both can be done in Θ
(
log
n
)
steps.
SVG
Figure 2.17: Unbalanced tree produced by adjoining 1 through 7 in sequence.
练习 2.62:给出以有序表表示的集合的 union-set 的 Θ(n) 实现。
集合作为二叉树
通过将集合元素组织成树的形式,我们可以比有序表表示做得更好。树的每个节点存放集合的一个元素,称为该节点的“条目”(entry),以及分别指向另外两个(可能为空的)节点的链接。“左”链接指向比该节点元素小的元素,“右”链接指向比该节点元素大的元素。图 2.16 展示了表示集合 {1, 3, 5, 7, 9, 11} 的一些树。同一集合可以用多种不同的树来表示。有效表示的唯一要求是:左子树中的所有元素均小于节点条目,右子树中的所有元素均大于节点条目。
SVG
图 2.16:表示集合 {1, 3, 5, 7, 9, 11} 的各种二叉树。
树表示的优势在于:假设我们要检查数 x 是否在集合中。我们先将 x 与顶层节点的条目比较。若 x 较小,只需搜索左子树;若 x 较大,只需搜索右子树。现在,若树是“平衡”的,这两棵子树各约为原树大小的一半。因此,每一步都将搜索大小为 n 的树的问题化归为搜索大小为 n/2 的树。由于每步树的大小减半,我们可以预期搜索大小为 n 的树所需的步骤数以 Θ(log n) 增长。对于大型集合,这将比前面的表示有显著的加速。
我们可以用表来表示树。每个节点是一个包含三个元素的表:节点的条目、左子树和右子树。空表的左或右子树表示该处没有连接的子树。我们可以用以下过程描述这种表示:
(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
(list entry left right))
现在我们可以按照上述策略写出 element-of-set? 过程:
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (entry set)) true)
((< x (entry set))
(element-of-set?
x
(left-branch set)))
((> x (entry set))
(element-of-set?
x
(right-branch set)))))
将元素并入集合的实现与此类似,同样需要 Θ(log n) 步。要并入元素 x,将 x 与节点条目比较,以确定 x 应加入右子树还是左子树,将 x 并入相应子树后,再将新构造的子树与原来的条目和另一子树拼合。若 x 等于条目,则直接返回该节点。若要求将 x 并入空树,则生成一棵以 x 为条目、左右子树均为空的树。以下是相应过程:
(define (adjoin-set x set)
(cond ((null? set) (make-tree x '() '()))
((= x (entry set)) set)
((< x (entry set))
(make-tree
(entry set)
(adjoin-set x (left-branch set))
(right-branch set)))
((> x (entry set))
(make-tree
(entry set)
(left-branch set)
(adjoin-set x (right-branch set))))))
上述关于树搜索可在对数步骤数内完成的论断,依赖于树是“平衡”的假设——即每棵树的左右子树所含元素数大致相同,使每棵子树约包含其父树元素的一半。但我们如何确保构造出的树是平衡的?即便从平衡树出发,用 adjoin-set 添加元素也可能产生不平衡的结果。由于新并入元素的位置取决于该元素与集合中已有元素的比较结果,我们可以预期,若“随机”添加元素,树平均而言会趋于平衡。但这并不是保证。例如,若从空集出发,依次并入 1 到 7,最终得到图 2.17 所示的极度不平衡的树。该树的所有左子树均为空,因而相比简单的有序表毫无优势。解决这一问题的一种方法是定义一个操作,将任意树变换为含相同元素的平衡树。然后可以在每隔数次 adjoin-set 操作后执行此变换,以保持集合的平衡。此外还有其他解决方案,其中大多数涉及设计新的数据结构,使搜索和插入均可在 Θ(log n) 步内完成。
SVG
图 2.17:依次并入 1 到 7 所产生的不平衡树。
(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
(list entry left right)) (define (element-of-set? x set)
(cond ((null? set) false)
((= x (entry set)) true)
(( x (entry set))
(element-of-set?
x
(left-branch set)))
((> x (entry set))
(element-of-set?
x
(right-branch set))))) (define (adjoin-set x set)
(cond ((null? set) (make-tree x '() '()))
((= x (entry set)) set)
(( x (entry set))
(make-tree
(entry set)
(adjoin-set x (left-branch set))
(right-branch set)))
((> x (entry set))
(make-tree
(entry set)
(left-branch set)
(adjoin-set x (right-branch set))))))