Survey of

the Self-Adjusting Data Structures

Daniyal Khowaja

December 21, 2017

Abstract

Over the past

decades, various types of optimality founded based on real-world applications

such as key-independent optimality19. Since dynamic optimality was

introduced, it remains an open area of research. We will be focusing on dynamic

binary search trees BSTs which are said to be dynamically optimal. What is

dynamic optimality? A BST is said to be dynamic optimal if it executes all sequences

in amount of time. Although all types

of balancing binary search trees used optimal cost of where n is the number of elements accessed. We know that online

BST is c-competitive and it usually executes long sequences in at most of time. is the best known ratio for dynamic binary

search trees. Splay trees and Tango trees are well-known binary search trees

and conjectured to be dynamically optimal but despite decades of research, the

conjecture still remains an open problem. We have also discussed this problem

in offline manner. Here we survey these dynamic binary search tree BSTs and

presenting each in detail since the dynamic optimality conjecture was first

introduced.

1 Introduction

Self-adjusting data

structures can rearrange themselves after operations are committed to it. Here

we will talk more about Binary Search Trees BSTs which are more popular and

famous data structure. They are frequently used to maintain an ordered set of

data and perform operations including insertion, deletion and searching.

1.1 BST Model

To understand the

problem in detail, we have defined binary search tree BST as model of

computation. This model has been developed by Wilber 5 in which we required

data to be stored as static set of keys of BST. The input would be a sequence X

of keys .The algorithm serve at a given access and has a single pointer to a node. At the

beginning of an access to a given key this pointer is initialized to

the root and can perform rotations to the left child, right child,

parent or rotation with parent. To execute the searching, it’s mandatory that it

should be touched. Whenever the pointer moves to or is

initialized to a node, we say that the node is touched. The time taken

by a BST to execute a sequence is at least the number of nodes it touches,

which in turn is at least m.

2 Previous Work

According to Knuth

2 in 1971, when he proved a binary search tree is supposed to be run in

optimal amount of time. He described that whenever there is independent access

at random, a BST is assumed to run in optimum amount of time up to a constant

factor with fixed distribution.

In 1985, Sleator and Tarjan 3 introduced new

concepts including amortized complexity, self-adjustment and came with

competiveness. As a result, he invented Splay-trees which are quite popular in

data structures. He explored properties like static optimality or entropy bound

and upper bounds including working set property and static finger bound which

is the most significant in all other properties. Finally, he conjectured that

splay-trees are dynamically optimal or O(1)-competitive and

proved to be competitive compared to other offline searching binary search

trees. Sleator and Tarjan 3 also observed that it performs well for many access patterns 9. In the same year, Tarjan 10 invented

sequential access bound and after that, Cole11,12 introduced dynamic finger

bound. These upper bounds showed that splay trees executes access sequences in time where ?(m)

time taken to execute on splay trees. No BST is supposed to be better than and also its no

better than against the

offline optimal BST data structure.

Demaine et al 14 improved the which was the best known

competitive ratio as far as the balanced trees are concerned. They were the first to develop

Tango trees and achieved non-trivial competitive factor of . This was the major

achievement but worse than Tango-trees works better in searching for

alternative BST algorithms that have small but non-constant competitive

factors.

After the publication of the tango tree paper, two other

O(log log n)- competitive BSTs have been introduced by Derryberry et al. 4,11

and Georgakopoulos 5. The multi-splay trees 4 are based on tango trees, but

instead of using red-black trees as auxiliary trees, they use splay trees 10.

As a consequence, multi-splay trees can be shown 4,11 to satisfy additional

properties, including the scanning and working-set bounds of splay trees, while

maintaining O(log log n)-competitiveness. Georgakopoulos uses the interleave

lower bound of Demaine et al. to develop a variation of splay trees called

chain-splay trees that achieves O(log log n)-competitiveness while not

maintaining any balance condition explicitly. However, neither of these two

structures achieves a worst-case single access time of O(log n). A data

structure achieving the same running time as tango trees alongside O(log n)

worst-case single access time was developed by Kujala and Elomaa 8, but this

data structure does not adhere to the BST model (in which the lower bounds on

OPT(X) are proved).

Being O(log log n)-competitive, tango trees are always at

most a factor O(log log n) worse than OPT(X). On the other hand, they may

actually pay this multiplicative overhead at each access, implying that they

have ?(log log n log n) worst-case access time, and use ?(m log log n log n)

time on some access sequences of length m. In comparison, any balanced BST

(even static) has O(log n) worst-case access time and spends O(m log n) on

every access sequence.

Bose et al15

presented zipper trees,

achieving the optimal worst-case access time while maintaining the O(log log

n)-competitiveness property. This shows that for binary search trees, optimal

worst-case access time and near-optimal amortized access time can be guaranteed

simultaneously.

George16 discussing whether splay trees or splaylike

trees can be as good as tango trees. He showed that splay-like class of trees

(chain-splay trees) are proved to be c

log logN-competitive to any off-line algorithm that maintains a binary search

tree with rotations.

There are three lower bounds that are very important to

mention here including independent rectangles, Wilber 1

and 2 and signed greedy. The independent rectangles have no corner strictly

inside another as in the below figure.

Wilber 1 and 2 are two known lower bounds on OPT(X) in

the BST model, both invented by Wilber5. Neither bound is simply stated; both are complex

functions of X, computable by efficient algorithms. In Wilber 2, assuming we have point

p and points below p as downward visible points. We will consider P as a line

and we need to count number of alterations both left and right of p. At last,

we need summation over p. We conjecture that its proportional to optimal

solution. In Wilber 1, we are fixing lower-bound tree p and for each node y of

tree p, we have left-subtree and right-subtree. So we need to count number of

alterations in x1, x2,…, xn between left and right sub-trees of y.

In Signed greedy, there

are two types of greedy + and -. In this case, we will only consider plus

greedy and need to avoid – rectangles. Plus greedy always gives independent set

of plus rectangles. Its lower bound and not satisfies the point set.

3 Splay Trees

Splay tree is one of

the self-adjusting BST has the capability to move recently requested closer to

the root. Splay trees are quite efficient when it comes non-random operations.

Splaying is the most common operation used so that the element can be placed at

the root or bring at the top. Splay trees has been introduced by Sleator and

Tarjan 3. They used a simple restructuring heuristic, to move the searched

item to the root. Splay trees have been proven to have a number of

amortized bounds on their performance, including such basic facts as amortized runtime

per search. However, it’s not yet proved that the splay trees are dynamically

optimal.

3.1 Splaying Steps

Case 1 (zig). This step will be completed if y is the

root as in the below figure(Section A) by rotating the edge between x and y.

Case 2 (zig-zig). This step will be completed if y is not

at root and x & y are either left or right children by rotating the edge

joining y with its parent z, then rotating edge joining x with y as shown in

below figure(Section B)

Case 3 (zig-zag). This step will be completed if y is not

at root and x is a right child and y is the left child or vice versa by

rotating the edge between y and x and then rotating on the resulting edge

between x and z as shown in below figure(Section C)

3.2 Variants of Splay Trees:

Splaying is the primary operation that can be performed either

during a second or bottom-up pass. This can be done over the access path of a

node. It is possible to record the access path during the first pass for use

during the second, but that requires extra space during the access operation. If

space is expensive, we can obtain the effect of having parent pointers without

using extra space, by storing in each node a pointer to its leftmost child and

a pointer to its right sibling, or to its parent if it has no right sibling as

in below figure. This takes only two pointers per node, the same as the

standard leftChild-rightChild representation, and allows accessing the left

child, right child, or parent of any node in at most two steps. Loss of time

efficiency can be one weakness of this representation.

A bottom-up

splaying method is appropriate if we have direct access to the node at which

the splaying is to occur. In top-down, splaying only occurs

immediately after an access, and it is more efficient to splay on the way down

the tree during the access.

This top-down

splaying routine uses three sets of nodes – left tree, right tree and middle

tree. The first two contain all items of original tree known to be less than or

greater than current item respectively. The middle tree consists of the

sub-tree rooted at the current node. These three sets are updated down the

access path while keeping the splay operations in check.

4. TANGO TREES

Demaine et al 14 proposed tango trees in order to find

dynamic binary search tree which is with the best

offline tree. They were successful to create a BST within a factor. To understand

the idea, there is a need of visualising a reference tree comprises of static

keys. Each non-leaf node has a preferred

child. The preferred child could be its left or right child depending on the

last accessed node, in other words, the most recently searched one. In this

way, we will have a preferred path in shape of chain of preferred childrens shown

in the below figure.

Figure for the Tango

Trees with preferred nodes

The main concept of tango trees has been taken from

auxiliary trees because it works in a way by splitting BST in set of

preferred paths. For each non-leaf node n in a preferred

path P, it has a non-preferred child c, which is the

root of a new auxiliary tree

in terms of preferred paths characterizing nodes of

balanced tree height . The leaves are directly connected with roots

The ingenious idea of tango trees is to represent the

nodes on a preferred path as a balanced tree of called an auxiliary tree.

The tango tree can be seen as a collection of auxiliary

trees linked together. The leaves of an auxiliary tree representing a preferred

path p link to the roots of auxiliary trees representing the paths immediately

below p in P, with the links uniquely determined by the inorder ordering. The

auxiliary tree containing the root of P constitutes the top-part of the tango

tree.

Whenever an access is performed, the preferred paths of P

may change. This change is actually a combination of several cut and

concatenation operations involving subpaths.

As the matter of fact, the auxiliary trees are applied

same as red-black trees17 and demaine

et al 14 presented these two important operations to maintain tango trees.

These cut & concatenation operations are mainly used based on split and

join procedures in red-black tree:

CUT-TANGO – This operation used to divide red-black tree

in to two, where one red-black tree store nodes in preferred paths having d where d is depth.

CONCATENATE-TANGO – This operation used to merge two

red-black trees that store two disjoint paths.

These operations take O(log k) amount of time for trees,

if adding extra information in nodes: Each node also stores the minimum and

maximum depth over the nodes in its subtree within its auxiliary tree. This

additional data is easily maintained in red-black trees. As the trees store

paths, we have . Hence, if an access passes i different preferred paths

in P, the necessary change in the tango tree will be cut and

concatenation operations, which is performed in time. Over any

access sequence, the total number of cut and concatenation operations performed

corresponds to the interleave bound , thus tango trees serve X in time, makes them as in theorem

where “For any access sequence , is a lower bound

on “.

Derryberry and Sleator4 suggested some improvements

that tango trees to have some additional desirable properties such as worst-case time per search, as they improved

to .

5. Offline Optimality

Here the question arises: Is there an offline

algorithm? If yes, Is it possible to compute?

It is possible to compute offline BST algorithm but

the only condition is the sufficient amount of time. The concern is regarding

running time that either its has the potential to figure it out in polynomial

time. To compute is same as

NP-complete. This problem has been addressed by demaine et al 18. He presented this problem by saying that a set

of items must be provided to be searched. The results were shown to be

NP-complete.

5.1 Greedy

If we are talking

about offline solution then we can say that the greedy algorithm or method is

the best possible offline method. The algorithm works in a way that it search

for the current item, it can rearrange the nodes in search tree. The heuristic

place node to be next searched one at the root (if not) then it places to

subtree that contains it or close to root. And the process continues. This has

been introduced by Lucas et al8 in 1988. The points are added one at a time

from bottom to top on every row so that the point set is satisfied from that

row down as in the below figure. The idea is pretty much same as in the

independent rectangle lower bound were it can be computed by sweeping twice and

satisfying the + and – unsatisfied rectangles separately but here greedy method

satisfying both + and – rectangle at the same time through single sweep. In

previous research studies, experts tried to prove that the point sets obtained

through these two methods are within a constant factor of each other would be

enough to show the greedy method is dynamically optimal binary search tree but

still no success yet. The greedy is the offline solution but it looks like

online if we can see in the geometric view. So, there is potential to make it

online if we have online satisfying point set then we can get actual online

binary search tree as Fox et al7 suggested in 2011 where greedy showed

equivalent and said to be the best online has same runtime as in online

algorithm.

6. Conclusion

The main goal in this

survey is to get in-depth knowledge and get existing results of dynamic BSTs. Moreover

to find best known competitive ratio of binary search tree or known to be dynamically optimal. We have

presented splay trees and variants followed by tango trees. We have also discussed

offline solution with respect to dynamic optimality.