Survey pointer moves to or is initialized

Survey of
the Self-Adjusting Data Structures

Daniyal Khowaja

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

December 21, 2017

 

 

Abstract

Over the past
decades, various types of optimality founded based on real-world applications
such as key-independent optimality19. Since dynamic optimality was
introduced, it remains an open area of research. We will be focusing on dynamic
binary search trees BSTs which are said to be dynamically optimal. What is
dynamic optimality? A BST is said to be dynamic optimal if it executes all sequences
 in  amount of time. Although all types
of balancing binary search trees used optimal cost of  where n is the number of elements accessed. We know that online
BST is c-competitive and it usually executes long sequences in at most  of time.  is the best known ratio for dynamic binary
search trees. Splay trees and Tango trees are well-known binary search trees
and conjectured to be dynamically optimal but despite decades of research, the
conjecture still remains an open problem. We have also discussed this problem
in offline manner. Here we survey these dynamic binary search tree BSTs and
presenting each in detail since the dynamic optimality conjecture was first
introduced.

 

1      Introduction

 

Self-adjusting data
structures can rearrange themselves after operations are committed to it. Here
we will talk more about Binary Search Trees BSTs which are more popular and
famous data structure. They are frequently used to maintain an ordered set of
data and perform operations including insertion, deletion and searching.

 

1.1     BST Model

 

To understand the
problem in detail, we have defined binary search tree BST as model of
computation. This model has been developed by Wilber 5 in which we required
data to be stored as static set of keys of BST. The input would be a sequence X
of keys .The algorithm serve at a given access  and has a single pointer to a node. At the
beginning of an access to a given key  this pointer is initialized to
the root and can perform rotations to the left child, right child,
parent or rotation with parent. To execute the searching, it’s mandatory that it
should be touched. Whenever the pointer moves to or is
initialized to a node, we say that the node is touched. The time taken
by a BST to execute a sequence is at least the number of nodes it touches,
which in turn is at least m.

 

2      Previous Work

 

According to Knuth
2 in 1971, when he proved a binary search tree is supposed to be run in
optimal amount of time. He described that whenever there is independent access
at random, a BST is assumed to run in optimum amount of time up to a constant
factor with fixed distribution.  

 

In 1985, Sleator and Tarjan 3 introduced new
concepts including amortized complexity, self-adjustment and came with
competiveness. As a result, he invented Splay-trees which are quite popular in
data structures. He explored properties like static optimality or entropy bound
and upper bounds including working set property and static finger bound which
is the most significant in all other properties. Finally, he conjectured that
splay-trees are dynamically optimal or O(1)-competitive and
proved to be competitive compared to other offline searching binary search
trees. Sleator and Tarjan 3  also observed that it performs well for many access patterns 9. In the same year, Tarjan 10 invented
sequential access bound and after that, Cole11,12 introduced dynamic finger
bound. These upper bounds showed that splay trees executes access sequences in  time where ?(m)
time taken to execute on splay trees. No BST is supposed to be better than  and also its no
better than  against the
offline optimal BST data structure.

 

Demaine et al 14 improved the  which was the best known
competitive ratio as far as the balanced trees are concerned. They were the first to develop
Tango trees and achieved non-trivial competitive factor of . This was the major
achievement but worse than  Tango-trees works better in searching for
alternative BST algorithms that have small but non-constant competitive
factors.

 

After the publication of the tango tree paper, two other
O(log log n)- competitive BSTs have been introduced by Derryberry et al. 4,11
and Georgakopoulos 5. The multi-splay trees 4 are based on tango trees, but
instead of using red-black trees as auxiliary trees, they use splay trees 10.
As a consequence, multi-splay trees can be shown 4,11 to satisfy additional
properties, including the scanning and working-set bounds of splay trees, while
maintaining O(log log n)-competitiveness. Georgakopoulos uses the interleave
lower bound of Demaine et al. to develop a variation of splay trees called
chain-splay trees that achieves O(log log n)-competitiveness while not
maintaining any balance condition explicitly. However, neither of these two
structures achieves a worst-case single access time of O(log n). A data
structure achieving the same running time as tango trees alongside O(log n)
worst-case single access time was developed by Kujala and Elomaa 8, but this
data structure does not adhere to the BST model (in which the lower bounds on
OPT(X) are proved).

 

Being O(log log n)-competitive, tango trees are always at
most a factor O(log log n) worse than OPT(X). On the other hand, they may
actually pay this multiplicative overhead at each access, implying that they
have ?(log log n log n) worst-case access time, and use ?(m log log n log n)
time on some access sequences of length m. In comparison, any balanced BST
(even static) has O(log n) worst-case access time and spends O(m log n) on
every access sequence.

 

Bose et al15
presented zipper trees,
achieving the optimal worst-case access time while maintaining the O(log log
n)-competitiveness property. This shows that for binary search trees, optimal
worst-case access time and near-optimal amortized access time can be guaranteed
simultaneously.

 

George16 discussing whether splay trees or splaylike
trees can be as good as tango trees. He showed that splay-like class of trees
(chain-splay trees) are proved to be  c
log logN-competitive to any off-line algorithm that maintains a binary search
tree with rotations.

 

There are three lower bounds that are very important to
mention here including independent rectangles, Wilber 1
and 2 and signed greedy. The independent rectangles have no corner strictly
inside another as in the below figure.

 

 

Wilber 1 and 2 are two known lower bounds on OPT(X) in
the BST model, both invented by Wilber5. Neither bound is simply stated; both are complex
functions of X, computable by efficient algorithms. In Wilber 2, assuming we have point
p and points below p as downward visible points. We will consider P as a line
and we need to count number of alterations both left and right of p. At last,
we need summation over p. We conjecture that its proportional to optimal
solution. In Wilber 1, we are fixing lower-bound tree p and for each node y of
tree p, we have left-subtree and right-subtree. So we need to count number of
alterations in x1, x2,…, xn between left and right sub-trees of y.

 

In Signed greedy, there
are two types of greedy + and -. In this case, we will only consider plus
greedy and need to avoid – rectangles. Plus greedy always gives independent set
of plus rectangles. Its lower bound and not satisfies the point set.

 

 

3      Splay Trees

 

Splay tree is one of
the self-adjusting BST has the capability to move recently requested closer to
the root. Splay trees are quite efficient when it comes non-random operations.
Splaying is the most common operation used so that the element can be placed at
the root or bring at the top. Splay trees has been introduced by Sleator and
Tarjan 3. They used a simple restructuring heuristic, to move the searched
item to the root. Splay trees have been proven to have a number of
amortized bounds on their performance, including such basic facts as  amortized runtime
per search. However, it’s not yet proved that the splay trees are dynamically
optimal.

 

3.1    Splaying Steps

 

Case 1 (zig). This step will be completed if y is the
root as in the below figure(Section A) by rotating the edge between x and y.

 

Case 2 (zig-zig). This step will be completed if y is not
at root and x & y are either left or right children by rotating the edge
joining y with its parent z, then rotating edge joining x with y as shown in
below figure(Section B)

 

Case 3 (zig-zag). This step will be completed if y is not
at root and x is a right child and y is the left child or vice versa by
rotating the edge between y and x and then rotating on the resulting edge
between x and z as shown in below figure(Section C)

 

 

 

 

3.2   Variants of Splay Trees:

 

Splaying is the primary operation that can be performed either
during a second or bottom-up pass. This can be done over the access path of a
node. It is possible to record the access path during the first pass for use
during the second, but that requires extra space during the access operation. If
space is expensive, we can obtain the effect of having parent pointers without
using extra space, by storing in each node a pointer to its leftmost child and
a pointer to its right sibling, or to its parent if it has no right sibling as
in below figure. This takes only two pointers per node, the same as the
standard leftChild-rightChild representation, and allows accessing the left
child, right child, or parent of any node in at most two steps. Loss of time
efficiency can be one weakness of this representation.

A bottom-up
splaying method is appropriate if we have direct access to the node at which
the splaying is to occur. In top-down, splaying only occurs
immediately after an access, and it is more efficient to splay on the way down
the tree during the access.

 

 

This top-down
splaying routine uses three sets of nodes – left tree, right tree and middle
tree. The first two contain all items of original tree known to be less than or
greater than current item respectively. The middle tree consists of the
sub-tree rooted at the current node. These three sets are updated down the
access path while keeping the splay operations in check.

 

 

4. TANGO TREES

 

Demaine et al 14 proposed tango trees in order to find
dynamic binary search tree which is  with the best
offline tree. They were successful to create a BST within a  factor. To understand
the idea, there is a need of visualising a reference tree comprises of static
keys.  Each non-leaf node has a preferred
child. The preferred child could be its left or right child depending on the
last accessed node, in other words, the most recently searched one. In this
way, we will have a preferred path in shape of chain of preferred childrens shown
in the below figure.

 

Figure for the Tango
Trees with preferred nodes

 

 

The main concept of tango trees has been taken from
auxiliary trees because it works in a way by splitting BST in set of
preferred paths. For each non-leaf node n in a preferred
path P, it has a non-preferred child c, which is the
root of a new auxiliary tree

 

in terms of preferred paths characterizing nodes of
balanced tree height . The leaves are directly connected with roots

 

The ingenious idea of tango trees is to represent the
nodes on a preferred path as a balanced tree of called an auxiliary tree.

 

The tango tree can be seen as a collection of auxiliary
trees linked together. The leaves of an auxiliary tree representing a preferred
path p link to the roots of auxiliary trees representing the paths immediately
below p in P, with the links uniquely determined by the inorder ordering. The
auxiliary tree containing the root of P constitutes the top-part of the tango
tree.

 

Whenever an access is performed, the preferred paths of P
may change. This change is actually a combination of several cut and
concatenation operations involving subpaths.

 

As the matter of fact, the auxiliary trees are applied
same as red-black trees17 and demaine
et al 14 presented these two important operations to maintain tango trees.
These cut & concatenation operations are mainly used based on split and
join procedures in red-black tree:

 

CUT-TANGO – This operation used to divide red-black tree
in to two, where one red-black tree store nodes in preferred paths having d where d is depth.

 

CONCATENATE-TANGO – This operation used to merge two
red-black trees that store two disjoint paths.

 

These operations take O(log k) amount of time for trees,
if adding extra information in nodes: Each node also stores the minimum and
maximum depth over the nodes in its subtree within its auxiliary tree. This
additional data is easily maintained in red-black trees. As the trees store
paths, we have . Hence, if an access passes i different preferred paths
in P, the necessary change in the tango tree will be  cut and
concatenation operations, which is performed in  time. Over any
access sequence, the total number of cut and concatenation operations performed
corresponds to the interleave bound , thus tango trees serve X in  time, makes them  as in theorem
where “For any access sequence ,  is a lower bound
on “.

 

Derryberry and Sleator4 suggested some improvements
that tango trees to have some additional desirable properties such as  worst-case time per search, as they improved
to .

 

5.   Offline Optimality

 

Here the question arises: Is there an offline
algorithm? If yes, Is it possible to compute?

It is possible to compute offline BST algorithm but
the only condition is the sufficient amount of time. The concern is regarding
running time that either its has the potential to figure it out in polynomial
time. To compute  is same as
NP-complete. This problem has been addressed by demaine et al 18. He presented this problem by saying that a set
of items must be provided to be searched. The results were shown to be
NP-complete.

 

 

5.1   Greedy

 

 

If we are talking
about offline solution then we can say that the greedy algorithm or method is
the best possible offline method. The algorithm works in a way that it search
for the current item, it can rearrange the nodes in search tree. The heuristic
place node to be next searched one at the root (if not) then it places to
subtree that contains it or close to root. And the process continues. This has
been introduced by Lucas et al8 in 1988. The points are added one at a time
from bottom to top on every row so that the point set is satisfied from that
row down as in the below figure. The idea is pretty much same as in the
independent rectangle lower bound were it can be computed by sweeping twice and
satisfying the + and – unsatisfied rectangles separately but here greedy method
satisfying both + and – rectangle at the same time through single sweep. In
previous research studies, experts tried to prove that the point sets obtained
through these two methods are within a constant factor of each other would be
enough to show the greedy method is dynamically optimal binary search tree but
still no success yet. The greedy is the offline solution but it looks like
online if we can see in the geometric view. So, there is potential to make it
online if we have online satisfying point set then we can get actual online
binary search tree as Fox et al7 suggested in 2011 where greedy showed
equivalent and said to be the best online has same runtime as in online
algorithm.

 

 

6.     Conclusion

 

The main goal in this
survey is to get in-depth knowledge and get existing results of dynamic BSTs. Moreover
to find best known competitive ratio of binary search tree or  known to be dynamically optimal. We have
presented splay trees and variants followed by tango trees. We have also discussed
offline solution with respect to dynamic optimality.