Ropes and AVL trees

Last week I've been working on a new UPDATE command implementation in Tarantool.

Tarantool UPDATE can modify, add and remove individual fields in a tuple.

A typical UPDATE looks like a sequence of operations, e.g. 'set field 1 to "abc", insert value "foo" after field 5, delete field 10, push value "tail" at the end of the tuple'.

The difficulty of an efficient implementation is caused by the tuple format: essentially, it's a blob storing a sequence of length-prefixed strings.

Due to this format, accessing an invividual field in a tuple has linear cost (O(tuple size)). In consequence of this, tuples are never modified in place.

When UPDATE contains many operations, they are all first "compiled", and then a new tuple is created based on the old tuple data and "modification instructions", produced from the operation list.

This all works fine, with one exception: the algorithm is written in such way that all indexes of fields in a tuple are "frozen" at start of UPDATE. E.g. if an UPDATE deletes field 5, and then inserts a new field after field 5, the new field is inserted in place of the deleted one, not at the relative position in the "intermediate" tuple.

At first I personally didn't consider this a serious defect, but then pointed out that this semantics makes it impossible to batch multiple UPDATEs into a single one.

The other flaw of the existing implementation was that, even though it was efficient enough (it took O(K * log(K) ) + O(N) time to do an update, where K is the number of operations, and N is the size of the tuple), it was rather complicated - 300 lines of difficult to read C code.

UPDATE operations were ordered by field number, then put into an intrusive linked list, etc. We needed an axuiliary container to store UPDATE ops, to delegate the semantics of operation ordering.

A rope looked like a viable solution.

Unfortunately, after quite a bit of digging, it turned out that there is no good standalone implementation of ropes. It seems ropes did not go mainstream, are still only employed in a few specific cases or as an advanced topic in a technical interview.

We found that Hans Boehm's cords are tightly coupled with his garbage collector, STL ropes, again, are quite bloated and integrated with STL allocators too tightly. Besides, both implementations use a very simple tree balancing procedure, which doesn't produce well balanced trees and doesn't care about memory consumption very much.

Popular edge cases are not optimized either: for example, deletion of a substring from a rope is typically done by means of two tree splits and one merge (cut the tail, cut the tail of the tail, throw away the middle, merge the beginning and the end), whereas in Tarantool one doesn't delete ranges of fields, and deletion of an individual field can easily be done by a usual binary tree deletion.

Tree balancing was not perfect either: implemented the same way as in Fibonacci heap, which doesn't produce a very balanced tree.

Tarantool needed something with low enough overhead to justify use of a complex data structure even when the number of operations in an UPDATE is small (typically an UPDATE in Tarantool contains less than half a dozen of ops), but scalable to large commands.

Thus, an idea to use AVL tree balancing algorithm emerged, and all the time I had for coding last week I devoted to writing a rope with AVL balancing procedures. I chose AVL, as opposed to, say, RB or Andersson trees since I knew it from the University.

An intermediate result you can see here. I borrowed a lot of ideas from an excellent AVL tree tutorial at http://eternallyconfuzzled.com.

The main idea is that all nodes are used to store substrings, not just leaf nodes, and typical operations, such as tree insert or delete, are done not by means of tree split and merge, typical for ropes, but by means of insertion and deletion from AVL tree.

AVL balance algorithm had to be modified, since insertion, when dealing with a rope, can insert not one node but two.

I'm still not sure if writing such a basic data structure almost from scratch was a good idea, or if it was worth the time spent. The good news is that the random tree generator I wrote at the dawn of Friday found no tree consistency bugs yet.

If someone, who knows binary trees and computer science better than me (I'm a nub, being in database technology means being a jack of all trades), is reading this blog post, I'd be grateful to all comments pointing out anything I might have missed.