B+HTree, now powering IntelliJ IDEA 11 indices

Dec 23, 2011 17:30


Last summer I implemented mine own flavor of B+Tree aka B+HTree, H for Hash. Usually O characteristics for B trees are calculated on height of the tree, however insertion of the data to leaf block can be costly. E.g. if key / value pairs are stored as array, one needs to shift at least half of the pairs at insertion. To avoid the problem, the data ( Read more... )

b+htree

Leave a comment

Comments 6

elizarov December 24 2011, 09:30:15 UTC
Can you, please, elaborate what exactly makes it more compact vs hash only. Naively thinking, hash-only solution should occupy as much space as all your leaf nodes.

Reply

nicity December 25 2011, 20:01:41 UTC
In short:B+HTree is more compact comparing with our previous hash based solution ( ... )

Reply

elizarov December 26 2011, 20:28:59 UTC
Thank you for description of your original hash based solution. It explains a lot.

Btw, how do you avoid costly mapping/unmapping with B+HTree?

Reply

nicity December 27 2011, 00:02:42 UTC
Due to index compactness one needs 2 times less byte buffer segments and this means one can open e.g. IntelliJ Idea code base (20000 files) without remapping byte buffers at all. Once number of keys in index becomes very large (e.g. when encountering 5M of unique source code names, this means 5M keys for source code index) we will still start to get indexing performance deterioration due to limited set of byte buffers available (200M on 32bit jdk).

Reply


xeroxman23 July 25 2012, 20:41:27 UTC
Hi, is there any chance to see C/C++ plugin open-sourced? So people like me could hack on it and fix annoying issues the plugin has.

Reply

nicity July 29 2012, 21:20:04 UTC
Plugin is facade over external C++ executable, so plugin source code is not of very much use.

Reply


Leave a comment

Up