Comments | nicity: B+HTree, now powering IntelliJ IDEA 11 indices

nicity

B+HTree, now powering IntelliJ IDEA 11 indices

Dec 23, 2011 17:30

Last summer I implemented mine own flavor of B+Tree aka B+HTree, H for Hash. Usually O characteristics for B trees are calculated on height of the tree, however insertion of the data to leaf block can be costly. E.g. if key / value pairs are stored as array, one needs to shift at least half of the pairs at insertion. To avoid the problem, the data ( Read more... )

b+htree

Comments 6

elizarov December 24 2011, 09:30:15 UTC

Can you, please, elaborate what exactly makes it more compact vs hash only. Naively thinking, hash-only solution should occupy as much space as all your leaf nodes.

nicity December 25 2011, 20:01:41 UTC

In short:B+HTree is more compact comparing with our previous hash based solution ( ... )

elizarov December 26 2011, 20:28:59 UTC

Thank you for description of your original hash based solution. It explains a lot.

Btw, how do you avoid costly mapping/unmapping with B+HTree?

nicity December 27 2011, 00:02:42 UTC

Due to index compactness one needs 2 times less byte buffer segments and this means one can open e.g. IntelliJ Idea code base (20000 files) without remapping byte buffers at all. Once number of keys in index becomes very large (e.g. when encountering 5M of unique source code names, this means 5M keys for source code index) we will still start to get indexing performance deterioration due to limited set of byte buffers available (200M on 32bit jdk).

xeroxman23 July 25 2012, 20:41:27 UTC

Hi, is there any chance to see C/C++ plugin open-sourced? So people like me could hack on it and fix annoying issues the plugin has.

nicity July 29 2012, 21:20:04 UTC

Plugin is facade over external C++ executable, so plugin source code is not of very much use.