A B tree is a type of self-balancing search tree that generalizes the self-balancing binary search tree to an arbitrary -arity search tree. This way, the B-tree effectively has many potential key-value pairs per node, and reduces the height of the tree. This ensures that all standard operations:
- search
- sequential access
- insertion
- deletion are in logarithmic time.
Fun Fact
The B in B trees doesn’t have an exact meaning. It’s a combination of the the name of the guy who created it (Bayer), the company he and his colleagues worked for (Boeing) and the B of balanced. When asked the, creators don’t specify which one it is.
Purpose
If a BST is stored on disk, because the underlying data structure of a tree is non-contiguous, at every step of going down a tree, you would need to fetch from disk again. This causes a lot of disk latency, as fetching from a disk can be very slow. A B tree solves this by fetching in 4kb pages, or whatever the page size is in order to reduce loads from disk. To implement this, B trees will have an arbitrary amount of key-value pairs and children, aligning to whatever the page size is.
Extra info
B trees (technically B+ trees) are very commonly used in databases as a way to store entries in a table.
Definition
Extra Info
There are two conventions to define B trees. Order (arity) or degree. They’re closely related as degree is just half the order. The order of a B tree defines the upper bound of children, while the degree defines the lower bound. You can just use one convention and stick with it, and in this class its order, which is by convention written .
Let be the order of the tree, be the height, and the amount of nodes.
A B-Tree of order should satisfy the following properties:
- Non-root nodes have in between and children (inclusive)
- The root node has either 0 or at least 2 children
- All leaves appear at the same level
- A non-leaf node with children contains keys
A B tree is a special type of tree that defines an m-ary tree where each node has up to m children. The nodes are still fully ordered, and the node itself usually has an additional pointer, to the parent, to a left sibling and to a right sibling. Ideally, this would maximise the size of a B tree node to a fill a disk block. In practice, branching factors over a thousand are commonly used.
Properties
A non-root node has a range of number of keys:
Worst height:
Maximum number of keys in a B-Tree based on order and height:
Operations
Insertion
Like in a binary search tree, we first perform a search and find the point at which the previous value is smaller than the current value and the next is larger. If the current node has space, insert.
If the node doesn’t have space, then split the node and send the median value upwards. This is in order to satisfy the property of B trees that every leaf has to be at the same level.
Removal
Non-testable
The removal of B-Trees is non-tesable material, at least when I took the class in W2 2025.
Removal is a bit more complex as it contains more edge cases. In the best case, removal doesn’t impact any of the key properties of the b tree.
In the medium case of an underflow, where if you have a node that would be underflowing if you remove it, then you can borrow from a parent or sibling to replace that value instead.
The hard case is if you need to remove a node but there is no node to borrow from children/siblings. In this case, we effectively do the reverse of an overflow split, where we merge and hoist the values upwards.
Search
Searching a B tree is very similar to searching a BST, except at each node, you perform a binary search within the node’s key array, since it’s a sorted array you can find it that way easily.
The total time complexity for finding a value in a B tree is .