A B tree is a type of self-balancing search tree that generalizes the self-balancing binary search tree to an arbitrary -arity search tree. This way, the B-tree effectively has many potential key-value pairs per node, and reduces the height of the tree. This ensures that all standard operations:

  • search
  • sequential access
  • insertion
  • deletion are in logarithmic time.

Purpose

If a BST is stored on disk, because the underlying data structure of a tree is non-contiguous, at every step of going down a tree, you would need to fetch from disk again. This causes a lot of disk latency, as fetching from a disk can be very slow. A B tree solves this by fetching in 4kb pages, or whatever the page size is in order to reduce loads from disk. To implement this, B trees will have an arbitrary amount of key-value pairs and children, aligning to whatever the page size is.

Definition

Let be the order of the tree, be the height, and the amount of nodes.

A B-Tree of order should satisfy the following properties:

  • Non-root nodes have in between and children (inclusive)
  • The root node has either 0 or at least 2 children
  • All leaves appear at the same level
  • A non-leaf node with children contains keys

A B tree is a special type of tree that defines an m-ary tree where each node has up to m children. The nodes are still fully ordered, and the node itself usually has an additional pointer, to the parent, to a left sibling and to a right sibling. Ideally, this would maximise the size of a B tree node to a fill a disk block. In practice, branching factors over a thousand are commonly used.

Properties

A non-root node has a range of number of keys:

Worst height:

Maximum number of keys in a B-Tree based on order and height:

Operations

Insertion

Like in a binary search tree, we first perform a search and find the point at which the previous value is smaller than the current value and the next is larger. If the current node has space, insert.

If the node doesn’t have space, then split the node and send the median value upwards. This is in order to satisfy the property of B trees that every leaf has to be at the same level.

Removal

Searching a B tree is very similar to searching a BST, except at each node, you perform a binary search within the node’s key array, since it’s a sorted array you can find it that way easily.

The total time complexity for finding a value in a B tree is .