A B tree is a type of self-balancing search tree that generalizes the self-balancing binary search tree to an arbitrary amount of nodes. It maintains order no matter the insertion, and ensures operations:

  • search
  • sequential access
  • insertion
  • deletion In logarithmic time.

Purpose

If a BST is stored on disk, because the underlying data structure of a tree is non-contiguous, at every step of going down a tree, you would need to fetch from disk again. This causes a lot of disk latency, as fetching from a disk can be very slow. A B tree solves this by fetching in 4kb pages, or whatever the page size is in order to reduce loads from disk. To implement this, B trees will have an arbitrary amount of key-value pairs and children, aligning to whatever the page size is.

Definition

Where m is the order of the tree, h is the height of the tree, t is the degree of the tree n the amount of nodes

A B-Tree of order m should satisfy the following properties:

  • Every node has at most m children
  • Every node has at least ciel(m/2) children
  • The root node has at least two children unless it is a leaf node
  • All leaves appear at the same level
  • A non-leaf node with k children contains k -1 keys

A B tree is a special type of tree that defines an m-ary tree where each node has up to m children. The nodes are still fully ordered, and the node itself usually has an additional pointer, to the parent, to a left sibling and to a right sibling. Ideally, this would maximise the size of a B tree node to a fill a disk block. In practice, branching factors over a thousand are commonly used.

Properties

  • Internal node has a number of keys = number of children - 1
  • all leaves are at the same depth
  • all nodes hold no more than m - 1 keys
  • all non=root internal nodes have between m/2 and m children
  • root can be a leaf
  • In an m-ary tree the number of minimum keys a non-root node can have is m/2 - 1

Worst height:

Degree/Order relation:

Number of nodes minimum is

Number of keys minimum:

Maximum number of keys in a B-Tree of order m and height h

Operations

Insertion

Like in a binary search tree, we first perform a search and find the point at which the previous value is smaller than the current value and the next is larger. If the current node has space, insert.

If the node doesn’t have space, then split the node and send the median value upwards. This is in order to satisfy the property of B trees that every leaf has to be at the same level.

Removal

Removal is a bit more complex as it contains more edge cases.

In the best case, removal doesn’t impact any of the key properties of the b tree.

In the medium case of an underflow, where if you have a node that would be underflowing if you remove it, then you can borrow from a parent or sibling to replace that value instead.

The hard case is if you need to remove a node but there is no node to borrow from children/siblings. In this case, we effectively do the reverse of an overflow split, where we merge and hoist the values upwards.