Learning Binary Search Tree

A Binary Search Tree (BST) stores keys in sorted order so you can search, insert, and delete efficiently—typically in O(log n) time—if the tree stays balanced. It’s great when you need ordered data + fast lookups, but you must guard against becoming skewed (which degrades to O(n)).

What is a Binary Search Tree?

What is a Binary Search Tree?

A Binary Search Tree (BST) is a binary tree where each node holds a key (and usually a value) and satisfies the BST property:

  • All keys in the left subtree are less than the node’s key.
  • All keys in the right subtree are greater than the node’s key.
  • This property holds recursively for every node.

Because the keys are kept in sorted order, BSTs support ordered operations such as range queries and in-order traversal (which yields sorted output).

When Do We Need a BST?

Use a BST when you need both:

  1. Fast lookups/updates (ideally ~O(log n)), and
  2. Ordering-aware queries, like:
    • “Give me all keys between A and M.”
    • “What’s the next larger (successor) or next smaller (predecessor) key?”
    • “Iterate results in sorted order.”

Common cases:

  • In-memory indexes (e.g., sorted maps/dictionaries).
  • Implementing priority features with order-aware operations (though heaps are better for pure min/max).
  • Autocomplete / prefix features (often with tries for strings, but BSTs work when comparing whole keys).
  • Scheduling or event timelines where you frequently need “next event after time T.”

If you only need existence lookups without ordering, a hash table might be simpler and faster on average.

Real-World Example

E-commerce product catalog: Keep products keyed by productId or price.

  • Search a product quickly by ID.
  • List products in a price range (e.g., $25–$50) with an in-order traversal constrained to that range.
  • Find the next higher-priced product (successor) for upsell suggestions.

Balanced BSTs (like AVL or Red-Black Trees) power the ordered collections in many standard libraries (e.g., TreeMap in Java, std::map in C++).

Main Operations

  • Search(key): Compare key at node, go left if smaller, right if larger, stop when found or null.
  • Insert(key, value): Standard BST insert followed by optional rebalancing (in self-balancing variants).
  • Delete(key): Three cases
    1. Node has no child → remove it.
    2. Node has one child → replace node with its child.
    3. Node has two children → replace with in-order successor (smallest in right subtree) or predecessor, then delete that successor/predecessor node.
  • Traversal:
    • In-order → sorted order.
    • Pre-order/Post-order → useful for copying/deletion tasks.
  • Min/Max: Follow left/right pointers to extremes.
  • Successor/Predecessor: Use right/left subtree, or walk up via parent pointers if available.

Time & Space Complexity

OperationAverage (Balanced)Worst Case (Skewed)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
Find min/max, successor/predecessorO(log n)O(n)
Space (n nodes)O(n)O(n)

In practice, self-balancing BSTs (AVL, Red-Black) keep height ≈ O(log n), ensuring predictable performance.

Advantages

  • Maintains order: Easy to output or traverse in sorted order.
  • Efficient range queries: Retrieve keys within [L, R] without scanning everything.
  • Deterministic memory: Pointer-based structure with O(n) space.
  • Extensible: Augment nodes with extra data (e.g., subtree size for order statistics, sums for range aggregates).

Disadvantages

  • Can degrade to O(n) if unbalanced (e.g., inserting already-sorted keys into a naive BST).
  • More complex deletions compared to hash tables or arrays.
  • Higher constant factors than hash tables for pure key→value lookups (when ordering isn’t needed).
  • Implementation complexity increases with self-balancing (AVL/Red-Black rotations, color/height tracking).

BST vs. Alternatives (Quick Compare)

  • BST vs. Hash Table
    • BST: Ordered, range queries, successor/predecessor → Yes.
    • Hash: Average O(1) lookups, no order → great for pure key lookups.
  • BST vs. Array
    • BST: Fast inserts/deletes (O(log n) balanced) and maintains order.
    • Sorted array: Fast binary search (O(log n)), but inserts/deletes O(n).
  • BST vs. Heap
    • BST: Get any key, do range queries, get successor/predecessor.
    • Heap: Fastest min/max access, but no ordered iteration.

Practical Tips & Pitfalls

  • Prefer self-balancing variants (e.g., AVL for tighter balance, Red-Black for simpler updates) in production.
  • To avoid skew, shuffle inputs or insert in mixed order if you must use a basic BST.
  • For heavy range queries, consider augmented BSTs (store subtree counts/sums) or B-trees for disk-based indices.
  • Use in-order traversal to stream results in sorted order without extra sorting cost.

Summary

A Binary Search Tree is a powerful, order-preserving data structure. Choose it when you need fast queries and ordering semantics—especially with self-balancing variants to maintain consistent O(log n) operations. If you don’t need order, a hash table is often simpler and faster; if you need just min/max, a heap may suffice.