Amol Budhiraja

SQL Query

Query Processing

Parser

→

Logical Plan

→

Cost-Based Optimizer

DP over join orderings

Execution Engine

Nested Loop Join

→

Sort-Merge Join

→

Hash Join

→

B+ Tree Scan

Concurrency + Storage

Lock Manager

IS·IX·S·SIX·X

→

Buffer Pool

LRU eviction

→

Deadlock Detection

waits-for graph

↓

WAL Logging

→

ARIES Recovery

analysis · redo · undo

Full relational database — from SQL to disk

RookieDB is the database project from Berkeley's CS 186. You start with a basic query execution engine and progressively add the features that make a real database system: indexes, joins, transactions, locking, and query optimization.

What I implemented

B+ Tree indexes: Full B+ tree implementation with support for bulk loading, range scans, and iterator-based traversal. Getting the split/merge logic right with proper key redistribution was the first major debugging marathon of the semester.

Join algorithms: Implemented multiple join strategies — nested loop join, sort-merge join, and hash join — along with the query planner that selects between them based on estimated I/O cost.

Query optimization: A cost-based optimizer using dynamic programming over join orderings. The optimizer estimates selectivity for predicates and chooses the plan with the lowest estimated page I/Os.

Concurrency control: Multi-granularity locking with lock escalation. Implemented the full lock type hierarchy (IS, IX, S, SIX, X) and a deadlock detection algorithm using a waits-for graph.

Recovery: ARIES-based logging and recovery. Write-ahead logging, checkpointing, and crash recovery via analysis, redo, and undo phases.

What I learned

Database systems are where theory meets brutal engineering reality. Serializability is a beautiful concept until you have to implement it with page-level locks and track 200 concurrent transactions. The course taught me to think about correctness and performance simultaneously — you can't optimize your way out of a correctness bug, and you can't correctness your way out of a 10x performance regression.