← Projects

RookieDB

2023

Java-based relational database with concurrency control and query optimization

JavaSystems
SQL Query
Query Processing
Parser
Logical Plan
Cost-Based Optimizer
DP over join orderings
Execution Engine
Nested Loop Join
Sort-Merge Join
Hash Join
B+ Tree Scan
Concurrency + Storage
Lock Manager
IS·IX·S·SIX·X
Buffer Pool
LRU eviction
Deadlock Detection
waits-for graph
WAL Logging
ARIES Recovery
analysis · redo · undo

Full relational database — from SQL to disk

RookieDB is the database project from Berkeley's CS 186. You start with a basic query execution engine and progressively add the features that make a real database system: indexes, joins, transactions, locking, and query optimization.

What I implemented

B+ Tree indexes: Full B+ tree implementation with support for bulk loading, range scans, and iterator-based traversal. Getting the split/merge logic right with proper key redistribution was the first major debugging marathon of the semester.

Join algorithms: Implemented multiple join strategies — nested loop join, sort-merge join, and hash join — along with the query planner that selects between them based on estimated I/O cost.

Query optimization: A cost-based optimizer using dynamic programming over join orderings. The optimizer estimates selectivity for predicates and chooses the plan with the lowest estimated page I/Os.

Concurrency control: Multi-granularity locking with lock escalation. Implemented the full lock type hierarchy (IS, IX, S, SIX, X) and a deadlock detection algorithm using a waits-for graph.

Recovery: ARIES-based logging and recovery. Write-ahead logging, checkpointing, and crash recovery via analysis, redo, and undo phases.

What I learned

Database systems are where theory meets brutal engineering reality. Serializability is a beautiful concept until you have to implement it with page-level locks and track 200 concurrent transactions. The course taught me to think about correctness and performance simultaneously — you can't optimize your way out of a correctness bug, and you can't correctness your way out of a 10x performance regression.