Roadmap

OverclockDB is being developed in phases, starting with core functionality and progressively adding advanced features. Track our progress and see what's coming next.

Phase 1: Foundation + Faceting (MVP)

Complete

Working database with search, faceting, and in-memory storage

Project setup with Cargo and configuration system
Schema and field types (string, int, float, bool, array)
Unicode-aware tokenizer with inverted index
Roaring bitmap facet index
Numeric index for range queries (BTreeMap-based)
Query parsing, term matching, BM25 scoring
Filter execution with bitmaps
REST API with Axum (CRUD, search, health checks)

Phase 2: Persistence & Durability

Complete

Data survives restarts with fast recovery

Write-Ahead Log (WAL) with CRC32 checksums
Configurable sync policy (Immediate, EveryN, OsManaged)
WAL segment rotation (64MB segments)
Full snapshot serialization (bincode + LZ4 compression)
Background-safe snapshot creation
Recovery manager for coordinating persistence
Graceful shutdown with final snapshot

Phase 3: Advanced Features

Complete

Production-grade search features

Context-aware overlays for customer-specific pricing
Collection Merge - join separate collections at query time (like SQL JOIN)
Merge-enabled fields with merge: true for optimized indexes
Multi-field sorting (price:asc,rating:desc)
Enhanced filtering (equality, range, AND combinations)
Per-context sorted BTreeMap index with O(limit) iteration
Parallel merge queries across collections with rayon
MergeStats (min, max, count) for query planning

Phase 4: Search Quality

Complete

Typo tolerance, stemming, and enhanced relevance

Stemming support (19 languages via Snowball + BulStem for Bulgarian)
Stop words filtering (19 languages)
Typo tolerance with SymSpell algorithm (edit distance 1-2)
Semantic/vector search with HNSW index
Hybrid BM25+vector ranking
Hierarchical categories with automatic ancestor indexing
Internationalization (i18n) for facet labels

Phase 5: Performance & Scale

In Progress

Handle 10M+ documents efficiently

Hash-based document sharding (complete)
Parallel search across shards with rayon (complete)
K-way heap merge for result aggregation (complete)
Global BM25 statistics with lazy refresh (complete)
Sharding persistence and recovery (complete)
Batched write operations (planned)
Background index maintenance (planned)
Memory-mapped index files (planned)
Memory budget management (planned)

Phase 6: Production Readiness

Planned

Production-deployable system

API key authentication
Rate limiting
Prometheus metrics endpoint
Request tracing with correlation IDs
Docker container & docker-compose
Configuration hot-reload
Admin API for operations
Integration test suite

Future: Distributed System

Future

Distributed architecture for horizontal scaling

Raft consensus for leader election
Data replication across nodes
Distributed search coordination
Automatic failover
Cluster membership management
Horizontal scaling

Performance Targets

Metric	Current	Target	Status
Single-term search	<5ms	<5ms P99	Achieved
Multi-term search	<10ms	<20ms P99	Achieved
Faceted search	<15ms	<30ms P99	Achieved
Merge search (50K docs)	~6ms	<10ms P99	Achieved
Indexed merge sort	170x-11,700x faster	O(offset+limit)	Achieved
Sharded search (100K, 4 shards)	386 µs	<1ms P99	Achieved
Sharded speedup	3x	2-4x	Achieved
Indexing throughput	~50K/s	>10K docs/sec	Achieved

Changelog

v0.9.0 (Current)

Collection Merge: Join separate collections at query time (like SQL JOIN)
New merge: true field option for merge-enabled indexes
New merge search parameter (replaces overlay API)
Parallel merge lookups across collections using rayon
Flat response format: all fields at root level (no doc wrapper)
Indexed sorting: O(offset + limit) for sorted merge queries
MergeIndex with sorted BTreeMap per field for efficient iteration
170x-11,700x faster indexed vs manual sort (benchmark verified)
Backward compatible: old overlay API still supported

v0.8.0

Per-Collection Overlays: Overlay indexes now isolated per target collection
Naming pattern: overlays_{collection} (e.g., overlays_products)
Prevents ID collisions between different collections
Optimized memory usage - each collection loads only its own overlays
Memory optimizations: 65% RAM reduction (8.5GB → 3GB for 1M docs)
WAL cleanup: Automatic deletion of old WAL segments after snapshots

v0.7.0

Multi-Language Support: Expanded stemming and stop words to 19 languages
Added: Arabic, Bulgarian, Danish, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish, Tamil, Turkish
Bulgarian uses custom BulStem algorithm with ~128,000 stemming rules (lazy-loaded)
All other languages use Snowball stemmers from rust-stemmers
Stop word lists sourced from Alir3z4/stop-words (CC-BY-4.0)

v0.6.0

Internationalization (i18n): Translated facet labels for multilingual UIs
Translation Registry for per-collection, per-field translations
REST API: PUT/GET /api/v1/collections/:name/translations
Search parameter: language for requesting translated labels
Fallback chain: requested language → English → raw value

v0.5.0

Hierarchical Categories: Native tree-structured category support
New hierarchy field type with automatic ancestor indexing
Filter syntax: category:^Electronics (includes all descendants)
Sharding Persistence: Sharded collections survive restarts
Sorted Overlay Index: Per-context BTreeMap for efficient sorted queries

v0.4.0

Sharding: Hash-based document sharding for parallel search
Configurable num_shards per collection
Parallel search using rayon thread pool
K-way heap merge for efficient result aggregation
2-3x search speedup at 100K+ documents

v0.3.0

Semantic/vector search with HNSW index
Hybrid BM25+vector ranking with configurable alpha
Collection-level vector embedding configuration
Individual text_score and vector_score in search results

v0.2.0

Stemming support (English, Russian via Snowball)
Stop words filtering
Typo tolerance with SymSpell algorithm

v0.1.0

Initial release with core functionality
Full-text search with BM25 scoring
Faceted filtering with roaring bitmaps
Numeric indexes for range queries
Context-aware numeric overlays
WAL + snapshot persistence
REST API with Axum