Roadmap
OverclockDB is being developed in phases, starting with core functionality and progressively adding advanced features. Track our progress and see what's coming next.
Phase 1: Foundation + Faceting (MVP)
Working database with search, faceting, and in-memory storage
- Project setup with Cargo and configuration system
- Schema and field types (string, int, float, bool, array)
- Unicode-aware tokenizer with inverted index
- Roaring bitmap facet index
- Numeric index for range queries (BTreeMap-based)
- Query parsing, term matching, BM25 scoring
- Filter execution with bitmaps
- REST API with Axum (CRUD, search, health checks)
Phase 2: Persistence & Durability
Data survives restarts with fast recovery
- Write-Ahead Log (WAL) with CRC32 checksums
- Configurable sync policy (Immediate, EveryN, OsManaged)
- WAL segment rotation (64MB segments)
- Full snapshot serialization (bincode + LZ4 compression)
- Background-safe snapshot creation
- Recovery manager for coordinating persistence
- Graceful shutdown with final snapshot
Phase 3: Advanced Features
Production-grade search features
- Context-aware overlays for customer-specific pricing
- Multi-field sorting (price:asc,rating:desc)
- Enhanced filtering (equality, range, AND combinations)
- Per-context sorted BTreeMap index
- OverlayStats (min, max, count) for query planning
Phase 4: Search Quality
Typo tolerance, stemming, and enhanced relevance
- Stemming support (English, Russian via Snowball)
- Stop words filtering
- Typo tolerance with SymSpell algorithm (edit distance 1-2)
- PostgreSQL native synchronization
- Semantic/vector search with HNSW index
- Hybrid BM25+vector ranking
- Hierarchical categories with automatic ancestor indexing
- Internationalization (i18n) for facet labels
Phase 5: Performance & Scale
Handle 10M+ documents efficiently
- Hash-based document sharding (complete)
- Parallel search across shards with rayon (complete)
- K-way heap merge for result aggregation (complete)
- Global BM25 statistics with lazy refresh (complete)
- Sharding persistence and recovery (complete)
- Batched write operations (planned)
- Background index maintenance (planned)
- Memory-mapped index files (planned)
- Memory budget management (planned)
Phase 6: Production Readiness
Production-deployable system
- API key authentication
- Rate limiting
- Prometheus metrics endpoint
- Request tracing with correlation IDs
- Docker container & docker-compose
- Configuration hot-reload
- Admin API for operations
- Integration test suite
Future: Distributed System
Distributed architecture for horizontal scaling
- Raft consensus for leader election
- Data replication across nodes
- Distributed search coordination
- Automatic failover
- Cluster membership management
- Horizontal scaling
Performance Targets
| Metric | Current | Target | Status |
|---|---|---|---|
| Single-term search | <5ms | <5ms P99 | Achieved |
| Multi-term search | <10ms | <20ms P99 | Achieved |
| Faceted search | <15ms | <30ms P99 | Achieved |
| Overlay search (50K docs) | ~6ms | <10ms P99 | Achieved |
| Sharded search (100K, 4 shards) | 386 µs | <1ms P99 | Achieved |
| Sharded speedup | 3x | 2-4x | Achieved |
| Indexing throughput | ~50K/s | >10K docs/sec | Achieved |
Changelog
v0.6.0 (Current)
- Internationalization (i18n): Translated facet labels for multilingual UIs
- Translation Registry for per-collection, per-field translations
- REST API: PUT/GET /api/v1/collections/:name/translations
- Search parameter: language for requesting translated labels
- Fallback chain: requested language → English → raw value
v0.5.0
- Hierarchical Categories: Native tree-structured category support
- New hierarchy field type with automatic ancestor indexing
- Filter syntax: category:^Electronics (includes all descendants)
- Sharding Persistence: Sharded collections survive restarts
- Atomic PostgreSQL Sync: Zero-downtime resync with shadow collection swap
- Sorted Overlay Index: Per-context BTreeMap for efficient sorted queries
v0.4.0
- Sharding: Hash-based document sharding for parallel search
- Configurable num_shards per collection
- Parallel search using rayon thread pool
- K-way heap merge for efficient result aggregation
- 2-3x search speedup at 100K+ documents
v0.3.0
- Semantic/vector search with HNSW index
- Hybrid BM25+vector ranking with configurable alpha
- Collection-level vector embedding configuration
- Individual text_score and vector_score in search results
v0.2.0
- Stemming support (English, Russian via Snowball)
- Stop words filtering
- Typo tolerance with SymSpell algorithm
- PostgreSQL native synchronization
- Polling-based CDC for real-time updates
v0.1.0
- Initial release with core functionality
- Full-text search with BM25 scoring
- Faceted filtering with roaring bitmaps
- Numeric indexes for range queries
- Context-aware numeric overlays
- WAL + snapshot persistence
- REST API with Axum