Roadmap

OverclockDB is being developed in phases, starting with core functionality and progressively adding advanced features. Track our progress and see what's coming next.

Phase 1: Foundation + Faceting (MVP)

Complete

Working database with search, faceting, and in-memory storage

  • Project setup with Cargo and configuration system
  • Schema and field types (string, int, float, bool, array)
  • Unicode-aware tokenizer with inverted index
  • Roaring bitmap facet index
  • Numeric index for range queries (BTreeMap-based)
  • Query parsing, term matching, BM25 scoring
  • Filter execution with bitmaps
  • REST API with Axum (CRUD, search, health checks)

Phase 2: Persistence & Durability

Complete

Data survives restarts with fast recovery

  • Write-Ahead Log (WAL) with CRC32 checksums
  • Configurable sync policy (Immediate, EveryN, OsManaged)
  • WAL segment rotation (64MB segments)
  • Full snapshot serialization (bincode + LZ4 compression)
  • Background-safe snapshot creation
  • Recovery manager for coordinating persistence
  • Graceful shutdown with final snapshot

Phase 3: Advanced Features

Complete

Production-grade search features

  • Context-aware overlays for customer-specific pricing
  • Multi-field sorting (price:asc,rating:desc)
  • Enhanced filtering (equality, range, AND combinations)
  • Per-context sorted BTreeMap index
  • OverlayStats (min, max, count) for query planning

Phase 4: Search Quality

Complete

Typo tolerance, stemming, and enhanced relevance

  • Stemming support (English, Russian via Snowball)
  • Stop words filtering
  • Typo tolerance with SymSpell algorithm (edit distance 1-2)
  • PostgreSQL native synchronization
  • Semantic/vector search with HNSW index
  • Hybrid BM25+vector ranking
  • Hierarchical categories with automatic ancestor indexing
  • Internationalization (i18n) for facet labels

Phase 5: Performance & Scale

In Progress

Handle 10M+ documents efficiently

  • Hash-based document sharding (complete)
  • Parallel search across shards with rayon (complete)
  • K-way heap merge for result aggregation (complete)
  • Global BM25 statistics with lazy refresh (complete)
  • Sharding persistence and recovery (complete)
  • Batched write operations (planned)
  • Background index maintenance (planned)
  • Memory-mapped index files (planned)
  • Memory budget management (planned)

Phase 6: Production Readiness

Planned

Production-deployable system

  • API key authentication
  • Rate limiting
  • Prometheus metrics endpoint
  • Request tracing with correlation IDs
  • Docker container & docker-compose
  • Configuration hot-reload
  • Admin API for operations
  • Integration test suite

Future: Distributed System

Future

Distributed architecture for horizontal scaling

  • Raft consensus for leader election
  • Data replication across nodes
  • Distributed search coordination
  • Automatic failover
  • Cluster membership management
  • Horizontal scaling

Performance Targets

Metric Current Target Status
Single-term search <5ms <5ms P99 Achieved
Multi-term search <10ms <20ms P99 Achieved
Faceted search <15ms <30ms P99 Achieved
Overlay search (50K docs) ~6ms <10ms P99 Achieved
Sharded search (100K, 4 shards) 386 µs <1ms P99 Achieved
Sharded speedup 3x 2-4x Achieved
Indexing throughput ~50K/s >10K docs/sec Achieved

Changelog

v0.6.0 (Current)

  • Internationalization (i18n): Translated facet labels for multilingual UIs
  • Translation Registry for per-collection, per-field translations
  • REST API: PUT/GET /api/v1/collections/:name/translations
  • Search parameter: language for requesting translated labels
  • Fallback chain: requested language → English → raw value

v0.5.0

  • Hierarchical Categories: Native tree-structured category support
  • New hierarchy field type with automatic ancestor indexing
  • Filter syntax: category:^Electronics (includes all descendants)
  • Sharding Persistence: Sharded collections survive restarts
  • Atomic PostgreSQL Sync: Zero-downtime resync with shadow collection swap
  • Sorted Overlay Index: Per-context BTreeMap for efficient sorted queries

v0.4.0

  • Sharding: Hash-based document sharding for parallel search
  • Configurable num_shards per collection
  • Parallel search using rayon thread pool
  • K-way heap merge for efficient result aggregation
  • 2-3x search speedup at 100K+ documents

v0.3.0

  • Semantic/vector search with HNSW index
  • Hybrid BM25+vector ranking with configurable alpha
  • Collection-level vector embedding configuration
  • Individual text_score and vector_score in search results

v0.2.0

  • Stemming support (English, Russian via Snowball)
  • Stop words filtering
  • Typo tolerance with SymSpell algorithm
  • PostgreSQL native synchronization
  • Polling-based CDC for real-time updates

v0.1.0

  • Initial release with core functionality
  • Full-text search with BM25 scoring
  • Faceted filtering with roaring bitmaps
  • Numeric indexes for range queries
  • Context-aware numeric overlays
  • WAL + snapshot persistence
  • REST API with Axum