- Fix stale-while-revalidate condition (was inverted for stale vs fresh) - Add unmarshalErr tracking for cache corruption edge case - Rename CachedResponse to CachedEngineResponse throughout - Fix override tier name (engineName not engineName_override) - Add EngineCache.Logger() method - Add tiers_test.go to file map Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 KiB
Per-Engine TTL Cache — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the merged-response cache with per-engine response caching, enabling tier-based TTLs and stale-while-revalidate semantics.
Architecture: Each engine's raw response is cached independently with its tier-based TTL. On stale hits, return cached data immediately and refresh in background. Query hash is computed from shared params (query, pageno, safesearch, language, time_range) and prefixed with engine name for the cache key.
Tech Stack: Go 1.24, Valkey/Redis (go-redis/v9), existing samsa contracts
File Map
| Action | File | Responsibility |
|---|---|---|
| Create | internal/cache/tiers.go |
Tier definitions, EngineTier() function |
| Create | internal/cache/tiers_test.go |
Tests for EngineTier |
| Create | internal/cache/engine_cache.go |
EngineCache struct with tier-aware Get/Set |
| Create | internal/cache/engine_cache_test.go |
Tests for EngineCache |
| Modify | internal/cache/cache.go |
Add QueryHash(), add CachedEngineResponse type |
| Modify | internal/cache/cache_test.go |
Add tests for QueryHash() |
| Modify | internal/config/config.go |
Add TTLOverrides to CacheConfig |
| Modify | internal/search/service.go |
Use EngineCache, parallel lookups, background refresh |
Task 1: Add QueryHash and CachedEngineResponse to cache.go
Files:
-
Modify:
internal/cache/cache.go -
Modify:
internal/cache/cache_test.go -
Step 1: Write failing test for QueryHash()
// In cache_test.go, add:
func TestQueryHash(t *testing.T) {
// Same params should produce same hash
hash1 := QueryHash("golang", 1, 0, "en", "")
hash2 := QueryHash("golang", 1, 0, "en", "")
if hash1 != hash2 {
t.Errorf("QueryHash: same params should produce same hash, got %s != %s", hash1, hash2)
}
// Different query should produce different hash
hash3 := QueryHash("rust", 1, 0, "en", "")
if hash1 == hash3 {
t.Errorf("QueryHash: different queries should produce different hash")
}
// Different pageno should produce different hash
hash4 := QueryHash("golang", 2, 0, "en", "")
if hash1 == hash4 {
t.Errorf("QueryHash: different pageno should produce different hash")
}
// time_range should affect hash
hash5 := QueryHash("golang", 1, 0, "en", "day")
if hash1 == hash5 {
t.Errorf("QueryHash: different time_range should produce different hash")
}
// Hash should be 16 characters (truncated SHA-256)
if len(hash1) != 16 {
t.Errorf("QueryHash: expected 16 char hash, got %d", len(hash1))
}
}
- Step 2: Run test to verify it fails
Run: nix develop --command bash -c "go test -run TestQueryHash ./internal/cache/ -v"
Expected: FAIL — "QueryHash not defined"
- Step 3: Implement QueryHash() and CachedEngineResponse in cache.go
Add to cache.go (the imports crypto/sha256 and encoding/hex are already present in cache.go from the existing Key() function):
// QueryHash computes a deterministic hash from shared request parameters
// (query, pageno, safesearch, language, time_range) for use as a cache key suffix.
// The hash is a truncated SHA-256 (16 hex chars).
func QueryHash(query string, pageno int, safesearch int, language, timeRange string) string {
h := sha256.New()
fmt.Fprintf(h, "q=%s|", query)
fmt.Fprintf(h, "pageno=%d|", pageno)
fmt.Fprintf(h, "safesearch=%d|", safesearch)
fmt.Fprintf(h, "lang=%s|", language)
if timeRange != "" {
fmt.Fprintf(h, "tr=%s|", timeRange)
}
return hex.EncodeToString(h.Sum(nil))[:16]
}
// CachedEngineResponse wraps an engine's cached response with metadata.
type CachedEngineResponse struct {
Engine string
Response []byte
StoredAt time.Time
}
- Step 4: Run test to verify it passes
Run: nix develop --command bash -c "go test -run TestQueryHash ./internal/cache/ -v"
Expected: PASS
- Step 5: Commit
git add internal/cache/cache.go internal/cache/cache_test.go
git commit -m "cache: add QueryHash and CachedEngineResponse type"
Task 2: Create tiers.go with tier definitions
Files:
-
Create:
internal/cache/tiers.go -
Step 1: Create tiers.go with tier definitions and EngineTier function
package cache
import "time"
// TTLTier represents a cache TTL tier with a name and duration.
type TTLTier struct {
Name string
Duration time.Duration
}
// defaultTiers maps engine names to their default TTL tiers.
var defaultTiers = map[string]TTLTier{
// Static knowledge engines — rarely change
"wikipedia": {Name: "static", Duration: 24 * time.Hour},
"wikidata": {Name: "static", Duration: 24 * time.Hour},
"arxiv": {Name: "static", Duration: 24 * time.Hour},
"crossref": {Name: "static", Duration: 24 * time.Hour},
"stackoverflow": {Name: "static", Duration: 24 * time.Hour},
"github": {Name: "static", Duration: 24 * time.Hour},
// API-based general search — fresher data
"braveapi": {Name: "api_general", Duration: 1 * time.Hour},
"youtube": {Name: "api_general", Duration: 1 * time.Hour},
// Scraped general search — moderately stable
"google": {Name: "scraped_general", Duration: 2 * time.Hour},
"bing": {Name: "scraped_general", Duration: 2 * time.Hour},
"duckduckgo": {Name: "scraped_general", Duration: 2 * time.Hour},
"qwant": {Name: "scraped_general", Duration: 2 * time.Hour},
"brave": {Name: "scraped_general", Duration: 2 * time.Hour},
// News/social — changes frequently
"reddit": {Name: "news_social", Duration: 30 * time.Minute},
// Image search
"bing_images": {Name: "images", Duration: 1 * time.Hour},
"ddg_images": {Name: "images", Duration: 1 * time.Hour},
"qwant_images": {Name: "images", Duration: 1 * time.Hour},
}
// EngineTier returns the TTL tier for an engine, applying overrides if provided.
// If the engine has no defined tier, returns a default of 1 hour.
func EngineTier(engineName string, overrides map[string]time.Duration) TTLTier {
// Check override first — override tier name is just the engine name
if override, ok := overrides[engineName]; ok && override > 0 {
return TTLTier{Name: engineName, Duration: override}
}
// Fall back to default tier
if tier, ok := defaultTiers[engineName]; ok {
return tier
}
// Unknown engines get a sensible default
return TTLTier{Name: "unknown", Duration: 1 * time.Hour}
}
- Step 2: Run go vet to verify it compiles
Run: nix develop --command bash -c "go vet ./internal/cache/tiers.go"
Expected: no output (success)
- Step 3: Write a basic test for EngineTier
// In internal/cache/tiers_test.go:
package cache
import "testing"
func TestEngineTier(t *testing.T) {
// Test default static tier
tier := EngineTier("wikipedia", nil)
if tier.Name != "static" || tier.Duration != 24*time.Hour {
t.Errorf("wikipedia: expected static/24h, got %s/%v", tier.Name, tier.Duration)
}
// Test default api_general tier
tier = EngineTier("braveapi", nil)
if tier.Name != "api_general" || tier.Duration != 1*time.Hour {
t.Errorf("braveapi: expected api_general/1h, got %s/%v", tier.Name, tier.Duration)
}
// Test override takes precedence — override tier name is just the engine name
override := 48 * time.Hour
tier = EngineTier("wikipedia", map[string]time.Duration{"wikipedia": override})
if tier.Name != "wikipedia" || tier.Duration != 48*time.Hour {
t.Errorf("wikipedia override: expected wikipedia/48h, got %s/%v", tier.Name, tier.Duration)
}
// Test unknown engine gets default
tier = EngineTier("unknown_engine", nil)
if tier.Name != "unknown" || tier.Duration != 1*time.Hour {
t.Errorf("unknown engine: expected unknown/1h, got %s/%v", tier.Name, tier.Duration)
}
}
- Step 4: Run test to verify it passes
Run: nix develop --command bash -c "go test -run TestEngineTier ./internal/cache/ -v"
Expected: PASS
- Step 5: Commit
git add internal/cache/tiers.go internal/cache/tiers_test.go
git commit -m "cache: add tier definitions and EngineTier function"
Task 3: Create EngineCache in engine_cache.go
Files:
- Create:
internal/cache/engine_cache.go - Create:
internal/cache/engine_cache_test.go
Note: The existing Key() function in cache.go is still used for favicon caching. The new QueryHash() and EngineCache are separate and only for per-engine search response caching.
- Step 1: Write failing test for EngineCache.Get/Set
package cache
import (
"context"
"testing"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
func TestEngineCacheGetSet(t *testing.T) {
// Create a disabled cache for unit testing (nil client)
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
ctx := context.Background()
cached, ok := ec.Get(ctx, "wikipedia", "abc123")
if ok {
t.Errorf("Get on disabled cache: expected false, got %v", ok)
}
_ = cached // unused when ok=false
}
func TestEngineCacheKeyFormat(t *testing.T) {
key := engineCacheKey("wikipedia", "abc123")
if key != "samsa:resp:wikipedia:abc123" {
t.Errorf("engineCacheKey: expected samsa:resp:wikipedia:abc123, got %s", key)
}
}
func TestEngineCacheIsStale(t *testing.T) {
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
// Fresh response (stored 1 minute ago, wikipedia has 24h TTL)
fresh := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-1 * time.Minute),
}
if ec.IsStale(fresh, "wikipedia") {
t.Errorf("IsStale: 1-minute-old wikipedia should NOT be stale")
}
// Stale response (stored 25 hours ago)
stale := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-25 * time.Hour),
}
if !ec.IsStale(stale, "wikipedia") {
t.Errorf("IsStale: 25-hour-old wikipedia SHOULD be stale (24h TTL)")
}
// Override: 30 minute TTL for reddit
overrides := map[string]time.Duration{"reddit": 30 * time.Minute}
ec2 := NewEngineCache(c, overrides)
// 20 minutes old with 30m override should NOT be stale
redditFresh := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-20 * time.Minute),
}
if ec2.IsStale(redditFresh, "reddit") {
t.Errorf("IsStale: 20-min reddit with 30m override should NOT be stale")
}
// 45 minutes old with 30m override SHOULD be stale
redditStale := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-45 * time.Minute),
}
if !ec2.IsStale(redditStale, "reddit") {
t.Errorf("IsStale: 45-min reddit with 30m override SHOULD be stale")
}
}
- Step 2: Run test to verify it fails
Run: nix develop --command bash -c "go test -run TestEngineCache ./internal/cache/ -v"
Expected: FAIL — "EngineCache not defined" or "CachedEngineResponse not defined"
- Step 3: Implement EngineCache using GetBytes/SetBytes
The EngineCache uses the existing GetBytes/SetBytes public methods on Cache (the client field is unexported so we must use those methods).
package cache
import (
"context"
"encoding/json"
"log/slog"
"time"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// EngineCache wraps Cache with per-engine tier-aware Get/Set operations.
type EngineCache struct {
cache *Cache
overrides map[string]time.Duration
}
// NewEngineCache creates a new EngineCache with optional TTL overrides.
// If overrides is nil, default tier durations are used.
func NewEngineCache(cache *Cache, overrides map[string]time.Duration) *EngineCache {
return &EngineCache{
cache: cache,
overrides: overrides,
}
}
// Get retrieves a cached engine response. Returns (zero value, false) if not
// found or if cache is disabled.
func (ec *EngineCache) Get(ctx context.Context, engine, queryHash string) (CachedEngineResponse, bool) {
key := engineCacheKey(engine, queryHash)
data, ok := ec.cache.GetBytes(ctx, key)
if !ok {
return CachedEngineResponse{}, false
}
var cached CachedEngineResponse
if err := json.Unmarshal(data, &cached); err != nil {
ec.cache.logger.Warn("engine cache hit but unmarshal failed", "key", key, "error", err)
return CachedEngineResponse{}, false
}
ec.cache.logger.Debug("engine cache hit", "key", key, "engine", engine)
return cached, true
}
// Set stores an engine response in the cache with the engine's tier TTL.
func (ec *EngineCache) Set(ctx context.Context, engine, queryHash string, resp contracts.SearchResponse) {
if !ec.cache.Enabled() {
return
}
data, err := json.Marshal(resp)
if err != nil {
ec.cache.logger.Warn("engine cache set: marshal failed", "engine", engine, "error", err)
return
}
tier := EngineTier(engine, ec.overrides)
key := engineCacheKey(engine, queryHash)
cached := CachedEngineResponse{
Engine: engine,
Response: data,
StoredAt: time.Now(),
}
cachedData, err := json.Marshal(cached)
if err != nil {
ec.cache.logger.Warn("engine cache set: wrap marshal failed", "key", key, "error", err)
return
}
ec.cache.SetBytes(ctx, key, cachedData, tier.Duration)
}
// IsStale returns true if the cached response is older than the tier's TTL.
func (ec *EngineCache) IsStale(cached CachedEngineResponse, engine string) bool {
tier := EngineTier(engine, ec.overrides)
return time.Since(cached.StoredAt) > tier.Duration
}
// Logger returns the logger for background refresh logging.
func (ec *EngineCache) Logger() *slog.Logger {
return ec.cache.logger
}
// engineCacheKey builds the cache key for an engine+query combination.
func engineCacheKey(engine, queryHash string) string {
return "samsa:resp:" + engine + ":" + queryHash
}
- Step 4: Run tests to verify they pass
Run: nix develop --command bash -c "go test -run TestEngineCache ./internal/cache/ -v"
Expected: PASS
- Step 5: Commit
git add internal/cache/engine_cache.go internal/cache/engine_cache_test.go
git commit -m "cache: add EngineCache with tier-aware Get/Set"
Task 4: Add TTLOverrides to config
Files:
-
Modify:
internal/config/config.go -
Step 1: Add TTLOverrides to CacheConfig
In CacheConfig struct, add:
type CacheConfig struct {
Address string `toml:"address"`
Password string `toml:"password"`
DB int `toml:"db"`
DefaultTTL string `toml:"default_ttl"`
TTLOverrides map[string]string `toml:"ttl_overrides"` // engine -> duration string
}
- Step 2: Add TTLOverridesParsed() method to Config
Add after CacheTTL():
// CacheTTLOverrides returns parsed TTL overrides from config.
func (c *Config) CacheTTLOverrides() map[string]time.Duration {
if len(c.Cache.TTLOverrides) == 0 {
return nil
}
out := make(map[string]time.Duration, len(c.Cache.TTLOverrides))
for engine, durStr := range c.Cache.TTLOverrides {
if d, err := time.ParseDuration(durStr); err == nil && d > 0 {
out[engine] = d
}
}
return out
}
- Step 3: Run tests to verify nothing breaks
Run: nix develop --command bash -c "go test ./internal/config/ -v"
Expected: PASS
- Step 4: Commit
git add internal/config/config.go
git commit -m "config: add TTLOverrides to CacheConfig"
Task 5: Wire EngineCache into search service
Files:
-
Modify:
internal/search/service.go -
Step 1: Read the current service.go to understand wiring
The service currently takes *Cache in ServiceConfig. We need to change it to take *EngineCache or change the field type.
- Step 2: Modify Service struct and NewService to use EngineCache
Change Service:
type Service struct {
upstreamClient *upstream.Client
planner *engines.Planner
localEngines map[string]engines.Engine
engineCache *cache.EngineCache
}
Change NewService:
func NewService(cfg ServiceConfig) *Service {
timeout := cfg.HTTPTimeout
if timeout <= 0 {
timeout = 10 * time.Second
}
httpClient := httpclient.NewClient(timeout)
var up *upstream.Client
if cfg.UpstreamURL != "" {
c, err := upstream.NewClient(cfg.UpstreamURL, timeout)
if err == nil {
up = c
}
}
var engineCache *cache.EngineCache
if cfg.Cache != nil {
engineCache = cache.NewEngineCache(cfg.Cache, cfg.CacheTTLOverrides)
}
return &Service{
upstreamClient: up,
planner: engines.NewPlannerFromEnv(),
localEngines: engines.NewDefaultPortedEngines(httpClient, cfg.EnginesConfig),
engineCache: engineCache,
}
}
Add CacheTTLOverrides to ServiceConfig:
type ServiceConfig struct {
UpstreamURL string
HTTPTimeout time.Duration
Cache *cache.Cache
CacheTTLOverrides map[string]time.Duration
EnginesConfig *config.Config
}
- Step 3: Rewrite Search() with correct stale-while-revalidate logic
The stale-while-revalidate flow:
-
Cache lookup (Phase 1): Check cache for each engine in parallel. Classify each as:
- Fresh hit: cache has data AND not stale → deserialize, mark as
fresh - Stale hit: cache has data AND stale → keep in
cached, nofreshyet - Miss: cache has no data →
hit=false, nocachedorfresh
- Fresh hit: cache has data AND not stale → deserialize, mark as
-
Fetch (Phase 2): For each engine:
- Fresh hit: return immediately, no fetch needed
- Stale hit: return stale data immediately, fetch fresh in background
- Miss: fetch fresh synchronously, cache result
-
Collect (Phase 3): Collect all responses for merge.
// Search executes the request against local engines (in parallel) and
// optionally the upstream instance for unported engines.
func (s *Service) Search(ctx context.Context, req SearchRequest) (SearchResponse, error) {
queryHash := cache.QueryHash(
req.Query,
int(req.Pageno),
int(req.Safesearch),
req.Language,
derefString(req.TimeRange),
)
localEngineNames, upstreamEngineNames, _ := s.planner.Plan(req)
// Phase 1: Parallel cache lookups — classify each engine as fresh/stale/miss
type cacheResult struct {
engine string
cached cache.CachedEngineResponse
hit bool
fresh contracts.SearchResponse
fetchErr error
unmarshalErr bool // true if hit but unmarshal failed (treat as miss)
}
cacheResults := make([]cacheResult, len(localEngineNames))
var lookupWg sync.WaitGroup
for i, name := range localEngineNames {
lookupWg.Add(1)
go func(i int, name string) {
defer lookupWg.Done()
result := cacheResult{engine: name}
if s.engineCache != nil {
cached, ok := s.engineCache.Get(ctx, name, queryHash)
if ok {
result.hit = true
result.cached = cached
if !s.engineCache.IsStale(cached, name) {
// Fresh cache hit — deserialize and use directly
var resp contracts.SearchResponse
if err := json.Unmarshal(cached.Response, &resp); err == nil {
result.fresh = resp
} else {
// Unmarshal failed — treat as cache miss (will fetch fresh synchronously)
result.unmarshalErr = true
result.hit = false // treat as miss
}
}
// If stale: result.fresh stays zero, result.cached has stale data
}
}
cacheResults[i] = result
}(i, name)
}
lookupWg.Wait()
// Phase 2: Fetch fresh for misses and stale entries
var fetchWg sync.WaitGroup
for i, name := range localEngineNames {
cr := cacheResults[i]
// Fresh hit — nothing to do in phase 2
if cr.hit && cr.fresh.Response != nil {
continue
}
// Stale hit — return stale immediately, refresh in background
if cr.hit && cr.cached.Response != nil && s.engineCache != nil && s.engineCache.IsStale(cr.cached, name) {
fetchWg.Add(1)
go func(name string) {
defer fetchWg.Done()
eng, ok := s.localEngines[name]
if !ok {
return
}
freshResp, err := eng.Search(ctx, req)
if err != nil {
s.engineCache.Logger().Debug("background refresh failed", "engine", name, "error", err)
return
}
s.engineCache.Set(ctx, name, queryHash, freshResp)
}(name)
continue
}
// Cache miss — fetch fresh synchronously
if !cr.hit {
fetchWg.Add(1)
go func(i int, name string) {
defer fetchWg.Done()
eng, ok := s.localEngines[name]
if !ok {
cacheResults[i] = cacheResult{
engine: name,
fetchErr: fmt.Errorf("engine not registered: %s", name),
}
return
}
freshResp, err := eng.Search(ctx, req)
if err != nil {
cacheResults[i] = cacheResult{
engine: name,
fetchErr: err,
}
return
}
// Cache the fresh response
if s.engineCache != nil {
s.engineCache.Set(ctx, name, queryHash, freshResp)
}
cacheResults[i] = cacheResult{
engine: name,
fresh: freshResp,
hit: false,
}
}(i, name)
}
}
fetchWg.Wait()
// Phase 3: Collect responses for merge
responses := make([]contracts.SearchResponse, 0, len(cacheResults))
for _, cr := range cacheResults {
if cr.fetchErr != nil {
responses = append(responses, unresponsiveResponse(req.Query, cr.engine, cr.fetchErr.Error()))
continue
}
// Use fresh data if available (fresh hit or freshly fetched), otherwise use stale cached
if cr.fresh.Response != nil {
responses = append(responses, cr.fresh)
} else if cr.hit && cr.cached.Response != nil {
var resp contracts.SearchResponse
if err := json.Unmarshal(cr.cached.Response, &resp); err == nil {
responses = append(responses, resp)
}
}
}
// ... rest of upstream proxy and merge logic (unchanged) ...
}
Note: The imports need encoding/json and fmt added. The existing imports in service.go already include sync and time.
- Step 4: Run tests to verify compilation
Run: nix develop --command bash -c "go build ./internal/search/"
Expected: no output (success)
- Step 5: Run full test suite
Run: nix develop --command bash -c "go test ./..."
Expected: All pass
- Step 6: Commit
git add internal/search/service.go
git commit -m "search: wire per-engine cache with tier-aware TTLs"
Task 6: Update config.example.toml
Files:
-
Modify:
config.example.toml -
Step 1: Add TTL overrides section to config.example.toml
Add after the [cache] section:
[cache.ttl_overrides]
# Per-engine TTL overrides (uncomment to use):
# wikipedia = "48h"
# reddit = "15m"
# braveapi = "2h"
- Step 2: Commit
git add config.example.toml
git commit -m "config: add cache.ttl_overrides example"
Verification
After all tasks complete, run:
nix develop --command bash -c "go test ./... -v 2>&1 | tail -50"
All tests should pass. The search service should now cache each engine's response independently with tier-based TTLs.