diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..bba67e1 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,74 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +kafka is a privacy-respecting metasearch engine written in Go. It provides a SearXNG-compatible `/search` API and an HTML frontend (HTMX + Go templates). 9 engines are implemented natively in Go; unlisted engines can be proxied to an upstream SearXNG instance. Responses from multiple engines are merged into a single JSON/CSV/RSS/HTML response. + +## Build & Run Commands + +```bash +# Enter Nix dev shell (provides Go 1.24 toolchain + curl) +nix develop + +# Run all tests +go test ./... + +# Run a single test +go test -run TestWikipedia ./internal/engines/ + +# Run tests in a specific package with verbose output +go test -v ./internal/engines/ + +# Run the server (requires config.toml) +go run ./cmd/searxng-go -config config.toml +``` + +There is no Makefile. There is no linter configured. + +## Architecture + +**Request flow:** HTTP request -> middleware chain (global rate limit -> burst rate limit -> per-IP rate limit -> CORS) -> HTTP handler -> `search.Service` (cache check) -> `engines.Planner` (splits into local vs upstream) -> parallel local engine execution + upstream proxy -> `MergeResponses` -> cache write -> serialize (JSON/CSV/RSS/HTML). + +**Key packages:** + +- `internal/contracts` — Shared types: `SearchRequest`, `SearchResponse`, `MainResult`, `OutputFormat`. `MainResult` preserves unknown JSON keys from upstream via a `raw map[string]any` field and round-trips them faithfully. +- `internal/config` — TOML-based configuration with env var fallbacks. `Load(path)` reads `config.toml`; env vars override zero-value fields. See `config.example.toml` for all settings. +- `internal/engines` — `Engine` interface and all 9 Go-native implementations. `factory.go` registers engines via `NewDefaultPortedEngines()`. `planner.go` routes engines to local or upstream based on `LOCAL_PORTED_ENGINES` env var. +- `internal/search` — `Service` orchestrates the pipeline: cache check, planning, parallel engine execution via goroutines/WaitGroup, upstream proxying, response merging. Individual engine failures are reported as `unresponsive_engines` rather than aborting the search. Qwant has fallback logic to upstream on empty results. +- `internal/httpapi` — HTTP handlers for `/`, `/search`, `/healthz`, `/opensearch.xml`. Detects HTMX requests via `HX-Request` header to return fragments instead of full pages. +- `internal/upstream` — Client that proxies requests to an upstream SearXNG instance via POST. +- `internal/cache` — Valkey/Redis-backed cache with SHA-256 cache keys. No-op if unconfigured. +- `internal/middleware` — Three rate limiters (per-IP sliding window, burst+sustained, global) and CORS. All disabled by default. +- `internal/views` — HTML templates and static files embedded via `//go:embed`. Renders full pages or HTMX fragments. Templates: `base.html`, `index.html`, `results.html`, `results_inner.html`, `result_item.html`. +- `cmd/searxng-go` — Entry point. Loads TOML config, seeds env vars for engine code, wires up middleware chain, starts HTTP server. + +**Engine interface** (`internal/engines/engine.go`): +```go +type Engine interface { + Name() string + Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) +} +``` + +**Adding a new engine:** +1. Create a new struct implementing the `Engine` interface in `internal/engines/` (single file, e.g., `newengine.go`) +2. Add a test file alongside it (use `roundTripperFunc` and `httpResponse` helpers in `http_mock_test.go` for mocking HTTP) +3. Register it in `NewDefaultPortedEngines()` in `factory.go` +4. Add its name to `defaultPortedEngines` in `planner.go` +5. Add category mappings in `inferFromCategories()` if applicable + +## Configuration + +Config is loaded from `config.toml` (see `config.example.toml`). All fields can be overridden via environment variables (env vars take precedence over zero-value TOML fields). Key sections: `[server]`, `[upstream]`, `[engines]`, `[cache]`, `[cors]`, `[rate_limit]`, `[global_rate_limit]`, `[burst_rate_limit]`. + +## Conventions + +- Module path: `github.com/metamorphosis-dev/kafka` +- Tests use shared mock helpers in `internal/engines/http_mock_test.go` (`roundTripperFunc`, `httpResponse`) +- Engine implementations are single files under `internal/engines/` (e.g., `wikipedia.go`, `duckduckgo.go`) +- Response merging de-duplicates by `engine|title|url` key; suggestions/corrections are merged as sets +- `MainResult` uses custom `UnmarshalJSON`/`MarshalJSON` to preserve unknown upstream JSON keys +- HTML templates and static files are embedded at build time via `//go:embed` in `internal/views/` +- Structured logging via `log/slog` with JSON handler