# samsa *samsa — named for Gregor Samsa, who woke to find himself transformed. You wanted results; you got a metasearch engine.* A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible API with an HTML frontend, designed to be fast, lightweight, and deployable anywhere. **11 engines. No JavaScript required. No tracking. One binary.** ## Features - **SearXNG-compatible API** — drop-in replacement for existing integrations - **11 search engines** — Wikipedia, arXiv, Crossref, Brave Search API, Brave (scraping), Qwant, DuckDuckGo, GitHub, Reddit, Bing, Google, YouTube - **Stack Overflow** — bonus engine, not enabled by default - **HTML frontend** — Go templates + HTMX with instant search, dark mode, responsive design - **Valkey cache** — optional Redis-compatible caching with configurable TTL - **Rate limiting** — three layers: per-IP, burst, and global (all disabled by default) - **CORS** — configurable origins for browser-based clients - **OpenSearch** — browsers can add samsa as a search engine from the address bar - **Graceful degradation** — individual engine failures don't kill the whole search - **Docker** — multi-stage build, static binary, ~20MB runtime image - **NixOS** — native NixOS module with systemd service ## Quick Start ### Binary ```bash git clone https://git.ashisgreat.xyz/penal-colony/samsa.git cd samsa go build ./cmd/samsa ./samsa -config config.toml ``` ### Docker Compose ```bash cp config.example.toml config.toml # Edit config.toml — set your Brave API key, YouTube API key, etc. docker compose up -d ``` ### NixOS Add to your flake inputs: ```nix inputs.samsa.url = "git+https://git.ashisgreat.xyz/penal-colony/samsa.git"; ``` Enable in your configuration: ```nix imports = [ inputs.samsa.nixosModules.default ]; services.samsa = { enable = true; openFirewall = true; baseUrl = "https://search.example.com"; # config = "/etc/samsa/config.toml"; # default }; ``` Write your config: ```bash sudo mkdir -p /etc/samsa sudo cp config.example.toml /etc/samsa/config.toml sudo $EDITOR /etc/samsa/config.toml ``` Deploy: ```bash sudo nixos-rebuild switch --flake .# ``` ### Nix Development Shell ```bash nix develop go test ./... go run ./cmd/samsa -config config.toml ``` ## Endpoints | Endpoint | Description | |---|---| | `GET /` | HTML search page | | `GET /search?q=…&format=html` | HTML results (full page or HTMX fragment) | | `GET/POST /search` | JSON/CSV/RSS results | | `GET /opensearch.xml` | OpenSearch description XML | | `GET /healthz` | Health check | | `GET /static/*` | Embedded CSS, images, favicon | ## Search API ### Parameters | Parameter | Default | Description | |---|---|---| | `q` | — | Search query (required) | | `format` | `json` | `json`, `csv`, `rss`, `html` | | `pageno` | `1` | Page number | | `safesearch` | `0` | Safe search level (0–2) | | `time_range` | — | `day`, `week`, `month`, `year` | | `language` | `auto` | BCP-47 language code | | `engines` | all | Comma-separated engine names | ### Example ```bash curl "http://localhost:8080/search?q=golang&format=json&engines=github,duckduckgo" ``` ### Response (JSON) ```json { "query": "golang", "number_of_results": 14768, "results": [ { "title": "The Go Programming Language", "url": "https://go.dev/", "content": "Go is an open source programming language...", "engine": "duckduckgo", "score": 1.0, "type": "result" } ], "suggestions": ["golang tutorial", "golang vs rust"], "unresponsive_engines": [] } ``` ## Configuration Copy `config.example.toml` to `config.toml` and edit. All settings can also be overridden via environment variables (listed in the example file). ### Key Sections - **`[server]`** — port, timeout, public base URL for OpenSearch - **`[upstream]`** — optional upstream metasearch proxy for unported engines - **`[engines]`** — which engines run locally, engine-specific settings - **`[engines.brave]`** — Brave Search API key - **`[engines.youtube]`** — YouTube Data API v3 key - **`[cache]`** — Valkey/Redis address, password, TTL - **`[cors]`** — allowed origins and methods - **`[rate_limit]`** — per-IP sliding window (30 req/min default) - **`[global_rate_limit]`** — server-wide limit (disabled by default) - **`[burst_rate_limit]`** — per-IP burst + sustained windows (disabled by default) ### Environment Variables | Variable | Description | |---|---| | `PORT` | Listen port (default: 8080) | | `BASE_URL` | Public URL for OpenSearch XML | | `UPSTREAM_SEARXNG_URL` | Upstream instance URL | | `LOCAL_PORTED_ENGINES` | Comma-separated local engine list | | `HTTP_TIMEOUT` | Upstream request timeout | | `BRAVE_API_KEY` | Brave Search API key | | `BRAVE_ACCESS_TOKEN` | Gate requests with token | | `YOUTUBE_API_KEY` | YouTube Data API v3 key | | `VALKEY_ADDRESS` | Valkey/Redis address | | `VALKEY_PASSWORD` | Valkey/Redis password | | `VALKEY_CACHE_TTL` | Cache TTL | See `config.example.toml` for the full list including rate limiting and CORS variables. ## Engines | Engine | Source | Notes | |---|---|---| | Wikipedia | MediaWiki API | General knowledge | | arXiv | arXiv API | Academic papers | | Crossref | Crossref API | Academic metadata | | Brave Search API | Brave API | General web (requires API key) | | Brave | Brave Lite HTML | General web (no key needed) | | Qwant | Qwant Lite HTML | General web | | DuckDuckGo | DDG Lite HTML | General web | | GitHub | GitHub Search API v3 | Code and repositories | | Reddit | Reddit JSON API | Discussions | | Bing | Bing RSS | General web | | Google | GSA User-Agent scraping | General web (no API key) | | YouTube | YouTube Data API v3 | Videos (requires API key) | | Stack Overflow | Stack Exchange API | Q&A (registered, not enabled by default) | Engines not listed in `engines.local_ported` are proxied to an upstream metasearch instance if `upstream.url` is configured. ### API Keys Brave Search API and YouTube Data API require keys. If omitted, those engines are silently skipped. Brave Lite (scraping) and Google (GSA UA scraping) work without keys. ## Architecture ``` ┌───────────────────────────────────────┐ │ HTTP Handler │ │ /search / /opensearch.xml │ ├───────────────────────────────────────┤ │ Middleware Chain │ │ Global → Burst → Per-IP → CORS │ ├───────────────────────────────────────┤ │ Search Service │ │ Parallel engine execution │ │ WaitGroup + graceful degradation │ ├───────────────────────────────────────┤ │ Cache Layer │ │ Valkey/Redis (optional; no-op if │ │ unconfigured) │ ├───────────────────────────────────────┤ │ Engines (×11 default) │ │ Each runs in its own goroutine │ │ Failures → unresponsive_engines │ └───────────────────────────────────────┘ ``` ## Docker The Dockerfile uses a multi-stage build with a static Go binary on alpine Linux: ```bash # Build: golang:1.24-alpine # Runtime: alpine:3.21 (~20MB) # CGO_ENABLED=0 — fully static docker compose up -d ``` Includes Valkey 8 with health checks out of the box. ## Contributing See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) for a walkthrough of adding a new engine. The interface is two methods: `Name()` and `Search(context, request)`. ## License [AGPLv3](https://www.gnu.org/licenses/agpl-3.0.html)