samsa/README.md
Franz Kafka 71b96598ed
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 22s
docs: refresh README — 11 engines, accurate clone URLs, API key clarity
2026-03-23 14:07:49 +00:00

239 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# samsa
*samsa — named for Gregor Samsa, who woke to find himself transformed. You wanted results; you got a metasearch engine.*
A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible API with an HTML frontend, designed to be fast, lightweight, and deployable anywhere.
**11 engines. No JavaScript required. No tracking. One binary.**
## Features
- **SearXNG-compatible API** — drop-in replacement for existing integrations
- **11 search engines** — Wikipedia, arXiv, Crossref, Brave Search API, Brave (scraping), Qwant, DuckDuckGo, GitHub, Reddit, Bing, Google, YouTube
- **Stack Overflow** — bonus engine, not enabled by default
- **HTML frontend** — Go templates + HTMX with instant search, dark mode, responsive design
- **Valkey cache** — optional Redis-compatible caching with configurable TTL
- **Rate limiting** — three layers: per-IP, burst, and global (all disabled by default)
- **CORS** — configurable origins for browser-based clients
- **OpenSearch** — browsers can add samsa as a search engine from the address bar
- **Graceful degradation** — individual engine failures don't kill the whole search
- **Docker** — multi-stage build, static binary, ~20MB runtime image
- **NixOS** — native NixOS module with systemd service
## Quick Start
### Binary
```bash
git clone https://git.ashisgreat.xyz/penal-colony/samsa.git
cd samsa
go build ./cmd/samsa
./samsa -config config.toml
```
### Docker Compose
```bash
cp config.example.toml config.toml
# Edit config.toml — set your Brave API key, YouTube API key, etc.
docker compose up -d
```
### NixOS
Add to your flake inputs:
```nix
inputs.samsa.url = "git+https://git.ashisgreat.xyz/penal-colony/samsa.git";
```
Enable in your configuration:
```nix
imports = [ inputs.samsa.nixosModules.default ];
services.samsa = {
enable = true;
openFirewall = true;
baseUrl = "https://search.example.com";
# config = "/etc/samsa/config.toml"; # default
};
```
Write your config:
```bash
sudo mkdir -p /etc/samsa
sudo cp config.example.toml /etc/samsa/config.toml
sudo $EDITOR /etc/samsa/config.toml
```
Deploy:
```bash
sudo nixos-rebuild switch --flake .#
```
### Nix Development Shell
```bash
nix develop
go test ./...
go run ./cmd/samsa -config config.toml
```
## Endpoints
| Endpoint | Description |
|---|---|
| `GET /` | HTML search page |
| `GET /search?q=…&format=html` | HTML results (full page or HTMX fragment) |
| `GET/POST /search` | JSON/CSV/RSS results |
| `GET /opensearch.xml` | OpenSearch description XML |
| `GET /healthz` | Health check |
| `GET /static/*` | Embedded CSS, images, favicon |
## Search API
### Parameters
| Parameter | Default | Description |
|---|---|---|
| `q` | — | Search query (required) |
| `format` | `json` | `json`, `csv`, `rss`, `html` |
| `pageno` | `1` | Page number |
| `safesearch` | `0` | Safe search level (02) |
| `time_range` | — | `day`, `week`, `month`, `year` |
| `language` | `auto` | BCP-47 language code |
| `engines` | all | Comma-separated engine names |
### Example
```bash
curl "http://localhost:8080/search?q=golang&format=json&engines=github,duckduckgo"
```
### Response (JSON)
```json
{
"query": "golang",
"number_of_results": 14768,
"results": [
{
"title": "The Go Programming Language",
"url": "https://go.dev/",
"content": "Go is an open source programming language...",
"engine": "duckduckgo",
"score": 1.0,
"type": "result"
}
],
"suggestions": ["golang tutorial", "golang vs rust"],
"unresponsive_engines": []
}
```
## Configuration
Copy `config.example.toml` to `config.toml` and edit. All settings can also be overridden via environment variables (listed in the example file).
### Key Sections
- **`[server]`** — port, timeout, public base URL for OpenSearch
- **`[upstream]`** — optional upstream metasearch proxy for unported engines
- **`[engines]`** — which engines run locally, engine-specific settings
- **`[engines.brave]`** — Brave Search API key
- **`[engines.youtube]`** — YouTube Data API v3 key
- **`[cache]`** — Valkey/Redis address, password, TTL
- **`[cors]`** — allowed origins and methods
- **`[rate_limit]`** — per-IP sliding window (30 req/min default)
- **`[global_rate_limit]`** — server-wide limit (disabled by default)
- **`[burst_rate_limit]`** — per-IP burst + sustained windows (disabled by default)
### Environment Variables
| Variable | Description |
|---|---|
| `PORT` | Listen port (default: 8080) |
| `BASE_URL` | Public URL for OpenSearch XML |
| `UPSTREAM_SEARXNG_URL` | Upstream instance URL |
| `LOCAL_PORTED_ENGINES` | Comma-separated local engine list |
| `HTTP_TIMEOUT` | Upstream request timeout |
| `BRAVE_API_KEY` | Brave Search API key |
| `BRAVE_ACCESS_TOKEN` | Gate requests with token |
| `YOUTUBE_API_KEY` | YouTube Data API v3 key |
| `VALKEY_ADDRESS` | Valkey/Redis address |
| `VALKEY_PASSWORD` | Valkey/Redis password |
| `VALKEY_CACHE_TTL` | Cache TTL |
See `config.example.toml` for the full list including rate limiting and CORS variables.
## Engines
| Engine | Source | Notes |
|---|---|---|
| Wikipedia | MediaWiki API | General knowledge |
| arXiv | arXiv API | Academic papers |
| Crossref | Crossref API | Academic metadata |
| Brave Search API | Brave API | General web (requires API key) |
| Brave | Brave Lite HTML | General web (no key needed) |
| Qwant | Qwant Lite HTML | General web |
| DuckDuckGo | DDG Lite HTML | General web |
| GitHub | GitHub Search API v3 | Code and repositories |
| Reddit | Reddit JSON API | Discussions |
| Bing | Bing RSS | General web |
| Google | GSA User-Agent scraping | General web (no API key) |
| YouTube | YouTube Data API v3 | Videos (requires API key) |
| Stack Overflow | Stack Exchange API | Q&A (registered, not enabled by default) |
Engines not listed in `engines.local_ported` are proxied to an upstream metasearch instance if `upstream.url` is configured.
### API Keys
Brave Search API and YouTube Data API require keys. If omitted, those engines are silently skipped. Brave Lite (scraping) and Google (GSA UA scraping) work without keys.
## Architecture
```
┌─────────────────────────────────────┐
│ HTTP Handler │
│ /search / /opensearch.xml │
├─────────────────────────────────────┤
│ Middleware Chain │
│ Global → Burst → Per-IP → CORS │
├─────────────────────────────────────┤
│ Search Service │
│ Parallel engine execution │
│ WaitGroup + graceful degradation │
├─────────────────────────────────────┤
│ Cache Layer │
│ Valkey/Redis (optional; no-op if │
│ unconfigured) │
├─────────────────────────────────────┤
│ Engines (×11 default) │
│ Each runs in its own goroutine │
│ Failures → unresponsive_engines │
└─────────────────────────────────────┘
```
## Docker
The Dockerfile uses a multi-stage build with a static Go binary on alpine Linux:
```bash
# Build: golang:1.24-alpine
# Runtime: alpine:3.21 (~20MB)
# CGO_ENABLED=0 — fully static
docker compose up -d
```
Includes Valkey 8 with health checks out of the box.
## Contributing
See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) for a walkthrough of adding a new engine. The interface is two methods: `Name()` and `Search(context, request)`.
## License
[AGPLv3](https://www.gnu.org/licenses/agpl-3.0.html)