224 lines
6.8 KiB
Markdown
224 lines
6.8 KiB
Markdown
# kafka
|
||
|
||
A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible API with an HTML frontend, designed to be fast, lightweight, and deployable anywhere.
|
||
|
||
**9 engines. No JavaScript. No tracking. One binary.**
|
||
|
||
## Features
|
||
|
||
- **SearXNG-compatible API** — drop-in replacement for existing integrations
|
||
- **9 search engines** — Wikipedia, arXiv, Crossref, Brave, Qwant, DuckDuckGo, GitHub, Reddit, Bing
|
||
- **HTML frontend** — HTMX + Go templates with instant search, dark mode, responsive design
|
||
- **Valkey cache** — optional Redis-compatible caching with configurable TTL
|
||
- **Rate limiting** — three layers: per-IP, burst, and global (all disabled by default)
|
||
- **CORS** — configurable origins for browser-based clients
|
||
- **OpenSearch** — browsers can add kafka as a search engine from the address bar
|
||
- **Graceful degradation** — individual engine failures don't kill the whole search
|
||
- **Docker** — multi-stage build, ~20MB runtime image
|
||
- **NixOS** — native NixOS module with systemd service
|
||
|
||
## Quick Start
|
||
|
||
### Binary
|
||
|
||
```bash
|
||
git clone https://git.ashisgreat.xyz/penal-colony/gosearch.git
|
||
cd kafka
|
||
go build ./cmd/kafka
|
||
./kafka -config config.toml
|
||
```
|
||
|
||
### Docker Compose
|
||
|
||
```bash
|
||
cp config.example.toml config.toml
|
||
# Edit config.toml — set your Brave API key, etc.
|
||
docker compose up -d
|
||
```
|
||
|
||
### NixOS
|
||
|
||
Add to your flake inputs:
|
||
|
||
```nix
|
||
inputs.kafka.url = "git+https://git.ashisgreat.xyz/penal-colony/gosearch.git";
|
||
```
|
||
|
||
Enable in your configuration:
|
||
|
||
```nix
|
||
imports = [ inputs.kafka.nixosModules.default ];
|
||
|
||
services.kafka = {
|
||
enable = true;
|
||
openFirewall = true;
|
||
baseUrl = "https://search.example.com";
|
||
# config = "/etc/kafka/config.toml"; # default
|
||
};
|
||
```
|
||
|
||
Write your config:
|
||
|
||
```bash
|
||
sudo mkdir -p /etc/kafka
|
||
sudo cp config.example.toml /etc/kafka/config.toml
|
||
sudo $EDITOR /etc/kafka/config.toml
|
||
```
|
||
|
||
Deploy:
|
||
|
||
```bash
|
||
sudo nixos-rebuild switch --flake .#
|
||
```
|
||
|
||
### Nix Development Shell
|
||
|
||
```bash
|
||
nix develop
|
||
go test ./...
|
||
go run ./cmd/kafka -config config.toml
|
||
```
|
||
|
||
## Endpoints
|
||
|
||
| Endpoint | Description |
|
||
|---|---|
|
||
| `GET /` | HTML search page |
|
||
| `GET /search?q=…&format=html` | HTML results (full page or HTMX fragment) |
|
||
| `GET/POST /search` | JSON/CSV/RSS results |
|
||
| `GET /opensearch.xml` | OpenSearch description XML |
|
||
| `GET /healthz` | Health check |
|
||
| `GET /static/*` | Embedded CSS, images, favicon |
|
||
|
||
## Search API
|
||
|
||
### Parameters
|
||
|
||
| Parameter | Default | Description |
|
||
|---|---|---|
|
||
| `q` | — | Search query (required) |
|
||
| `format` | `json` | `json`, `csv`, `rss`, `html` |
|
||
| `pageno` | `1` | Page number |
|
||
| `safesearch` | `0` | Safe search level (0–2) |
|
||
| `time_range` | — | `day`, `week`, `month`, `year` |
|
||
| `language` | `auto` | BCP-47 language code |
|
||
| `engines` | all | Comma-separated engine names |
|
||
|
||
### Example
|
||
|
||
```bash
|
||
curl "http://localhost:8080/search?q=golang&format=json&engines=github,duckduckgo"
|
||
```
|
||
|
||
### Response (JSON)
|
||
|
||
```json
|
||
{
|
||
"query": "golang",
|
||
"number_of_results": 14768,
|
||
"results": [
|
||
{
|
||
"title": "The Go Programming Language",
|
||
"url": "https://go.dev/",
|
||
"content": "Go is an open source programming language...",
|
||
"engine": "duckduckgo",
|
||
"score": 1.0,
|
||
"type": "result"
|
||
}
|
||
],
|
||
"suggestions": ["golang tutorial", "golang vs rust"],
|
||
"unresponsive_engines": []
|
||
}
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Copy `config.example.toml` to `config.toml` and edit. All settings can also be overridden via environment variables (listed in the example file).
|
||
|
||
### Key Sections
|
||
|
||
- **`[server]`** — port, timeout, public base URL for OpenSearch
|
||
- **`[upstream]`** — optional upstream metasearch proxy for unported engines
|
||
- **`[engines]`** — which engines run locally, engine-specific settings
|
||
- **`[cache]`** — Valkey/Redis address, password, TTL
|
||
- **`[cors]`** — allowed origins and methods
|
||
- **`[rate_limit]`** — per-IP sliding window (30 req/min default)
|
||
- **`[global_rate_limit]`** — server-wide limit (disabled by default)
|
||
- **`[burst_rate_limit]`** — per-IP burst + sustained windows (disabled by default)
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Description |
|
||
|---|---|
|
||
| `PORT` | Listen port (default: 8080) |
|
||
| `BASE_URL` | Public URL for OpenSearch XML |
|
||
| `UPSTREAM_SEARXNG_URL` | Upstream instance URL |
|
||
| `LOCAL_PORTED_ENGINES` | Comma-separated local engine list |
|
||
| `HTTP_TIMEOUT` | Upstream request timeout |
|
||
| `BRAVE_API_KEY` | Brave Search API key |
|
||
| `BRAVE_ACCESS_TOKEN` | Gate requests with token |
|
||
| `VALKEY_ADDRESS` | Valkey/Redis address |
|
||
| `VALKEY_PASSWORD` | Valkey/Redis password |
|
||
| `VALKEY_CACHE_TTL` | Cache TTL |
|
||
|
||
See `config.example.toml` for the full list including rate limiting and CORS variables.
|
||
|
||
## Engines
|
||
|
||
| Engine | Source | Notes |
|
||
|---|---|---|
|
||
| Wikipedia | MediaWiki API | General knowledge |
|
||
| arXiv | arXiv API | Academic papers |
|
||
| Crossref | Crossref API | Academic metadata |
|
||
| Brave | Brave Search API | General web (requires API key) |
|
||
| Qwant | Qwant Lite HTML | General web |
|
||
| DuckDuckGo | DDG Lite HTML | General web |
|
||
| GitHub | GitHub Search API v3 | Code and repositories |
|
||
| Reddit | Reddit JSON API | Discussions |
|
||
| Bing | Bing RSS | General web |
|
||
|
||
Engines not listed in `engines.local_ported` are proxied to an upstream metasearch instance if `upstream.url` is configured.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────┐
|
||
│ HTTP Handler │
|
||
│ /search / /opensearch.xml │
|
||
├─────────────────────────────────────┤
|
||
│ Middleware Chain │
|
||
│ Global → Burst → Per-IP → CORS │
|
||
├─────────────────────────────────────┤
|
||
│ Search Service │
|
||
│ Parallel engine execution │
|
||
│ WaitGroup + graceful degradation │
|
||
├─────────────────────────────────────┤
|
||
│ Cache Layer │
|
||
│ Valkey/Redis (optional, no-op if │
|
||
│ unconfigured) │
|
||
├─────────────────────────────────────┤
|
||
│ Engines (×9) │
|
||
│ Each runs in its own goroutine │
|
||
│ Failures → unresponsive_engines │
|
||
└─────────────────────────────────────┘
|
||
```
|
||
|
||
## Docker
|
||
|
||
The Dockerfile uses a multi-stage build:
|
||
|
||
```dockerfile
|
||
# Build stage: golang:1.24-alpine
|
||
# Runtime stage: alpine:3.21 (~20MB)
|
||
# CGO_ENABLED=0 — static binary
|
||
```
|
||
|
||
```bash
|
||
docker compose up -d
|
||
```
|
||
|
||
Includes Valkey 8 with health checks out of the box.
|
||
|
||
## License
|
||
|
||
[AGPLv3](https://www.gnu.org/licenses/agpl-3.0.html)
|