Commit graph

6 commits

Author SHA1 Message Date
b3e3123612 security: fix build errors, add honest Google UA, sanitize error msgs
- Fix config validation: upstream URLs allow private IPs (self-hosted)
- Fix util.SafeURLScheme to return parsed URL
- Replace spoofed GSA User-Agent with honest Kafka UA
- Sanitize all engine error messages (strip response bodies)
- Replace unused body reads with io.Copy(io.Discard, ...) for reuse
- Fix pre-existing braveapi_test using wrong struct type
- Fix ratelimit test reference to limiter variable
- Update ratelimit tests for new trusted proxy behavior
2026-03-22 16:27:49 +00:00
da367a1bfd security: harden against SAST findings (criticals through mediums)
Critical:
- Validate baseURL/sourceURL/upstreamURL at config load time
  (prevents XML injection, XSS, SSRF via config/env manipulation)
- Use xml.Escape for OpenSearch XML template interpolation

High:
- Add security headers middleware (CSP, X-Frame-Options, HSTS, etc.)
- Sanitize result URLs to reject javascript:/data: schemes
- Sanitize infobox img_src against dangerous URL schemes
- Default CORS to deny-all (was wildcard *)

Medium:
- Rate limiter: X-Forwarded-For only trusted from configured proxies
- Validate engine names against known registry allowlist
- Add 1024-char max query length
- Sanitize upstream error messages (strip raw response bodies)
- Upstream client validates URL scheme (http/https only)

Test updates:
- Update extractIP tests for new trusted proxy behavior
2026-03-22 16:22:27 +00:00
7be03b4017 license: change from MIT to AGPLv3
Update LICENSE file and add AGPL header to all source files.

AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
2026-03-22 08:27:23 +00:00
6346fb7155 chore: update Go module path to github.com/metamorphosis-dev/kafka
Module path now matches the GitHub mirror location.
All internal imports updated across 35+ files.
2026-03-21 19:42:01 +00:00
e5295fa69d chore: rename project from gosearch to kafka
A search engine named after a man who proved answers don't exist.

Renamed everywhere user-facing:
- Brand name, UI titles, OpenSearch description, CSS filename
- Docker service name, NixOS module (services.kafka)
- Cache key prefix (kafka:), User-Agent strings (kafka/0.1)
- README, config.example.toml, flake.nix descriptions

Kept unchanged (internal):
- Go module path: github.com/ashie/gosearch
- Git repository URL: git.ashisgreat.xyz/penal-colony/gosearch
- Binary entrypoint: cmd/searxng-go
2026-03-21 19:20:47 +00:00
df8fe9474b feat: add DuckDuckGo, GitHub, Reddit, and Bing engines
- DuckDuckGo: scrapes Lite HTML endpoint for results
  - Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
  - HTML parser extracts result links and snippets from DDG Lite markup
  - Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape

- GitHub: uses public Search API (repos, sorted by stars)
  - No auth required (10 req/min unauthenticated)
  - Shows stars, language, topics, last updated date
  - Paginated via GitHub's page parameter

- Reddit: uses public JSON search API
  - Respects safesearch (skips over_18 posts)
  - Shows subreddit, score, comment count
  - Links self-posts to the thread URL

- Bing: scrapes web search HTML (b_algo containers)
  - Extracts titles, URLs, and snippets from Bing's result markup
  - Handles Bing's tracking URL encoding

- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
  region mapping tests, live request tests (skipped in short mode)

9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
2026-03-21 16:52:11 +00:00