Commit graph

8 commits

Author SHA1 Message Date
8e9aae062b rename: kafka → samsa
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 11s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 42s
Full project rename from kafka to samsa (after Gregor Samsa, who
woke one morning from uneasy dreams to find himself transformed).

- Module: github.com/metamorphosis-dev/kafka → samsa
- Binary: cmd/kafka/ → cmd/samsa/
- CSS: kafka.css → samsa.css
- UI: all 'kafka' product names, titles, localStorage keys → samsa
- localStorage keys: kafka-theme → samsa-theme, kafka-engines → samsa-engines
- OpenSearch: ShortName, LongName, description, URLs updated
- AGPL headers: 'kafka' → 'samsa'
- Docs, configs, examples updated
- Cache key prefix: kafka: → samsa:
2026-03-22 23:44:55 +00:00
b3e3123612 security: fix build errors, add honest Google UA, sanitize error msgs
- Fix config validation: upstream URLs allow private IPs (self-hosted)
- Fix util.SafeURLScheme to return parsed URL
- Replace spoofed GSA User-Agent with honest Kafka UA
- Sanitize all engine error messages (strip response bodies)
- Replace unused body reads with io.Copy(io.Discard, ...) for reuse
- Fix pre-existing braveapi_test using wrong struct type
- Fix ratelimit test reference to limiter variable
- Update ratelimit tests for new trusted proxy behavior
2026-03-22 16:27:49 +00:00
da367a1bfd security: harden against SAST findings (criticals through mediums)
Critical:
- Validate baseURL/sourceURL/upstreamURL at config load time
  (prevents XML injection, XSS, SSRF via config/env manipulation)
- Use xml.Escape for OpenSearch XML template interpolation

High:
- Add security headers middleware (CSP, X-Frame-Options, HSTS, etc.)
- Sanitize result URLs to reject javascript:/data: schemes
- Sanitize infobox img_src against dangerous URL schemes
- Default CORS to deny-all (was wildcard *)

Medium:
- Rate limiter: X-Forwarded-For only trusted from configured proxies
- Validate engine names against known registry allowlist
- Add 1024-char max query length
- Sanitize upstream error messages (strip raw response bodies)
- Upstream client validates URL scheme (http/https only)

Test updates:
- Update extractIP tests for new trusted proxy behavior
2026-03-22 16:22:27 +00:00
7969b724de fix(engines): remove unsupported lookahead from Google regex
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 41s
Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:16:04 +01:00
5b942a5fd6 refactor: clean up verbose and redundant comments
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 25s
Trim or remove comments that:
- State the obvious (function names already convey purpose)
- Repeat what the code clearly shows
- Are excessively long without adding value

Keep comments that explain *why*, not *what*.
2026-03-22 11:10:50 +00:00
7be03b4017 license: change from MIT to AGPLv3
Update LICENSE file and add AGPL header to all source files.

AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
2026-03-22 08:27:23 +00:00
f1436310eb fix: regexp.DotAll flag in google engine and Metadata field removal
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
- google.go: use inline (?s) flag instead of regexp.DotAll second arg
- youtube.go: remove Metadata field (not in MainResult contract)
- config_test.go: fix expected engine count from 9 to 11 (google+youtube)
2026-03-22 02:54:12 +00:00
4be9cf2725 feat: add Google engine using GSA User-Agent scraping
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.

Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
2026-03-22 01:29:46 +00:00