samsa

Author	SHA1	Message	Date
Franz Kafka	9e95ce7b53	perf: shared http.Transport with tuned connection pooling Add internal/httpclient package as a singleton RoundTripper used by all outbound engine requests (search, engines, autocomplete, upstream). Key Transport settings: - MaxIdleConnsPerHost = 20 (up from Go default of 2) - MaxIdleConns = 100 - IdleConnTimeout = 90s - DialContext timeout = 5s Previously, the default transport limited each host to 2 idle connections, forcing a new TCP+TLS handshake on every search for each engine. With 12 engines hitting the same upstream hosts in parallel, connections were constantly recycled. Now warm connections are reused across all goroutines and requests.	2026-03-23 14:26:26 +00:00
Franz Kafka	8e9aae062b	rename: kafka → samsa Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 11s Details Mirror to GitHub / mirror (push) Failing after 5s Details Tests / test (push) Successful in 42s Details Full project rename from kafka to samsa (after Gregor Samsa, who woke one morning from uneasy dreams to find himself transformed). - Module: github.com/metamorphosis-dev/kafka → samsa - Binary: cmd/kafka/ → cmd/samsa/ - CSS: kafka.css → samsa.css - UI: all 'kafka' product names, titles, localStorage keys → samsa - localStorage keys: kafka-theme → samsa-theme, kafka-engines → samsa-engines - OpenSearch: ShortName, LongName, description, URLs updated - AGPL headers: 'kafka' → 'samsa' - Docs, configs, examples updated - Cache key prefix: kafka: → samsa:	2026-03-22 23:44:55 +00:00
Franz Kafka	df67492602	feat: add Stack Overflow search engine Uses the Stack Exchange API v3 (/search/advanced) to find questions sorted by relevance. No API key required (300 req/day); optionally configure via STACKOVERFLOW_KEY env var or [engines.stackoverflow]. Results include score, answer count, view count, and tags in the snippet. Assigned to the 'it' category, triggered by the IT category tab or explicit engine selection. 6 tests covering parsing, edge cases, and helpers.	2026-03-22 22:29:34 +00:00
Franz Kafka	2b072e4de3	feat: add image search with Bing, DuckDuckGo, and Qwant engines Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 6s Details Mirror to GitHub / mirror (push) Failing after 3s Details Tests / test (push) Successful in 25s Details Three new image search engines: - bing_images: Bing Images via RSS endpoint - ddg_images: DuckDuckGo Images via VQD API - qwant_images: Qwant Images via v3 search API Frontend: - Image grid layout with responsive columns - image_item template with thumbnail, title, and source metadata - Hover animations and lazy loading - Grid activates automatically when category=images Backend: - category=images routes to image engines via planner - Image engines registered in factory and engine allowlist - extractImgSrc helper for parsing thumbnail URLs from HTML - IsImageSearch flag on PageData for template layout switching	2026-03-22 16:49:24 +00:00
Franz Kafka	b3e3123612	security: fix build errors, add honest Google UA, sanitize error msgs - Fix config validation: upstream URLs allow private IPs (self-hosted) - Fix util.SafeURLScheme to return parsed URL - Replace spoofed GSA User-Agent with honest Kafka UA - Sanitize all engine error messages (strip response bodies) - Replace unused body reads with io.Copy(io.Discard, ...) for reuse - Fix pre-existing braveapi_test using wrong struct type - Fix ratelimit test reference to limiter variable - Update ratelimit tests for new trusted proxy behavior	2026-03-22 16:27:49 +00:00
Franz Kafka	da367a1bfd	security: harden against SAST findings (criticals through mediums) Critical: - Validate baseURL/sourceURL/upstreamURL at config load time (prevents XML injection, XSS, SSRF via config/env manipulation) - Use xml.Escape for OpenSearch XML template interpolation High: - Add security headers middleware (CSP, X-Frame-Options, HSTS, etc.) - Sanitize result URLs to reject javascript:/data: schemes - Sanitize infobox img_src against dangerous URL schemes - Default CORS to deny-all (was wildcard *) Medium: - Rate limiter: X-Forwarded-For only trusted from configured proxies - Validate engine names against known registry allowlist - Add 1024-char max query length - Sanitize upstream error messages (strip raw response bodies) - Upstream client validates URL scheme (http/https only) Test updates: - Update extractIP tests for new trusted proxy behavior	2026-03-22 16:22:27 +00:00
Franz Kafka	2d22a8cdbb	feat: add Brave web search scraper engine New brave.go: scrapes https://search.brave.com directly. Extracts title, URL, snippet, and favicon from Brave's HTML. No API key required. Rename existing BraveAPIEngine (was BraveEngine) to avoid collision with the new scraper. API engine stays as 'braveapi', scraper as 'brave'.	2026-03-22 16:01:49 +00:00
ashisgreat22	7969b724de	fix(engines): remove unsupported lookahead from Google regex Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 6s Details Mirror to GitHub / mirror (push) Failing after 3s Details Tests / test (push) Successful in 41s Details Go's regexp package doesn't support Perl lookahead (?=...). Removing the unnecessary lookahead since each MjjYud div is self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 14:16:04 +01:00
ashisgreat22	d21e9189b8	fix(engines): validate Wikipedia language codes to prevent SSRF Wikipedia language subdomain was derived from user input without validation, allowing attackers to redirect requests via malicious language values like "evil.com.attacker.com". Added a whitelist of valid Wikipedia language codes to prevent this. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 13:22:52 +01:00
ashisgreat22	f172da33ef	fix(engines): cap Brave API offset to 9 to avoid 422 error Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 6s Details Mirror to GitHub / mirror (push) Failing after 3s Details Tests / test (push) Successful in 24s Details Brave API only supports offset values 0-9. When pageno > 1 with resultsPerPage=20, offset exceeded this limit causing 422 errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 12:01:25 +00:00
Franz Kafka	5b942a5fd6	refactor: clean up verbose and redundant comments Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 7s Details Mirror to GitHub / mirror (push) Failing after 3s Details Tests / test (push) Successful in 25s Details Trim or remove comments that: - State the obvious (function names already convey purpose) - Repeat what the code clearly shows - Are excessively long without adding value Keep comments that explain why, not what.	2026-03-22 11:10:50 +00:00
Franz Kafka	7be03b4017	license: change from MIT to AGPLv3 Update LICENSE file and add AGPL header to all source files. AGPLv3 ensures that if someone runs Kafka as a network service and modifies it, they must release their source code under the same license.	2026-03-22 08:27:23 +00:00
Franz Kafka	f1436310eb	fix: regexp.DotAll flag in google engine and Metadata field removal Some checks failed Build and Push Docker Image / build-and-push (push) Failing after 7s Details Mirror to GitHub / mirror (push) Failing after 3s Details Tests / test (push) Successful in 21s Details - google.go: use inline (?s) flag instead of regexp.DotAll second arg - youtube.go: remove Metadata field (not in MainResult contract) - config_test.go: fix expected engine count from 9 to 11 (google+youtube)	2026-03-22 02:54:12 +00:00
Franz Kafka	a7f594b7fa	feat: add YouTube engine with config file and env support YouTube Data API v3 engine: - Add YouTubeConfig to EnginesConfig with api_key field - Add YOUTUBE_API_KEY env override - Thread *config.Config through search service to factory - Factory falls back to env vars if config fields are empty - Update config.example.toml with youtube section Also update default local_ported to include google and youtube.	2026-03-22 01:57:13 +00:00
Franz Kafka	1689cab9bd	feat: add YouTube engine via Data API v3 Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY environment variable (free from Google Cloud Console). Returns video results with title, description, channel, publish date, and thumbnail URL. Falls back gracefully if no API key.	2026-03-22 01:53:19 +00:00
Franz Kafka	31fdd5e06f	Merge branch 'feat/google-engine', remote-tracking branch 'origin/main'	2026-03-22 01:35:20 +00:00
Franz Kafka	4be9cf2725	feat: add Google engine using GSA User-Agent scraping SearXNG approach: use Google Search Appliance (GSA) User-Agent pool — these are whitelisted enterprise identifiers Google trusts. Key techniques: - GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop - CONSENT=YES+ cookie to bypass EU consent wall - Parse /url?q= redirector URLs (unquote + strip &sa= params) - div.MjjYud class for result containers (SearXNG selector) - data-sncf divs for snippets - detect sorry.google.com blocks - Suggestions from ouy7Mc class cards	2026-03-22 01:29:46 +00:00
ashisgreat22	fcd9be16df	refactor: remove SearXNG references and rename binary to kafka - Rename cmd/searxng-go to cmd/kafka - Remove all SearXNG references from source comments while keeping "SearXNG-compatible API" in user-facing docs - Update binary paths in README, CLAUDE.md, and Dockerfile - Update log message to "kafka starting" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 01:47:03 +01:00
ashisgreat22	0d3f3c19d7	fix: add missing engines to defaultPortedEngines duckduckgo, github, reddit, and bing were registered in factory.go and config.go but missing from planner.go, so they were silently skipped when LOCAL_PORTED_ENGINES was not set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 00:13:57 +01:00
Franz Kafka	6346fb7155	chore: update Go module path to github.com/metamorphosis-dev/kafka Module path now matches the GitHub mirror location. All internal imports updated across 35+ files.	2026-03-21 19:42:01 +00:00
Franz Kafka	e5295fa69d	chore: rename project from gosearch to kafka A search engine named after a man who proved answers don't exist. Renamed everywhere user-facing: - Brand name, UI titles, OpenSearch description, CSS filename - Docker service name, NixOS module (services.kafka) - Cache key prefix (kafka:), User-Agent strings (kafka/0.1) - README, config.example.toml, flake.nix descriptions Kept unchanged (internal): - Go module path: github.com/ashie/gosearch - Git repository URL: git.ashisgreat.xyz/penal-colony/gosearch - Binary entrypoint: cmd/searxng-go	2026-03-21 19:20:47 +00:00
Franz Kafka	a8ab29b23a	fix: fix DDG and Bing parsers — verified with live tests DuckDuckGo: - Fixed parser to handle single-quoted class attributes (class='result-link') - Decode DDG tracking URLs (uddg= parameter) to extract real URLs - Match snippet extraction to actual DDG Lite HTML structure (</td> terminator) Bing: - Switched from HTML scraping (blocked by JS detection) to RSS endpoint (?format=rss) which returns parseable XML - Added JSON API response parsing as fallback - Returns graceful unresponsive_engines entry when blocked Live test results: - DuckDuckGo: 9 results ✅ - GitHub: 10 results (14,768 total) ✅ - Bing: 10 results via RSS ✅ - Reddit: skipped (403 from sandbox, needs browser-like context)	2026-03-21 16:57:02 +00:00
Franz Kafka	df8fe9474b	feat: add DuckDuckGo, GitHub, Reddit, and Bing engines - DuckDuckGo: scrapes Lite HTML endpoint for results - Language-aware region mapping (de→de-de, ja→jp-jp, etc.) - HTML parser extracts result links and snippets from DDG Lite markup - Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape - GitHub: uses public Search API (repos, sorted by stars) - No auth required (10 req/min unauthenticated) - Shows stars, language, topics, last updated date - Paginated via GitHub's page parameter - Reddit: uses public JSON search API - Respects safesearch (skips over_18 posts) - Shows subreddit, score, comment count - Links self-posts to the thread URL - Bing: scrapes web search HTML (b_algo containers) - Extracts titles, URLs, and snippets from Bing's result markup - Handles Bing's tracking URL encoding - Updated factory, config defaults, and config.example.toml - Full test suite: unit tests for all engines, HTML parsing tests, region mapping tests, live request tests (skipped in short mode) 9 engines total: wikipedia, arxiv, crossref, braveapi, qwant, duckduckgo, github, reddit, bing	2026-03-21 16:52:11 +00:00
Franz Kafka	dc44837219	feat: build Go-based SearXNG-compatible search service Implement an API-first Go rewrite with local engine adapters, upstream fallback, and Nix-based tooling so searches can run without matching the original UI while preserving response compatibility. Made-with: Cursor	2026-03-20 20:34:08 +01:00

24 commits