Add internal/httpclient package as a singleton RoundTripper used by
all outbound engine requests (search, engines, autocomplete, upstream).
Key Transport settings:
- MaxIdleConnsPerHost = 20 (up from Go default of 2)
- MaxIdleConns = 100
- IdleConnTimeout = 90s
- DialContext timeout = 5s
Previously, the default transport limited each host to 2 idle connections,
forcing a new TCP+TLS handshake on every search for each engine. With
12 engines hitting the same upstream hosts in parallel, connections
were constantly recycled. Now warm connections are reused across all
goroutines and requests.
Uses the Stack Exchange API v3 (/search/advanced) to find questions
sorted by relevance. No API key required (300 req/day); optionally
configure via STACKOVERFLOW_KEY env var or [engines.stackoverflow].
Results include score, answer count, view count, and tags in the
snippet. Assigned to the 'it' category, triggered by the IT category
tab or explicit engine selection.
6 tests covering parsing, edge cases, and helpers.
Three new image search engines:
- bing_images: Bing Images via RSS endpoint
- ddg_images: DuckDuckGo Images via VQD API
- qwant_images: Qwant Images via v3 search API
Frontend:
- Image grid layout with responsive columns
- image_item template with thumbnail, title, and source metadata
- Hover animations and lazy loading
- Grid activates automatically when category=images
Backend:
- category=images routes to image engines via planner
- Image engines registered in factory and engine allowlist
- extractImgSrc helper for parsing thumbnail URLs from HTML
- IsImageSearch flag on PageData for template layout switching
New brave.go: scrapes https://search.brave.com directly.
Extracts title, URL, snippet, and favicon from Brave's HTML.
No API key required.
Rename existing BraveAPIEngine (was BraveEngine) to avoid collision
with the new scraper. API engine stays as 'braveapi', scraper as 'brave'.
Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wikipedia language subdomain was derived from user input without
validation, allowing attackers to redirect requests via malicious
language values like "evil.com.attacker.com". Added a whitelist of
valid Wikipedia language codes to prevent this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Trim or remove comments that:
- State the obvious (function names already convey purpose)
- Repeat what the code clearly shows
- Are excessively long without adding value
Keep comments that explain *why*, not *what*.
Update LICENSE file and add AGPL header to all source files.
AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
- google.go: use inline (?s) flag instead of regexp.DotAll second arg
- youtube.go: remove Metadata field (not in MainResult contract)
- config_test.go: fix expected engine count from 9 to 11 (google+youtube)
YouTube Data API v3 engine:
- Add YouTubeConfig to EnginesConfig with api_key field
- Add YOUTUBE_API_KEY env override
- Thread *config.Config through search service to factory
- Factory falls back to env vars if config fields are empty
- Update config.example.toml with youtube section
Also update default local_ported to include google and youtube.
Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY
environment variable (free from Google Cloud Console).
Returns video results with title, description, channel, publish
date, and thumbnail URL. Falls back gracefully if no API key.
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.
Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
- Rename cmd/searxng-go to cmd/kafka
- Remove all SearXNG references from source comments while keeping
"SearXNG-compatible API" in user-facing docs
- Update binary paths in README, CLAUDE.md, and Dockerfile
- Update log message to "kafka starting"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
duckduckgo, github, reddit, and bing were registered in factory.go
and config.go but missing from planner.go, so they were silently
skipped when LOCAL_PORTED_ENGINES was not set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)
Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
(?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked
Live test results:
- DuckDuckGo: 9 results ✅
- GitHub: 10 results (14,768 total) ✅
- Bing: 10 results via RSS ✅
- Reddit: skipped (403 from sandbox, needs browser-like context)
- DuckDuckGo: scrapes Lite HTML endpoint for results
- Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
- HTML parser extracts result links and snippets from DDG Lite markup
- Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape
- GitHub: uses public Search API (repos, sorted by stars)
- No auth required (10 req/min unauthenticated)
- Shows stars, language, topics, last updated date
- Paginated via GitHub's page parameter
- Reddit: uses public JSON search API
- Respects safesearch (skips over_18 posts)
- Shows subreddit, score, comment count
- Links self-posts to the thread URL
- Bing: scrapes web search HTML (b_algo containers)
- Extracts titles, URLs, and snippets from Bing's result markup
- Handles Bing's tracking URL encoding
- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
region mapping tests, live request tests (skipped in short mode)
9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
Implement an API-first Go rewrite with local engine adapters, upstream fallback, and Nix-based tooling so searches can run without matching the original UI while preserving response compatibility.
Made-with: Cursor