Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap sidebar time/type filters in a form with HTMX attributes so
filter changes trigger partial page updates instead of full reload.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wikipedia language subdomain was derived from user input without
validation, allowing attackers to redirect requests via malicious
language values like "evil.com.attacker.com". Added a whitelist of
valid Wikipedia language codes to prevent this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test GET /healthz, /, /search, /autocompleter endpoints.
Verify response codes, content types, JSON decoding, empty-query
redirect, and source URL presence in footer.
Also fix dead code in Search handler: the redirect for empty q
was unreachable because ParseSearchRequest errors on empty q first.
Move the q/format check before ParseSearchRequest to fix the redirect.
Trim or remove comments that:
- State the obvious (function names already convey purpose)
- Repeat what the code clearly shows
- Are excessively long without adding value
Keep comments that explain *why*, not *what*.
Thread source_url through: config.ServerConfig → Handler.sourceURL
→ PageData.SourceURL → template footer. Footer only shows Source
link when source_url is set.
Update LICENSE file and add AGPL header to all source files.
AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
- New CSS: complete design system with CSS variables, modern color palette
- Homepage: full-viewport hero with centered search, logo, tagline
- Result cards: rounded, shadowed, with favicons via Google Favicon API
- Layout: sidebar + results grid, responsive
- Typography: proper font stack, variable weights
- Settings panel: polished popover with animations
- Autocomplete: modern dropdown with keyboard nav
- Dark mode: full color palette via data-theme attribute
- Favicon: clean search icon SVG
- google.go: use inline (?s) flag instead of regexp.DotAll second arg
- youtube.go: remove Metadata field (not in MainResult contract)
- config_test.go: fix expected engine count from 9 to 11 (google+youtube)
html/template requires template names to be string literals, not field
accesses. Use {{if eq .Template "videos"}} to branch and call the
appropriate template by literal name.
Go html/template doesn't support function calls as template names in
{{template (func .Arg) .}}. Instead, precompute TemplateName in
FromResponse and use {{template .TemplateName .}} in the template.
MainResult: add Thumbnail field (used by YouTube, images, etc.)
video_item.html: new partial for video results with thumbnail display
views.go: add templateForResult func + video_item.html to template parse
results_inner.html: dispatch to video_item when Template="videos"
kafka.css: add .video-result flex layout with thumbnail styling
- Replace document.body.innerHTML with panel.querySelector('.settings-popover-body').innerHTML
- Use theme buttons (.theme-btn) with icons instead of radio buttons
- Use .engine-toggle class for engine checkboxes in 2-column grid
- Include settings-notice paragraph for engine changes
- Use dropdowns for safe search and format with proper ids
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
YouTube Data API v3 engine:
- Add YouTubeConfig to EnginesConfig with api_key field
- Add YOUTUBE_API_KEY env override
- Thread *config.Config through search service to factory
- Factory falls back to env vars if config fields are empty
- Update config.example.toml with youtube section
Also update default local_ported to include google and youtube.
Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY
environment variable (free from Google Cloud Console).
Returns video results with title, description, channel, publish
date, and thumbnail URL. Falls back gracefully if no API key.
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.
Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
- Rename cmd/searxng-go to cmd/kafka
- Remove all SearXNG references from source comments while keeping
"SearXNG-compatible API" in user-facing docs
- Update binary paths in README, CLAUDE.md, and Dockerfile
- Update log message to "kafka starting"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Inline JS in base.html: debounced fetch from /autocompleter on keyup
- Keyboard nav: arrows to navigate, Enter to select, Esc to close
- Highlight matching prefix in suggestions
- Click to select and submit
- Dropdown positioned absolutely below search input
- Dark mode compatible via existing CSS variables
Proxies to upstream SearXNG /autocompleter if configured, otherwise
falls back to Wikipedia OpenSearch API. Returns a JSON array of
suggestion strings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
duckduckgo, github, reddit, and bing were registered in factory.go
and config.go but missing from planner.go, so they were silently
skipped when LOCAL_PORTED_ENGINES was not set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Serve /opensearch.xml with configurable base URL
- Browsers can now add gosearch as a search engine from the address bar
- Configurable via [server] base_url or BASE_URL env var
- XML template embedded in the binary via go:embed
- Added base_url to config.example.toml
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)
Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
(?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked
Live test results:
- DuckDuckGo: 9 results ✅
- GitHub: 10 results (14,768 total) ✅
- Bing: 10 results via RSS ✅
- Reddit: skipped (403 from sandbox, needs browser-like context)
- DuckDuckGo: scrapes Lite HTML endpoint for results
- Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
- HTML parser extracts result links and snippets from DDG Lite markup
- Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape
- GitHub: uses public Search API (repos, sorted by stars)
- No auth required (10 req/min unauthenticated)
- Shows stars, language, topics, last updated date
- Paginated via GitHub's page parameter
- Reddit: uses public JSON search API
- Respects safesearch (skips over_18 posts)
- Shows subreddit, score, comment count
- Links self-posts to the thread URL
- Bing: scrapes web search HTML (b_algo containers)
- Extracts titles, URLs, and snippets from Bing's result markup
- Handles Bing's tracking URL encoding
- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
region mapping tests, live request tests (skipped in short mode)
9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
- Add internal/views/ package with embedded templates and static files
- Go html/template with SearXNG-compatible CSS class names
- Dark mode via prefers-color-scheme, responsive layout, print styles
- HTMX integration:
- Debounced instant search (500ms) on the search input
- Form submission targets #results via hx-post
- Pagination buttons are HTMX-powered (swap results div only)
- HX-Request header detection for fragment vs full page rendering
- Template structure:
- base.html: full page layout with HTMX script, favicon, CSS
- index.html: homepage with centered search box
- results.html: full results page (wraps base + results_inner)
- results_inner.html: results fragment (HTMX partial + sidebar + pagination)
- result_item.html: reusable result article partial
- Smart format detection: browser requests (Accept: text/html) default to HTML,
API clients default to JSON
- Static files served at /static/ from embedded FS (CSS, favicon SVG)
- Index route at GET /
- Empty query on HTML format redirects to homepage
- Custom CSS (gosearch.css): clean, minimal, privacy-respecting aesthetic
with light/dark mode, responsive breakpoints, print stylesheet
- Add views package tests
CORS:
- Configurable allowed origins (wildcard "*" or specific domains)
- Handles OPTIONS preflight with configurable methods, headers, max-age
- Exposed headers support for browser API access
- Env override: CORS_ALLOWED_ORIGINS
Rate Limiting:
- In-memory per-IP sliding window counter
- Configurable request limit and time window
- Background goroutine cleans up stale IP entries
- HTTP 429 with Retry-After header when exceeded
- Extracts real IP from X-Forwarded-For and X-Real-IP (proxy-aware)
- Env overrides: RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW, RATE_LIMIT_CLEANUP_INTERVAL
- Set requests=0 in config to disable
Both wired into main.go as middleware chain: rate_limit → cors → handler.
Config example updated with [cors] and [rate_limit] sections.
Full test coverage for both middleware packages.
- Run all local engines in parallel using goroutines + sync.WaitGroup
- Individual engine failures are captured as unresponsive_engines entries
instead of aborting the entire search request
- Context cancellation is respected: cancelled engines report as unresponsive
- Upstream proxy failure is also gracefully handled (single unresponsive entry)
- Extract unresponsiveResponse() and emptyResponse() helpers for consistency
- Add comprehensive tests:
- ConcurrentEngines: verifies parallelism (2x100ms engines complete in ~100ms)
- GracefulDegradation: one engine fails, one succeeds, both represented
- AllEnginesFail: no error returned, all engines in unresponsive_engines
- ContextCancellation: engine respects context timeout, reports unresponsive