New brave.go: scrapes https://search.brave.com directly.
Extracts title, URL, snippet, and favicon from Brave's HTML.
No API key required.
Rename existing BraveAPIEngine (was BraveEngine) to avoid collision
with the new scraper. API engine stays as 'braveapi', scraper as 'brave'.
Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap sidebar time/type filters in a form with HTMX attributes so
filter changes trigger partial page updates instead of full reload.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wikipedia language subdomain was derived from user input without
validation, allowing attackers to redirect requests via malicious
language values like "evil.com.attacker.com". Added a whitelist of
valid Wikipedia language codes to prevent this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test GET /healthz, /, /search, /autocompleter endpoints.
Verify response codes, content types, JSON decoding, empty-query
redirect, and source URL presence in footer.
Also fix dead code in Search handler: the redirect for empty q
was unreachable because ParseSearchRequest errors on empty q first.
Move the q/format check before ParseSearchRequest to fix the redirect.
Trim or remove comments that:
- State the obvious (function names already convey purpose)
- Repeat what the code clearly shows
- Are excessively long without adding value
Keep comments that explain *why*, not *what*.
Thread source_url through: config.ServerConfig → Handler.sourceURL
→ PageData.SourceURL → template footer. Footer only shows Source
link when source_url is set.
Update LICENSE file and add AGPL header to all source files.
AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
- New CSS: complete design system with CSS variables, modern color palette
- Homepage: full-viewport hero with centered search, logo, tagline
- Result cards: rounded, shadowed, with favicons via Google Favicon API
- Layout: sidebar + results grid, responsive
- Typography: proper font stack, variable weights
- Settings panel: polished popover with animations
- Autocomplete: modern dropdown with keyboard nav
- Dark mode: full color palette via data-theme attribute
- Favicon: clean search icon SVG
- google.go: use inline (?s) flag instead of regexp.DotAll second arg
- youtube.go: remove Metadata field (not in MainResult contract)
- config_test.go: fix expected engine count from 9 to 11 (google+youtube)
html/template requires template names to be string literals, not field
accesses. Use {{if eq .Template "videos"}} to branch and call the
appropriate template by literal name.
Go html/template doesn't support function calls as template names in
{{template (func .Arg) .}}. Instead, precompute TemplateName in
FromResponse and use {{template .TemplateName .}} in the template.
MainResult: add Thumbnail field (used by YouTube, images, etc.)
video_item.html: new partial for video results with thumbnail display
views.go: add templateForResult func + video_item.html to template parse
results_inner.html: dispatch to video_item when Template="videos"
kafka.css: add .video-result flex layout with thumbnail styling
- Replace document.body.innerHTML with panel.querySelector('.settings-popover-body').innerHTML
- Use theme buttons (.theme-btn) with icons instead of radio buttons
- Use .engine-toggle class for engine checkboxes in 2-column grid
- Include settings-notice paragraph for engine changes
- Use dropdowns for safe search and format with proper ids
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
YouTube Data API v3 engine:
- Add YouTubeConfig to EnginesConfig with api_key field
- Add YOUTUBE_API_KEY env override
- Thread *config.Config through search service to factory
- Factory falls back to env vars if config fields are empty
- Update config.example.toml with youtube section
Also update default local_ported to include google and youtube.
Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY
environment variable (free from Google Cloud Console).
Returns video results with title, description, channel, publish
date, and thumbnail URL. Falls back gracefully if no API key.
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.
Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
- Rename cmd/searxng-go to cmd/kafka
- Remove all SearXNG references from source comments while keeping
"SearXNG-compatible API" in user-facing docs
- Update binary paths in README, CLAUDE.md, and Dockerfile
- Update log message to "kafka starting"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Inline JS in base.html: debounced fetch from /autocompleter on keyup
- Keyboard nav: arrows to navigate, Enter to select, Esc to close
- Highlight matching prefix in suggestions
- Click to select and submit
- Dropdown positioned absolutely below search input
- Dark mode compatible via existing CSS variables
Proxies to upstream SearXNG /autocompleter if configured, otherwise
falls back to Wikipedia OpenSearch API. Returns a JSON array of
suggestion strings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
duckduckgo, github, reddit, and bing were registered in factory.go
and config.go but missing from planner.go, so they were silently
skipped when LOCAL_PORTED_ENGINES was not set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Serve /opensearch.xml with configurable base URL
- Browsers can now add gosearch as a search engine from the address bar
- Configurable via [server] base_url or BASE_URL env var
- XML template embedded in the binary via go:embed
- Added base_url to config.example.toml
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)
Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
(?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked
Live test results:
- DuckDuckGo: 9 results ✅
- GitHub: 10 results (14,768 total) ✅
- Bing: 10 results via RSS ✅
- Reddit: skipped (403 from sandbox, needs browser-like context)
- DuckDuckGo: scrapes Lite HTML endpoint for results
- Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
- HTML parser extracts result links and snippets from DDG Lite markup
- Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape
- GitHub: uses public Search API (repos, sorted by stars)
- No auth required (10 req/min unauthenticated)
- Shows stars, language, topics, last updated date
- Paginated via GitHub's page parameter
- Reddit: uses public JSON search API
- Respects safesearch (skips over_18 posts)
- Shows subreddit, score, comment count
- Links self-posts to the thread URL
- Bing: scrapes web search HTML (b_algo containers)
- Extracts titles, URLs, and snippets from Bing's result markup
- Handles Bing's tracking URL encoding
- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
region mapping tests, live request tests (skipped in short mode)
9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
- Add internal/views/ package with embedded templates and static files
- Go html/template with SearXNG-compatible CSS class names
- Dark mode via prefers-color-scheme, responsive layout, print styles
- HTMX integration:
- Debounced instant search (500ms) on the search input
- Form submission targets #results via hx-post
- Pagination buttons are HTMX-powered (swap results div only)
- HX-Request header detection for fragment vs full page rendering
- Template structure:
- base.html: full page layout with HTMX script, favicon, CSS
- index.html: homepage with centered search box
- results.html: full results page (wraps base + results_inner)
- results_inner.html: results fragment (HTMX partial + sidebar + pagination)
- result_item.html: reusable result article partial
- Smart format detection: browser requests (Accept: text/html) default to HTML,
API clients default to JSON
- Static files served at /static/ from embedded FS (CSS, favicon SVG)
- Index route at GET /
- Empty query on HTML format redirects to homepage
- Custom CSS (gosearch.css): clean, minimal, privacy-respecting aesthetic
with light/dark mode, responsive breakpoints, print stylesheet
- Add views package tests
CORS:
- Configurable allowed origins (wildcard "*" or specific domains)
- Handles OPTIONS preflight with configurable methods, headers, max-age
- Exposed headers support for browser API access
- Env override: CORS_ALLOWED_ORIGINS
Rate Limiting:
- In-memory per-IP sliding window counter
- Configurable request limit and time window
- Background goroutine cleans up stale IP entries
- HTTP 429 with Retry-After header when exceeded
- Extracts real IP from X-Forwarded-For and X-Real-IP (proxy-aware)
- Env overrides: RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW, RATE_LIMIT_CLEANUP_INTERVAL
- Set requests=0 in config to disable
Both wired into main.go as middleware chain: rate_limit → cors → handler.
Config example updated with [cors] and [rate_limit] sections.
Full test coverage for both middleware packages.
- Run all local engines in parallel using goroutines + sync.WaitGroup
- Individual engine failures are captured as unresponsive_engines entries
instead of aborting the entire search request
- Context cancellation is respected: cancelled engines report as unresponsive
- Upstream proxy failure is also gracefully handled (single unresponsive entry)
- Extract unresponsiveResponse() and emptyResponse() helpers for consistency
- Add comprehensive tests:
- ConcurrentEngines: verifies parallelism (2x100ms engines complete in ~100ms)
- GracefulDegradation: one engine fails, one succeeds, both represented
- AllEnginesFail: no error returned, all engines in unresponsive_engines
- ContextCancellation: engine respects context timeout, reports unresponsive