Commit graph

53 commits

Author SHA1 Message Date
2d22a8cdbb feat: add Brave web search scraper engine
New brave.go: scrapes https://search.brave.com directly.
Extracts title, URL, snippet, and favicon from Brave's HTML.
No API key required.

Rename existing BraveAPIEngine (was BraveEngine) to avoid collision
with the new scraper. API engine stays as 'braveapi', scraper as 'brave'.
2026-03-22 16:01:49 +00:00
7969b724de fix(engines): remove unsupported lookahead from Google regex
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 41s
Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:16:04 +01:00
e18a54a41a fix(frontend): add HTMX filter submission for sidebar radio buttons
Wrap sidebar time/type filters in a form with HTMX attributes so
filter changes trigger partial page updates instead of full reload.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:05:26 +01:00
6d7e68ada1 feat(frontend): reduce popover to theme+engines, add preferences page JS 2026-03-22 14:00:53 +01:00
0afcf509c3 fix: use single Preferences handler with method check instead of dead POST route 2026-03-22 13:57:32 +01:00
70818558cd feat: add GET and POST /preferences route
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:53:23 +01:00
b4053b7f98 feat(frontend): add preferences page template and styles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:47:30 +01:00
3dbde9fbfd feat(frontend): add category tiles to homepage 2026-03-22 13:42:24 +01:00
bfcbd45c57 fix(frontend): update FromResponse tests and fix disabled categories rendering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:40:16 +01:00
0e79b729fe feat(frontend): add three-column results layout with left sidebar navigation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:36:09 +01:00
2e7075adf1 fix(frontend): merge duplicate sidebar sticky rules 2026-03-22 13:33:24 +01:00
0af49f91b7 feat(frontend): add CSS layout framework for three-column results and preferences page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:29:39 +01:00
d21e9189b8 fix(engines): validate Wikipedia language codes to prevent SSRF
Wikipedia language subdomain was derived from user input without
validation, allowing attackers to redirect requests via malicious
language values like "evil.com.attacker.com". Added a whitelist of
valid Wikipedia language codes to prevent this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:22:52 +01:00
f172da33ef fix(engines): cap Brave API offset to 9 to avoid 422 error
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 24s
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:01:25 +00:00
f1cf23745e test: add HTTP API integration tests
Some checks failed
Mirror to GitHub / mirror (push) Waiting to run
Tests / test (push) Waiting to run
Build and Push Docker Image / build-and-push (push) Has been cancelled
Test GET /healthz, /, /search, /autocompleter endpoints.
Verify response codes, content types, JSON decoding, empty-query
redirect, and source URL presence in footer.

Also fix dead code in Search handler: the redirect for empty q
was unreachable because ParseSearchRequest errors on empty q first.
Move the q/format check before ParseSearchRequest to fix the redirect.
2026-03-22 11:44:48 +00:00
5b942a5fd6 refactor: clean up verbose and redundant comments
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 25s
Trim or remove comments that:
- State the obvious (function names already convey purpose)
- Repeat what the code clearly shows
- Are excessively long without adding value

Keep comments that explain *why*, not *what*.
2026-03-22 11:10:50 +00:00
805e7ffdc2 feat: add source_url config option for footer source link
Thread source_url through: config.ServerConfig → Handler.sourceURL
→ PageData.SourceURL → template footer. Footer only shows Source
link when source_url is set.
2026-03-22 08:34:20 +00:00
bb0b97820b ui: add source and AGPL license links to footer 2026-03-22 08:29:04 +00:00
7be03b4017 license: change from MIT to AGPLv3
Update LICENSE file and add AGPL header to all source files.

AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
2026-03-22 08:27:23 +00:00
f7cece9648 feat: complete UI redesign — modern, clean search interface
- New CSS: complete design system with CSS variables, modern color palette
- Homepage: full-viewport hero with centered search, logo, tagline
- Result cards: rounded, shadowed, with favicons via Google Favicon API
- Layout: sidebar + results grid, responsive
- Typography: proper font stack, variable weights
- Settings panel: polished popover with animations
- Autocomplete: modern dropdown with keyboard nav
- Dark mode: full color palette via data-theme attribute
- Favicon: clean search icon SVG
2026-03-22 08:06:31 +00:00
f1436310eb fix: regexp.DotAll flag in google engine and Metadata field removal
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
- google.go: use inline (?s) flag instead of regexp.DotAll second arg
- youtube.go: remove Metadata field (not in MainResult contract)
- config_test.go: fix expected engine count from 9 to 11 (google+youtube)
2026-03-22 02:54:12 +00:00
b499db68f7 fix: use explicit if/else template dispatch instead of dynamic name
html/template requires template names to be string literals, not field
accesses. Use {{if eq .Template "videos"}} to branch and call the
appropriate template by literal name.
2026-03-22 02:46:28 +00:00
f0a65e2b8c fix: compute TemplateName in ResultView instead of using dynamic template function
Go html/template doesn't support function calls as template names in
{{template (func .Arg) .}}. Instead, precompute TemplateName in
FromResponse and use {{template .TemplateName .}} in the template.
2026-03-22 02:44:50 +00:00
4a6559be62 fix: add Thumbnail field and video result template
MainResult: add Thumbnail field (used by YouTube, images, etc.)
video_item.html: new partial for video results with thumbnail display
views.go: add templateForResult func + video_item.html to template parse
results_inner.html: dispatch to video_item when Template="videos"
kafka.css: add .video-result flex layout with thumbnail styling
2026-03-22 02:06:41 +00:00
a9ea99c104 Merge branch 'feat/youtube-engine' 2026-03-22 02:02:32 +00:00
277db9463e feat(settings): add hidden engines input to search forms 2026-03-22 03:00:12 +01:00
84777211f8 feat(settings): add gear trigger and panel markup to base template
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 03:00:12 +01:00
8e53a8b11d fix(settings): rewrite renderPanel to use panel body innerHTML, not document.body
- Replace document.body.innerHTML with panel.querySelector('.settings-popover-body').innerHTML
- Use theme buttons (.theme-btn) with icons instead of radio buttons
- Use .engine-toggle class for engine checkboxes in 2-column grid
- Include settings-notice paragraph for engine changes
- Use dropdowns for safe search and format with proper ids

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 03:00:12 +01:00
1906723859 fix(settings): re-render panel when last engine unchecked to enforce minimum
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 03:00:12 +01:00
2785b84939 feat(settings): add JS module for localStorage preferences and panel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 03:00:12 +01:00
4fe78c69ce fix(settings): add html[data-theme=light] explicit light mode reset 2026-03-22 03:00:12 +01:00
11480dacdf feat(settings): add popover, toggle, and bottom-sheet CSS 2026-03-22 03:00:12 +01:00
a7f594b7fa feat: add YouTube engine with config file and env support
YouTube Data API v3 engine:
- Add YouTubeConfig to EnginesConfig with api_key field
- Add YOUTUBE_API_KEY env override
- Thread *config.Config through search service to factory
- Factory falls back to env vars if config fields are empty
- Update config.example.toml with youtube section

Also update default local_ported to include google and youtube.
2026-03-22 01:57:13 +00:00
1689cab9bd feat: add YouTube engine via Data API v3
Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY
environment variable (free from Google Cloud Console).

Returns video results with title, description, channel, publish
date, and thumbnail URL. Falls back gracefully if no API key.
2026-03-22 01:53:19 +00:00
31fdd5e06f Merge branch 'feat/google-engine', remote-tracking branch 'origin/main' 2026-03-22 01:35:20 +00:00
4be9cf2725 feat: add Google engine using GSA User-Agent scraping
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.

Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
2026-03-22 01:29:46 +00:00
fcd9be16df refactor: remove SearXNG references and rename binary to kafka
- Rename cmd/searxng-go to cmd/kafka
- Remove all SearXNG references from source comments while keeping
  "SearXNG-compatible API" in user-facing docs
- Update binary paths in README, CLAUDE.md, and Dockerfile
- Update log message to "kafka starting"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 01:47:03 +01:00
a2f8077669 feat: add autocomplete dropdown UI with keyboard nav
- Inline JS in base.html: debounced fetch from /autocompleter on keyup
- Keyboard nav: arrows to navigate, Enter to select, Esc to close
- Highlight matching prefix in suggestions
- Click to select and submit
- Dropdown positioned absolutely below search input
- Dark mode compatible via existing CSS variables
2026-03-22 00:20:43 +00:00
9b280ad606 feat: add /autocompleter endpoint for search suggestions
Some checks failed
Mirror to GitHub / mirror (push) Waiting to run
Tests / test (push) Waiting to run
Build and Push Docker Image / build-and-push (push) Has been cancelled
Proxies to upstream SearXNG /autocompleter if configured, otherwise
falls back to Wikipedia OpenSearch API. Returns a JSON array of
suggestion strings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 01:06:25 +01:00
0d3f3c19d7 fix: add missing engines to defaultPortedEngines
duckduckgo, github, reddit, and bing were registered in factory.go
and config.go but missing from planner.go, so they were silently
skipped when LOCAL_PORTED_ENGINES was not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 00:13:57 +01:00
6346fb7155 chore: update Go module path to github.com/metamorphosis-dev/kafka
Module path now matches the GitHub mirror location.
All internal imports updated across 35+ files.
2026-03-21 19:42:01 +00:00
e5295fa69d chore: rename project from gosearch to kafka
A search engine named after a man who proved answers don't exist.

Renamed everywhere user-facing:
- Brand name, UI titles, OpenSearch description, CSS filename
- Docker service name, NixOS module (services.kafka)
- Cache key prefix (kafka:), User-Agent strings (kafka/0.1)
- README, config.example.toml, flake.nix descriptions

Kept unchanged (internal):
- Go module path: github.com/ashie/gosearch
- Git repository URL: git.ashisgreat.xyz/penal-colony/gosearch
- Binary entrypoint: cmd/searxng-go
2026-03-21 19:20:47 +00:00
13040268d6 feat: add global and burst rate limiters
Three layers of rate limiting, all disabled by default, opt-in via config:

1. Per-IP (existing): 30 req/min per IP
2. Global: server-wide limit across all IPs
   - Lock-free atomic counter for minimal overhead
   - Returns 503 when exceeded
   - Prevents pool exhaustion from distributed attacks
3. Burst: per-IP burst + sustained windows
   - Blocks rapid-fire abuse within seconds
   - Returns 429 with X-RateLimit-Reason header
   - Example: 5 req/5s burst, 60 req/min sustained

Config:
[global_rate_limit]
requests = 0  # disabled by default
window = "1m"

[burst_rate_limit]
burst = 0  # disabled by default
burst_window = "5s"
sustained = 0
sustained_window = "1m"

Env overrides: GLOBAL_RATE_LIMIT_REQUESTS, GLOBAL_RATE_LIMIT_WINDOW,
BURST_RATE_LIMIT_BURST, BURST_RATE_LIMIT_BURST_WINDOW,
BURST_RATE_LIMIT_SUSTAINED, BURST_RATE_LIMIT_SUSTAINED_WINDOW

Full test coverage: concurrent lock-free test, window expiry, disabled states,
IP isolation, burst vs sustained distinction.
2026-03-21 18:35:31 +00:00
4ec600f6c0 feat: add OpenSearch XML endpoint
- Serve /opensearch.xml with configurable base URL
- Browsers can now add gosearch as a search engine from the address bar
- Configurable via [server] base_url or BASE_URL env var
- XML template embedded in the binary via go:embed
- Added base_url to config.example.toml
2026-03-21 17:40:05 +00:00
a8ab29b23a fix: fix DDG and Bing parsers — verified with live tests
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)

Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
  (?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked

Live test results:
- DuckDuckGo: 9 results 
- GitHub: 10 results (14,768 total) 
- Bing: 10 results via RSS 
- Reddit: skipped (403 from sandbox, needs browser-like context)
2026-03-21 16:57:02 +00:00
df8fe9474b feat: add DuckDuckGo, GitHub, Reddit, and Bing engines
- DuckDuckGo: scrapes Lite HTML endpoint for results
  - Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
  - HTML parser extracts result links and snippets from DDG Lite markup
  - Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape

- GitHub: uses public Search API (repos, sorted by stars)
  - No auth required (10 req/min unauthenticated)
  - Shows stars, language, topics, last updated date
  - Paginated via GitHub's page parameter

- Reddit: uses public JSON search API
  - Respects safesearch (skips over_18 posts)
  - Shows subreddit, score, comment count
  - Links self-posts to the thread URL

- Bing: scrapes web search HTML (b_algo containers)
  - Extracts titles, URLs, and snippets from Bing's result markup
  - Handles Bing's tracking URL encoding

- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
  region mapping tests, live request tests (skipped in short mode)

9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
2026-03-21 16:52:11 +00:00
28b61ff251 feat: HTMX + Go Templates HTML frontend
- Add internal/views/ package with embedded templates and static files
- Go html/template with SearXNG-compatible CSS class names
- Dark mode via prefers-color-scheme, responsive layout, print styles
- HTMX integration:
  - Debounced instant search (500ms) on the search input
  - Form submission targets #results via hx-post
  - Pagination buttons are HTMX-powered (swap results div only)
  - HX-Request header detection for fragment vs full page rendering
- Template structure:
  - base.html: full page layout with HTMX script, favicon, CSS
  - index.html: homepage with centered search box
  - results.html: full results page (wraps base + results_inner)
  - results_inner.html: results fragment (HTMX partial + sidebar + pagination)
  - result_item.html: reusable result article partial
- Smart format detection: browser requests (Accept: text/html) default to HTML,
  API clients default to JSON
- Static files served at /static/ from embedded FS (CSS, favicon SVG)
- Index route at GET /
- Empty query on HTML format redirects to homepage
- Custom CSS (gosearch.css): clean, minimal, privacy-respecting aesthetic
  with light/dark mode, responsive breakpoints, print stylesheet
- Add views package tests
2026-03-21 16:10:42 +00:00
ebeaeeef21 feat: add CORS and rate limiting middleware
CORS:
- Configurable allowed origins (wildcard "*" or specific domains)
- Handles OPTIONS preflight with configurable methods, headers, max-age
- Exposed headers support for browser API access
- Env override: CORS_ALLOWED_ORIGINS

Rate Limiting:
- In-memory per-IP sliding window counter
- Configurable request limit and time window
- Background goroutine cleans up stale IP entries
- HTTP 429 with Retry-After header when exceeded
- Extracts real IP from X-Forwarded-For and X-Real-IP (proxy-aware)
- Env overrides: RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW, RATE_LIMIT_CLEANUP_INTERVAL
- Set requests=0 in config to disable

Both wired into main.go as middleware chain: rate_limit → cors → handler.
Config example updated with [cors] and [rate_limit] sections.
Full test coverage for both middleware packages.
2026-03-21 15:54:52 +00:00
94322ceff4 feat: Valkey cache for search results
- Add internal/cache package using go-redis/v9 (Valkey-compatible)
- Cache keys are deterministic SHA-256 hashes of search parameters
- Cache wraps the Search() method: check cache → miss → execute → store
- Gracefully disabled if Valkey is unreachable or unconfigured
- Configurable TTL (default 5m), address, password, and DB index
- Environment variable overrides: VALKEY_ADDRESS, VALKEY_PASSWORD,
  VALKEY_DB, VALKEY_CACHE_TTL
- Structured JSON logging via slog throughout cache layer
- Refactored service.go: extract executeSearch() from Search() for clarity
- Update config.example.toml with [cache] section
- Add cache package tests (key generation, nop behavior)
2026-03-21 15:43:47 +00:00
385a7acab7 feat: concurrent engine execution with graceful degradation
- Run all local engines in parallel using goroutines + sync.WaitGroup
- Individual engine failures are captured as unresponsive_engines entries
  instead of aborting the entire search request
- Context cancellation is respected: cancelled engines report as unresponsive
- Upstream proxy failure is also gracefully handled (single unresponsive entry)
- Extract unresponsiveResponse() and emptyResponse() helpers for consistency
- Add comprehensive tests:
  - ConcurrentEngines: verifies parallelism (2x100ms engines complete in ~100ms)
  - GracefulDegradation: one engine fails, one succeeds, both represented
  - AllEnginesFail: no error returned, all engines in unresponsive_engines
  - ContextCancellation: engine respects context timeout, reports unresponsive
2026-03-21 15:39:00 +00:00