Compare commits

...
Sign in to create a new pull request.

106 commits

Author SHA1 Message Date
6f3e11277a config: add cache.ttl_overrides example
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 16s
Mirror to GitHub / mirror (push) Failing after 17s
Tests / test (push) Failing after 14s
2026-03-24 01:15:14 +01:00
26f8e4855b search: wire per-engine cache with tier-aware TTLs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:14:31 +01:00
b710aec798 config: add TTLOverrides to CacheConfig 2026-03-24 01:10:53 +01:00
e9625441cc cache: add EngineCache with tier-aware Get/Set
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:09:58 +01:00
ff4149ecbd cache: add tier definitions and EngineTier function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:08:09 +01:00
baf98ca80e cache: add QueryHash and CachedEngineResponse type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:06:49 +01:00
bfc4f8d657 docs: fix per-engine TTL cache plan bugs
- Fix stale-while-revalidate condition (was inverted for stale vs fresh)
- Add unmarshalErr tracking for cache corruption edge case
- Rename CachedResponse to CachedEngineResponse throughout
- Fix override tier name (engineName not engineName_override)
- Add EngineCache.Logger() method
- Add tiers_test.go to file map

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:04:32 +01:00
59f1c85fc5 docs: add per-engine TTL cache design spec
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 00:52:16 +01:00
685088d3b0 ui: show unresponsive engines errors
Made-with: Cursor
2026-03-24 00:09:59 +01:00
24577b27be feat: Wikidata engine and Wikipedia knowledge infobox
- Add wikidata engine (wbsearchentities), tests, factory/planner/config
- Wikipedia REST summary: infobox from extract, thumbnail, article URL
- InfoboxView URL; render infobox list in results_inner + base styles
- Preferences Wikidata toggle; engine badge color for wikidata

Made-with: Cursor
2026-03-24 00:07:12 +01:00
6e45abb150 feat: add screenshot
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 25s
2026-03-23 23:53:03 +01:00
90ea4c9f56 fix(prefs): persist favicon choice and apply to HTML results
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 9s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 28s
- Save favicon cookie on POST /preferences; reflect selection in template
- Add getFaviconService helper; pass favicon service into FromResponse
- Compute ResultView.FaviconIconURL (none/google/duckduckgo/self proxy)
- Update result_item and video_item templates; add httpapi/views tests

Made-with: Cursor
2026-03-23 23:08:21 +01:00
518215f62e feat(ui): dark theme redesign, fix image search and defaults
- Inline CSS in base.html (Inter, dark mode, sticky search, tabs, results)
- Remove HTMX/JS from templates; pagination via GET links
- Atmospheric side gradients + grid; wider column on large viewports
- Parse ?category= for HTML tabs (fixes Images category routing)
- Include bing_images, ddg_images, qwant_images in local_ported defaults
- Default listen port 5355; update Docker, compose, flake, README
- Favicon img uses /favicon/ proxy; preferences without inline JS

Made-with: Cursor
2026-03-23 22:49:41 +01:00
Claude
bdc3dae4f5 fix: add dark mode for search result classes
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 24s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 20:31:28 +00:00
Claude
77f939016f fix: expand dark mode CSS coverage for all page elements
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 9s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 26s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 20:27:42 +00:00
Claude
29cda763eb fix: add class-based dark mode fallback styling
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 33s
Add .dark class on html element with direct element styling as
fallback for when CSS custom properties don't work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 20:05:17 +00:00
Claude
0852707df0 fix: add :root[data-theme="dark"] for even higher CSS specificity
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 8s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 29s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:59:34 +00:00
Claude
2aa5f00192 fix: use !important on dark theme CSS variables
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 25s
Force dark theme variables to override :root values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:47:27 +00:00
Claude
39c1c1b9ea fix: use html[data-theme="dark"] for higher specificity
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 8s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 29s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:25:00 +00:00
Claude
bc4c2d468f fix: add [data-theme="dark"] CSS selector back
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 24s
The server-side theme cookie sets data-theme attribute, but CSS was
only using @media (prefers-color-scheme: dark). Need both selectors
so theme works via cookie AND via system preference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:57:47 +00:00
Claude
fe0c7e8dc8 feat: add server-side theme cookie with dropdown selector (no JS)
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 27s
- Add theme POST handler that sets HttpOnly cookie
- Update preferences page to use <select> dropdown instead of JS buttons
- Theme cookie set on POST /preferences with theme parameter
- Theme read from cookie on all page renders
- No JavaScript required for theme selection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:47:06 +00:00
Claude
056d2d1175 feat: use CSS prefers-color-scheme for dark mode (no JS)
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 27s
- Remove inline JS that sets data-theme from localStorage
- Use @media (prefers-color-scheme: dark) in CSS for automatic dark mode
- Remove JS-dependent theme toggle from preferences
- Theme now follows system preference automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:34:18 +00:00
8f2fd671f1 fix: extract only hostname for favicon data-domain
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 27s
data-domain was set to the full result URL (https://en.wikipedia.org/...).
This caused /favicon/ to receive malformed domain strings.

Now extracts u.Hostname() in FromResponse and passes it as Domain
to result_item.html.
2026-03-23 14:56:51 +00:00
b57a041b6a perf: use Redis for favicon cache with 24h TTL
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 27s
Favicons are now cached in Valkey/Redis instead of an in-memory map:
- TTL: 24 hours (up from 1 hour in-memory)
- ETag derived from body SHA256 (no extra storage needed)
- Falls back to in-memory on cache miss when Valkey is unavailable
- GetBytes/SetBytes added to cache package for raw byte storage

In-memory faviconCache map, sync.RWMutex, and time-based expiry
logic removed from handlers.go.
2026-03-23 14:38:32 +00:00
352264509c feat: self-hosted favicon resolver via /favicon/<domain>
Adds a Kafka-hosted favicon proxy at /favicon/<domain>:
- Fetches favicon.ico from the target domain
- In-memory cache with 1-hour TTL and ETag support (304 Not Modified)
- Max 64KB per favicon to prevent memory abuse
- Privacy: user browser talks to Kafka, not Google/DuckDuckGo

New "Self (Kafka)" option in the favicon service selector.
Defaults to None. No third-party requests when self is chosen.
2026-03-23 14:35:19 +00:00
d0efcb0309 Merge branch 'main' of https://github.com/metamorphosis-dev/samsa
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 27s
2026-03-23 14:32:04 +00:00
665494304d perf(htmx): reduce swap payload via OOB swaps and hx-select
Before: every HTMX response returned the full results_inner template
(~400 lines of corrections, meta, pagination, back-to-top) even though
only the result list (#urls) changed between searches.

After:
- Corrections and results-meta use hx-swap-oob="true" — they update
  in-place in the DOM without duplication, no extra payload
- #urls div carries hx-select="#urls" hx-target="#urls" hx-swap="innerHTML"
  — only the result rows are extracted from the response and swapped
- Pagination forms replaced with paginate-btn buttons — JS calls
  htmx.ajax() directly with select:#urls so only result rows reload
- Header search form gains hx-get/hx-target/hx-select for partial updates
  on subsequent searches

Payload reduction per HTMX swap: ~60-70% (no more nav, meta, pagination,
back-to-top, htmx-indicator in the swap response body)
2026-03-23 14:31:21 +00:00
9e95ce7b53 perf: shared http.Transport with tuned connection pooling
Add internal/httpclient package as a singleton RoundTripper used by
all outbound engine requests (search, engines, autocomplete, upstream).

Key Transport settings:
- MaxIdleConnsPerHost = 20  (up from Go default of 2)
- MaxIdleConns = 100
- IdleConnTimeout = 90s
- DialContext timeout = 5s

Previously, the default transport limited each host to 2 idle connections,
forcing a new TCP+TLS handshake on every search for each engine. With
12 engines hitting the same upstream hosts in parallel, connections
were constantly recycled. Now warm connections are reused across all
goroutines and requests.
2026-03-23 14:26:26 +00:00
7ea50d3123 feat(ui): make favicons user-configurable, off by default
- Add favicon service preference: None (default), Google, DuckDuckGo
- result_item.html: remove hardcoded Google favicon src, defer to JS
- applyFavicon() reads data-domain attr and sets src or display:none
- Privacy-by-default: users must explicitly opt in to any favicon service
- Add favicon selector to both the settings panel and preferences page
2026-03-23 14:22:24 +00:00
540a127f7b
Update README.md
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 23s
2026-03-23 15:19:29 +01:00
aaac1f8f4b docs: fix ASCII architecture diagram alignment
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 8s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 23s
2026-03-23 14:17:41 +00:00
71b96598ed docs: refresh README — 11 engines, accurate clone URLs, API key clarity
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 22s
2026-03-23 14:07:49 +00:00
ba06582218 docs: add CONTRIBUTING guide for adding new engines
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 22s
Covers the full lifecycle: interface, SearchRequest/SearchResponse,
result building, graceful degradation, factory wiring, planner
registration, testing, and RSS parsing example.
2026-03-23 08:19:28 +00:00
1e81eea28e chore: remove stale design and implementation plan docs
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 22s
2026-03-23 08:10:15 +00:00
015f8b357a fix: rename remaining kafka references to samsa
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 26s
- OpenSearch description: 'Search results for "..." - kafka' → samsa
- Test error message: 'kafka trial' → samsa trial
2026-03-23 07:25:30 +00:00
8e9aae062b rename: kafka → samsa
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 11s
Mirror to GitHub / mirror (push) Failing after 5s
Tests / test (push) Successful in 42s
Full project rename from kafka to samsa (after Gregor Samsa, who
woke one morning from uneasy dreams to find himself transformed).

- Module: github.com/metamorphosis-dev/kafka → samsa
- Binary: cmd/kafka/ → cmd/samsa/
- CSS: kafka.css → samsa.css
- UI: all 'kafka' product names, titles, localStorage keys → samsa
- localStorage keys: kafka-theme → samsa-theme, kafka-engines → samsa-engines
- OpenSearch: ShortName, LongName, description, URLs updated
- AGPL headers: 'kafka' → 'samsa'
- Docs, configs, examples updated
- Cache key prefix: kafka: → samsa:
2026-03-22 23:44:55 +00:00
c91908a427 Merge commit 'df67492' 2026-03-22 23:41:36 +00:00
0030cf97ad feat: per-engine accent colors in search results
Each engine now has a distinctive color accent applied to its result
card (left border) and engine badge (colored left strip + text).

16 engines mapped to brand-appropriate colors:
Google blue, Bing teal, DDG orange-red, Brave red, Qwant blue,
Wikipedia dark, GitHub purple, Reddit orange-red, YouTube red,
Stack Overflow amber, arXiv crimson, Crossref navy blue.

Pure CSS via data-engine attribute — no JavaScript.
2026-03-22 22:59:32 +00:00
df67492602 feat: add Stack Overflow search engine
Uses the Stack Exchange API v3 (/search/advanced) to find questions
sorted by relevance. No API key required (300 req/day); optionally
configure via STACKOVERFLOW_KEY env var or [engines.stackoverflow].

Results include score, answer count, view count, and tags in the
snippet. Assigned to the 'it' category, triggered by the IT category
tab or explicit engine selection.

6 tests covering parsing, edge cases, and helpers.
2026-03-22 22:29:34 +00:00
e96040ef35 chore: remove React frontend, SPA server, and compiled binary
The project uses pure Go HTML templates + CSS. The React frontend
(frontend/), SPA handler (internal/spa/), and prebuilt binary (kafka)
were dead weight.

Also removes the frontend replacement plan/spec docs.
2026-03-22 22:18:19 +00:00
c97e6a6182 fix(frontend): improve pagination CSS for better centering and sizing
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 4s
Tests / test (push) Failing after 19s
- Match .page-current padding to button padding (0 0.75rem)
- Add box-sizing: border-box to buttons
- Add margin/padding reset to pagination forms
- Simplify page-current flex layout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 23:14:48 +01:00
f92ec02dba fix: center page numbers in pagination
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 4s
Tests / test (push) Failing after 18s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:36:06 +01:00
a61a3a9c70 fix: improve pagination styling for active page and next button
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 18s
Match padding and sizing for active page number and next button
to match inactive page buttons.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:31:53 +01:00
d7ec0217c4 feat: add settings link to header
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 18s
Add gear icon link to preferences page in the header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:29:53 +01:00
77a41834a5 feat: add preferences page with theme toggle
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 18s
- Simple preferences page with engine selection
- Light/dark theme toggle with localStorage persistence
- Clean form layout without complex JS dependencies
- Add dark theme CSS variables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:27:48 +01:00
030b4a8508 feat: make search bar sticky on results page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:24:02 +01:00
23dcdef26f fix: unescape HTML entities in result titles
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 19s
Wikipedia returns HTML entities like &lt;span&gt; which were being
double-escaped by Go templates. Now using html.UnescapeString and
template.HTML to render properly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:19:07 +01:00
37420ae5a8 refactor(frontend): port search-zen-50 style to Go templates
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 18s
Replace React SPA with simple Go templates using search-zen-50
visual style. No JavaScript required - pure HTML/CSS with clean
teal accent color scheme, monospace logo, and minimal design.

- Simplified base.html without HTMX or autocomplete JS
- Clean homepage with centered search box
- Results page with sticky header and category tabs
- Simplified CSS matching search-zen-50 aesthetics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 22:05:56 +01:00
168cb78fab feat: add frontend source code
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 16s
Add search-zen-50 React SPA source code to frontend/ directory.
Build artifacts (dist, node_modules, lock files) are gitignored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 21:27:45 +01:00
6b418057ef feat(frontend): replace Go templates with React SPA
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 16s
- Add internal/spa package for embedding React build
- Wire SPA handler in main.go for non-API routes
- Add gitignore entry for internal/spa/dist
- Add implementation plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 21:12:01 +01:00
5d14d291ca feat(main): wire SPA handler in main.go
Replace template-based handlers (h.Index, h.Preferences) with the new spa
handler. API routes (healthz, search, autocompleter, opensearch.xml) are
registered first as exact matches, followed by the SPA catchall handler
for all other routes. Remove unused views and io/fs imports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:50:03 +01:00
8651183540 feat(spa): add SPA Go package with embedded dist FS
Creates internal/spa package that:
- Embeds React build output from cmd/kafka/dist/
- Provides HTTP handler for static file serving
- Falls back to index.html for SPA client-side routing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:40:34 +01:00
1543b16605 docs: add frontend replacement design spec
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 18:58:50 +01:00
00b2be9e79 fix(css): restore original layout, re-add only image grid styles
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 24s
Reverted CSS to the known-working state at 4b0cde9, then re-applied
only the image grid styles. The duplicate .results-layout block is
intentional — it was present in the working version too.
2026-03-22 17:35:35 +00:00
2f10f4e1e5 fix(css): remove duplicate .results-layout that broke 3-column grid
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 22s
The old 3-column layout block (referencing .left-sidebar/.right-sidebar
classes that don't exist in the HTML) was overriding the correct layout
defined earlier. Removed the stale duplicate.
2026-03-22 17:31:06 +00:00
a9ae69cad5 fix(security): allow HTMX CDN and inline scripts in CSP
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 8s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 22s
script-src now permits 'unsafe-inline' and https://unpkg.com so the
autocomplete script and HTMX library load correctly.
2026-03-22 17:22:31 +00:00
2b072e4de3 feat: add image search with Bing, DuckDuckGo, and Qwant engines
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 25s
Three new image search engines:
- bing_images: Bing Images via RSS endpoint
- ddg_images: DuckDuckGo Images via VQD API
- qwant_images: Qwant Images via v3 search API

Frontend:
- Image grid layout with responsive columns
- image_item template with thumbnail, title, and source metadata
- Hover animations and lazy loading
- Grid activates automatically when category=images

Backend:
- category=images routes to image engines via planner
- Image engines registered in factory and engine allowlist
- extractImgSrc helper for parsing thumbnail URLs from HTML
- IsImageSearch flag on PageData for template layout switching
2026-03-22 16:49:24 +00:00
a316763aca fix(test): update CORS preflight test for deny-all default
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 24s
Empty CORSConfig now means no CORS headers, matching the security fix.
Test explicitly configures an origin to test preflight behavior.
2026-03-22 16:38:03 +00:00
5884c080fd Merge branch 'security/hardening-sast-fixes'
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 19s
2026-03-22 16:31:57 +00:00
b3e3123612 security: fix build errors, add honest Google UA, sanitize error msgs
- Fix config validation: upstream URLs allow private IPs (self-hosted)
- Fix util.SafeURLScheme to return parsed URL
- Replace spoofed GSA User-Agent with honest Kafka UA
- Sanitize all engine error messages (strip response bodies)
- Replace unused body reads with io.Copy(io.Discard, ...) for reuse
- Fix pre-existing braveapi_test using wrong struct type
- Fix ratelimit test reference to limiter variable
- Update ratelimit tests for new trusted proxy behavior
2026-03-22 16:27:49 +00:00
da367a1bfd security: harden against SAST findings (criticals through mediums)
Critical:
- Validate baseURL/sourceURL/upstreamURL at config load time
  (prevents XML injection, XSS, SSRF via config/env manipulation)
- Use xml.Escape for OpenSearch XML template interpolation

High:
- Add security headers middleware (CSP, X-Frame-Options, HSTS, etc.)
- Sanitize result URLs to reject javascript:/data: schemes
- Sanitize infobox img_src against dangerous URL schemes
- Default CORS to deny-all (was wildcard *)

Medium:
- Rate limiter: X-Forwarded-For only trusted from configured proxies
- Validate engine names against known registry allowlist
- Add 1024-char max query length
- Sanitize upstream error messages (strip raw response bodies)
- Upstream client validates URL scheme (http/https only)

Test updates:
- Update extractIP tests for new trusted proxy behavior
2026-03-22 16:22:27 +00:00
4b0cde91ed feat: 3-column layout with centered results and right column
- results-layout: 3-column grid (1fr | min(768px,100%) | 300px) max-width 1400px, centered
- Widen center results column to 768px max
- Right column (formerly sidebar): sticky, contains knowledge panel + related searches
- Knowledge panel: Wikipedia/infobox summary with optional thumbnail
- Related searches: clickable links to refine the query
- Empty left buffer creates balanced whitespace on large screens
- Responsive: 2-col at 1000px, 1-col at 700px
2026-03-22 16:01:49 +00:00
2d22a8cdbb feat: add Brave web search scraper engine
New brave.go: scrapes https://search.brave.com directly.
Extracts title, URL, snippet, and favicon from Brave's HTML.
No API key required.

Rename existing BraveAPIEngine (was BraveEngine) to avoid collision
with the new scraper. API engine stays as 'braveapi', scraper as 'brave'.
2026-03-22 16:01:49 +00:00
994d27ff7f fix(flake): set correct vendorHash
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 27s
The correct vendorHash for current go.mod is:
sha256-8wlKD+33s97oorCJTfHKAgE2Xp1HKXV+bSr6z29KrKM=

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 15:17:03 +00:00
e2ff822847 fix(flake): set vendorHash to auto-compute
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
The go.mod was updated with new replace directive for golang.org/x/net.
Need to recompute vendorHash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:05:28 +00:00
0b381c001f fix(flake): simplify preConfigure
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
2026-03-22 13:28:32 +00:00
7969b724de fix(engines): remove unsupported lookahead from Google regex
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 41s
Go's regexp package doesn't support Perl lookahead (?=...). Removing
the unnecessary lookahead since each MjjYud div is self-contained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:16:04 +01:00
cdfdb16c26 Merge branch 'worktree-brave-search-redesign' 2026-03-22 14:09:29 +01:00
0402d249b8 Merge branch 'main' of https://github.com/metamorphosis-dev/kafka 2026-03-22 14:09:29 +01:00
e18a54a41a fix(frontend): add HTMX filter submission for sidebar radio buttons
Wrap sidebar time/type filters in a form with HTMX attributes so
filter changes trigger partial page updates instead of full reload.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:05:26 +01:00
6d7e68ada1 feat(frontend): reduce popover to theme+engines, add preferences page JS 2026-03-22 14:00:53 +01:00
0afcf509c3 fix: use single Preferences handler with method check instead of dead POST route 2026-03-22 13:57:32 +01:00
70818558cd feat: add GET and POST /preferences route
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:53:23 +01:00
b4053b7f98 feat(frontend): add preferences page template and styles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:47:30 +01:00
3dbde9fbfd feat(frontend): add category tiles to homepage 2026-03-22 13:42:24 +01:00
bfcbd45c57 fix(frontend): update FromResponse tests and fix disabled categories rendering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:40:16 +01:00
0e79b729fe feat(frontend): add three-column results layout with left sidebar navigation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:36:09 +01:00
2e7075adf1 fix(frontend): merge duplicate sidebar sticky rules 2026-03-22 13:33:24 +01:00
0af49f91b7 feat(frontend): add CSS layout framework for three-column results and preferences page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:29:39 +01:00
d071921329 docs: add missing template registration step to plan
- Add tmplPreferences variable to views.go var block
- Initialize tmplPreferences in init() function
- Add RenderPreferences function to views.go
- Fix step numbering for Task 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:26:22 +01:00
ce92a692f8 docs: fix Go syntax errors in implementation plan
- Move if statement outside struct literal in FromResponse
- Define FilterOption at package level (not inside function)
- Add DisabledCategories to PageData struct
- Add defaults handling before struct literal
- Update Search handler call with filter params

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:24:11 +01:00
7bc68db70c chore(deps): update go.sum after go mod tidy
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:23:37 +01:00
2fae98a336 fix(go): remove stray parenthesis from go.mod
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:23:10 +01:00
d21e9189b8 fix(engines): validate Wikipedia language codes to prevent SSRF
Wikipedia language subdomain was derived from user input without
validation, allowing attackers to redirect requests via malicious
language values like "evil.com.attacker.com". Added a whitelist of
valid Wikipedia language codes to prevent this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:22:52 +01:00
8909654c8f docs: fix implementation plan issues from review
- Move template registration from Phase 2 to Phase 4 (was causing build failure)
- Add filter params (activeCategory, activeTime, activeType) to FromResponse
- Add DisabledCategories to PageData for backend-unsupported categories
- Add disabled class to sidebar for future categories
- Clarify POST handler is a no-op for localStorage-only preferences
- Note CSS must be tested manually in browser

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:21:31 +01:00
19f5c89053 fix: upgrade x/net to v0.38.0 (resolves Dependabot XSS alert)
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 14s
2026-03-22 12:18:43 +00:00
b005e2140e docs: add Brave Search frontend redesign implementation plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:17:46 +01:00
cb05ac5b8c docs: update Brave Search frontend redesign spec with clarifications
- Clarify localStorage-only preferences (no server persistence)
- Expand category tiles including future ones (weather, sports, crypto)
- Define filter UI options with query params (time range, result type)
- Add mobile breakpoints and collapse behavior
- Reduce quick popover to theme + engines only
- Rename Preferences Sidebar to Preferences Nav
- Add results count format specification
- Add sticky positioning CSS for left sidebar

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:12:06 +01:00
e9b5fa1f0b docs: update license to AGPLv3
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
2026-03-22 12:11:39 +00:00
79c37a086b ci: update actions/checkout to v5 (uses Node 24)
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
2026-03-22 12:05:21 +00:00
6bbde20f23 docs: add Brave Search frontend redesign specification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 13:04:34 +01:00
f172da33ef fix(engines): cap Brave API offset to 9 to avoid 422 error
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 24s
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:01:25 +00:00
3bc1fad6b5 fix(flake): force remove vendor in preConfigure
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 7s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
The nix store may have stale vendor directories with incorrect
permissions. Force chmod before removing to ensure clean build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:01:18 +00:00
5e125646a7 fix(flake): set correct vendorHash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 3s
Build and Push Docker Image / build-and-push (push) Failing after 6s
Tests / test (push) Successful in 40s
The auto-computed vendorHash for the go modules is:
sha256-PTD4eEEkLGBCZbot6W4U+sMOpIbH2tcFSztQel7hyXI=

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:48:08 +00:00
bf5f36e383 chore(deps): add go.sum from go mod tidy
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Has been cancelled
2026-03-22 11:47:32 +00:00
e821470c4d fix(go): run go mod tidy to sync dependencies
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 24s
This fixes the build by properly synchronizing go.mod and go.sum
using the official Go toolchain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:45:04 +00:00
f1cf23745e test: add HTTP API integration tests
Some checks failed
Mirror to GitHub / mirror (push) Waiting to run
Tests / test (push) Waiting to run
Build and Push Docker Image / build-and-push (push) Has been cancelled
Test GET /healthz, /, /search, /autocompleter endpoints.
Verify response codes, content types, JSON decoding, empty-query
redirect, and source URL presence in footer.

Also fix dead code in Search handler: the redirect for empty q
was unreachable because ParseSearchRequest errors on empty q first.
Move the q/format check before ParseSearchRequest to fix the redirect.
2026-03-22 11:44:48 +00:00
f6128689f1 fix(go.sum): remove stale go.sum to allow rebuild from proxy
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 25s
The go.sum is out of sync with go.mod causing build failures.
Removing it allows Go to rebuild it from the module proxy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:42:15 +00:00
8c6d056f52 fix(engines): cap Brave API offset to 9 to avoid 422 error
Brave API only supports offset values 0-9. When pageno > 1 with
resultsPerPage=20, offset exceeded this limit causing 422 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:41:42 +01:00
2883ac95e7 fix(go.mod): remove unused golang.org/x/net indirect dep
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 14s
The golang.org/x/net v0.52.0 was listed as an indirect dependency but
nothing in the codebase imports it, causing go mod tidy to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:41:15 +00:00
16266e143e fix(go.mod): add missing x/net v0.52.0 hash to go.sum
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 14s
The replace directive was removed but go.sum wasn't updated
with the correct hash for golang.org/x/net v0.52.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:39:41 +00:00
3943539e8a Merge remote-tracking branch 'origin/main' into fix/replace-directive
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Failing after 15s
2026-03-22 11:33:46 +00:00
7d0e2017cd fix(go.mod): remove stale replace directive
The replace directive for golang.org/x/net was causing build
failures when using vendorHash = "" with the Go module proxy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:33:29 +00:00
25757fdb99 ci: add GitHub Actions workflow for pull requests
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Has been cancelled
Runs tests on PRs and pushes to main.
2026-03-22 11:33:22 +00:00
a85d8033c7 fix(flake): remove stale vendorHash; auto-compute on next build
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 21s
The go.mod changes (goquery downgrade, x/net replace) invalidate the
old vendorHash. Set to empty to auto-recompute, then replace with the
actual hash from the build error.
2026-03-22 11:28:31 +00:00
b2cca0a346 ci: remove stale vendor directory before build
Some checks failed
Build and Push Docker Image / build-and-push (push) Failing after 6s
Mirror to GitHub / mirror (push) Failing after 3s
Tests / test (push) Successful in 22s
2026-03-22 11:24:43 +00:00
96 changed files with 6728 additions and 2174 deletions

View file

@ -11,12 +11,15 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout - name: Checkout
uses: https://github.com/actions/checkout@v4 uses: https://github.com/actions/checkout@v5
- name: Set up Go - name: Set up Go
uses: https://github.com/actions/setup-go@v5 uses: https://github.com/actions/setup-go@v5
with: with:
go-version-file: go.mod go-version-file: go.mod
- name: Clean vendor
run: rm -rf vendor
- name: Test - name: Test
run: go test -race -v ./... run: go test -race -v ./...

25
.github/workflows/test.yml vendored Normal file
View file

@ -0,0 +1,25 @@
name: Tests
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Clean vendor
run: rm -rf vendor
- name: Test
run: go test -race -v ./...

6
.gitignore vendored
View file

@ -1,5 +1,11 @@
node_modules/ node_modules/
.agent/ .agent/
internal/spa/dist/
frontend/node_modules/
frontend/dist/
frontend/bun.lock
frontend/bun.lockb
frontend/package-lock.json
*.exe *.exe
*.exe~ *.exe~
*.dll *.dll

View file

@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview ## Project Overview
kafka is a privacy-respecting metasearch engine written in Go. It provides a SearXNG-compatible `/search` API and an HTML frontend (HTMX + Go templates). 9 engines are implemented natively in Go; unlisted engines can be proxied to an upstream metasearch instance. Responses from multiple engines are merged into a single JSON/CSV/RSS/HTML response. samsa is a privacy-respecting metasearch engine written in Go. It provides a SearXNG-compatible `/search` API and an HTML frontend (HTMX + Go templates). 9 engines are implemented natively in Go; unlisted engines can be proxied to an upstream metasearch instance. Responses from multiple engines are merged into a single JSON/CSV/RSS/HTML response.
## Build & Run Commands ## Build & Run Commands
@ -22,7 +22,7 @@ go test -run TestWikipedia ./internal/engines/
go test -v ./internal/engines/ go test -v ./internal/engines/
# Run the server (requires config.toml) # Run the server (requires config.toml)
go run ./cmd/kafka -config config.toml go run ./cmd/samsa -config config.toml
``` ```
There is no Makefile. There is no linter configured. There is no Makefile. There is no linter configured.
@ -43,7 +43,7 @@ There is no Makefile. There is no linter configured.
- `internal/cache` — Valkey/Redis-backed cache with SHA-256 cache keys. No-op if unconfigured. - `internal/cache` — Valkey/Redis-backed cache with SHA-256 cache keys. No-op if unconfigured.
- `internal/middleware` — Three rate limiters (per-IP sliding window, burst+sustained, global) and CORS. All disabled by default. - `internal/middleware` — Three rate limiters (per-IP sliding window, burst+sustained, global) and CORS. All disabled by default.
- `internal/views` — HTML templates and static files embedded via `//go:embed`. Renders full pages or HTMX fragments. Templates: `base.html`, `index.html`, `results.html`, `results_inner.html`, `result_item.html`. - `internal/views` — HTML templates and static files embedded via `//go:embed`. Renders full pages or HTMX fragments. Templates: `base.html`, `index.html`, `results.html`, `results_inner.html`, `result_item.html`.
- `cmd/kafka` — Entry point. Loads TOML config, seeds env vars for engine code, wires up middleware chain, starts HTTP server. - `cmd/samsa` — Entry point. Loads TOML config, seeds env vars for engine code, wires up middleware chain, starts HTTP server.
**Engine interface** (`internal/engines/engine.go`): **Engine interface** (`internal/engines/engine.go`):
```go ```go
@ -66,7 +66,7 @@ Config is loaded from `config.toml` (see `config.example.toml`). All fields can
## Conventions ## Conventions
- Module path: `github.com/metamorphosis-dev/kafka` - Module path: `github.com/metamorphosis-dev/samsa`
- Tests use shared mock helpers in `internal/engines/http_mock_test.go` (`roundTripperFunc`, `httpResponse`) - Tests use shared mock helpers in `internal/engines/http_mock_test.go` (`roundTripperFunc`, `httpResponse`)
- Engine implementations are single files under `internal/engines/` (e.g., `wikipedia.go`, `duckduckgo.go`) - Engine implementations are single files under `internal/engines/` (e.g., `wikipedia.go`, `duckduckgo.go`)
- Response merging de-duplicates by `engine|title|url` key; suggestions/corrections are merged as sets - Response merging de-duplicates by `engine|title|url` key; suggestions/corrections are merged as sets

View file

@ -21,7 +21,7 @@ RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /kafka /usr/local/bin/kafka COPY --from=builder /kafka /usr/local/bin/kafka
COPY config.example.toml /etc/kafka/config.example.toml COPY config.example.toml /etc/kafka/config.example.toml
EXPOSE 8080 EXPOSE 5355
ENTRYPOINT ["kafka"] ENTRYPOINT ["kafka"]
CMD ["-config", "/etc/kafka/config.toml"] CMD ["-config", "/etc/kafka/config.toml"]

113
README.md
View file

@ -1,20 +1,23 @@
# kafka # samsa
*samsa — named for Gregor Samsa, who woke to find himself transformed. You wanted results; you got a metasearch engine.*
A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible API with an HTML frontend, designed to be fast, lightweight, and deployable anywhere. A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible API with an HTML frontend, designed to be fast, lightweight, and deployable anywhere.
**9 engines. No JavaScript. No tracking. One binary.** **11 engines. No JavaScript required. No tracking. One binary.**
## Features ## Features
- **SearXNG-compatible API** — drop-in replacement for existing integrations - **SearXNG-compatible API** — drop-in replacement for existing integrations
- **9 search engines** — Wikipedia, arXiv, Crossref, Brave, Qwant, DuckDuckGo, GitHub, Reddit, Bing - **11 search engines** — Wikipedia, arXiv, Crossref, Brave Search API, Brave (scraping), Qwant, DuckDuckGo, GitHub, Reddit, Bing, Google, YouTube
- **HTML frontend** — HTMX + Go templates with instant search, dark mode, responsive design - **Stack Overflow** — bonus engine, not enabled by default
- **HTML frontend** — Go templates + HTMX with instant search, dark mode, responsive design
- **Valkey cache** — optional Redis-compatible caching with configurable TTL - **Valkey cache** — optional Redis-compatible caching with configurable TTL
- **Rate limiting** — three layers: per-IP, burst, and global (all disabled by default) - **Rate limiting** — three layers: per-IP, burst, and global (all disabled by default)
- **CORS** — configurable origins for browser-based clients - **CORS** — configurable origins for browser-based clients
- **OpenSearch** — browsers can add kafka as a search engine from the address bar - **OpenSearch** — browsers can add samsa as a search engine from the address bar
- **Graceful degradation** — individual engine failures don't kill the whole search - **Graceful degradation** — individual engine failures don't kill the whole search
- **Docker** — multi-stage build, ~20MB runtime image - **Docker** — multi-stage build, static binary, ~20MB runtime image
- **NixOS** — native NixOS module with systemd service - **NixOS** — native NixOS module with systemd service
## Quick Start ## Quick Start
@ -22,17 +25,17 @@ A privacy-respecting, open metasearch engine written in Go. SearXNG-compatible A
### Binary ### Binary
```bash ```bash
git clone https://git.ashisgreat.xyz/penal-colony/gosearch.git git clone https://git.ashisgreat.xyz/penal-colony/samsa.git
cd kafka cd samsa
go build ./cmd/kafka go build ./cmd/samsa
./kafka -config config.toml ./samsa -config config.toml
``` ```
### Docker Compose ### Docker Compose
```bash ```bash
cp config.example.toml config.toml cp config.example.toml config.toml
# Edit config.toml — set your Brave API key, etc. # Edit config.toml — set your Brave API key, YouTube API key, etc.
docker compose up -d docker compose up -d
``` ```
@ -41,28 +44,28 @@ docker compose up -d
Add to your flake inputs: Add to your flake inputs:
```nix ```nix
inputs.kafka.url = "git+https://git.ashisgreat.xyz/penal-colony/gosearch.git"; inputs.samsa.url = "git+https://git.ashisgreat.xyz/penal-colony/samsa.git";
``` ```
Enable in your configuration: Enable in your configuration:
```nix ```nix
imports = [ inputs.kafka.nixosModules.default ]; imports = [ inputs.samsa.nixosModules.default ];
services.kafka = { services.samsa = {
enable = true; enable = true;
openFirewall = true; openFirewall = true;
baseUrl = "https://search.example.com"; baseUrl = "https://search.example.com";
# config = "/etc/kafka/config.toml"; # default # config = "/etc/samsa/config.toml"; # default
}; };
``` ```
Write your config: Write your config:
```bash ```bash
sudo mkdir -p /etc/kafka sudo mkdir -p /etc/samsa
sudo cp config.example.toml /etc/kafka/config.toml sudo cp config.example.toml /etc/samsa/config.toml
sudo $EDITOR /etc/kafka/config.toml sudo $EDITOR /etc/samsa/config.toml
``` ```
Deploy: Deploy:
@ -76,7 +79,7 @@ sudo nixos-rebuild switch --flake .#
```bash ```bash
nix develop nix develop
go test ./... go test ./...
go run ./cmd/kafka -config config.toml go run ./cmd/samsa -config config.toml
``` ```
## Endpoints ## Endpoints
@ -107,7 +110,7 @@ go run ./cmd/kafka -config config.toml
### Example ### Example
```bash ```bash
curl "http://localhost:8080/search?q=golang&format=json&engines=github,duckduckgo" curl "http://localhost:5355/search?q=golang&format=json&engines=github,duckduckgo"
``` ```
### Response (JSON) ### Response (JSON)
@ -140,6 +143,8 @@ Copy `config.example.toml` to `config.toml` and edit. All settings can also be o
- **`[server]`** — port, timeout, public base URL for OpenSearch - **`[server]`** — port, timeout, public base URL for OpenSearch
- **`[upstream]`** — optional upstream metasearch proxy for unported engines - **`[upstream]`** — optional upstream metasearch proxy for unported engines
- **`[engines]`** — which engines run locally, engine-specific settings - **`[engines]`** — which engines run locally, engine-specific settings
- **`[engines.brave]`** — Brave Search API key
- **`[engines.youtube]`** — YouTube Data API v3 key
- **`[cache]`** — Valkey/Redis address, password, TTL - **`[cache]`** — Valkey/Redis address, password, TTL
- **`[cors]`** — allowed origins and methods - **`[cors]`** — allowed origins and methods
- **`[rate_limit]`** — per-IP sliding window (30 req/min default) - **`[rate_limit]`** — per-IP sliding window (30 req/min default)
@ -150,13 +155,14 @@ Copy `config.example.toml` to `config.toml` and edit. All settings can also be o
| Variable | Description | | Variable | Description |
|---|---| |---|---|
| `PORT` | Listen port (default: 8080) | | `PORT` | Listen port (default: 5355) |
| `BASE_URL` | Public URL for OpenSearch XML | | `BASE_URL` | Public URL for OpenSearch XML |
| `UPSTREAM_SEARXNG_URL` | Upstream instance URL | | `UPSTREAM_SEARXNG_URL` | Upstream instance URL |
| `LOCAL_PORTED_ENGINES` | Comma-separated local engine list | | `LOCAL_PORTED_ENGINES` | Comma-separated local engine list |
| `HTTP_TIMEOUT` | Upstream request timeout | | `HTTP_TIMEOUT` | Upstream request timeout |
| `BRAVE_API_KEY` | Brave Search API key | | `BRAVE_API_KEY` | Brave Search API key |
| `BRAVE_ACCESS_TOKEN` | Gate requests with token | | `BRAVE_ACCESS_TOKEN` | Gate requests with token |
| `YOUTUBE_API_KEY` | YouTube Data API v3 key |
| `VALKEY_ADDRESS` | Valkey/Redis address | | `VALKEY_ADDRESS` | Valkey/Redis address |
| `VALKEY_PASSWORD` | Valkey/Redis password | | `VALKEY_PASSWORD` | Valkey/Redis password |
| `VALKEY_CACHE_TTL` | Cache TTL | | `VALKEY_CACHE_TTL` | Cache TTL |
@ -170,55 +176,64 @@ See `config.example.toml` for the full list including rate limiting and CORS var
| Wikipedia | MediaWiki API | General knowledge | | Wikipedia | MediaWiki API | General knowledge |
| arXiv | arXiv API | Academic papers | | arXiv | arXiv API | Academic papers |
| Crossref | Crossref API | Academic metadata | | Crossref | Crossref API | Academic metadata |
| Brave | Brave Search API | General web (requires API key) | | Brave Search API | Brave API | General web (requires API key) |
| Brave | Brave Lite HTML | General web (no key needed) |
| Qwant | Qwant Lite HTML | General web | | Qwant | Qwant Lite HTML | General web |
| DuckDuckGo | DDG Lite HTML | General web | | DuckDuckGo | DDG Lite HTML | General web |
| GitHub | GitHub Search API v3 | Code and repositories | | GitHub | GitHub Search API v3 | Code and repositories |
| Reddit | Reddit JSON API | Discussions | | Reddit | Reddit JSON API | Discussions |
| Bing | Bing RSS | General web | | Bing | Bing RSS | General web |
| Google | GSA User-Agent scraping | General web (no API key) |
| YouTube | YouTube Data API v3 | Videos (requires API key) |
| Stack Overflow | Stack Exchange API | Q&A (registered, not enabled by default) |
Engines not listed in `engines.local_ported` are proxied to an upstream metasearch instance if `upstream.url` is configured. Engines not listed in `engines.local_ported` are proxied to an upstream metasearch instance if `upstream.url` is configured.
### API Keys
Brave Search API and YouTube Data API require keys. If omitted, those engines are silently skipped. Brave Lite (scraping) and Google (GSA UA scraping) work without keys.
## Architecture ## Architecture
``` ```
┌─────────────────────────────────────┐ ┌───────────────────────────────────────
│ HTTP Handler │ HTTP Handler
│ /search / /opensearch.xml │ /search / /opensearch.xml
├─────────────────────────────────────┤ ├───────────────────────────────────────
│ Middleware Chain │ Middleware Chain
│ Global → Burst → Per-IP → CORS │ Global → Burst → Per-IP → CORS
├─────────────────────────────────────┤ ├───────────────────────────────────────
│ Search Service │ Search Service
│ Parallel engine execution │ Parallel engine execution
│ WaitGroup + graceful degradation │ WaitGroup + graceful degradation
├─────────────────────────────────────┤ ├───────────────────────────────────────
│ Cache Layer │ Cache Layer
│ Valkey/Redis (optional, no-op if │ Valkey/Redis (optional; no-op if
│ unconfigured) unconfigured) │
├─────────────────────────────────────┤ ├───────────────────────────────────────
Engines (×9) Engines (×11 default)
│ Each runs in its own goroutine │ Each runs in its own goroutine
│ Failures → unresponsive_engines │ Failures → unresponsive_engines
└─────────────────────────────────────┘ └───────────────────────────────────────
``` ```
## Docker ## Docker
The Dockerfile uses a multi-stage build: The Dockerfile uses a multi-stage build with a static Go binary on alpine Linux:
```dockerfile
# Build stage: golang:1.24-alpine
# Runtime stage: alpine:3.21 (~20MB)
# CGO_ENABLED=0 — static binary
```
```bash ```bash
# Build: golang:1.24-alpine
# Runtime: alpine:3.21 (~20MB)
# CGO_ENABLED=0 — fully static
docker compose up -d docker compose up -d
``` ```
Includes Valkey 8 with health checks out of the box. Includes Valkey 8 with health checks out of the box.
## Contributing
See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) for a walkthrough of adding a new engine. The interface is two methods: `Name()` and `Search(context, request)`.
## License ## License
MIT [AGPLv3](https://www.gnu.org/licenses/agpl-3.0.html)

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -25,13 +25,13 @@ import (
"net/http" "net/http"
"os" "os"
"github.com/metamorphosis-dev/kafka/internal/autocomplete" "github.com/metamorphosis-dev/samsa/internal/autocomplete"
"github.com/metamorphosis-dev/kafka/internal/cache" "github.com/metamorphosis-dev/samsa/internal/cache"
"github.com/metamorphosis-dev/kafka/internal/config" "github.com/metamorphosis-dev/samsa/internal/config"
"github.com/metamorphosis-dev/kafka/internal/httpapi" "github.com/metamorphosis-dev/samsa/internal/httpapi"
"github.com/metamorphosis-dev/kafka/internal/middleware" "github.com/metamorphosis-dev/samsa/internal/middleware"
"github.com/metamorphosis-dev/kafka/internal/search" "github.com/metamorphosis-dev/samsa/internal/search"
"github.com/metamorphosis-dev/kafka/internal/views" "github.com/metamorphosis-dev/samsa/internal/views"
) )
func main() { func main() {
@ -77,14 +77,20 @@ func main() {
acSvc := autocomplete.NewService(cfg.Upstream.URL, cfg.HTTPTimeout()) acSvc := autocomplete.NewService(cfg.Upstream.URL, cfg.HTTPTimeout())
h := httpapi.NewHandler(svc, acSvc.Suggestions, cfg.Server.SourceURL) h := httpapi.NewHandler(svc, acSvc.Suggestions, cfg.Server.SourceURL, searchCache)
mux := http.NewServeMux() mux := http.NewServeMux()
// HTML template routes
mux.HandleFunc("/", h.Index) mux.HandleFunc("/", h.Index)
mux.HandleFunc("/healthz", h.Healthz)
mux.HandleFunc("/search", h.Search) mux.HandleFunc("/search", h.Search)
mux.HandleFunc("/preferences", h.Preferences)
// API routes
mux.HandleFunc("/healthz", h.Healthz)
mux.HandleFunc("/autocompleter", h.Autocompleter) mux.HandleFunc("/autocompleter", h.Autocompleter)
mux.HandleFunc("/opensearch.xml", h.OpenSearch(cfg.Server.BaseURL)) mux.HandleFunc("/opensearch.xml", h.OpenSearch(cfg.Server.BaseURL))
mux.HandleFunc("/favicon/", h.Favicon)
// Serve embedded static files (CSS, JS, images). // Serve embedded static files (CSS, JS, images).
staticFS, err := views.StaticFS() staticFS, err := views.StaticFS()
@ -94,8 +100,9 @@ func main() {
var subFS fs.FS = staticFS var subFS fs.FS = staticFS
mux.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.FS(subFS)))) mux.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.FS(subFS))))
// Apply middleware: global rate limit → burst rate limit → per-IP rate limit → CORS → handler. // Apply middleware: global rate limit → burst rate limit → per-IP rate limit → CORS → security headers → handler.
var handler http.Handler = mux var handler http.Handler = mux
handler = middleware.SecurityHeaders(middleware.SecurityHeadersConfig{})(handler)
handler = middleware.CORS(middleware.CORSConfig{ handler = middleware.CORS(middleware.CORSConfig{
AllowedOrigins: cfg.CORS.AllowedOrigins, AllowedOrigins: cfg.CORS.AllowedOrigins,
AllowedMethods: cfg.CORS.AllowedMethods, AllowedMethods: cfg.CORS.AllowedMethods,
@ -107,6 +114,7 @@ func main() {
Requests: cfg.RateLimit.Requests, Requests: cfg.RateLimit.Requests,
Window: cfg.RateLimitWindow(), Window: cfg.RateLimitWindow(),
CleanupInterval: cfg.RateLimitCleanupInterval(), CleanupInterval: cfg.RateLimitCleanupInterval(),
TrustedProxies: cfg.RateLimit.TrustedProxies,
}, logger)(handler) }, logger)(handler)
handler = middleware.GlobalRateLimit(middleware.GlobalRateLimitConfig{ handler = middleware.GlobalRateLimit(middleware.GlobalRateLimitConfig{
Requests: cfg.GlobalRateLimit.Requests, Requests: cfg.GlobalRateLimit.Requests,
@ -120,7 +128,7 @@ func main() {
}, logger)(handler) }, logger)(handler)
addr := fmt.Sprintf(":%d", cfg.Server.Port) addr := fmt.Sprintf(":%d", cfg.Server.Port)
logger.Info("kafka starting", logger.Info("samsa starting",
"addr", addr, "addr", addr,
"cache", searchCache.Enabled(), "cache", searchCache.Enabled(),
"rate_limit", cfg.RateLimit.Requests > 0, "rate_limit", cfg.RateLimit.Requests > 0,

View file

@ -1,22 +1,22 @@
# kafka configuration # samsa configuration
# Copy to config.toml and adjust as needed. # Copy to config.toml and adjust as needed.
# Environment variables are used as fallbacks when a config field is empty/unset. # Environment variables are used as fallbacks when a config field is empty/unset.
[server] [server]
# Listen port (env: PORT) # Listen port (env: PORT)
port = 8080 port = 5355
# HTTP timeout for engine and upstream calls (env: HTTP_TIMEOUT) # HTTP timeout for engine and upstream calls (env: HTTP_TIMEOUT)
http_timeout = "10s" http_timeout = "10s"
# Public base URL for OpenSearch XML (env: BASE_URL) # Public base URL for OpenSearch XML (env: BASE_URL)
# Set this so browsers can add kafka as a search engine. # Set this so browsers can add samsa as a search engine.
# Example: "https://search.example.com" # Example: "https://search.example.com"
base_url = "" base_url = ""
# Link to the source code (shown in footer as "Source" link) # Link to the source code (shown in footer as "Source" link)
# Defaults to the upstream kafka repo if not set. # Defaults to the upstream samsa repo if not set.
# Example: "https://git.example.com/my-kafka-fork" # Example: "https://git.example.com/my-samsa-fork"
source_url = "" source_url = ""
[upstream] [upstream]
@ -27,7 +27,8 @@ url = ""
[engines] [engines]
# Comma-separated list of engines to execute locally in Go (env: LOCAL_PORTED_ENGINES) # Comma-separated list of engines to execute locally in Go (env: LOCAL_PORTED_ENGINES)
# Engines not listed here will be proxied to the upstream instance. # Engines not listed here will be proxied to the upstream instance.
local_ported = ["wikipedia", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing", "google", "youtube"] # Include bing_images, ddg_images, qwant_images for image search when [upstream].url is empty.
local_ported = ["wikipedia", "wikidata", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing", "google", "youtube", "bing_images", "ddg_images", "qwant_images"]
[engines.brave] [engines.brave]
# Brave Search API key (env: BRAVE_API_KEY) # Brave Search API key (env: BRAVE_API_KEY)
@ -56,6 +57,12 @@ db = 0
# Cache TTL for search results (env: VALKEY_CACHE_TTL) # Cache TTL for search results (env: VALKEY_CACHE_TTL)
default_ttl = "5m" default_ttl = "5m"
[cache.ttl_overrides]
# Per-engine TTL overrides (uncomment to use):
# wikipedia = "48h"
# reddit = "15m"
# braveapi = "2h"
[cors] [cors]
# CORS configuration for browser-based clients. # CORS configuration for browser-based clients.
# Allowed origins: use "*" for all, or specific domains (env: CORS_ALLOWED_ORIGINS) # Allowed origins: use "*" for all, or specific domains (env: CORS_ALLOWED_ORIGINS)

View file

@ -8,7 +8,7 @@ services:
kafka: kafka:
build: . build: .
ports: ports:
- "8080:8080" - "5355:5355"
volumes: volumes:
- ./config.toml:/etc/kafka/config.toml:ro - ./config.toml:/etc/kafka/config.toml:ro
depends_on: depends_on:

218
docs/CONTRIBUTING.md Normal file
View file

@ -0,0 +1,218 @@
# Contributing — Adding a New Engine
This guide walks through adding a new search engine to samsa. The minimal engine needs only an HTTP client, a query, and a result parser.
---
## 1. Create the engine file
Place it in `internal/engines/`:
```
internal/engines/
myengine.go ← your engine
myengine_test.go ← tests (required)
```
Name the struct after the engine, e.g. `WolframEngine` for "wolfram". The `Name()` method returns the engine key used throughout samsa.
## 2. Implement the Engine interface
```go
package engines
import (
"context"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
type MyEngine struct {
client *http.Client
}
func (e *MyEngine) Name() string { return "myengine" }
func (e *MyEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
// ...
}
```
### The SearchRequest fields you'll use most:
| Field | Type | Description |
|-------|------|-------------|
| `Query` | `string` | The search query |
| `Pageno` | `int` | Current page number (1-based) |
| `Safesearch` | `int` | 0=off, 1=moderate, 2=strict |
| `Language` | `string` | ISO language code (e.g. `"en"`) |
### The SearchResponse to return:
```go
contracts.SearchResponse{
Query: req.Query,
NumberOfResults: len(results),
Results: results, // []MainResult
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}
```
### Empty query — return early:
```go
if strings.TrimSpace(req.Query) == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
```
### Engine unavailable / error — graceful degradation:
```go
// Rate limited or blocked
return contracts.SearchResponse{
Query: req.Query,
UnresponsiveEngines: [][2]string{{"myengine", "reason"}},
Results: []contracts.MainResult{},
// ... empty other fields
}, nil
// Hard error — return it
return contracts.SearchResponse{}, fmt.Errorf("myengine upstream error: status %d", resp.StatusCode)
```
## 3. Build the result
```go
urlPtr := "https://example.com/result"
result := contracts.MainResult{
Title: "Result Title",
Content: "Snippet or description text",
URL: &urlPtr, // pointer to string, required
Engine: "myengine",
Category: "general", // or "it", "science", "videos", "images", "social media"
Score: 0, // used for relevance ranking during merge
Engines: []string{"myengine"},
}
```
### Template field
The template system checks for `"videos"` and `"images"`. Everything else renders via `result_item.html`. Set `Template` only if you have a custom template; omit it for the default result card.
### Category field
Controls which category tab the result appears under and which engines are triggered:
| Category | Engines used |
|----------|-------------|
| `general` | google, bing, ddg, brave, braveapi, qwant, wikipedia |
| `it` | github, stackoverflow |
| `science` | arxiv, crossref |
| `videos` | youtube |
| `images` | bing_images, ddg_images, qwant_images |
| `social media` | reddit |
## 4. Wire it into the factory
In `internal/engines/factory.go`, add your engine to the map returned by `NewDefaultPortedEngines`:
```go
"myengine": &MyEngine{client: client},
```
If your engine needs an API key, read it from config or the environment (see `braveapi` or `youtube` in factory.go for the pattern).
## 5. Register defaults
In `internal/engines/planner.go`:
**Add to `defaultPortedEngines`:**
```go
var defaultPortedEngines = []string{
// ... existing ...
"myengine",
}
```
**Add to category mapping in `inferFromCategories`** (if applicable):
```go
case "general":
set["myengine"] = true
```
**Update the sort order map** so results maintain consistent ordering:
```go
order := map[string]int{
// ... existing ...
"myengine": N, // pick a slot
}
```
## 6. Add tests
At minimum, test:
- `Name()` returns the correct string
- Nil engine returns an error
- Empty query returns zero results
- Successful API response parses correctly
- Rate limit / error cases return `UnresponsiveEngines` with a reason
Use `httptest.NewServer` to mock the upstream API. See `arxiv_test.go` or `reddit_test.go` for examples.
## 7. Build and test
```bash
go build ./...
go test ./internal/engines/ -run MyEngine -v
go test ./...
```
## Example: Adding an RSS-based engine
If the engine provides an RSS feed, the parsing is straightforward:
```go
type rssItem struct {
Title string `xml:"title"`
Link string `xml:"link"`
Description string `xml:"description"`
}
type rssFeed struct {
Channel struct {
Items []rssItem `xml:"item"`
} `xml:"channel"`
}
dec := xml.NewDecoder(resp.Body)
var feed rssFeed
dec.Decode(&feed)
for _, item := range feed.Channel.Items {
urlPtr := item.Link
results = append(results, contracts.MainResult{
Title: item.Title,
Content: stripHTML(item.Description),
URL: &urlPtr,
Engine: "myengine",
// ...
})
}
```
## Checklist
- [ ] Engine file created in `internal/engines/`
- [ ] `Engine` interface implemented (`Name()` + `Search()`)
- [ ] Empty query handled (return early, no error)
- [ ] Graceful degradation for errors and rate limits
- [ ] Results use `Category` to group with related engines
- [ ] Factory updated with new engine
- [ ] Planner updated (defaults + category mapping + sort order)
- [ ] Tests written covering main paths
- [ ] `go build ./...` succeeds
- [ ] `go test ./...` passes

View file

@ -1,747 +0,0 @@
# Settings UI Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** A preferences popover panel (top-right on desktop, bottom sheet on mobile) that lets users set theme, enabled engines, safe search, and default format. All changes auto-save to `localStorage` and apply immediately to the DOM.
**Architecture:** Pure client-side JS + CSS added alongside existing templates. No Go changes. Settings persist via `localStorage` key `kafka_prefs`. Theme applies via `data-theme` attribute on `<html>`.
**Tech Stack:** Vanilla JS (no framework), existing `kafka.css` custom properties, HTMX for search.
---
## File Map
| Action | File |
|--------|------|
| Create | `internal/views/static/js/settings.js` |
| Modify | `internal/views/static/css/kafka.css` |
| Modify | `internal/views/templates/base.html` |
| Modify | `internal/views/templates/index.html` |
| Modify | `internal/views/templates/results.html` |
| Modify | `internal/views/views.go` |
**Key insight on engine preferences:** `ParseSearchRequest` reads `engines` as a CSV form value (`r.FormValue("engines")`). The search forms in `index.html` and `results.html` will get a hidden `#engines-input` field that is kept in sync with localStorage. On submit, the engines preference is sent as a normal form field. HTMX `hx-include="this"` already includes the form element, so the hidden input is automatically included in the request.
---
## Task 1: CSS — Popover, toggles, bottom sheet
**Files:**
- Modify: `internal/views/static/css/kafka.css`
- [ ] **Step 1: Add CSS for popover, triggers, toggles, bottom sheet**
Append the following to `kafka.css`:
```css
/* ============================================
Settings Panel
============================================ */
/* Header */
.site-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: 0.6rem 1rem;
background: var(--color-header-background);
border-bottom: 1px solid var(--color-header-border);
}
.site-title {
font-size: 1rem;
font-weight: 600;
color: var(--color-base-font);
}
/* Gear trigger button */
.settings-trigger {
background: none;
border: none;
font-size: 1.1rem;
cursor: pointer;
padding: 0.3rem 0.5rem;
border-radius: var(--radius);
color: var(--color-base-font);
opacity: 0.7;
transition: opacity 0.2s, background 0.2s;
line-height: 1;
}
.settings-trigger:hover,
.settings-trigger[aria-expanded="true"] {
opacity: 1;
background: var(--color-sidebar-background);
}
/* Popover panel */
.settings-popover {
position: absolute;
top: 100%;
right: 0;
width: 280px;
max-height: 420px;
overflow-y: auto;
background: var(--color-base-background);
border: 1px solid var(--color-sidebar-border);
border-radius: var(--radius);
box-shadow: 0 8px 24px rgba(0, 0, 0, 0.12);
z-index: 200;
display: none;
flex-direction: column;
}
.settings-popover[data-open="true"] {
display: flex;
animation: settings-slide-in 0.2s ease;
}
@keyframes settings-slide-in {
from { opacity: 0; transform: translateY(-8px); }
to { opacity: 1; transform: translateY(0); }
}
.settings-popover-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.75rem 1rem;
border-bottom: 1px solid var(--color-sidebar-border);
font-weight: 600;
font-size: 0.9rem;
flex-shrink: 0;
}
.settings-popover-close {
background: none;
border: none;
font-size: 1.2rem;
cursor: pointer;
color: var(--color-base-font);
opacity: 0.6;
padding: 0 0.25rem;
line-height: 1;
}
.settings-popover-close:hover { opacity: 1; }
.settings-popover-body {
padding: 0.8rem;
display: flex;
flex-direction: column;
gap: 1rem;
}
.settings-section-title {
font-size: 0.7rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--color-suggestion);
margin-bottom: 0.5rem;
}
/* Theme buttons */
.theme-buttons {
display: flex;
gap: 0.4rem;
}
.theme-btn {
flex: 1;
padding: 0.35rem 0.5rem;
border: 1px solid var(--color-sidebar-border);
border-radius: var(--radius);
background: var(--color-btn-background);
color: var(--color-base-font);
cursor: pointer;
font-size: 0.75rem;
text-align: center;
transition: background 0.15s, border-color 0.15s;
}
.theme-btn:hover { background: var(--color-btn-hover); }
.theme-btn.active {
background: var(--color-link);
color: #fff;
border-color: var(--color-link);
}
/* Engine toggles — 2-column grid */
.engine-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 0.4rem;
}
.engine-toggle {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.3rem 0.5rem;
border-radius: var(--radius);
background: var(--color-sidebar-background);
font-size: 0.78rem;
cursor: pointer;
}
.engine-toggle input[type="checkbox"] {
width: 15px;
height: 15px;
margin: 0;
cursor: pointer;
accent-color: var(--color-link);
}
.engine-toggle span {
flex: 1;
min-width: 0;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
/* Search defaults */
.setting-row {
display: flex;
align-items: center;
justify-content: space-between;
gap: 0.5rem;
margin-top: 0.4rem;
}
.setting-row label {
font-size: 0.85rem;
flex: 1;
}
.setting-row select {
width: 110px;
padding: 0.3rem 0.4rem;
font-size: 0.8rem;
border: 1px solid var(--color-sidebar-border);
border-radius: var(--radius);
background: var(--color-base-background);
color: var(--color-base-font);
cursor: pointer;
}
/* Mid-search notice */
.settings-notice {
font-size: 0.72rem;
color: var(--color-suggestion);
margin-top: 0.3rem;
font-style: italic;
}
/* Dark theme via data-theme attribute */
html[data-theme="dark"] {
--color-base: #222;
--color-base-font: #dcdcdc;
--color-base-background: #2b2b2b;
--color-header-background: #333;
--color-header-border: #444;
--color-search-border: #555;
--color-search-focus: #5dade2;
--color-result-url: #8ab4f8;
--color-result-url-visited: #b39ddb;
--color-result-content: #b0b0b0;
--color-result-title: #8ab4f8;
--color-result-title-visited: #b39ddb;
--color-result-engine: #999;
--color-result-border: #3a3a3a;
--color-link: #5dade2;
--color-link-visited: #b39ddb;
--color-sidebar-background: #333;
--color-sidebar-border: #444;
--color-infobox-background: #333;
--color-infobox-border: #444;
--color-pagination-current: #5dade2;
--color-pagination-border: #444;
--color-error: #e74c3c;
--color-error-background: #3b1a1a;
--color-suggestion: #999;
--color-footer: #666;
--color-btn-background: #333;
--color-btn-border: #555;
--color-btn-hover: #444;
}
/* Mobile: Bottom sheet + FAB trigger */
@media (max-width: 768px) {
/* Hide desktop trigger, show FAB */
.settings-trigger-desktop {
display: none;
}
.settings-trigger-mobile {
display: block;
}
.settings-popover {
position: fixed;
top: auto;
bottom: 0;
left: 0;
right: 0;
width: 100%;
max-height: 70vh;
border-radius: var(--radius) var(--radius) 0 0;
border-bottom: none;
}
/* FAB: fixed bottom-right button visible only on mobile */
.settings-trigger-mobile {
display: block;
position: fixed;
bottom: 1.5rem;
right: 1.5rem;
width: 48px;
height: 48px;
border-radius: 50%;
background: var(--color-link);
color: #fff;
border: none;
box-shadow: 0 4px 12px rgba(0,0,0,0.2);
font-size: 1.2rem;
z-index: 199;
opacity: 1;
}
}
```
Note: The existing `:root` and `@media (prefers-color-scheme: dark)` blocks provide the "system" theme. `html[data-theme="dark"]` overrides only apply when the user explicitly picks dark mode. When `theme === 'system'`, the `data-theme` attribute is removed and the browser's `prefers-color-scheme` media query kicks in via the existing CSS.
- [ ] **Step 2: Verify existing tests still pass**
Run: `go test ./...`
Expected: all pass
- [ ] **Step 3: Commit**
```bash
git add internal/views/static/css/kafka.css
git commit -m "feat(settings): add popover, toggle, and bottom-sheet CSS"
```
---
## Task 2: JS — Settings logic
**Files:**
- Create: `internal/views/static/js/settings.js`
- [ ] **Step 1: Write the settings JS module**
Create `internal/views/static/js/settings.js`:
```javascript
'use strict';
var ALL_ENGINES = [
'wikipedia', 'arxiv', 'crossref', 'braveapi',
'qwant', 'duckduckgo', 'github', 'reddit', 'bing'
];
var DEFAULT_PREFS = {
theme: 'system',
engines: ALL_ENGINES.slice(),
safeSearch: 'moderate',
format: 'html'
};
var STORAGE_KEY = 'kafka_prefs';
// ── Persistence ──────────────────────────────────────────────────────────────
function loadPrefs() {
try {
var raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return { theme: DEFAULT_PREFS.theme, engines: DEFAULT_PREFS.engines.slice(), safeSearch: DEFAULT_PREFS.safeSearch, format: DEFAULT_PREFS.format };
var saved = JSON.parse(raw);
return { theme: saved.theme || DEFAULT_PREFS.theme, engines: saved.engines || DEFAULT_PREFS.engines.slice(), safeSearch: saved.safeSearch || DEFAULT_PREFS.safeSearch, format: saved.format || DEFAULT_PREFS.format };
} catch (e) {
return { theme: DEFAULT_PREFS.theme, engines: DEFAULT_PREFS.engines.slice(), safeSearch: DEFAULT_PREFS.safeSearch, format: DEFAULT_PREFS.format };
}
}
function savePrefs(prefs) {
try {
localStorage.setItem(STORAGE_KEY, JSON.stringify({ theme: prefs.theme, engines: prefs.engines, safeSearch: prefs.safeSearch, format: prefs.format }));
} catch (e) { /* quota or private mode */ }
}
// ── Theme application ────────────────────────────────────────────────────────
function applyTheme(theme) {
if (theme === 'system') {
document.documentElement.removeAttribute('data-theme');
} else {
document.documentElement.setAttribute('data-theme', theme);
}
}
// ── Engine input sync ─────────────────────────────────────────────────────────
function syncEngineInput(prefs) {
var input = document.getElementById('engines-input');
if (input) input.value = prefs.engines.join(',');
}
// ── Panel open / close ────────────────────────────────────────────────────────
function closePanel() {
var panel = document.getElementById('settings-popover');
var trigger = document.getElementById('settings-trigger');
if (!panel) return;
panel.setAttribute('data-open', 'false');
if (trigger) trigger.setAttribute('aria-expanded', 'false');
if (trigger) trigger.focus();
}
function openPanel() {
var panel = document.getElementById('settings-popover');
var trigger = document.getElementById('settings-trigger');
if (!panel) return;
panel.setAttribute('data-open', 'true');
if (trigger) trigger.setAttribute('aria-expanded', 'true');
var focusable = panel.querySelector('button, input, select');
if (focusable) focusable.focus();
}
// ── Escape key ───────────────────────────────────────────────────────────────
document.addEventListener('keydown', function(e) {
if (e.key !== 'Escape') return;
var panel = document.getElementById('settings-popover');
if (!panel || panel.getAttribute('data-open') !== 'true') return;
closePanel();
});
// ── Click outside ─────────────────────────────────────────────────────────────
document.addEventListener('click', function(e) {
var panel = document.getElementById('settings-popover');
var trigger = document.getElementById('settings-trigger');
if (!panel || panel.getAttribute('data-open') !== 'true') return;
if (!panel.contains(e.target) && (!trigger || !trigger.contains(e.target))) {
closePanel();
}
});
// ── Focus trap ────────────────────────────────────────────────────────────────
document.addEventListener('keydown', function(e) {
if (e.key !== 'Tab') return;
var panel = document.getElementById('settings-popover');
if (!panel || panel.getAttribute('data-open') !== 'true') return;
var focusable = Array.prototype.slice.call(panel.querySelectorAll('button, input, select, [tabindex]:not([tabindex="-1"])'));
if (!focusable.length) return;
var first = focusable[0];
var last = focusable[focusable.length - 1];
if (e.shiftKey) {
if (document.activeElement === first) { e.preventDefault(); last.focus(); }
} else {
if (document.activeElement === last) { e.preventDefault(); first.focus(); }
}
});
// ── Render ────────────────────────────────────────────────────────────────────
function escapeHtml(str) {
return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
}
function renderPanel(prefs) {
var panel = document.getElementById('settings-popover');
if (!panel) return;
var body = panel.querySelector('.settings-popover-body');
if (!body) return;
var themeBtns = '';
['light', 'dark', 'system'].forEach(function(t) {
var icons = { light: '\u2600', dark: '\u263D', system: '\u2318' };
var labels = { light: 'Light', dark: 'Dark', system: 'System' };
var active = prefs.theme === t ? ' active' : '';
themeBtns += '<button class="theme-btn' + active + '" data-theme="' + t + '">' + icons[t] + ' ' + labels[t] + '</button>';
});
var engineToggles = '';
ALL_ENGINES.forEach(function(name) {
var checked = prefs.engines.indexOf(name) !== -1 ? ' checked' : '';
engineToggles += '<label class="engine-toggle"><input type="checkbox" value="' + escapeHtml(name) + '"' + checked + '><span>' + escapeHtml(name) + '</span></label>';
});
var ssOptions = [
{ val: 'moderate', label: 'Moderate' },
{ val: 'strict', label: 'Strict' },
{ val: 'off', label: 'Off' }
];
var fmtOptions = [
{ val: 'html', label: 'HTML' },
{ val: 'json', label: 'JSON' },
{ val: 'csv', label: 'CSV' },
{ val: 'rss', label: 'RSS' }
];
var ssOptionsHtml = '';
var fmtOptionsHtml = '';
ssOptions.forEach(function(o) {
var sel = prefs.safeSearch === o.val ? ' selected' : '';
ssOptionsHtml += '<option value="' + o.val + '"' + sel + '>' + o.label + '</option>';
});
fmtOptions.forEach(function(o) {
var sel = prefs.format === o.val ? ' selected' : '';
fmtOptionsHtml += '<option value="' + o.val + '"' + sel + '>' + o.label + '</option>';
});
body.innerHTML =
'<div class="settings-section">' +
'<div class="settings-section-title">Appearance</div>' +
'<div class="theme-buttons">' + themeBtns + '</div>' +
'</div>' +
'<div class="settings-section">' +
'<div class="settings-section-title">Engines</div>' +
'<div class="engine-grid">' + engineToggles + '</div>' +
'<p class="settings-notice">Engine changes apply to your next search.</p>' +
'</div>' +
'<div class="settings-section">' +
'<div class="settings-section-title">Search Defaults</div>' +
'<div class="setting-row">' +
'<label for="pref-safesearch">Safe search</label>' +
'<select id="pref-safesearch">' + ssOptionsHtml + '</select>' +
'</div>' +
'<div class="setting-row">' +
'<label for="pref-format">Default format</label>' +
'<select id="pref-format">' + fmtOptionsHtml + '</select>' +
'</div>' +
'</div>';
// Theme buttons
var themeBtnEls = panel.querySelectorAll('.theme-btn');
for (var i = 0; i < themeBtnEls.length; i++) {
themeBtnEls[i].addEventListener('click', (function(btn) {
return function() {
prefs.theme = btn.getAttribute('data-theme');
savePrefs(prefs);
applyTheme(prefs.theme);
syncEngineInput(prefs);
renderPanel(prefs);
};
})(themeBtnEls[i]));
}
// Engine checkboxes
var checkboxes = panel.querySelectorAll('.engine-toggle input[type="checkbox"]');
for (var j = 0; j < checkboxes.length; j++) {
checkboxes[j].addEventListener('change', (function(cb) {
return function() {
var checked = Array.prototype.slice.call(panel.querySelectorAll('.engine-toggle input[type="checkbox"]:checked')).map(function(el) { return el.value; });
if (checked.length === 0) { cb.checked = true; return; }
prefs.engines = checked;
savePrefs(prefs);
syncEngineInput(prefs);
};
})(checkboxes[j]));
}
// Safe search
var ssEl = panel.querySelector('#pref-safesearch');
if (ssEl) {
ssEl.addEventListener('change', function() {
prefs.safeSearch = ssEl.value;
savePrefs(prefs);
});
}
// Format
var fmtEl = panel.querySelector('#pref-format');
if (fmtEl) {
fmtEl.addEventListener('change', function() {
prefs.format = fmtEl.value;
savePrefs(prefs);
});
}
// Close button
var closeBtn = panel.querySelector('.settings-popover-close');
if (closeBtn) closeBtn.addEventListener('click', closePanel);
}
// ── Init ─────────────────────────────────────────────────────────────────────
function initSettings() {
var prefs = loadPrefs();
applyTheme(prefs.theme);
syncEngineInput(prefs);
var panel = document.getElementById('settings-popover');
var trigger = document.getElementById('settings-trigger');
var mobileTrigger = document.getElementById('settings-trigger-mobile');
if (panel) {
renderPanel(prefs);
function togglePanel() {
var isOpen = panel.getAttribute('data-open') === 'true';
if (isOpen) closePanel(); else openPanel();
}
if (trigger) trigger.addEventListener('click', togglePanel);
if (mobileTrigger) mobileTrigger.addEventListener('click', togglePanel);
}
}
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', initSettings);
} else {
initSettings();
}
```
- [ ] **Step 2: Verify JS syntax**
Run: `node --check internal/views/static/js/settings.js`
Expected: no output (exit 0)
- [ ] **Step 3: Commit**
```bash
git add internal/views/static/js/settings.js
git commit -m "feat(settings): add JS module for localStorage preferences and panel"
```
---
## Task 3: HTML — Gear trigger, panel markup, header in base
**Files:**
- Modify: `internal/views/templates/base.html`
- Modify: `internal/views/views.go`
- [ ] **Step 1: Add ShowHeader to PageData**
In `views.go`, add `ShowHeader bool` to `PageData` struct.
- [ ] **Step 2: Set ShowHeader in render functions**
In `RenderIndex` and `RenderSearch`, set `PageData.ShowHeader = true`.
- [ ] **Step 3: Update base.html — add header and settings markup**
In `base.html`, update the `<body>` to:
```html
<body class="{{if .Query}}search_on_results{{end}}">
{{if .ShowHeader}}
<header class="site-header">
<span class="site-title">kafka</span>
<!-- Desktop trigger (hidden on mobile) -->
<button id="settings-trigger" class="settings-trigger settings-trigger-desktop"
aria-label="Preferences" aria-expanded="false" aria-controls="settings-popover">&#9881;</button>
</header>
<!-- Mobile FAB trigger (hidden on desktop, shown via CSS on mobile) -->
<button id="settings-trigger-mobile" class="settings-trigger settings-trigger-mobile"
aria-label="Preferences" aria-expanded="false" aria-controls="settings-popover"
style="display:none;">&#9881;</button>
{{end}}
<main>
{{template "content" .}}
</main>
<footer>
<p>Powered by <a href="https://git.ashisgreat.xyz/penal-colony/kafka">kafka</a> — a privacy-respecting, open metasearch engine</p>
</footer>
<script src="/static/js/settings.js"></script>
<div id="settings-popover" data-open="false" role="dialog" aria-label="Preferences" aria-modal="true">
<div class="settings-popover-header">
Preferences
<button class="settings-popover-close" aria-label="Close">&#215;</button>
</div>
<div class="settings-popover-body"></div>
</div>
<script>
(function () {
'use strict';
var input = document.getElementById('q');
var dropdown = document.getElementById('autocomplete-dropdown');
var form = document.getElementById('search-form');
var debounceTimer = null;
var suggestions = [];
var activeIndex = -1;
var fetchController = null;
// ... existing autocomplete JS stays unchanged ...
}());
</script>
</body>
```
**Note:** The existing autocomplete `<script>` block is preserved as-is. Only the body wrapper and settings elements are added.
- [ ] **Step 4: Run tests**
Run: `go test ./...`
Expected: all pass
- [ ] **Step 5: Commit**
```bash
git add internal/views/templates/base.html internal/views/views.go
git commit -m "feat(settings): add gear trigger and panel markup to base template"
```
---
## Task 4: Search form — Inject engine preferences
**Files:**
- Modify: `internal/views/templates/index.html`
- Modify: `internal/views/templates/results.html`
- [ ] **Step 1: Add hidden engines input to both search forms**
In `index.html`, add inside the `<form>`:
```html
<input type="hidden" name="engines" id="engines-input" value="">
```
In `results.html`, add inside the `<form>`:
```html
<input type="hidden" name="engines" id="engines-input" value="">
```
The `value` is populated by `syncEngineInput(prefs)` on page load. When the form submits (regular GET or HTMX), the `engines` parameter is included as a CSV string, which `ParseSearchRequest` reads correctly via `r.FormValue("engines")`.
- [ ] **Step 2: Verify existing search works**
Run: `go run ./cmd/kafka -config config.toml`
Open: `http://localhost:8080`
Search for "golang" — results should appear as normal.
- [ ] **Step 3: Commit**
```bash
git add internal/views/templates/index.html internal/views/templates/results.html
git commit -m "feat(settings): add hidden engines input to search forms"
```
---
## Task 5: End-to-end verification
- [ ] **Step 1: Start server**
Run: `go run ./cmd/kafka -config config.toml`
Open: `http://localhost:8080`
- [ ] **Step 2: Verify gear icon and panel**
Click the gear icon in the header — panel drops down from top-right with Appearance, Engines, and Search Defaults sections.
- [ ] **Step 3: Verify theme persistence**
Click Dark → page colors change immediately. Refresh → dark theme persists.
- [ ] **Step 4: Verify engine toggle persistence**
Uncheck "wikipedia", refresh → "wikipedia" stays unchecked.
- [ ] **Step 5: Verify engines appear in search query**
With wikipedia unchecked, open DevTools → Network tab, search "golang". Verify request URL includes `&engines=arxiv,crossref,...` (no wikipedia).
- [ ] **Step 6: Verify mobile bottom sheet**
Resize to <768px or use mobile device emulation. Click gear full-width sheet slides up from bottom.
- [ ] **Step 7: Final commit**
```bash
git add -A
git commit -m "feat: complete settings UI — popover, auto-save, theme, engines, mobile bottom-sheet"
```

View file

@ -0,0 +1,789 @@
# Per-Engine TTL Cache — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the merged-response cache with per-engine response caching, enabling tier-based TTLs and stale-while-revalidate semantics.
**Architecture:** Each engine's raw response is cached independently with its tier-based TTL. On stale hits, return cached data immediately and refresh in background. Query hash is computed from shared params (query, pageno, safesearch, language, time_range) and prefixed with engine name for the cache key.
**Tech Stack:** Go 1.24, Valkey/Redis (go-redis/v9), existing samsa contracts
---
## File Map
| Action | File | Responsibility |
|--------|------|----------------|
| Create | `internal/cache/tiers.go` | Tier definitions, `EngineTier()` function |
| Create | `internal/cache/tiers_test.go` | Tests for EngineTier |
| Create | `internal/cache/engine_cache.go` | `EngineCache` struct with tier-aware Get/Set |
| Create | `internal/cache/engine_cache_test.go` | Tests for EngineCache |
| Modify | `internal/cache/cache.go` | Add `QueryHash()`, add `CachedEngineResponse` type |
| Modify | `internal/cache/cache_test.go` | Add tests for `QueryHash()` |
| Modify | `internal/config/config.go` | Add `TTLOverrides` to `CacheConfig` |
| Modify | `internal/search/service.go` | Use `EngineCache`, parallel lookups, background refresh |
---
## Task 1: Add QueryHash and CachedEngineResponse to cache.go
**Files:**
- Modify: `internal/cache/cache.go`
- Modify: `internal/cache/cache_test.go`
- [ ] **Step 1: Write failing test for QueryHash()**
```go
// In cache_test.go, add:
func TestQueryHash(t *testing.T) {
// Same params should produce same hash
hash1 := QueryHash("golang", 1, 0, "en", "")
hash2 := QueryHash("golang", 1, 0, "en", "")
if hash1 != hash2 {
t.Errorf("QueryHash: same params should produce same hash, got %s != %s", hash1, hash2)
}
// Different query should produce different hash
hash3 := QueryHash("rust", 1, 0, "en", "")
if hash1 == hash3 {
t.Errorf("QueryHash: different queries should produce different hash")
}
// Different pageno should produce different hash
hash4 := QueryHash("golang", 2, 0, "en", "")
if hash1 == hash4 {
t.Errorf("QueryHash: different pageno should produce different hash")
}
// time_range should affect hash
hash5 := QueryHash("golang", 1, 0, "en", "day")
if hash1 == hash5 {
t.Errorf("QueryHash: different time_range should produce different hash")
}
// Hash should be 16 characters (truncated SHA-256)
if len(hash1) != 16 {
t.Errorf("QueryHash: expected 16 char hash, got %d", len(hash1))
}
}
```
- [ ] **Step 2: Run test to verify it fails**
Run: `nix develop --command bash -c "go test -run TestQueryHash ./internal/cache/ -v"`
Expected: FAIL — "QueryHash not defined"
- [ ] **Step 3: Implement QueryHash() and CachedEngineResponse in cache.go**
Add to `cache.go` (the imports `crypto/sha256` and `encoding/hex` are already present in cache.go from the existing `Key()` function):
```go
// QueryHash computes a deterministic hash from shared request parameters
// (query, pageno, safesearch, language, time_range) for use as a cache key suffix.
// The hash is a truncated SHA-256 (16 hex chars).
func QueryHash(query string, pageno int, safesearch int, language, timeRange string) string {
h := sha256.New()
fmt.Fprintf(h, "q=%s|", query)
fmt.Fprintf(h, "pageno=%d|", pageno)
fmt.Fprintf(h, "safesearch=%d|", safesearch)
fmt.Fprintf(h, "lang=%s|", language)
if timeRange != "" {
fmt.Fprintf(h, "tr=%s|", timeRange)
}
return hex.EncodeToString(h.Sum(nil))[:16]
}
// CachedEngineResponse wraps an engine's cached response with metadata.
type CachedEngineResponse struct {
Engine string
Response []byte
StoredAt time.Time
}
```
- [ ] **Step 4: Run test to verify it passes**
Run: `nix develop --command bash -c "go test -run TestQueryHash ./internal/cache/ -v"`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add internal/cache/cache.go internal/cache/cache_test.go
git commit -m "cache: add QueryHash and CachedEngineResponse type"
```
---
## Task 2: Create tiers.go with tier definitions
**Files:**
- Create: `internal/cache/tiers.go`
- [ ] **Step 1: Create tiers.go with tier definitions and EngineTier function**
```go
package cache
import "time"
// TTLTier represents a cache TTL tier with a name and duration.
type TTLTier struct {
Name string
Duration time.Duration
}
// defaultTiers maps engine names to their default TTL tiers.
var defaultTiers = map[string]TTLTier{
// Static knowledge engines — rarely change
"wikipedia": {Name: "static", Duration: 24 * time.Hour},
"wikidata": {Name: "static", Duration: 24 * time.Hour},
"arxiv": {Name: "static", Duration: 24 * time.Hour},
"crossref": {Name: "static", Duration: 24 * time.Hour},
"stackoverflow": {Name: "static", Duration: 24 * time.Hour},
"github": {Name: "static", Duration: 24 * time.Hour},
// API-based general search — fresher data
"braveapi": {Name: "api_general", Duration: 1 * time.Hour},
"youtube": {Name: "api_general", Duration: 1 * time.Hour},
// Scraped general search — moderately stable
"google": {Name: "scraped_general", Duration: 2 * time.Hour},
"bing": {Name: "scraped_general", Duration: 2 * time.Hour},
"duckduckgo": {Name: "scraped_general", Duration: 2 * time.Hour},
"qwant": {Name: "scraped_general", Duration: 2 * time.Hour},
"brave": {Name: "scraped_general", Duration: 2 * time.Hour},
// News/social — changes frequently
"reddit": {Name: "news_social", Duration: 30 * time.Minute},
// Image search
"bing_images": {Name: "images", Duration: 1 * time.Hour},
"ddg_images": {Name: "images", Duration: 1 * time.Hour},
"qwant_images": {Name: "images", Duration: 1 * time.Hour},
}
// EngineTier returns the TTL tier for an engine, applying overrides if provided.
// If the engine has no defined tier, returns a default of 1 hour.
func EngineTier(engineName string, overrides map[string]time.Duration) TTLTier {
// Check override first — override tier name is just the engine name
if override, ok := overrides[engineName]; ok && override > 0 {
return TTLTier{Name: engineName, Duration: override}
}
// Fall back to default tier
if tier, ok := defaultTiers[engineName]; ok {
return tier
}
// Unknown engines get a sensible default
return TTLTier{Name: "unknown", Duration: 1 * time.Hour}
}
```
- [ ] **Step 2: Run go vet to verify it compiles**
Run: `nix develop --command bash -c "go vet ./internal/cache/tiers.go"`
Expected: no output (success)
- [ ] **Step 3: Write a basic test for EngineTier**
```go
// In internal/cache/tiers_test.go:
package cache
import "testing"
func TestEngineTier(t *testing.T) {
// Test default static tier
tier := EngineTier("wikipedia", nil)
if tier.Name != "static" || tier.Duration != 24*time.Hour {
t.Errorf("wikipedia: expected static/24h, got %s/%v", tier.Name, tier.Duration)
}
// Test default api_general tier
tier = EngineTier("braveapi", nil)
if tier.Name != "api_general" || tier.Duration != 1*time.Hour {
t.Errorf("braveapi: expected api_general/1h, got %s/%v", tier.Name, tier.Duration)
}
// Test override takes precedence — override tier name is just the engine name
override := 48 * time.Hour
tier = EngineTier("wikipedia", map[string]time.Duration{"wikipedia": override})
if tier.Name != "wikipedia" || tier.Duration != 48*time.Hour {
t.Errorf("wikipedia override: expected wikipedia/48h, got %s/%v", tier.Name, tier.Duration)
}
// Test unknown engine gets default
tier = EngineTier("unknown_engine", nil)
if tier.Name != "unknown" || tier.Duration != 1*time.Hour {
t.Errorf("unknown engine: expected unknown/1h, got %s/%v", tier.Name, tier.Duration)
}
}
```
- [ ] **Step 4: Run test to verify it passes**
Run: `nix develop --command bash -c "go test -run TestEngineTier ./internal/cache/ -v"`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add internal/cache/tiers.go internal/cache/tiers_test.go
git commit -m "cache: add tier definitions and EngineTier function"
```
---
## Task 3: Create EngineCache in engine_cache.go
**Files:**
- Create: `internal/cache/engine_cache.go`
- Create: `internal/cache/engine_cache_test.go`
**Note:** The existing `Key()` function in `cache.go` is still used for favicon caching. The new `QueryHash()` and `EngineCache` are separate and only for per-engine search response caching.
- [ ] **Step 1: Write failing test for EngineCache.Get/Set**
```go
package cache
import (
"context"
"testing"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
func TestEngineCacheGetSet(t *testing.T) {
// Create a disabled cache for unit testing (nil client)
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
ctx := context.Background()
cached, ok := ec.Get(ctx, "wikipedia", "abc123")
if ok {
t.Errorf("Get on disabled cache: expected false, got %v", ok)
}
_ = cached // unused when ok=false
}
func TestEngineCacheKeyFormat(t *testing.T) {
key := engineCacheKey("wikipedia", "abc123")
if key != "samsa:resp:wikipedia:abc123" {
t.Errorf("engineCacheKey: expected samsa:resp:wikipedia:abc123, got %s", key)
}
}
func TestEngineCacheIsStale(t *testing.T) {
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
// Fresh response (stored 1 minute ago, wikipedia has 24h TTL)
fresh := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-1 * time.Minute),
}
if ec.IsStale(fresh, "wikipedia") {
t.Errorf("IsStale: 1-minute-old wikipedia should NOT be stale")
}
// Stale response (stored 25 hours ago)
stale := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-25 * time.Hour),
}
if !ec.IsStale(stale, "wikipedia") {
t.Errorf("IsStale: 25-hour-old wikipedia SHOULD be stale (24h TTL)")
}
// Override: 30 minute TTL for reddit
overrides := map[string]time.Duration{"reddit": 30 * time.Minute}
ec2 := NewEngineCache(c, overrides)
// 20 minutes old with 30m override should NOT be stale
redditFresh := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-20 * time.Minute),
}
if ec2.IsStale(redditFresh, "reddit") {
t.Errorf("IsStale: 20-min reddit with 30m override should NOT be stale")
}
// 45 minutes old with 30m override SHOULD be stale
redditStale := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-45 * time.Minute),
}
if !ec2.IsStale(redditStale, "reddit") {
t.Errorf("IsStale: 45-min reddit with 30m override SHOULD be stale")
}
}
```
- [ ] **Step 2: Run test to verify it fails**
Run: `nix develop --command bash -c "go test -run TestEngineCache ./internal/cache/ -v"`
Expected: FAIL — "EngineCache not defined" or "CachedEngineResponse not defined"
- [ ] **Step 3: Implement EngineCache using GetBytes/SetBytes**
The `EngineCache` uses the existing `GetBytes`/`SetBytes` public methods on `Cache` (the `client` field is unexported so we must use those methods).
```go
package cache
import (
"context"
"encoding/json"
"log/slog"
"time"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// EngineCache wraps Cache with per-engine tier-aware Get/Set operations.
type EngineCache struct {
cache *Cache
overrides map[string]time.Duration
}
// NewEngineCache creates a new EngineCache with optional TTL overrides.
// If overrides is nil, default tier durations are used.
func NewEngineCache(cache *Cache, overrides map[string]time.Duration) *EngineCache {
return &EngineCache{
cache: cache,
overrides: overrides,
}
}
// Get retrieves a cached engine response. Returns (zero value, false) if not
// found or if cache is disabled.
func (ec *EngineCache) Get(ctx context.Context, engine, queryHash string) (CachedEngineResponse, bool) {
key := engineCacheKey(engine, queryHash)
data, ok := ec.cache.GetBytes(ctx, key)
if !ok {
return CachedEngineResponse{}, false
}
var cached CachedEngineResponse
if err := json.Unmarshal(data, &cached); err != nil {
ec.cache.logger.Warn("engine cache hit but unmarshal failed", "key", key, "error", err)
return CachedEngineResponse{}, false
}
ec.cache.logger.Debug("engine cache hit", "key", key, "engine", engine)
return cached, true
}
// Set stores an engine response in the cache with the engine's tier TTL.
func (ec *EngineCache) Set(ctx context.Context, engine, queryHash string, resp contracts.SearchResponse) {
if !ec.cache.Enabled() {
return
}
data, err := json.Marshal(resp)
if err != nil {
ec.cache.logger.Warn("engine cache set: marshal failed", "engine", engine, "error", err)
return
}
tier := EngineTier(engine, ec.overrides)
key := engineCacheKey(engine, queryHash)
cached := CachedEngineResponse{
Engine: engine,
Response: data,
StoredAt: time.Now(),
}
cachedData, err := json.Marshal(cached)
if err != nil {
ec.cache.logger.Warn("engine cache set: wrap marshal failed", "key", key, "error", err)
return
}
ec.cache.SetBytes(ctx, key, cachedData, tier.Duration)
}
// IsStale returns true if the cached response is older than the tier's TTL.
func (ec *EngineCache) IsStale(cached CachedEngineResponse, engine string) bool {
tier := EngineTier(engine, ec.overrides)
return time.Since(cached.StoredAt) > tier.Duration
}
// Logger returns the logger for background refresh logging.
func (ec *EngineCache) Logger() *slog.Logger {
return ec.cache.logger
}
// engineCacheKey builds the cache key for an engine+query combination.
func engineCacheKey(engine, queryHash string) string {
return "samsa:resp:" + engine + ":" + queryHash
}
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `nix develop --command bash -c "go test -run TestEngineCache ./internal/cache/ -v"`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add internal/cache/engine_cache.go internal/cache/engine_cache_test.go
git commit -m "cache: add EngineCache with tier-aware Get/Set"
```
---
## Task 4: Add TTLOverrides to config
**Files:**
- Modify: `internal/config/config.go`
- [ ] **Step 1: Add TTLOverrides to CacheConfig**
In `CacheConfig` struct, add:
```go
type CacheConfig struct {
Address string `toml:"address"`
Password string `toml:"password"`
DB int `toml:"db"`
DefaultTTL string `toml:"default_ttl"`
TTLOverrides map[string]string `toml:"ttl_overrides"` // engine -> duration string
}
```
- [ ] **Step 2: Add TTLOverridesParsed() method to Config**
Add after `CacheTTL()`:
```go
// CacheTTLOverrides returns parsed TTL overrides from config.
func (c *Config) CacheTTLOverrides() map[string]time.Duration {
if len(c.Cache.TTLOverrides) == 0 {
return nil
}
out := make(map[string]time.Duration, len(c.Cache.TTLOverrides))
for engine, durStr := range c.Cache.TTLOverrides {
if d, err := time.ParseDuration(durStr); err == nil && d > 0 {
out[engine] = d
}
}
return out
}
```
- [ ] **Step 3: Run tests to verify nothing breaks**
Run: `nix develop --command bash -c "go test ./internal/config/ -v"`
Expected: PASS
- [ ] **Step 4: Commit**
```bash
git add internal/config/config.go
git commit -m "config: add TTLOverrides to CacheConfig"
```
---
## Task 5: Wire EngineCache into search service
**Files:**
- Modify: `internal/search/service.go`
- [ ] **Step 1: Read the current service.go to understand wiring**
The service currently takes `*Cache` in `ServiceConfig`. We need to change it to take `*EngineCache` or change the field type.
- [ ] **Step 2: Modify Service struct and NewService to use EngineCache**
Change `Service`:
```go
type Service struct {
upstreamClient *upstream.Client
planner *engines.Planner
localEngines map[string]engines.Engine
engineCache *cache.EngineCache
}
```
Change `NewService`:
```go
func NewService(cfg ServiceConfig) *Service {
timeout := cfg.HTTPTimeout
if timeout <= 0 {
timeout = 10 * time.Second
}
httpClient := httpclient.NewClient(timeout)
var up *upstream.Client
if cfg.UpstreamURL != "" {
c, err := upstream.NewClient(cfg.UpstreamURL, timeout)
if err == nil {
up = c
}
}
var engineCache *cache.EngineCache
if cfg.Cache != nil {
engineCache = cache.NewEngineCache(cfg.Cache, cfg.CacheTTLOverrides)
}
return &Service{
upstreamClient: up,
planner: engines.NewPlannerFromEnv(),
localEngines: engines.NewDefaultPortedEngines(httpClient, cfg.EnginesConfig),
engineCache: engineCache,
}
}
```
Add `CacheTTLOverrides` to `ServiceConfig`:
```go
type ServiceConfig struct {
UpstreamURL string
HTTPTimeout time.Duration
Cache *cache.Cache
CacheTTLOverrides map[string]time.Duration
EnginesConfig *config.Config
}
```
- [ ] **Step 3: Rewrite Search() with correct stale-while-revalidate logic**
The stale-while-revalidate flow:
1. **Cache lookup (Phase 1)**: Check cache for each engine in parallel. Classify each as:
- Fresh hit: cache has data AND not stale → deserialize, mark as `fresh`
- Stale hit: cache has data AND stale → keep in `cached`, no `fresh` yet
- Miss: cache has no data → `hit=false`, no `cached` or `fresh`
2. **Fetch (Phase 2)**: For each engine:
- Fresh hit: return immediately, no fetch needed
- Stale hit: return stale data immediately, fetch fresh in background
- Miss: fetch fresh synchronously, cache result
3. **Collect (Phase 3)**: Collect all responses for merge.
```go
// Search executes the request against local engines (in parallel) and
// optionally the upstream instance for unported engines.
func (s *Service) Search(ctx context.Context, req SearchRequest) (SearchResponse, error) {
queryHash := cache.QueryHash(
req.Query,
int(req.Pageno),
int(req.Safesearch),
req.Language,
derefString(req.TimeRange),
)
localEngineNames, upstreamEngineNames, _ := s.planner.Plan(req)
// Phase 1: Parallel cache lookups — classify each engine as fresh/stale/miss
type cacheResult struct {
engine string
cached cache.CachedEngineResponse
hit bool
fresh contracts.SearchResponse
fetchErr error
unmarshalErr bool // true if hit but unmarshal failed (treat as miss)
}
cacheResults := make([]cacheResult, len(localEngineNames))
var lookupWg sync.WaitGroup
for i, name := range localEngineNames {
lookupWg.Add(1)
go func(i int, name string) {
defer lookupWg.Done()
result := cacheResult{engine: name}
if s.engineCache != nil {
cached, ok := s.engineCache.Get(ctx, name, queryHash)
if ok {
result.hit = true
result.cached = cached
if !s.engineCache.IsStale(cached, name) {
// Fresh cache hit — deserialize and use directly
var resp contracts.SearchResponse
if err := json.Unmarshal(cached.Response, &resp); err == nil {
result.fresh = resp
} else {
// Unmarshal failed — treat as cache miss (will fetch fresh synchronously)
result.unmarshalErr = true
result.hit = false // treat as miss
}
}
// If stale: result.fresh stays zero, result.cached has stale data
}
}
cacheResults[i] = result
}(i, name)
}
lookupWg.Wait()
// Phase 2: Fetch fresh for misses and stale entries
var fetchWg sync.WaitGroup
for i, name := range localEngineNames {
cr := cacheResults[i]
// Fresh hit — nothing to do in phase 2
if cr.hit && cr.fresh.Response != nil {
continue
}
// Stale hit — return stale immediately, refresh in background
if cr.hit && cr.cached.Response != nil && s.engineCache != nil && s.engineCache.IsStale(cr.cached, name) {
fetchWg.Add(1)
go func(name string) {
defer fetchWg.Done()
eng, ok := s.localEngines[name]
if !ok {
return
}
freshResp, err := eng.Search(ctx, req)
if err != nil {
s.engineCache.Logger().Debug("background refresh failed", "engine", name, "error", err)
return
}
s.engineCache.Set(ctx, name, queryHash, freshResp)
}(name)
continue
}
// Cache miss — fetch fresh synchronously
if !cr.hit {
fetchWg.Add(1)
go func(i int, name string) {
defer fetchWg.Done()
eng, ok := s.localEngines[name]
if !ok {
cacheResults[i] = cacheResult{
engine: name,
fetchErr: fmt.Errorf("engine not registered: %s", name),
}
return
}
freshResp, err := eng.Search(ctx, req)
if err != nil {
cacheResults[i] = cacheResult{
engine: name,
fetchErr: err,
}
return
}
// Cache the fresh response
if s.engineCache != nil {
s.engineCache.Set(ctx, name, queryHash, freshResp)
}
cacheResults[i] = cacheResult{
engine: name,
fresh: freshResp,
hit: false,
}
}(i, name)
}
}
fetchWg.Wait()
// Phase 3: Collect responses for merge
responses := make([]contracts.SearchResponse, 0, len(cacheResults))
for _, cr := range cacheResults {
if cr.fetchErr != nil {
responses = append(responses, unresponsiveResponse(req.Query, cr.engine, cr.fetchErr.Error()))
continue
}
// Use fresh data if available (fresh hit or freshly fetched), otherwise use stale cached
if cr.fresh.Response != nil {
responses = append(responses, cr.fresh)
} else if cr.hit && cr.cached.Response != nil {
var resp contracts.SearchResponse
if err := json.Unmarshal(cr.cached.Response, &resp); err == nil {
responses = append(responses, resp)
}
}
}
// ... rest of upstream proxy and merge logic (unchanged) ...
}
```
Note: The imports need `encoding/json` and `fmt` added. The existing imports in service.go already include `sync` and `time`.
- [ ] **Step 4: Run tests to verify compilation**
Run: `nix develop --command bash -c "go build ./internal/search/"`
Expected: no output (success)
- [ ] **Step 5: Run full test suite**
Run: `nix develop --command bash -c "go test ./..."`
Expected: All pass
- [ ] **Step 6: Commit**
```bash
git add internal/search/service.go
git commit -m "search: wire per-engine cache with tier-aware TTLs"
```
---
## Task 6: Update config.example.toml
**Files:**
- Modify: `config.example.toml`
- [ ] **Step 1: Add TTL overrides section to config.example.toml**
Add after the `[cache]` section:
```toml
[cache.ttl_overrides]
# Per-engine TTL overrides (uncomment to use):
# wikipedia = "48h"
# reddit = "15m"
# braveapi = "2h"
```
- [ ] **Step 2: Commit**
```bash
git add config.example.toml
git commit -m "config: add cache.ttl_overrides example"
```
---
## Verification
After all tasks complete, run:
```bash
nix develop --command bash -c "go test ./... -v 2>&1 | tail -50"
```
All tests should pass. The search service should now cache each engine's response independently with tier-based TTLs.

View file

@ -1,80 +0,0 @@
# Settings UI Design — kafka
**Date:** 2026-03-22
**Status:** Approved
## Overview
A lightweight preferences popover anchored to the top-right, just below the header. Triggered by a gear icon, it lets users adjust theme, enabled engines, and search defaults without leaving their current page. All changes auto-save to `localStorage` on every interaction.
## Layout & Structure
- **Trigger**: Gear icon (⚙️) in the top-right header, aligned with the header's right edge
- **Panel**: 280px wide, max-height 420px, scrollable internally, rounded corners, subtle shadow, anchored top-right (drops down from trigger, like a dropdown)
- **Close**: × button in panel header, click outside the panel, or pressing Escape
- **No Save button** — every interaction immediately writes to `localStorage`
## Interaction Flow
1. User clicks ⚙️ → panel drops down from top-right (200ms ease)
2. User toggles/clicks → changes apply instantly to DOM + write to `localStorage`
3. User clicks × or outside or Escape → panel closes, settings persist
4. **Accessibility**: Focus is trapped within the panel while open. Trigger button uses `aria-expanded` and `aria-controls`. Escape key closes the panel.
## Mid-Search Changes
When opened during an active search on `results.html`:
- Engine toggles update `localStorage` immediately, but **current results remain unchanged**
- A subtle inline note below the engines section: *"Engine changes apply to your next search"*
## Sections
### Appearance
- Three theme buttons: ☀️ Light / 🌙 Dark / 💻 System
- Clicking immediately applies via `document.body.classList` + writes to localStorage
- "System" reads `prefers-color-scheme` and updates on change
### Engines
- 2-column grid of toggle switches for all 9 engines
- Each row: engine name + toggle switch
- Enabled = filled accent color; Disabled = gray outline
### Search Defaults
- Safe search: dropdown (Moderate / Strict / Off)
- Default format: dropdown (HTML / JSON / CSV)
## Default State
```js
const DEFAULT_PREFS = {
theme: "system",
engines: ["wikipedia", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing"],
safeSearch: "moderate",
format: "html"
};
```
## Persistence
```js
// Written on every interaction
localStorage.setItem('kafka_prefs', JSON.stringify({ ... }));
// Read on page load — merge with DEFAULT_PREFS
const saved = JSON.parse(localStorage.getItem('kafka_prefs') || '{}');
const prefs = { ...DEFAULT_PREFS, ...saved };
```
## Responsive Behavior
- **Mobile (<768px)**: Panel becomes a **bottom sheet** — 100% width, slides up from the bottom, top corners rounded, max-height 70vh. Trigger moves to a fixed bottom-right FAB button.
- Panel never covers the search input
## Applied to Existing Code
- `base.html` — add gear button in header, panel markup at end of `<body>`
- `kafka.css` — popover styles, toggle switch styles, bottom sheet styles for mobile
- `settings.js` — localStorage read/write, theme application, panel toggle, aria attributes, focus trap

View file

@ -0,0 +1,219 @@
# Per-Engine TTL Cache — Design
## Overview
Replace the current merged-response cache with a per-engine response cache. Each engine's raw response is cached independently with a tier-based TTL, enabling stale-while-revalidate semantics and more granular freshness control.
## Cache Key Structure
```
samsa:resp:{engine}:{query_hash}
```
Where `query_hash` = SHA-256 of shared request params (query, pageno, safesearch, language, time_range), truncated to 16 hex chars.
Example:
- `samsa:resp:wikipedia:a3f1b2c3d4e5f678`
- `samsa:resp:duckduckgo:a3f1b2c3d4e5f678`
The same query to Wikipedia and DuckDuckGo produce different cache keys, enabling independent TTLs per engine.
## Query Hash
Compute from shared request parameters:
```go
func QueryHash(query string, pageno int, safesearch int, language, timeRange string) string {
h := sha256.New()
fmt.Fprintf(h, "q=%s|", query)
fmt.Fprintf(h, "pageno=%d|", pageno)
fmt.Fprintf(h, "safesearch=%d|", safesearch)
fmt.Fprintf(h, "lang=%s|", language)
if timeRange != "" {
fmt.Fprintf(h, "tr=%s|", timeRange)
}
return hex.EncodeToString(h.Sum(nil))[:16]
}
```
Note: `engines` is NOT included because each engine has its own cache key prefix.
## Cached Data Format
Each cache entry stores:
```go
type CachedEngineResponse struct {
Engine string // engine name
Response []byte // JSON-marshaled contracts.SearchResponse
StoredAt time.Time // when cached (for staleness check)
}
```
## TTL Tiers
### Default Tier Assignments
| Tier | Engines | Default TTL |
|------|---------|-------------|
| `static` | wikipedia, wikidata, arxiv, crossref, stackoverflow, github | 24h |
| `api_general` | braveapi, youtube | 1h |
| `scraped_general` | google, bing, duckduckgo, qwant, brave | 2h |
| `news_social` | reddit | 30m |
| `images` | bing_images, ddg_images, qwant_images | 1h |
### TOML Override Format
```toml
[cache.ttl_overrides]
wikipedia = "48h" # override default 24h
reddit = "15m" # override default 30m
```
## Search Flow
### 1. Parse Request
Extract engine list from planner, compute shared `queryHash`.
### 2. Parallel Cache Lookups
For each engine, spawn a goroutine to check cache:
```go
type engineCacheResult struct {
engine string
resp contracts.SearchResponse
fromCache bool
err error
}
// For each engine, concurrently:
cached, hit := engineCache.Get(ctx, engine, queryHash)
if hit && !isStale(cached) {
return cached.Response, nil // fresh cache hit
}
if hit && isStale(cached) {
go refreshInBackground(engine, queryHash) // stale-while-revalidate
return cached.Response, nil // return stale immediately
}
// cache miss
fresh, err := engine.Search(ctx, req)
engineCache.Set(ctx, engine, queryHash, fresh)
return fresh, err
```
### 3. Classify Each Engine
- **Cache miss** → fetch fresh immediately
- **Cache hit, fresh** → use cached
- **Cache hit, stale** → use cached, fetch fresh in background (stale-while-revalidate)
### 4. Background Refresh
When a stale cache hit occurs:
1. Return stale data immediately
2. Spawn goroutine to fetch fresh data
3. On success, overwrite cache with fresh data
4. On failure, log and discard (stale data already returned)
### 5. Merge
Collect all engine responses (cached + fresh), merge via existing `MergeResponses`.
### 6. Write Fresh to Cache
For engines that were fetched fresh, write to cache with their tier TTL.
## Staleness Check
```go
func isStale(cached CachedEngineResponse, tier TTLTier) bool {
return time.Since(cached.StoredAt) > tier.Duration
}
```
## Tier Resolution
```go
type TTLTier struct {
Name string
Duration time.Duration
}
func EngineTier(engineName string) TTLTier {
if override := ttlOverrides[engineName]; override > 0 {
return TTLTier{Name: engineName, Duration: override}
}
return defaultTiers[engineName] // from hardcoded map above
}
```
## New Files
### `internal/cache/engine_cache.go`
`EngineCache` struct wrapping `*Cache` with tier-aware `Get/Set` methods:
```go
type EngineCache struct {
cache *Cache
overrides map[string]time.Duration
tiers map[string]TTLTier
}
func (ec *EngineCache) Get(ctx context.Context, engine, queryHash string) (CachedEngineResponse, bool)
func (ec *EngineCache) Set(ctx context.Context, engine, queryHash string, resp contracts.SearchResponse)
```
### `internal/cache/tiers.go`
Tier definitions and `EngineTier(engineName string)` function.
## Modified Files
### `internal/cache/cache.go`
- Rename `Key()` to `QueryHash()` and add `Engine` prefix externally
- `Get/Set` remain for favicon caching (unchanged)
### `internal/search/service.go`
- Replace `*Cache` with `*EngineCache`
- Parallel cache lookups with goroutines
- Stale-while-revalidate background refresh
- Merge collected responses
### `internal/config/config.go`
Add `TTLOverrides` field:
```go
type CacheConfig struct {
// ... existing fields ...
TTLOverrides map[string]time.Duration
}
```
## Config Example
```toml
[cache]
enabled = true
url = "valkey://localhost:6379/0"
default_ttl = "5m"
[cache.ttl_overrides]
wikipedia = "48h"
reddit = "15m"
braveapi = "2h"
```
## Error Handling
- **Cache read failure**: Treat as cache miss, fetch fresh
- **Cache write failure**: Log warning, continue without caching for that engine
- **Background refresh failure**: Log error, discard (stale data already returned)
- **Engine failure**: Continue with other engines, report in `unresponsive_engines`
## Testing
1. **Unit tests** for `QueryHash()` consistency
2. **Unit tests** for `EngineTier()` with overrides
3. **Unit tests** for `isStale()` boundary conditions
4. **Integration tests** for cache hit/miss/stale scenarios using mock Valkey
## Out of Scope
- Cache invalidation API (future work)
- Dog-pile prevention (future work)
- Per-engine cache size limits (future work)

View file

@ -21,13 +21,16 @@
version = "0.1.0"; version = "0.1.0";
src = ./.; src = ./.;
vendorHash = "sha256-NbAa4QM/TI3BTuZs4glx9k3ZjSl2/2LQfKlQ7izR8Ho="; vendorHash = "sha256-8wlKD+33s97oorCJTfHKAgE2Xp1HKXV+bSr6z29KrKM=";
# Run: nix build .#packages.x86_64-linux.default # Run: nix build .#packages.x86_64-linux.default
# It will fail with the correct hash. Replace it here. # It will fail with the correct hash. Replace vendorHash with it.
# Embed the templates and static files at build time. # Embed the templates and static files at build time.
ldflags = [ "-s" "-w" ]; ldflags = [ "-s" "-w" ];
# Remove stale vendor directory before buildGoModule deletes it.
preConfigure = "rm -rf vendor || true";
nativeCheckInputs = with pkgs; [ ]; nativeCheckInputs = with pkgs; [ ];
# Tests require network; they run in CI instead. # Tests require network; they run in CI instead.
@ -58,7 +61,7 @@
port = lib.mkOption { port = lib.mkOption {
type = lib.types.port; type = lib.types.port;
default = 8080; default = 5355;
description = "Port to listen on."; description = "Port to listen on.";
}; };

6
go.mod
View file

@ -1,4 +1,4 @@
module github.com/metamorphosis-dev/kafka module github.com/metamorphosis-dev/samsa
go 1.24 go 1.24
@ -13,7 +13,7 @@ require (
github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
go.uber.org/atomic v1.11.0 // indirect go.uber.org/atomic v1.11.0 // indirect
golang.org/x/net v0.52.0 // indirect golang.org/x/net v0.33.0 // indirect
) )
replace golang.org/x/net v0.52.0 => golang.org/x/net v0.33.0 replace golang.org/x/net => golang.org/x/net v0.38.0

48
go.sum
View file

@ -4,8 +4,6 @@ github.com/PuerkitoBio/goquery v1.9.0 h1:zgjKkdpRY9T97Q5DCtcXwfqkcylSFIVCocZmn2h
github.com/PuerkitoBio/goquery v1.9.0/go.mod h1:cW1n6TmIMDoORQU5IU/P1T3tGFunOeXEpGP2WHRwkbY= github.com/PuerkitoBio/goquery v1.9.0/go.mod h1:cW1n6TmIMDoORQU5IU/P1T3tGFunOeXEpGP2WHRwkbY=
github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM= github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM=
github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA= github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs= github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs=
github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c= github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c=
github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA= github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA=
@ -30,68 +28,36 @@ github.com/zeebo/xxh3 v1.0.2 h1:xZmwmqxHZA8AI603jOQ0tMqmBr9lPeFwGg6d+xy9DC0=
github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA= github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA=
go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE= go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE=
go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0= go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc= golang.org/x/crypto v0.36.0/go.mod h1:Y4J0ReaxCR1IMaabaSMugxJES1EpwhBHhv2bDHklZvc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4= golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs= golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.12.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE= golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= golang.org/x/term v0.30.0/go.mod h1:NYYFdzHoI5wRh/h5tDMdMqCqPJZEuNqVR5xJLd/n67g=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ= golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8= golang.org/x/text v0.23.0/go.mod h1:/BLNzu4aZCJ1+kcD0DNRotWKage4q2rGVAg4o22unh4=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc= golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58= golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk= golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=

BIN
img/screenshot1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -25,6 +25,8 @@ import (
"net/url" "net/url"
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/samsa/internal/httpclient"
) )
// Service fetches search suggestions from upstream or Wikipedia OpenSearch. // Service fetches search suggestions from upstream or Wikipedia OpenSearch.
@ -39,7 +41,7 @@ func NewService(upstreamURL string, timeout time.Duration) *Service {
} }
return &Service{ return &Service{
upstreamURL: strings.TrimRight(upstreamURL, "/"), upstreamURL: strings.TrimRight(upstreamURL, "/"),
http: &http.Client{Timeout: timeout}, http: httpclient.NewClient(timeout),
} }
} }
@ -102,7 +104,7 @@ func (s *Service) wikipediaSuggestions(ctx context.Context, query string) ([]str
} }
req.Header.Set( req.Header.Set(
"User-Agent", "User-Agent",
"gosearch-go/0.1 (compatible; +https://github.com/metamorphosis-dev/kafka)", "gosearch-go/0.1 (compatible; +https://github.com/metamorphosis-dev/samsa)",
) )
resp, err := s.http.Do(req) resp, err := s.http.Do(req)

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -25,7 +25,7 @@ import (
"log/slog" "log/slog"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/redis/go-redis/v9" "github.com/redis/go-redis/v9"
) )
@ -97,7 +97,7 @@ func (c *Cache) Get(ctx context.Context, key string) (contracts.SearchResponse,
return contracts.SearchResponse{}, false return contracts.SearchResponse{}, false
} }
fullKey := "kafka:" + key fullKey := "samsa:" + key
data, err := c.client.Get(ctx, fullKey).Bytes() data, err := c.client.Get(ctx, fullKey).Bytes()
if err != nil { if err != nil {
@ -129,7 +129,7 @@ func (c *Cache) Set(ctx context.Context, key string, resp contracts.SearchRespon
return return
} }
fullKey := "kafka:" + key fullKey := "samsa:" + key
if err := c.client.Set(ctx, fullKey, data, c.ttl).Err(); err != nil { if err := c.client.Set(ctx, fullKey, data, c.ttl).Err(); err != nil {
c.logger.Warn("cache set failed", "key", fullKey, "error", err) c.logger.Warn("cache set failed", "key", fullKey, "error", err)
} }
@ -140,10 +140,42 @@ func (c *Cache) Invalidate(ctx context.Context, key string) {
if !c.Enabled() { if !c.Enabled() {
return return
} }
fullKey := "kafka:" + key fullKey := "samsa:" + key
c.client.Del(ctx, fullKey) c.client.Del(ctx, fullKey)
} }
// GetBytes retrieves a raw byte slice from the cache. Returns (data, true) on hit,
// (nil, false) on miss or error.
func (c *Cache) GetBytes(ctx context.Context, key string) ([]byte, bool) {
if !c.Enabled() {
return nil, false
}
fullKey := "samsa:" + key
data, err := c.client.Get(ctx, fullKey).Bytes()
if err != nil {
if err != redis.Nil {
c.logger.Debug("cache bytes miss (error)", "key", fullKey, "error", err)
}
return nil, false
}
return data, true
}
// SetBytes stores a raw byte slice with a custom TTL.
// If ttl <= 0, the cache's default TTL is used.
func (c *Cache) SetBytes(ctx context.Context, key string, data []byte, ttl time.Duration) {
if !c.Enabled() {
return
}
if ttl <= 0 {
ttl = c.ttl
}
fullKey := "samsa:" + key
if err := c.client.Set(ctx, fullKey, data, ttl).Err(); err != nil {
c.logger.Warn("cache set bytes failed", "key", fullKey, "error", err)
}
}
// Close closes the Valkey connection. // Close closes the Valkey connection.
func (c *Cache) Close() error { func (c *Cache) Close() error {
if c.client == nil { if c.client == nil {
@ -176,3 +208,25 @@ func Key(req contracts.SearchRequest) string {
return hex.EncodeToString(h.Sum(nil))[:32] return hex.EncodeToString(h.Sum(nil))[:32]
} }
// QueryHash computes a deterministic hash from shared request parameters
// (query, pageno, safesearch, language, time_range) for use as a cache key suffix.
// The hash is a truncated SHA-256 (16 hex chars).
func QueryHash(query string, pageno int, safesearch int, language, timeRange string) string {
h := sha256.New()
fmt.Fprintf(h, "q=%s|", query)
fmt.Fprintf(h, "pageno=%d|", pageno)
fmt.Fprintf(h, "safesearch=%d|", safesearch)
fmt.Fprintf(h, "lang=%s|", language)
if timeRange != "" {
fmt.Fprintf(h, "tr=%s|", timeRange)
}
return hex.EncodeToString(h.Sum(nil))[:16]
}
// CachedEngineResponse wraps an engine's cached response with metadata.
type CachedEngineResponse struct {
Engine string
Response []byte
StoredAt time.Time
}

View file

@ -3,13 +3,13 @@ package cache
import ( import (
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestKey_Deterministic(t *testing.T) { func TestKey_Deterministic(t *testing.T) {
req := contracts.SearchRequest{ req := contracts.SearchRequest{
Format: contracts.FormatJSON, Format: contracts.FormatJSON,
Query: "kafka metamorphosis", Query: "samsa metamorphosis",
Pageno: 1, Pageno: 1,
Safesearch: 0, Safesearch: 0,
Language: "auto", Language: "auto",
@ -29,7 +29,7 @@ func TestKey_Deterministic(t *testing.T) {
} }
func TestKey_DifferentQueries(t *testing.T) { func TestKey_DifferentQueries(t *testing.T) {
reqA := contracts.SearchRequest{Query: "kafka", Format: contracts.FormatJSON} reqA := contracts.SearchRequest{Query: "samsa", Format: contracts.FormatJSON}
reqB := contracts.SearchRequest{Query: "orwell", Format: contracts.FormatJSON} reqB := contracts.SearchRequest{Query: "orwell", Format: contracts.FormatJSON}
if Key(reqA) == Key(reqB) { if Key(reqA) == Key(reqB) {
@ -75,3 +75,35 @@ func TestNew_NopWithoutAddress(t *testing.T) {
} }
func strPtr(s string) *string { return &s } func strPtr(s string) *string { return &s }
func TestQueryHash(t *testing.T) {
// Same params should produce same hash
hash1 := QueryHash("golang", 1, 0, "en", "")
hash2 := QueryHash("golang", 1, 0, "en", "")
if hash1 != hash2 {
t.Errorf("QueryHash: same params should produce same hash, got %s != %s", hash1, hash2)
}
// Different query should produce different hash
hash3 := QueryHash("rust", 1, 0, "en", "")
if hash1 == hash3 {
t.Errorf("QueryHash: different queries should produce different hash")
}
// Different pageno should produce different hash
hash4 := QueryHash("golang", 2, 0, "en", "")
if hash1 == hash4 {
t.Errorf("QueryHash: different pageno should produce different hash")
}
// time_range should affect hash
hash5 := QueryHash("golang", 1, 0, "en", "day")
if hash1 == hash5 {
t.Errorf("QueryHash: different time_range should produce different hash")
}
// Hash should be 16 characters (truncated SHA-256)
if len(hash1) != 16 {
t.Errorf("QueryHash: expected 16 char hash, got %d", len(hash1))
}
}

91
internal/cache/engine_cache.go vendored Normal file
View file

@ -0,0 +1,91 @@
package cache
import (
"context"
"encoding/json"
"log/slog"
"time"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// EngineCache wraps Cache with per-engine tier-aware Get/Set operations.
type EngineCache struct {
cache *Cache
overrides map[string]time.Duration
}
// NewEngineCache creates a new EngineCache with optional TTL overrides.
// If overrides is nil, default tier durations are used.
func NewEngineCache(cache *Cache, overrides map[string]time.Duration) *EngineCache {
return &EngineCache{
cache: cache,
overrides: overrides,
}
}
// Get retrieves a cached engine response. Returns (zero value, false) if not
// found or if cache is disabled.
func (ec *EngineCache) Get(ctx context.Context, engine, queryHash string) (CachedEngineResponse, bool) {
key := engineCacheKey(engine, queryHash)
data, ok := ec.cache.GetBytes(ctx, key)
if !ok {
return CachedEngineResponse{}, false
}
var cached CachedEngineResponse
if err := json.Unmarshal(data, &cached); err != nil {
ec.cache.logger.Warn("engine cache hit but unmarshal failed", "key", key, "error", err)
return CachedEngineResponse{}, false
}
ec.cache.logger.Debug("engine cache hit", "key", key, "engine", engine)
return cached, true
}
// Set stores an engine response in the cache with the engine's tier TTL.
func (ec *EngineCache) Set(ctx context.Context, engine, queryHash string, resp contracts.SearchResponse) {
if !ec.cache.Enabled() {
return
}
data, err := json.Marshal(resp)
if err != nil {
ec.cache.logger.Warn("engine cache set: marshal failed", "engine", engine, "error", err)
return
}
tier := EngineTier(engine, ec.overrides)
key := engineCacheKey(engine, queryHash)
cached := CachedEngineResponse{
Engine: engine,
Response: data,
StoredAt: time.Now(),
}
cachedData, err := json.Marshal(cached)
if err != nil {
ec.cache.logger.Warn("engine cache set: wrap marshal failed", "key", key, "error", err)
return
}
ec.cache.SetBytes(ctx, key, cachedData, tier.Duration)
}
// IsStale returns true if the cached response is older than the tier's TTL.
func (ec *EngineCache) IsStale(cached CachedEngineResponse, engine string) bool {
tier := EngineTier(engine, ec.overrides)
return time.Since(cached.StoredAt) > tier.Duration
}
// Logger returns the logger for background refresh logging.
func (ec *EngineCache) Logger() *slog.Logger {
return ec.cache.logger
}
// engineCacheKey builds the cache key for an engine+query combination.
func engineCacheKey(engine, queryHash string) string {
return "samsa:resp:" + engine + ":" + queryHash
}

95
internal/cache/engine_cache_test.go vendored Normal file
View file

@ -0,0 +1,95 @@
package cache
import (
"context"
"log/slog"
"testing"
"time"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
func TestEngineCacheGetSet(t *testing.T) {
// Create a disabled cache for unit testing (nil client)
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
ctx := context.Background()
cached, ok := ec.Get(ctx, "wikipedia", "abc123")
if ok {
t.Errorf("Get on disabled cache: expected false, got %v", ok)
}
_ = cached // unused when ok=false
}
func TestEngineCacheKeyFormat(t *testing.T) {
key := engineCacheKey("wikipedia", "abc123")
if key != "samsa:resp:wikipedia:abc123" {
t.Errorf("engineCacheKey: expected samsa:resp:wikipedia:abc123, got %s", key)
}
}
func TestEngineCacheIsStale(t *testing.T) {
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
// Fresh response (stored 1 minute ago, wikipedia has 24h TTL)
fresh := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-1 * time.Minute),
}
if ec.IsStale(fresh, "wikipedia") {
t.Errorf("IsStale: 1-minute-old wikipedia should NOT be stale")
}
// Stale response (stored 25 hours ago)
stale := CachedEngineResponse{
Engine: "wikipedia",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-25 * time.Hour),
}
if !ec.IsStale(stale, "wikipedia") {
t.Errorf("IsStale: 25-hour-old wikipedia SHOULD be stale (24h TTL)")
}
// Override: 30 minute TTL for reddit
overrides := map[string]time.Duration{"reddit": 30 * time.Minute}
ec2 := NewEngineCache(c, overrides)
// 20 minutes old with 30m override should NOT be stale
redditFresh := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-20 * time.Minute),
}
if ec2.IsStale(redditFresh, "reddit") {
t.Errorf("IsStale: 20-min reddit with 30m override should NOT be stale")
}
// 45 minutes old with 30m override SHOULD be stale
redditStale := CachedEngineResponse{
Engine: "reddit",
Response: []byte(`{}`),
StoredAt: time.Now().Add(-45 * time.Minute),
}
if !ec2.IsStale(redditStale, "reddit") {
t.Errorf("IsStale: 45-min reddit with 30m override SHOULD be stale")
}
}
func TestEngineCacheSetResponseType(t *testing.T) {
c := &Cache{logger: slog.Default()}
ec := NewEngineCache(c, nil)
ctx := context.Background()
urlStr := "https://example.com"
resp := contracts.SearchResponse{
Results: []contracts.MainResult{
{Title: "Test", URL: &urlStr},
},
}
// Should not panic on disabled cache
ec.Set(ctx, "wikipedia", "abc123", resp)
}

56
internal/cache/tiers.go vendored Normal file
View file

@ -0,0 +1,56 @@
package cache
import "time"
// TTLTier represents a cache TTL tier with a name and duration.
type TTLTier struct {
Name string
Duration time.Duration
}
// defaultTiers maps engine names to their default TTL tiers.
var defaultTiers = map[string]TTLTier{
// Static knowledge engines — rarely change
"wikipedia": {Name: "static", Duration: 24 * time.Hour},
"wikidata": {Name: "static", Duration: 24 * time.Hour},
"arxiv": {Name: "static", Duration: 24 * time.Hour},
"crossref": {Name: "static", Duration: 24 * time.Hour},
"stackoverflow": {Name: "static", Duration: 24 * time.Hour},
"github": {Name: "static", Duration: 24 * time.Hour},
// API-based general search — fresher data
"braveapi": {Name: "api_general", Duration: 1 * time.Hour},
"youtube": {Name: "api_general", Duration: 1 * time.Hour},
// Scraped general search — moderately stable
"google": {Name: "scraped_general", Duration: 2 * time.Hour},
"bing": {Name: "scraped_general", Duration: 2 * time.Hour},
"duckduckgo": {Name: "scraped_general", Duration: 2 * time.Hour},
"qwant": {Name: "scraped_general", Duration: 2 * time.Hour},
"brave": {Name: "scraped_general", Duration: 2 * time.Hour},
// News/social — changes frequently
"reddit": {Name: "news_social", Duration: 30 * time.Minute},
// Image search
"bing_images": {Name: "images", Duration: 1 * time.Hour},
"ddg_images": {Name: "images", Duration: 1 * time.Hour},
"qwant_images": {Name: "images", Duration: 1 * time.Hour},
}
// EngineTier returns the TTL tier for an engine, applying overrides if provided.
// If the engine has no defined tier, returns a default of 1 hour.
func EngineTier(engineName string, overrides map[string]time.Duration) TTLTier {
// Check override first — override tier name is just the engine name
if override, ok := overrides[engineName]; ok && override > 0 {
return TTLTier{Name: engineName, Duration: override}
}
// Fall back to default tier
if tier, ok := defaultTiers[engineName]; ok {
return tier
}
// Unknown engines get a sensible default
return TTLTier{Name: "unknown", Duration: 1 * time.Hour}
}

33
internal/cache/tiers_test.go vendored Normal file
View file

@ -0,0 +1,33 @@
package cache
import (
"testing"
"time"
)
func TestEngineTier(t *testing.T) {
// Test default static tier
tier := EngineTier("wikipedia", nil)
if tier.Name != "static" || tier.Duration != 24*time.Hour {
t.Errorf("wikipedia: expected static/24h, got %s/%v", tier.Name, tier.Duration)
}
// Test default api_general tier
tier = EngineTier("braveapi", nil)
if tier.Name != "api_general" || tier.Duration != 1*time.Hour {
t.Errorf("braveapi: expected api_general/1h, got %s/%v", tier.Name, tier.Duration)
}
// Test override takes precedence — override tier name is just the engine name
override := 48 * time.Hour
tier = EngineTier("wikipedia", map[string]time.Duration{"wikipedia": override})
if tier.Name != "wikipedia" || tier.Duration != 48*time.Hour {
t.Errorf("wikipedia override: expected wikipedia/48h, got %s/%v", tier.Name, tier.Duration)
}
// Test unknown engine gets default
tier = EngineTier("unknown_engine", nil)
if tier.Name != "unknown" || tier.Duration != 1*time.Hour {
t.Errorf("unknown engine: expected unknown/1h, got %s/%v", tier.Name, tier.Duration)
}
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -23,9 +23,10 @@ import (
"time" "time"
"github.com/BurntSushi/toml" "github.com/BurntSushi/toml"
"github.com/metamorphosis-dev/samsa/internal/util"
) )
// Config is the top-level configuration for the kafka service. // Config is the top-level configuration for the samsa service.
type Config struct { type Config struct {
Server ServerConfig `toml:"server"` Server ServerConfig `toml:"server"`
Upstream UpstreamConfig `toml:"upstream"` Upstream UpstreamConfig `toml:"upstream"`
@ -49,18 +50,24 @@ type UpstreamConfig struct {
} }
type EnginesConfig struct { type EnginesConfig struct {
LocalPorted []string `toml:"local_ported"` LocalPorted []string `toml:"local_ported"`
Brave BraveConfig `toml:"brave"` Brave BraveConfig `toml:"brave"`
Qwant QwantConfig `toml:"qwant"` Qwant QwantConfig `toml:"qwant"`
YouTube YouTubeConfig `toml:"youtube"` YouTube YouTubeConfig `toml:"youtube"`
StackOverflow *StackOverflowConfig `toml:"stackoverflow"`
}
type StackOverflowConfig struct {
APIKey string `toml:"api_key"`
} }
// CacheConfig holds Valkey/Redis cache settings. // CacheConfig holds Valkey/Redis cache settings.
type CacheConfig struct { type CacheConfig struct {
Address string `toml:"address"` // Valkey server address (e.g. "localhost:6379") Address string `toml:"address"` // Valkey server address (e.g. "localhost:6379")
Password string `toml:"password"` // Auth password (empty = none) Password string `toml:"password"` // Auth password (empty = none)
DB int `toml:"db"` // Database index (default 0) DB int `toml:"db"` // Database index (default 0)
DefaultTTL string `toml:"default_ttl"` // Cache TTL (e.g. "5m", default "5m") DefaultTTL string `toml:"default_ttl"` // Cache TTL (e.g. "5m", default "5m")
TTLOverrides map[string]string `toml:"ttl_overrides"` // engine -> duration string
} }
// CORSConfig holds CORS middleware settings. // CORSConfig holds CORS middleware settings.
@ -77,6 +84,7 @@ type RateLimitConfig struct {
Requests int `toml:"requests"` // Max requests per window (default: 30) Requests int `toml:"requests"` // Max requests per window (default: 30)
Window string `toml:"window"` // Time window (e.g. "1m", default: "1m") Window string `toml:"window"` // Time window (e.g. "1m", default: "1m")
CleanupInterval string `toml:"cleanup_interval"` // Stale entry cleanup interval (default: "5m") CleanupInterval string `toml:"cleanup_interval"` // Stale entry cleanup interval (default: "5m")
TrustedProxies []string `toml:"trusted_proxies"` // CIDRs allowed to set X-Forwarded-For
} }
// GlobalRateLimitConfig holds server-wide rate limiting settings. // GlobalRateLimitConfig holds server-wide rate limiting settings.
@ -120,18 +128,45 @@ func Load(path string) (*Config, error) {
} }
applyEnvOverrides(cfg) applyEnvOverrides(cfg)
if err := validateConfig(cfg); err != nil {
return nil, fmt.Errorf("invalid configuration: %w", err)
}
return cfg, nil return cfg, nil
} }
// validateConfig checks security-critical config values at startup.
func validateConfig(cfg *Config) error {
if cfg.Server.BaseURL != "" {
if err := util.ValidatePublicURL(cfg.Server.BaseURL); err != nil {
return fmt.Errorf("server.base_url: %w", err)
}
}
if cfg.Server.SourceURL != "" {
if err := util.ValidatePublicURL(cfg.Server.SourceURL); err != nil {
return fmt.Errorf("server.source_url: %w", err)
}
}
if cfg.Upstream.URL != "" {
// Validate scheme and well-formedness, but allow private IPs
// since self-hosted deployments commonly use localhost/internal addresses.
if _, err := util.SafeURLScheme(cfg.Upstream.URL); err != nil {
return fmt.Errorf("upstream.url: %w", err)
}
}
return nil
}
func defaultConfig() *Config { func defaultConfig() *Config {
return &Config{ return &Config{
Server: ServerConfig{ Server: ServerConfig{
Port: 8080, Port: 5355,
HTTPTimeout: "10s", HTTPTimeout: "10s",
}, },
Upstream: UpstreamConfig{}, Upstream: UpstreamConfig{},
Engines: EnginesConfig{ Engines: EnginesConfig{
LocalPorted: []string{"wikipedia", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing", "google", "youtube"}, LocalPorted: []string{"wikipedia", "wikidata", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing", "google", "youtube", "bing_images", "ddg_images", "qwant_images"},
Qwant: QwantConfig{ Qwant: QwantConfig{
Category: "web-lite", Category: "web-lite",
ResultsPerPage: 10, ResultsPerPage: 10,
@ -176,6 +211,12 @@ func applyEnvOverrides(cfg *Config) {
if v := os.Getenv("YOUTUBE_API_KEY"); v != "" { if v := os.Getenv("YOUTUBE_API_KEY"); v != "" {
cfg.Engines.YouTube.APIKey = v cfg.Engines.YouTube.APIKey = v
} }
if v := os.Getenv("STACKOVERFLOW_KEY"); v != "" {
if cfg.Engines.StackOverflow == nil {
cfg.Engines.StackOverflow = &StackOverflowConfig{}
}
cfg.Engines.StackOverflow.APIKey = v
}
if v := os.Getenv("VALKEY_ADDRESS"); v != "" { if v := os.Getenv("VALKEY_ADDRESS"); v != "" {
cfg.Cache.Address = v cfg.Cache.Address = v
} }
@ -244,6 +285,20 @@ func (c *Config) CacheTTL() time.Duration {
return 5 * time.Minute return 5 * time.Minute
} }
// CacheTTLOverrides returns parsed TTL overrides from config.
func (c *Config) CacheTTLOverrides() map[string]time.Duration {
if len(c.Cache.TTLOverrides) == 0 {
return nil
}
out := make(map[string]time.Duration, len(c.Cache.TTLOverrides))
for engine, durStr := range c.Cache.TTLOverrides {
if d, err := time.ParseDuration(durStr); err == nil && d > 0 {
out[engine] = d
}
}
return out
}
// RateLimitWindow parses the rate limit window into a time.Duration. // RateLimitWindow parses the rate limit window into a time.Duration.
func (c *Config) RateLimitWindow() time.Duration { func (c *Config) RateLimitWindow() time.Duration {
if d, err := time.ParseDuration(c.RateLimit.Window); err == nil && d > 0 { if d, err := time.ParseDuration(c.RateLimit.Window); err == nil && d > 0 {

View file

@ -11,11 +11,11 @@ func TestLoadDefaults(t *testing.T) {
if err != nil { if err != nil {
t.Fatalf("Load with missing file should return defaults: %v", err) t.Fatalf("Load with missing file should return defaults: %v", err)
} }
if cfg.Server.Port != 8080 { if cfg.Server.Port != 5355 {
t.Errorf("expected default port 8080, got %d", cfg.Server.Port) t.Errorf("expected default port 5355, got %d", cfg.Server.Port)
} }
if len(cfg.Engines.LocalPorted) != 11 { if len(cfg.Engines.LocalPorted) != 15 {
t.Errorf("expected 11 default engines, got %d", len(cfg.Engines.LocalPorted)) t.Errorf("expected 15 default engines, got %d", len(cfg.Engines.LocalPorted))
} }
} }

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -28,7 +28,7 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
const ( const (
@ -75,8 +75,8 @@ func (e *ArxivEngine) Search(ctx context.Context, req contracts.SearchRequest) (
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("arxiv upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("arxiv upstream error: status %d", resp.StatusCode)
} }
raw, err := io.ReadAll(resp.Body) raw, err := io.ReadAll(resp.Body)

View file

@ -6,7 +6,7 @@ import (
"strings" "strings"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestArxivEngine_Search(t *testing.T) { func TestArxivEngine_Search(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -28,7 +28,7 @@ import (
"strconv" "strconv"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// BingEngine searches Bing via the public Bing API. // BingEngine searches Bing via the public Bing API.
@ -68,8 +68,8 @@ func (e *BingEngine) Search(ctx context.Context, req contracts.SearchRequest) (c
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("bing upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("bing upstream error: status %d", resp.StatusCode)
} }
contentType := resp.Header.Get("Content-Type") contentType := resp.Header.Get("Content-Type")

View file

@ -0,0 +1,123 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
package engines
import (
"context"
"encoding/xml"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strings"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// BingImagesEngine searches Bing Images via their public RSS endpoint.
type BingImagesEngine struct {
client *http.Client
}
func (e *BingImagesEngine) Name() string { return "bing_images" }
func (e *BingImagesEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("bing_images engine not initialized")
}
q := strings.TrimSpace(req.Query)
if q == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
offset := (req.Pageno - 1) * 10
endpoint := fmt.Sprintf(
"https://www.bing.com/images/search?q=%s&count=10&offset=%d&format=rss",
url.QueryEscape(q),
offset,
)
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "kafka/0.1 (compatible; +https://git.ashisgreat.xyz/penal-colony/kafka)")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("bing_images upstream error: status %d", resp.StatusCode)
}
return parseBingImagesRSS(resp.Body, req.Query)
}
// parseBingImagesRSS parses Bing's RSS image search results.
// The description field contains HTML with an <img> tag whose src is the
// thumbnail and whose enclosing <a> tag links to the source page.
func parseBingImagesRSS(r io.Reader, query string) (contracts.SearchResponse, error) {
type bingImageItem struct {
Title string `xml:"title"`
Link string `xml:"link"`
Descrip string `xml:"description"`
}
type rssFeed struct {
XMLName xml.Name `xml:"rss"`
Channel struct {
Items []bingImageItem `xml:"item"`
} `xml:"channel"`
}
var rss rssFeed
if err := xml.NewDecoder(r).Decode(&rss); err != nil {
return contracts.SearchResponse{}, fmt.Errorf("bing_images RSS parse error: %w", err)
}
results := make([]contracts.MainResult, 0, len(rss.Channel.Items))
for _, item := range rss.Channel.Items {
if item.Link == "" {
continue
}
// Extract thumbnail URL from the description HTML.
thumbnail := extractImgSrc(item.Descrip)
content := stripHTML(item.Descrip)
linkPtr := item.Link
results = append(results, contracts.MainResult{
Template: "images",
Title: item.Title,
Content: content,
URL: &linkPtr,
Thumbnail: thumbnail,
Engine: "bing_images",
Score: 0,
Category: "images",
Engines: []string{"bing_images"},
})
}
return contracts.SearchResponse{
Query: query,
NumberOfResults: len(results),
Results: results,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}

View file

@ -7,7 +7,7 @@ import (
"testing" "testing"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestBingEngine_EmptyQuery(t *testing.T) { func TestBingEngine_EmptyQuery(t *testing.T) {

172
internal/engines/brave.go Normal file
View file

@ -0,0 +1,172 @@
package engines
import (
"context"
"fmt"
"io"
"net/http"
"net/url"
"regexp"
"strings"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
type BraveEngine struct {
client *http.Client
}
func (e *BraveEngine) Name() string { return "brave" }
func (e *BraveEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if strings.TrimSpace(req.Query) == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
start := (req.Pageno - 1) * 20
u := fmt.Sprintf(
"https://search.brave.com/search?q=%s&offset=%d&source=web",
url.QueryEscape(req.Query),
start,
)
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36")
httpReq.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
httpReq.Header.Set("Accept-Language", "en-US,en;q=0.9")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("brave error: status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 128*1024))
if err != nil {
return contracts.SearchResponse{}, err
}
results := parseBraveResults(string(body))
return contracts.SearchResponse{
Query: req.Query,
NumberOfResults: len(results),
Results: results,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: extractBraveSuggestions(string(body)),
UnresponsiveEngines: [][2]string{},
}, nil
}
func parseBraveResults(body string) []contracts.MainResult {
var results []contracts.MainResult
// Brave wraps each result in divs with data-type="web" or data-type="news".
// Pattern: <div ... data-type="web"> ... <a class="result-title" href="URL">TITLE</a> ... <div class="snippet">SNIPPET</div>
webPattern := regexp.MustCompile(`(?s)<div[^>]+data-type="web"[^>]*>(.*?)</div>\s*<div[^>]+data-type="(web|news)"`)
matches := webPattern.FindAllStringSubmatch(body, -1)
seen := map[string]bool{}
for _, match := range matches {
if len(match) < 2 {
continue
}
block := match[1]
// Extract title and URL from the result-title link.
titlePattern := regexp.MustCompile(`<a[^>]+class="result-title"[^>]+href="([^"]+)"[^>]*>([^<]+)</a>`)
titleMatch := titlePattern.FindStringSubmatch(block)
if titleMatch == nil {
continue
}
href := titleMatch[1]
title := stripTags(titleMatch[2])
if href == "" || !strings.HasPrefix(href, "http") {
continue
}
if seen[href] {
continue
}
seen[href] = true
// Extract snippet.
snippet := extractBraveSnippet(block)
// Extract favicon URL.
favicon := extractBraveFavicon(block)
urlPtr := href
results = append(results, contracts.MainResult{
Title: title,
URL: &urlPtr,
Content: snippet,
Thumbnail: favicon,
Engine: "brave",
Score: 1.0,
Category: "general",
Engines: []string{"brave"},
})
}
return results
}
func extractBraveSnippet(block string) string {
// Try various snippet selectors Brave uses.
patterns := []string{
`<div[^>]+class="snippet"[^>]*>(.*?)</div>`,
`<p[^>]+class="[^"]*description[^"]*"[^>]*>(.*?)</p>`,
`<span[^>]+class="[^"]*snippet[^"]*"[^>]*>(.*?)</span>`,
}
for _, pat := range patterns {
re := regexp.MustCompile(`(?s)` + pat)
m := re.FindStringSubmatch(block)
if len(m) >= 2 {
text := stripTags(m[1])
if text != "" {
return strings.TrimSpace(text)
}
}
}
return ""
}
func extractBraveFavicon(block string) string {
imgPattern := regexp.MustCompile(`<img[^>]+class="[^"]*favicon[^"]*"[^>]+src="([^"]+)"`)
m := imgPattern.FindStringSubmatch(block)
if len(m) >= 2 {
return m[1]
}
return ""
}
func extractBraveSuggestions(body string) []string {
var suggestions []string
// Brave suggestions appear in a dropdown or related searches section.
suggestPattern := regexp.MustCompile(`(?s)<li[^>]+class="[^"]*suggestion[^"]*"[^>]*>.*?<a[^>]*>([^<]+)</a>`)
matches := suggestPattern.FindAllStringSubmatch(body, -1)
seen := map[string]bool{}
for _, m := range matches {
if len(m) < 2 {
continue
}
s := strings.TrimSpace(stripTags(m[1]))
if s != "" && !seen[s] {
seen[s] = true
suggestions = append(suggestions, s)
}
}
return suggestions
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,22 +27,22 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// BraveEngine implements the Brave Web Search API. // BraveEngine implements the Brave Web Search API.
// Required: BRAVE_API_KEY env var or config. // Required: BRAVE_API_KEY env var or config.
// Optional: BRAVE_ACCESS_TOKEN to gate requests. // Optional: BRAVE_ACCESS_TOKEN to gate requests.
type BraveEngine struct { type BraveAPIEngine struct {
client *http.Client client *http.Client
apiKey string apiKey string
accessGateToken string accessGateToken string
resultsPerPage int resultsPerPage int
} }
func (e *BraveEngine) Name() string { return "braveapi" } func (e *BraveAPIEngine) Name() string { return "braveapi" }
func (e *BraveEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) { func (e *BraveAPIEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil { if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("brave engine not initialized") return contracts.SearchResponse{}, errors.New("brave engine not initialized")
} }
@ -80,10 +80,15 @@ func (e *BraveEngine) Search(ctx context.Context, req contracts.SearchRequest) (
return contracts.SearchResponse{Query: req.Query}, nil return contracts.SearchResponse{Query: req.Query}, nil
} }
// Brave API only supports offset values 0-9 (first page of results).
// Paginating beyond the first page is not supported by Brave.
offset := 0 offset := 0
if req.Pageno > 1 { if req.Pageno > 1 {
offset = (req.Pageno - 1) * e.resultsPerPage offset = (req.Pageno - 1) * e.resultsPerPage
} }
if offset > 9 {
offset = 9
}
args := url.Values{} args := url.Values{}
args.Set("q", q) args.Set("q", q)
@ -122,8 +127,8 @@ func (e *BraveEngine) Search(ctx context.Context, req contracts.SearchRequest) (
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("brave upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("brave upstream error: status %d", resp.StatusCode)
} }
var api struct { var api struct {

View file

@ -5,7 +5,7 @@ import (
"net/http" "net/http"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestBraveEngine_GatingAndHeader(t *testing.T) { func TestBraveEngine_GatingAndHeader(t *testing.T) {
@ -39,7 +39,7 @@ func TestBraveEngine_GatingAndHeader(t *testing.T) {
}) })
client := &http.Client{Transport: transport} client := &http.Client{Transport: transport}
engine := &BraveEngine{ engine := &BraveAPIEngine{
client: client, client: client,
apiKey: wantAPIKey, apiKey: wantAPIKey,
accessGateToken: wantToken, accessGateToken: wantToken,

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,7 +27,7 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
type CrossrefEngine struct { type CrossrefEngine struct {
@ -63,8 +63,8 @@ func (e *CrossrefEngine) Search(ctx context.Context, req contracts.SearchRequest
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("crossref upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("crossref upstream error: status %d", resp.StatusCode)
} }
var api struct { var api struct {

View file

@ -5,7 +5,7 @@ import (
"net/http" "net/http"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestCrossrefEngine_Search(t *testing.T) { func TestCrossrefEngine_Search(t *testing.T) {

View file

@ -0,0 +1,207 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
package engines
import (
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strconv"
"strings"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// DuckDuckGoImagesEngine searches DuckDuckGo Images via their vql API.
type DuckDuckGoImagesEngine struct {
client *http.Client
}
func (e *DuckDuckGoImagesEngine) Name() string { return "ddg_images" }
func (e *DuckDuckGoImagesEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("ddg_images engine not initialized")
}
q := strings.TrimSpace(req.Query)
if q == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
// Step 1: Get a VQD token from the initial search page.
vqd, err := e.getVQD(ctx, q)
if err != nil {
return contracts.SearchResponse{
Query: req.Query,
UnresponsiveEngines: [][2]string{{"ddg_images", "vqd_fetch_failed"}},
Results: []contracts.MainResult{},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
}, nil
}
// Step 2: Fetch image results using the VQD token.
endpoint := fmt.Sprintf(
"https://duckduckgo.com/i.js?q=%s&kl=wt-wt&l=wt-wt&p=1&s=%d&vqd=%s",
url.QueryEscape(q),
(req.Pageno-1)*50,
url.QueryEscape(vqd),
)
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "kafka/0.1 (compatible; +https://git.ashisgreat.xyz/penal-colony/kafka)")
httpReq.Header.Set("Referer", "https://duckduckgo.com/")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("ddg_images upstream error: status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024))
if err != nil {
return contracts.SearchResponse{}, err
}
return parseDDGImages(body, req.Query)
}
// getVQD fetches a VQD token from DuckDuckGo's search page.
func (e *DuckDuckGoImagesEngine) getVQD(ctx context.Context, query string) (string, error) {
endpoint := "https://duckduckgo.com/?q=" + url.QueryEscape(query)
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return "", err
}
httpReq.Header.Set("User-Agent", "kafka/0.1 (compatible; +https://git.ashisgreat.xyz/penal-colony/kafka)")
resp, err := e.client.Do(httpReq)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(io.LimitReader(resp.Body, 512*1024))
if err != nil {
return "", err
}
// Extract VQD from the HTML: vqd='...'
vqd := extractVQD(string(body))
if vqd == "" {
return "", fmt.Errorf("vqd token not found in response")
}
return vqd, nil
}
// extractVQD extracts the VQD token from DuckDuckGo's HTML response.
func extractVQD(html string) string {
// Look for: vqd='...' or vqd="..."
for _, prefix := range []string{"vqd='", `vqd="`} {
idx := strings.Index(html, prefix)
if idx == -1 {
continue
}
start := idx + len(prefix)
end := start
for end < len(html) && html[end] != '\'' && html[end] != '"' {
end++
}
if end > start {
return html[start:end]
}
}
return ""
}
// ddgImageResult represents a single image result from DDG's JSON API.
type ddgImageResult struct {
Title string `json:"title"`
URL string `json:"url"`
Thumbnail string `json:"thumbnail"`
Image string `json:"image"`
Width int `json:"width"`
Height int `json:"height"`
Source string `json:"source"`
}
func parseDDGImages(body []byte, query string) (contracts.SearchResponse, error) {
var results struct {
Results []ddgImageResult `json:"results"`
}
if err := json.Unmarshal(body, &results); err != nil {
return contracts.SearchResponse{}, fmt.Errorf("ddg_images JSON parse error: %w", err)
}
out := make([]contracts.MainResult, 0, len(results.Results))
for _, img := range results.Results {
if img.URL == "" {
continue
}
// Prefer the full image URL as thumbnail, fall back to the thumbnail field.
thumb := img.Image
if thumb == "" {
thumb = img.Thumbnail
}
// Build a simple content string showing dimensions.
content := ""
if img.Width > 0 && img.Height > 0 {
content = strconv.Itoa(img.Width) + " × " + strconv.Itoa(img.Height)
}
if img.Source != "" {
if content != "" {
content += " — " + img.Source
} else {
content = img.Source
}
}
urlPtr := img.URL
out = append(out, contracts.MainResult{
Template: "images",
Title: img.Title,
Content: content,
URL: &urlPtr,
Thumbnail: thumb,
Engine: "ddg_images",
Score: 0,
Category: "images",
Engines: []string{"ddg_images"},
})
}
return contracts.SearchResponse{
Query: query,
NumberOfResults: len(out),
Results: out,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -25,7 +25,7 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// DuckDuckGoEngine searches DuckDuckGo's Lite/HTML endpoint. // DuckDuckGoEngine searches DuckDuckGo's Lite/HTML endpoint.
@ -63,8 +63,8 @@ func (e *DuckDuckGoEngine) Search(ctx context.Context, req contracts.SearchReque
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("duckduckgo upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("duckduckgo upstream error: status %d", resp.StatusCode)
} }
results, err := parseDuckDuckGoHTML(resp.Body) results, err := parseDuckDuckGoHTML(resp.Body)

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -21,7 +21,7 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// parseDuckDuckGoHTML parses DuckDuckGo Lite's HTML response for search results. // parseDuckDuckGoHTML parses DuckDuckGo Lite's HTML response for search results.

View file

@ -7,7 +7,7 @@ import (
"testing" "testing"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestDuckDuckGoEngine_EmptyQuery(t *testing.T) { func TestDuckDuckGoEngine_EmptyQuery(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -19,7 +19,7 @@ package engines
import ( import (
"context" "context"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// Engine is a Go-native implementation of a search engine. // Engine is a Go-native implementation of a search engine.

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -21,14 +21,15 @@ import (
"os" "os"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/config" "github.com/metamorphosis-dev/samsa/internal/config"
"github.com/metamorphosis-dev/samsa/internal/httpclient"
) )
// NewDefaultPortedEngines returns the Go-native engine registry. // NewDefaultPortedEngines returns the Go-native engine registry.
// If cfg is nil, API keys fall back to environment variables. // If cfg is nil, API keys fall back to environment variables.
func NewDefaultPortedEngines(client *http.Client, cfg *config.Config) map[string]Engine { func NewDefaultPortedEngines(client *http.Client, cfg *config.Config) map[string]Engine {
if client == nil { if client == nil {
client = &http.Client{Timeout: 10 * time.Second} client = httpclient.NewClient(10 * time.Second)
} }
var braveAPIKey, braveAccessToken, youtubeAPIKey string var braveAPIKey, braveAccessToken, youtubeAPIKey string
@ -49,14 +50,16 @@ func NewDefaultPortedEngines(client *http.Client, cfg *config.Config) map[string
return map[string]Engine{ return map[string]Engine{
"wikipedia": &WikipediaEngine{client: client}, "wikipedia": &WikipediaEngine{client: client},
"wikidata": &WikidataEngine{client: client},
"arxiv": &ArxivEngine{client: client}, "arxiv": &ArxivEngine{client: client},
"crossref": &CrossrefEngine{client: client}, "crossref": &CrossrefEngine{client: client},
"braveapi": &BraveEngine{ "braveapi": &BraveAPIEngine{
client: client, client: client,
apiKey: braveAPIKey, apiKey: braveAPIKey,
accessGateToken: braveAccessToken, accessGateToken: braveAccessToken,
resultsPerPage: 20, resultsPerPage: 20,
}, },
"brave": &BraveEngine{client: client},
"qwant": &QwantEngine{ "qwant": &QwantEngine{
client: client, client: client,
category: "web-lite", category: "web-lite",
@ -72,5 +75,18 @@ func NewDefaultPortedEngines(client *http.Client, cfg *config.Config) map[string
apiKey: youtubeAPIKey, apiKey: youtubeAPIKey,
baseURL: "https://www.googleapis.com", baseURL: "https://www.googleapis.com",
}, },
"stackoverflow": &StackOverflowEngine{client: client, apiKey: stackoverflowAPIKey(cfg)},
// Image engines
"bing_images": &BingImagesEngine{client: client},
"ddg_images": &DuckDuckGoImagesEngine{client: client},
"qwant_images": &QwantImagesEngine{client: client},
} }
} }
// stackoverflowAPIKey returns the Stack Overflow API key from config or env var.
func stackoverflowAPIKey(cfg *config.Config) string {
if cfg != nil && cfg.Engines.StackOverflow != nil && cfg.Engines.StackOverflow.APIKey != "" {
return cfg.Engines.StackOverflow.APIKey
}
return os.Getenv("STACKOVERFLOW_KEY")
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,7 +27,7 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// GitHubEngine searches GitHub repositories and code via the public search API. // GitHubEngine searches GitHub repositories and code via the public search API.
@ -66,8 +66,8 @@ func (e *GitHubEngine) Search(ctx context.Context, req contracts.SearchRequest)
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("github api error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("github api error: status %d", resp.StatusCode)
} }
var data struct { var data struct {

View file

@ -6,7 +6,7 @@ import (
"testing" "testing"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestGitHubEngine_EmptyQuery(t *testing.T) { func TestGitHubEngine_EmptyQuery(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -25,23 +25,13 @@ import (
"regexp" "regexp"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// GSA User-Agent pool — these are Google Search Appliance identifiers // googleUserAgent is an honest User-Agent identifying the metasearch engine.
// that Google trusts for enterprise search appliance traffic. // Using a spoofed GSA User-Agent violates Google's Terms of Service and
var gsaUserAgents = []string{ // risks permanent IP blocking.
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/399.2.845414227 Mobile/15E148 Safari/604.1", var googleUserAgent = "Kafka/0.1 (compatible; +https://github.com/metamorphosis-dev/samsa)"
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/406.0.862495628 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/406.0.862495628 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (iPhone; CPU iPhone OS 18_0_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/406.0.862495628 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (iPhone; CPU iPhone OS 18_1_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/399.2.845414227 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (iPhone; CPU iPhone OS 18_5_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/406.0.862495628 Mobile/15E148 Safari/604.1",
}
func gsaUA() string {
return gsaUserAgents[0] // deterministic for now; could rotate
}
type GoogleEngine struct { type GoogleEngine struct {
client *http.Client client *http.Client
@ -70,7 +60,7 @@ func (e *GoogleEngine) Search(ctx context.Context, req contracts.SearchRequest)
if err != nil { if err != nil {
return contracts.SearchResponse{}, err return contracts.SearchResponse{}, err
} }
httpReq.Header.Set("User-Agent", gsaUA()) httpReq.Header.Set("User-Agent", googleUserAgent)
httpReq.Header.Set("Accept", "*/*") httpReq.Header.Set("Accept", "*/*")
httpReq.AddCookie(&http.Cookie{Name: "CONSENT", Value: "YES+"}) httpReq.AddCookie(&http.Cookie{Name: "CONSENT", Value: "YES+"})
@ -95,8 +85,8 @@ func (e *GoogleEngine) Search(ctx context.Context, req contracts.SearchRequest)
} }
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("google error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("google error: status %d", resp.StatusCode)
} }
body, err := io.ReadAll(io.LimitReader(resp.Body, 128*1024)) body, err := io.ReadAll(io.LimitReader(resp.Body, 128*1024))
@ -129,7 +119,7 @@ func detectGoogleSorry(resp *http.Response) bool {
func parseGoogleResults(body, query string) []contracts.MainResult { func parseGoogleResults(body, query string) []contracts.MainResult {
var results []contracts.MainResult var results []contracts.MainResult
mjjPattern := regexp.MustCompile(`<div[^>]*class="[^"]*MjjYud[^"]*"[^>]*>(.*?)</div>\s*(?=<div[^>]*class="[^"]*MjjYud|$)`) mjjPattern := regexp.MustCompile(`<div[^>]*class="[^"]*MjjYud[^"]*"[^>]*>(.*?)</div>`)
matches := mjjPattern.FindAllStringSubmatch(body, -1) matches := mjjPattern.FindAllStringSubmatch(body, -1)
for i, match := range matches { for i, match := range matches {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -72,3 +72,14 @@ func htmlUnescape(s string) string {
s = strings.ReplaceAll(s, "&nbsp;", " ") s = strings.ReplaceAll(s, "&nbsp;", " ")
return s return s
} }
// extractImgSrc finds the first <img src="..."> in an HTML string and returns
// the src attribute value.
func extractImgSrc(html string) string {
idx := strings.Index(html, "<img")
if idx == -1 {
return ""
}
remaining := html[idx:]
return extractAttr(remaining, "src")
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -20,10 +20,16 @@ import (
"os" "os"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
var defaultPortedEngines = []string{"wikipedia", "arxiv", "crossref", "braveapi", "qwant", "duckduckgo", "github", "reddit", "bing", "google", "youtube"} var defaultPortedEngines = []string{
"wikipedia", "wikidata", "arxiv", "crossref", "braveapi",
"brave", "qwant", "duckduckgo", "github", "reddit",
"bing", "google", "youtube", "stackoverflow",
// Image engines
"bing_images", "ddg_images", "qwant_images",
}
type Planner struct { type Planner struct {
PortedSet map[string]bool PortedSet map[string]bool
@ -100,6 +106,7 @@ func inferFromCategories(categories []string) []string {
switch strings.TrimSpace(strings.ToLower(c)) { switch strings.TrimSpace(strings.ToLower(c)) {
case "general": case "general":
set["wikipedia"] = true set["wikipedia"] = true
set["wikidata"] = true
set["braveapi"] = true set["braveapi"] = true
set["qwant"] = true set["qwant"] = true
set["duckduckgo"] = true set["duckduckgo"] = true
@ -110,10 +117,15 @@ func inferFromCategories(categories []string) []string {
set["crossref"] = true set["crossref"] = true
case "it": case "it":
set["github"] = true set["github"] = true
set["stackoverflow"] = true
case "social media": case "social media":
set["reddit"] = true set["reddit"] = true
case "videos": case "videos":
set["youtube"] = true set["youtube"] = true
case "images":
set["bing_images"] = true
set["ddg_images"] = true
set["qwant_images"] = true
} }
} }
@ -122,7 +134,11 @@ func inferFromCategories(categories []string) []string {
out = append(out, e) out = append(out, e)
} }
// stable order // stable order
order := map[string]int{"wikipedia": 0, "braveapi": 1, "qwant": 2, "duckduckgo": 3, "bing": 4, "google": 5, "arxiv": 6, "crossref": 7, "github": 8, "reddit": 9, "youtube": 10} order := map[string]int{
"wikipedia": 0, "wikidata": 1, "braveapi": 2, "brave": 3, "qwant": 4, "duckduckgo": 5, "bing": 6, "google": 7,
"arxiv": 8, "crossref": 9, "github": 10, "stackoverflow": 11, "reddit": 12, "youtube": 13,
"bing_images": 14, "ddg_images": 15, "qwant_images": 16,
}
sortByOrder(out, order) sortByOrder(out, order)
return out return out
} }

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -26,7 +26,7 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/PuerkitoBio/goquery" "github.com/PuerkitoBio/goquery"
) )
@ -124,8 +124,8 @@ func (e *QwantEngine) searchWebAPI(ctx context.Context, req contracts.SearchRequ
} }
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("qwant upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("qwant upstream error: status %d", resp.StatusCode)
} }
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024)) body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024))
@ -253,8 +253,8 @@ func (e *QwantEngine) searchWebLite(ctx context.Context, req contracts.SearchReq
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("qwant lite upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("qwant lite upstream error: status %d", resp.StatusCode)
} }
doc, err := goquery.NewDocumentFromReader(resp.Body) doc, err := goquery.NewDocumentFromReader(resp.Body)

View file

@ -0,0 +1,199 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
package engines
import (
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strings"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// QwantImagesEngine searches Qwant Images via the v3 search API.
type QwantImagesEngine struct {
client *http.Client
}
func (e *QwantImagesEngine) Name() string { return "qwant_images" }
func (e *QwantImagesEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("qwant_images engine not initialized")
}
q := strings.TrimSpace(req.Query)
if q == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
args := url.Values{}
args.Set("q", req.Query)
args.Set("count", "20")
args.Set("locale", qwantLocale(req.Language))
args.Set("safesearch", fmt.Sprintf("%d", req.Safesearch))
args.Set("offset", fmt.Sprintf("%d", (req.Pageno-1)*20))
endpoint := "https://api.qwant.com/v3/search/images?" + args.Encode()
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "kafka/0.1 (compatible; +https://git.ashisgreat.xyz/penal-colony/kafka)")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusForbidden {
return contracts.SearchResponse{
Query: req.Query,
UnresponsiveEngines: [][2]string{{"qwant_images", "captcha_or_js_block"}},
Results: []contracts.MainResult{},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
}, nil
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("qwant_images upstream error: status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024))
if err != nil {
return contracts.SearchResponse{}, err
}
return parseQwantImages(body, req.Query)
}
func parseQwantImages(body []byte, query string) (contracts.SearchResponse, error) {
var top map[string]any
if err := json.Unmarshal(body, &top); err != nil {
return contracts.SearchResponse{}, fmt.Errorf("qwant_images JSON parse error: %w", err)
}
status, _ := top["status"].(string)
if status != "success" {
return contracts.SearchResponse{
Query: query,
UnresponsiveEngines: [][2]string{{"qwant_images", "api_error"}},
Results: []contracts.MainResult{},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
}, nil
}
data, _ := top["data"].(map[string]any)
result, _ := data["result"].(map[string]any)
items, _ := result["items"].(map[string]any)
mainline := items["mainline"]
rows := toSlice(mainline)
if len(rows) == 0 {
return contracts.SearchResponse{
Query: query,
NumberOfResults: 0,
Results: []contracts.MainResult{},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}
out := make([]contracts.MainResult, 0)
for _, row := range rows {
rowMap, ok := row.(map[string]any)
if !ok {
continue
}
rowType, _ := rowMap["type"].(string)
if rowType != "images" {
continue
}
rowItems := toSlice(rowMap["items"])
for _, it := range rowItems {
itemMap, ok := it.(map[string]any)
if !ok {
continue
}
title := toString(itemMap["title"])
resURL := toString(itemMap["url"])
thumb := toString(itemMap["thumbnail"])
fullImg := toString(itemMap["media"])
source := toString(itemMap["source"])
if resURL == "" && fullImg == "" {
continue
}
// Use the source page URL for the link, full image for thumbnail display.
linkPtr := resURL
if linkPtr == "" {
linkPtr = fullImg
}
displayThumb := fullImg
if displayThumb == "" {
displayThumb = thumb
}
content := source
if width, ok := itemMap["width"]; ok {
w := toString(width)
if h, ok2 := itemMap["height"]; ok2 {
h2 := toString(h)
if w != "" && h2 != "" {
content = w + " × " + h2
if source != "" {
content += " — " + source
}
}
}
}
out = append(out, contracts.MainResult{
Template: "images",
Title: title,
Content: content,
URL: &linkPtr,
Thumbnail: displayThumb,
Engine: "qwant_images",
Score: 0,
Category: "images",
Engines: []string{"qwant_images"},
})
}
}
return contracts.SearchResponse{
Query: query,
NumberOfResults: len(out),
Results: out,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}

View file

@ -5,7 +5,7 @@ import (
"net/http" "net/http"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestQwantEngine_WebLite(t *testing.T) { func TestQwantEngine_WebLite(t *testing.T) {

View file

@ -5,7 +5,7 @@ import (
"net/http" "net/http"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestQwantEngine_Web(t *testing.T) { func TestQwantEngine_Web(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -26,7 +26,7 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// RedditEngine searches Reddit posts via the public JSON API. // RedditEngine searches Reddit posts via the public JSON API.
@ -62,8 +62,8 @@ func (e *RedditEngine) Search(ctx context.Context, req contracts.SearchRequest)
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("reddit api error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("reddit api error: status %d", resp.StatusCode)
} }
var data struct { var data struct {

View file

@ -6,7 +6,7 @@ import (
"testing" "testing"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestRedditEngine_EmptyQuery(t *testing.T) { func TestRedditEngine_EmptyQuery(t *testing.T) {

View file

@ -0,0 +1,226 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package engines
import (
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strings"
"time"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
const stackOverflowAPIBase = "https://api.stackexchange.com/2.3"
// StackOverflowEngine searches Stack Overflow via the public API.
// No API key is required, but providing one via STACKOVERFLOW_KEY env var
// or config raises the rate limit from 300 to 10,000 requests/day.
type StackOverflowEngine struct {
client *http.Client
apiKey string
}
func (e *StackOverflowEngine) Name() string { return "stackoverflow" }
func (e *StackOverflowEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("stackoverflow engine not initialized")
}
q := strings.TrimSpace(req.Query)
if q == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
page := req.Pageno
if page < 1 {
page = 1
}
args := url.Values{}
args.Set("order", "desc")
args.Set("sort", "relevance")
args.Set("site", "stackoverflow")
args.Set("page", fmt.Sprintf("%d", page))
args.Set("pagesize", "20")
args.Set("filter", "!9_bDDxJY5")
if e.apiKey != "" {
args.Set("key", e.apiKey)
}
endpoint := stackOverflowAPIBase + "/search/advanced?" + args.Encode() + "&q=" + url.QueryEscape(q)
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "kafka/0.1 (compatible; +https://git.ashisgreat.xyz/penal-colony/kafka)")
httpReq.Header.Set("Accept", "application/json")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusTooManyRequests {
return contracts.SearchResponse{
Query: req.Query,
UnresponsiveEngines: [][2]string{{"stackoverflow", "rate_limited"}},
Results: []contracts.MainResult{},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
}, nil
}
if resp.StatusCode != http.StatusOK {
io.Copy(io.Discard, io.LimitReader(resp.Body, 4*1024))
return contracts.SearchResponse{}, fmt.Errorf("stackoverflow upstream error: status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024))
if err != nil {
return contracts.SearchResponse{}, err
}
return parseStackOverflow(body, req.Query)
}
// soQuestion represents a question item from the Stack Exchange API.
type soQuestion struct {
QuestionID int `json:"question_id"`
Title string `json:"title"`
Link string `json:"link"`
Body string `json:"body"`
Score int `json:"score"`
AnswerCount int `json:"answer_count"`
ViewCount int `json:"view_count"`
Tags []string `json:"tags"`
CreationDate float64 `json:"creation_date"`
Owner *soOwner `json:"owner"`
AcceptedAnswerID *int `json:"accepted_answer_id"`
IsAnswered bool `json:"is_answered"`
}
type soOwner struct {
Reputation int `json:"reputation"`
DisplayName string `json:"display_name"`
}
type soResponse struct {
Items []soQuestion `json:"items"`
HasMore bool `json:"has_more"`
QuotaRemaining int `json:"quota_remaining"`
QuotaMax int `json:"quota_max"`
}
func parseStackOverflow(body []byte, query string) (contracts.SearchResponse, error) {
var resp soResponse
if err := json.Unmarshal(body, &resp); err != nil {
return contracts.SearchResponse{}, fmt.Errorf("stackoverflow JSON parse error: %w", err)
}
results := make([]contracts.MainResult, 0, len(resp.Items))
for _, q := range resp.Items {
if q.Link == "" {
continue
}
// Strip HTML from the body excerpt.
snippet := truncate(stripHTML(q.Body), 300)
// Build a content string with useful metadata.
content := snippet
if q.Score > 0 {
content = fmt.Sprintf("Score: %d", q.Score)
if q.AnswerCount > 0 {
content += fmt.Sprintf(" · %d answers", q.AnswerCount)
}
if q.ViewCount > 0 {
content += fmt.Sprintf(" · %s views", formatCount(q.ViewCount))
}
if snippet != "" {
content += "\n" + snippet
}
}
// Append tags as category hint.
if len(q.Tags) > 0 {
displayTags := q.Tags
if len(displayTags) > 5 {
displayTags = displayTags[:5]
}
content += "\n[" + strings.Join(displayTags, "] [") + "]"
}
linkPtr := q.Link
results = append(results, contracts.MainResult{
Template: "default",
Title: q.Title,
Content: content,
URL: &linkPtr,
Engine: "stackoverflow",
Score: float64(q.Score),
Category: "it",
Engines: []string{"stackoverflow"},
})
}
return contracts.SearchResponse{
Query: query,
NumberOfResults: len(results),
Results: results,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}
// formatCount formats large numbers compactly (1.2k, 3.4M).
func formatCount(n int) string {
if n >= 1_000_000 {
return fmt.Sprintf("%.1fM", float64(n)/1_000_000)
}
if n >= 1_000 {
return fmt.Sprintf("%.1fk", float64(n)/1_000)
}
return fmt.Sprintf("%d", n)
}
// truncate cuts a string to at most maxLen characters, appending "…" if truncated.
func truncate(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen] + "…"
}
// stackOverflowCreatedAt returns a time.Time from a Unix timestamp.
// Kept as a helper for potential future pubdate use.
func stackOverflowCreatedAt(unix float64) *string {
t := time.Unix(int64(unix), 0).UTC()
s := t.Format("2006-01-02")
return &s
}

View file

@ -0,0 +1,186 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package engines
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
func TestStackOverflow_Name(t *testing.T) {
e := &StackOverflowEngine{}
if e.Name() != "stackoverflow" {
t.Errorf("expected name 'stackoverflow', got %q", e.Name())
}
}
func TestStackOverflow_NilEngine(t *testing.T) {
var e *StackOverflowEngine
_, err := e.Search(context.Background(), contracts.SearchRequest{Query: "test"})
if err == nil {
t.Fatal("expected error for nil engine")
}
}
func TestStackOverflow_EmptyQuery(t *testing.T) {
e := &StackOverflowEngine{client: &http.Client{}}
resp, err := e.Search(context.Background(), contracts.SearchRequest{Query: ""})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(resp.Results) != 0 {
t.Errorf("expected 0 results for empty query, got %d", len(resp.Results))
}
}
func TestStackOverflow_Search(t *testing.T) {
items := []soQuestion{
{
QuestionID: 12345,
Title: "How to center a div in CSS?",
Link: "https://stackoverflow.com/questions/12345",
Body: "<p>I have a div that I want to center horizontally and vertically.</p>",
Score: 42,
AnswerCount: 7,
ViewCount: 15000,
Tags: []string{"css", "html", "layout"},
},
{
QuestionID: 67890,
Title: "Python list comprehension help",
Link: "https://stackoverflow.com/questions/67890",
Body: "<p>I'm trying to flatten a list of lists.</p>",
Score: 15,
AnswerCount: 3,
ViewCount: 2300,
Tags: []string{"python", "list", "comprehension"},
},
}
respBody := soResponse{
Items: items,
HasMore: false,
QuotaRemaining: 299,
QuotaMax: 300,
}
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/2.3/search/advanced" {
t.Errorf("unexpected path: %s", r.URL.Path)
}
q := r.URL.Query()
if q.Get("site") != "stackoverflow" {
t.Errorf("expected site=stackoverflow, got %q", q.Get("site"))
}
if q.Get("sort") != "relevance" {
t.Errorf("expected sort=relevance, got %q", q.Get("sort"))
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(respBody)
}))
defer srv.Close()
// We can't easily override the base URL, so test parsing directly.
body, _ := json.Marshal(respBody)
result, err := parseStackOverflow(body, "center div css")
if err != nil {
t.Fatalf("parseStackOverflow error: %v", err)
}
if result.NumberOfResults != 2 {
t.Errorf("expected 2 results, got %d", result.NumberOfResults)
}
if len(result.Results) < 2 {
t.Fatalf("expected at least 2 results, got %d", len(result.Results))
}
r0 := result.Results[0]
if r0.Title != "How to center a div in CSS?" {
t.Errorf("wrong title: %q", r0.Title)
}
if r0.Engine != "stackoverflow" {
t.Errorf("wrong engine: %q", r0.Engine)
}
if r0.Category != "it" {
t.Errorf("wrong category: %q", r0.Category)
}
if r0.URL == nil || *r0.URL != "https://stackoverflow.com/questions/12345" {
t.Errorf("wrong URL: %v", r0.URL)
}
if r0.Content == "" {
t.Error("expected non-empty content")
}
// Verify score is populated.
if r0.Score != 42 {
t.Errorf("expected score 42, got %f", r0.Score)
}
}
func TestStackOverflow_RateLimited(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusTooManyRequests)
}))
defer srv.Close()
// We can't override the URL, so test the parsing of rate limit response.
// The engine returns empty results with unresponsive engine info.
// This is verified via the factory integration; here we just verify the nil case.
}
func TestStackOverflow_NoAPIKey(t *testing.T) {
// Verify that the engine works without an API key set.
e := &StackOverflowEngine{client: &http.Client{}, apiKey: ""}
if e.apiKey != "" {
t.Error("expected empty API key")
}
}
func TestFormatCount(t *testing.T) {
tests := []struct {
n int
want string
}{
{999, "999"},
{1000, "1.0k"},
{1500, "1.5k"},
{999999, "1000.0k"},
{1000000, "1.0M"},
{3500000, "3.5M"},
}
for _, tt := range tests {
got := formatCount(tt.n)
if got != tt.want {
t.Errorf("formatCount(%d) = %q, want %q", tt.n, got, tt.want)
}
}
}
func TestTruncate(t *testing.T) {
if got := truncate("hello", 10); got != "hello" {
t.Errorf("truncate short string: got %q", got)
}
if got := truncate("hello world this is long", 10); got != "hello worl…" {
t.Errorf("truncate long string: got %q", got)
}
}

View file

@ -0,0 +1,133 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
package engines
import (
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/url"
"strings"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
// wikidataAPIBase is the Wikidata MediaWiki API endpoint (overridable in tests).
var wikidataAPIBase = "https://www.wikidata.org/w/api.php"
// WikidataEngine searches entity labels and descriptions via the Wikidata API.
// See: https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities
type WikidataEngine struct {
client *http.Client
}
func (e *WikidataEngine) Name() string { return "wikidata" }
func (e *WikidataEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
if e == nil || e.client == nil {
return contracts.SearchResponse{}, errors.New("wikidata engine not initialized")
}
q := strings.TrimSpace(req.Query)
if q == "" {
return contracts.SearchResponse{Query: req.Query}, nil
}
lang := strings.TrimSpace(req.Language)
if lang == "" || lang == "auto" {
lang = "en"
}
lang = strings.SplitN(lang, "-", 2)[0]
lang = strings.ReplaceAll(lang, "_", "-")
if _, ok := validWikipediaLangs[lang]; !ok {
lang = "en"
}
u, err := url.Parse(wikidataAPIBase)
if err != nil {
return contracts.SearchResponse{}, err
}
qv := u.Query()
qv.Set("action", "wbsearchentities")
qv.Set("search", q)
qv.Set("language", lang)
qv.Set("limit", "10")
qv.Set("format", "json")
u.RawQuery = qv.Encode()
httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, u.String(), nil)
if err != nil {
return contracts.SearchResponse{}, err
}
httpReq.Header.Set("User-Agent", "samsa/1.0 (Wikidata search; +https://github.com/metamorphosis-dev/samsa)")
resp, err := e.client.Do(httpReq)
if err != nil {
return contracts.SearchResponse{}, err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("wikidata upstream error: status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024))
if err != nil {
return contracts.SearchResponse{}, err
}
var api struct {
Search []struct {
ID string `json:"id"`
Label string `json:"label"`
Description string `json:"description"`
} `json:"search"`
}
if err := json.Unmarshal(body, &api); err != nil {
return contracts.SearchResponse{}, fmt.Errorf("wikidata JSON parse error: %w", err)
}
results := make([]contracts.MainResult, 0, len(api.Search))
for _, hit := range api.Search {
id := strings.TrimSpace(hit.ID)
if id == "" || !strings.HasPrefix(id, "Q") {
continue
}
pageURL := "https://www.wikidata.org/wiki/" + url.PathEscape(id)
title := strings.TrimSpace(hit.Label)
if title == "" {
title = id
}
content := strings.TrimSpace(hit.Description)
urlPtr := pageURL
results = append(results, contracts.MainResult{
Template: "default.html",
Title: title,
Content: content,
URL: &urlPtr,
Engine: "wikidata",
Category: "general",
Engines: []string{"wikidata"},
})
}
return contracts.SearchResponse{
Query: req.Query,
NumberOfResults: len(results),
Results: results,
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}, nil
}

View file

@ -0,0 +1,51 @@
package engines
import (
"context"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/metamorphosis-dev/samsa/internal/contracts"
)
func TestWikidataEngine_Search(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Query().Get("action") != "wbsearchentities" {
t.Errorf("action=%q", r.URL.Query().Get("action"))
}
if got := r.URL.Query().Get("search"); got != "test" {
t.Errorf("search=%q want test", got)
}
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{"search":[{"id":"Q937","label":"Go","description":"Programming language"}]}`))
}))
defer ts.Close()
orig := wikidataAPIBase
t.Cleanup(func() { wikidataAPIBase = orig })
wikidataAPIBase = ts.URL + "/w/api.php"
e := &WikidataEngine{client: ts.Client()}
resp, err := e.Search(context.Background(), contracts.SearchRequest{
Query: "test",
Language: "en",
})
if err != nil {
t.Fatal(err)
}
if len(resp.Results) != 1 {
t.Fatalf("expected 1 result, got %d", len(resp.Results))
}
r0 := resp.Results[0]
if r0.Engine != "wikidata" {
t.Errorf("engine=%q", r0.Engine)
}
if r0.Title != "Go" {
t.Errorf("title=%q", r0.Title)
}
if r0.URL == nil || !strings.Contains(*r0.URL, "Q937") {
t.Errorf("url=%v", r0.URL)
}
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -26,13 +26,51 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
type WikipediaEngine struct { type WikipediaEngine struct {
client *http.Client client *http.Client
} }
// validWikipediaLangs contains the set of valid Wikipedia language codes.
// This prevents SSRF attacks where an attacker could use a malicious language
// value to redirect requests to an attacker-controlled domain.
var validWikipediaLangs = map[string]struct{}{
"aa": {}, "ab": {}, "ae": {}, "af": {}, "ak": {}, "am": {}, "an": {},
"ar": {}, "arc": {}, "as": {}, "ast": {}, "at": {}, "av": {}, "ay": {},
"az": {}, "ba": {}, "be": {}, "bg": {}, "bh": {}, "bi": {}, "bm": {},
"bn": {}, "bo": {}, "br": {}, "bs": {}, "ca": {}, "ce": {}, "ch": {},
"co": {}, "cr": {}, "cs": {}, "cu": {}, "cv": {}, "cy": {}, "da": {},
"de": {}, "di": {}, "dv": {}, "dz": {}, "ee": {}, "el": {}, "en": {},
"eo": {}, "es": {}, "et": {}, "eu": {}, "fa": {}, "ff": {}, "fi": {},
"fj": {}, "fo": {}, "fr": {}, "fy": {}, "ga": {}, "gd": {}, "gl": {},
"gn": {}, "gu": {}, "gv": {}, "ha": {}, "he": {}, "hi": {}, "ho": {},
"hr": {}, "ht": {}, "hu": {}, "hy": {}, "hz": {}, "ia": {}, "id": {},
"ie": {}, "ig": {}, "ii": {}, "ik": {}, "io": {}, "is": {}, "it": {},
"iu": {}, "ja": {}, "jv": {}, "ka": {}, "kg": {}, "ki": {}, "kj": {},
"kk": {}, "kl": {}, "km": {}, "kn": {}, "ko": {}, "kr": {}, "ks": {},
"ku": {}, "kv": {}, "kw": {}, "ky": {}, "la": {}, "lb": {}, "lg": {},
"li": {}, "lij": {}, "ln": {}, "lo": {}, "lt": {}, "lv": {}, "mg": {},
"mh": {}, "mi": {}, "mk": {}, "ml": {}, "mn": {}, "mo": {}, "mr": {},
"ms": {}, "mt": {}, "mus": {}, "my": {}, "na": {}, "nah": {}, "nap": {},
"nd": {}, "nds": {}, "ne": {}, "new": {}, "ng": {}, "nl": {}, "nn": {},
"no": {}, "nov": {}, "nrm": {}, "nv": {}, "ny": {}, "oc": {}, "oj": {},
"om": {}, "or": {}, "os": {}, "pa": {}, "pag": {}, "pam": {}, "pap": {},
"pdc": {}, "pl": {}, "pms": {}, "pn": {}, "ps": {}, "pt": {}, "qu": {},
"rm": {}, "rmy": {}, "rn": {}, "ro": {}, "roa-rup": {}, "ru": {},
"rw": {}, "sa": {}, "sah": {}, "sc": {}, "scn": {}, "sco": {}, "sd": {},
"se": {}, "sg": {}, "sh": {}, "si": {}, "simple": {}, "sk": {}, "sl": {},
"sm": {}, "sn": {}, "so": {}, "sq": {}, "sr": {}, "ss": {}, "st": {},
"su": {}, "sv": {}, "sw": {}, "szl": {}, "ta": {}, "te": {}, "tg": {},
"th": {}, "ti": {}, "tk": {}, "tl": {}, "tn": {}, "to": {}, "tpi": {},
"tr": {}, "ts": {}, "tt": {}, "tum": {}, "tw": {}, "ty": {}, "udm": {},
"ug": {}, "uk": {}, "ur": {}, "uz": {}, "ve": {}, "vec": {}, "vi": {},
"vls": {}, "vo": {}, "wa": {}, "wo": {}, "xal": {}, "xh": {}, "yi": {},
"yo": {}, "za": {}, "zea": {}, "zh": {}, "zh-classical": {},
"zh-min-nan": {}, "zh-yue": {}, "zu": {},
}
func (e *WikipediaEngine) Name() string { return "wikipedia" } func (e *WikipediaEngine) Name() string { return "wikipedia" }
func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) { func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchRequest) (contracts.SearchResponse, error) {
@ -50,6 +88,11 @@ func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchReques
// Wikipedia subdomains are based on the language code; keep it simple for MVP. // Wikipedia subdomains are based on the language code; keep it simple for MVP.
lang = strings.SplitN(lang, "-", 2)[0] lang = strings.SplitN(lang, "-", 2)[0]
lang = strings.ReplaceAll(lang, "_", "-") lang = strings.ReplaceAll(lang, "_", "-")
// Validate lang against whitelist to prevent SSRF attacks where an attacker
// could use a malicious language value to redirect requests to their server.
if _, ok := validWikipediaLangs[lang]; !ok {
lang = "en"
}
wikiNetloc := fmt.Sprintf("%s.wikipedia.org", lang) wikiNetloc := fmt.Sprintf("%s.wikipedia.org", lang)
endpoint := fmt.Sprintf( endpoint := fmt.Sprintf(
@ -65,7 +108,7 @@ func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchReques
// Wikimedia APIs require a descriptive User-Agent. // Wikimedia APIs require a descriptive User-Agent.
httpReq.Header.Set( httpReq.Header.Set(
"User-Agent", "User-Agent",
"gosearch-go/0.1 (compatible; +https://github.com/metamorphosis-dev/kafka)", "gosearch-go/0.1 (compatible; +https://github.com/metamorphosis-dev/samsa)",
) )
// Best-effort: hint content language. // Best-effort: hint content language.
if req.Language != "" && req.Language != "auto" { if req.Language != "" && req.Language != "auto" {
@ -80,27 +123,31 @@ func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchReques
if resp.StatusCode == http.StatusNotFound { if resp.StatusCode == http.StatusNotFound {
return contracts.SearchResponse{ return contracts.SearchResponse{
Query: req.Query, Query: req.Query,
NumberOfResults: 0, NumberOfResults: 0,
Results: []contracts.MainResult{}, Results: []contracts.MainResult{},
Answers: []map[string]any{}, Answers: []map[string]any{},
Corrections: []string{}, Corrections: []string{},
Infoboxes: []map[string]any{}, Infoboxes: []map[string]any{},
Suggestions: []string{}, Suggestions: []string{},
UnresponsiveEngines: [][2]string{}, UnresponsiveEngines: [][2]string{},
}, nil }, nil
} }
if resp.StatusCode < 200 || resp.StatusCode >= 300 { if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 16*1024)) io.Copy(io.Discard, io.LimitReader(resp.Body, 16*1024))
return contracts.SearchResponse{}, fmt.Errorf("wikipedia upstream error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("wikipedia upstream error: status %d", resp.StatusCode)
} }
var api struct { var api struct {
Title string `json:"title"` Title string `json:"title"`
Description string `json:"description"` Description string `json:"description"`
Extract string `json:"extract"`
Titles struct { Titles struct {
Display string `json:"display"` Display string `json:"display"`
} `json:"titles"` } `json:"titles"`
Thumbnail struct {
Source string `json:"source"`
} `json:"thumbnail"`
ContentURLs struct { ContentURLs struct {
Desktop struct { Desktop struct {
Page string `json:"page"` Page string `json:"page"`
@ -117,7 +164,7 @@ func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchReques
// API returned a non-standard payload; treat as no result. // API returned a non-standard payload; treat as no result.
return contracts.SearchResponse{ return contracts.SearchResponse{
Query: req.Query, Query: req.Query,
NumberOfResults: 0, NumberOfResults: 0,
Results: []contracts.MainResult{}, Results: []contracts.MainResult{},
Answers: []map[string]any{}, Answers: []map[string]any{},
Corrections: []string{}, Corrections: []string{},
@ -132,36 +179,61 @@ func (e *WikipediaEngine) Search(ctx context.Context, req contracts.SearchReques
title = api.Title title = api.Title
} }
content := api.Description content := strings.TrimSpace(api.Extract)
if content == "" {
content = strings.TrimSpace(api.Description)
}
urlPtr := pageURL urlPtr := pageURL
pub := (*string)(nil) pub := (*string)(nil)
// Knowledge infobox for HTML (Wikipedia REST summary: title, extract, thumbnail, link).
var infoboxes []map[string]any
ibTitle := api.Titles.Display
if ibTitle == "" {
ibTitle = api.Title
}
body := strings.TrimSpace(api.Extract)
if body == "" {
body = strings.TrimSpace(api.Description)
}
imgSrc := strings.TrimSpace(api.Thumbnail.Source)
if ibTitle != "" || body != "" || imgSrc != "" {
row := map[string]any{
"title": ibTitle,
"infobox": body,
"url": pageURL,
}
if imgSrc != "" {
row["img_src"] = imgSrc
}
infoboxes = append(infoboxes, row)
}
results := []contracts.MainResult{ results := []contracts.MainResult{
{ {
Template: "default.html", Template: "default.html",
Title: title, Title: title,
Content: content, Content: content,
URL: &urlPtr, URL: &urlPtr,
Pubdate: pub, Pubdate: pub,
Engine: "wikipedia", Engine: "wikipedia",
Score: 0, Score: 0,
Category: "general", Category: "general",
Priority: "", Priority: "",
Positions: nil, Positions: nil,
Engines: []string{"wikipedia"}, Engines: []string{"wikipedia"},
}, },
} }
return contracts.SearchResponse{ return contracts.SearchResponse{
Query: req.Query, Query: req.Query,
NumberOfResults: len(results), NumberOfResults: len(results),
Results: results, Results: results,
Answers: []map[string]any{}, Answers: []map[string]any{},
Corrections: []string{}, Corrections: []string{},
Infoboxes: []map[string]any{}, Infoboxes: infoboxes,
Suggestions: []string{}, Suggestions: []string{},
UnresponsiveEngines: [][2]string{}, UnresponsiveEngines: [][2]string{},
}, nil }, nil
} }

View file

@ -5,7 +5,7 @@ import (
"net/http" "net/http"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestWikipediaEngine_Search(t *testing.T) { func TestWikipediaEngine_Search(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,7 +27,7 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
type YouTubeEngine struct { type YouTubeEngine struct {
@ -77,8 +77,8 @@ func (e *YouTubeEngine) Search(ctx context.Context, req contracts.SearchRequest)
defer resp.Body.Close() defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) io.Copy(io.Discard, io.LimitReader(resp.Body, 4096))
return contracts.SearchResponse{}, fmt.Errorf("youtube api error: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("youtube api error: status %d", resp.StatusCode)
} }
var apiResp youtubeSearchResponse var apiResp youtubeSearchResponse
@ -87,7 +87,7 @@ func (e *YouTubeEngine) Search(ctx context.Context, req contracts.SearchRequest)
} }
if apiResp.Error != nil { if apiResp.Error != nil {
return contracts.SearchResponse{}, fmt.Errorf("youtube api error: %s", apiResp.Error.Message) return contracts.SearchResponse{}, fmt.Errorf("youtube api error: code %d", apiResp.Error.Code)
} }
results := make([]contracts.MainResult, 0, len(apiResp.Items)) results := make([]contracts.MainResult, 0, len(apiResp.Items))

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -18,26 +18,34 @@ package httpapi
import ( import (
"context" "context"
"crypto/sha256"
"encoding/hex"
"encoding/json" "encoding/json"
"io"
"net/http" "net/http"
"strings" "strings"
"time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/cache"
"github.com/metamorphosis-dev/kafka/internal/search" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/kafka/internal/views" "github.com/metamorphosis-dev/samsa/internal/httpclient"
"github.com/metamorphosis-dev/samsa/internal/search"
"github.com/metamorphosis-dev/samsa/internal/views"
) )
type Handler struct { type Handler struct {
searchSvc *search.Service searchSvc *search.Service
autocompleteSvc func(ctx context.Context, query string) ([]string, error) autocompleteSvc func(ctx context.Context, query string) ([]string, error)
sourceURL string sourceURL string
faviconCache *cache.Cache
} }
func NewHandler(searchSvc *search.Service, autocompleteSuggestions func(ctx context.Context, query string) ([]string, error), sourceURL string) *Handler { func NewHandler(searchSvc *search.Service, autocompleteSuggestions func(ctx context.Context, query string) ([]string, error), sourceURL string, faviconCache *cache.Cache) *Handler {
return &Handler{ return &Handler{
searchSvc: searchSvc, searchSvc: searchSvc,
autocompleteSvc: autocompleteSuggestions, autocompleteSvc: autocompleteSuggestions,
sourceURL: sourceURL, sourceURL: sourceURL,
faviconCache: faviconCache,
} }
} }
@ -47,13 +55,35 @@ func (h *Handler) Healthz(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("OK")) _, _ = w.Write([]byte("OK"))
} }
// getTheme returns the user's theme preference from cookie, defaulting to "light".
func (h *Handler) getTheme(r *http.Request) string {
if cookie, err := r.Cookie("theme"); err == nil {
if cookie.Value == "dark" || cookie.Value == "light" {
return cookie.Value
}
}
return "light"
}
// getFaviconService returns the favicon provider from cookie (default "none").
func (h *Handler) getFaviconService(r *http.Request) string {
if cookie, err := r.Cookie("favicon"); err == nil {
switch cookie.Value {
case "none", "google", "duckduckgo", "self":
return cookie.Value
}
}
return "none"
}
// Index renders the homepage with the search box. // Index renders the homepage with the search box.
func (h *Handler) Index(w http.ResponseWriter, r *http.Request) { func (h *Handler) Index(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" { if r.URL.Path != "/" {
http.NotFound(w, r) http.NotFound(w, r)
return return
} }
if err := views.RenderIndex(w, h.sourceURL); err != nil { theme := h.getTheme(r)
if err := views.RenderIndex(w, h.sourceURL, theme); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError) http.Error(w, err.Error(), http.StatusInternalServerError)
} }
} }
@ -72,17 +102,19 @@ func (h *Handler) OpenSearch(baseURL string) http.HandlerFunc {
} }
func (h *Handler) Search(w http.ResponseWriter, r *http.Request) { func (h *Handler) Search(w http.ResponseWriter, r *http.Request) {
q := r.FormValue("q")
format := r.FormValue("format")
// For HTML format with no query, redirect to homepage. // For HTML format with no query, redirect to homepage.
if r.FormValue("q") == "" && (r.FormValue("format") == "" || r.FormValue("format") == "html") { if q == "" && (format == "" || format == "html") {
http.Redirect(w, r, "/", http.StatusFound) http.Redirect(w, r, "/", http.StatusFound)
return return
} }
req, err := search.ParseSearchRequest(r) req, err := search.ParseSearchRequest(r)
if err != nil { if err != nil {
// For HTML, render error on the results page. if format == "html" || format == "" {
if req.Format == contracts.FormatHTML || r.FormValue("format") == "html" { pd := views.PageData{SourceURL: h.sourceURL, Query: q, Theme: h.getTheme(r), FaviconService: h.getFaviconService(r)}
pd := views.PageData{SourceURL: h.sourceURL, Query: r.FormValue("q")}
if views.IsHTMXRequest(r) { if views.IsHTMXRequest(r) {
views.RenderSearchFragment(w, pd) views.RenderSearchFragment(w, pd)
} else { } else {
@ -97,7 +129,7 @@ func (h *Handler) Search(w http.ResponseWriter, r *http.Request) {
resp, err := h.searchSvc.Search(r.Context(), req) resp, err := h.searchSvc.Search(r.Context(), req)
if err != nil { if err != nil {
if req.Format == contracts.FormatHTML { if req.Format == contracts.FormatHTML {
pd := views.PageData{SourceURL: h.sourceURL, Query: req.Query} pd := views.PageData{SourceURL: h.sourceURL, Query: req.Query, Theme: h.getTheme(r), FaviconService: h.getFaviconService(r)}
if views.IsHTMXRequest(r) { if views.IsHTMXRequest(r) {
views.RenderSearchFragment(w, pd) views.RenderSearchFragment(w, pd)
} else { } else {
@ -110,7 +142,9 @@ func (h *Handler) Search(w http.ResponseWriter, r *http.Request) {
} }
if req.Format == contracts.FormatHTML { if req.Format == contracts.FormatHTML {
pd := views.FromResponse(resp, req.Query, req.Pageno) pd := views.FromResponse(resp, req.Query, req.Pageno,
r.FormValue("category"), r.FormValue("time"), r.FormValue("type"), h.getFaviconService(r))
pd.Theme = h.getTheme(r)
if err := views.RenderSearchAuto(w, r, pd); err != nil { if err := views.RenderSearchAuto(w, r, pd); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError) http.Error(w, err.Error(), http.StatusInternalServerError)
} }
@ -139,3 +173,126 @@ func (h *Handler) Autocompleter(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json; charset=utf-8") w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(suggestions) _ = json.NewEncoder(w).Encode(suggestions)
} }
// Preferences handles GET and POST for the preferences page.
func (h *Handler) Preferences(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/preferences" {
http.NotFound(w, r)
return
}
if r.Method == "POST" {
// Handle theme preference via server-side cookie
theme := r.FormValue("theme")
if theme == "dark" || theme == "light" {
http.SetCookie(w, &http.Cookie{
Name: "theme",
Value: theme,
Path: "/",
MaxAge: 86400 * 365,
HttpOnly: false, // Allow CSS to read via :has()
SameSite: http.SameSiteLaxMode,
})
}
// Persist favicon provider preference.
favicon := strings.TrimSpace(r.FormValue("favicon"))
switch favicon {
case "none", "google", "duckduckgo", "self":
http.SetCookie(w, &http.Cookie{
Name: "favicon",
Value: favicon,
Path: "/",
MaxAge: 86400 * 365,
HttpOnly: false,
SameSite: http.SameSiteLaxMode,
})
}
http.Redirect(w, r, "/preferences", http.StatusFound)
return
}
// Read theme cookie for template
theme := "light"
if cookie, err := r.Cookie("theme"); err == nil {
if cookie.Value == "dark" || cookie.Value == "light" {
theme = cookie.Value
}
}
if err := views.RenderPreferences(w, h.sourceURL, theme, h.getFaviconService(r)); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
const faviconCacheTTL = 24 * time.Hour
// Favicon serves a fetched favicon for the given domain, with ETag support
// and a 24-hour Redis cache. This lets Kafka act as a privacy-preserving
// favicon proxy: the user's browser talks to Kafka, not Google or DuckDuckGo.
func (h *Handler) Favicon(w http.ResponseWriter, r *http.Request) {
domain := strings.TrimPrefix(r.URL.Path, "/favicon/")
domain = strings.TrimSuffix(domain, "/")
domain = strings.TrimSpace(domain)
if domain == "" || strings.Contains(domain, "/") {
http.Error(w, "invalid domain", http.StatusBadRequest)
return
}
cacheKey := "favicon:" + domain
// Check Redis cache.
if cached, ok := h.faviconCache.GetBytes(r.Context(), cacheKey); ok {
h.serveFavicon(w, r, cached)
return
}
// Fetch from the domain's favicon.ico.
fetchURL := "https://" + domain + "/favicon.ico"
req, err := http.NewRequestWithContext(r.Context(), http.MethodGet, fetchURL, nil)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
req.Header.Set("User-Agent", "Kafka/0.1 (+https://git.ashisgreat.xyz/penal-colony/samsa)")
req.Header.Set("Accept", "image/x-icon,image/png,image/webp,*/*")
client := httpclient.NewClient(5 * time.Second)
resp, err := client.Do(req)
if err != nil {
http.Error(w, "favicon fetch failed", http.StatusBadGateway)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
http.Error(w, "favicon not found", http.StatusNotFound)
return
}
body, err := io.ReadAll(http.MaxBytesReader(w, resp.Body, 64*1024))
if err != nil {
http.Error(w, "favicon too large", http.StatusBadGateway)
return
}
// Store in Redis with 24h TTL.
h.faviconCache.SetBytes(r.Context(), cacheKey, body, faviconCacheTTL)
h.serveFavicon(w, r, body)
}
// serveFavicon writes a cached or freshly-fetched body with appropriate
// caching headers. ETag is derived from the body hash (no storage needed).
func (h *Handler) serveFavicon(w http.ResponseWriter, r *http.Request, body []byte) {
h2 := sha256.Sum256(body)
etag := `"` + hex.EncodeToString(h2[:8]) + `"`
if etagMatch := r.Header.Get("If-None-Match"); etagMatch != "" && etagMatch == etag {
w.WriteHeader(http.StatusNotModified)
return
}
w.Header().Set("Content-Type", "image/x-icon")
w.Header().Set("ETag", etag)
w.Header().Set("Cache-Control", "private, max-age=86400")
w.WriteHeader(http.StatusOK)
w.Write(body)
}

View file

@ -0,0 +1,278 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package httpapi_test
import (
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/samsa/internal/httpapi"
"github.com/metamorphosis-dev/samsa/internal/search"
)
// mockUpstreamHandler returns controlled JSON responses.
func mockUpstreamJSON(query string) contracts.SearchResponse {
return contracts.SearchResponse{
Query: query,
NumberOfResults: 2,
Results: []contracts.MainResult{
{Title: "Upstream Result 1", URL: ptr("https://upstream.example/1"), Content: "From upstream", Engine: "upstream"},
{Title: "Upstream Result 2", URL: ptr("https://upstream.example/2"), Content: "From upstream", Engine: "upstream"},
},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{"upstream suggestion"},
UnresponsiveEngines: [][2]string{},
}
}
func ptr(s string) *string { return &s }
func newTestServer(t *testing.T) (*httptest.Server, *httpapi.Handler) {
t.Helper()
// Mock upstream server that returns controlled JSON.
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
query := r.FormValue("q")
resp := mockUpstreamJSON(query)
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(resp)
}))
t.Cleanup(upstream.Close)
svc := search.NewService(search.ServiceConfig{
UpstreamURL: upstream.URL,
HTTPTimeout: 0,
Cache: nil,
EnginesConfig: nil,
})
h := httpapi.NewHandler(svc, nil, "https://src.example.com", nil)
mux := http.NewServeMux()
mux.HandleFunc("/healthz", h.Healthz)
mux.HandleFunc("/", h.Index)
mux.HandleFunc("/search", h.Search)
mux.HandleFunc("/autocompleter", h.Autocompleter)
mux.HandleFunc("/preferences", h.Preferences)
server := httptest.NewServer(mux)
t.Cleanup(server.Close)
return server, h
}
func TestHealthz(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/healthz")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
if ct := resp.Header.Get("Content-Type"); !strings.Contains(ct, "text/plain") {
t.Errorf("expected text/plain, got %s", ct)
}
}
func TestIndex(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
if ct := resp.Header.Get("Content-Type"); !strings.Contains(ct, "text/html") {
t.Errorf("expected text/html, got %s", ct)
}
body, _ := io.ReadAll(resp.Body)
if !strings.Contains(string(body), "<!DOCTYPE html") {
t.Error("expected HTML DOCTYPE")
}
}
func TestSearch_RedirectOnEmptyQuery(t *testing.T) {
server, _ := newTestServer(t)
client := &http.Client{CheckRedirect: func(*http.Request, []*http.Request) error {
return http.ErrUseLastResponse
}}
req, _ := http.NewRequest("GET", server.URL+"/search?format=html", nil)
resp, err := client.Do(req)
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusFound {
t.Errorf("expected redirect 302, got %d", resp.StatusCode)
}
loc, _ := resp.Location()
if loc == nil || loc.Path != "/" {
t.Errorf("expected redirect to /, got %v", loc)
}
}
func TestSearch_JSONResponse(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/search?q=test&format=json")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
if ct := resp.Header.Get("Content-Type"); !strings.Contains(ct, "application/json") {
t.Errorf("expected application/json, got %s", ct)
}
var result contracts.SearchResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
t.Fatalf("failed to decode JSON: %v", err)
}
if result.Query != "test" {
t.Errorf("expected query 'test', got %q", result.Query)
}
if len(result.Results) == 0 {
t.Error("expected at least one result")
}
}
func TestSearch_HTMLResponse(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/search?q=test&format=html")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
if ct := resp.Header.Get("Content-Type"); !strings.Contains(ct, "text/html") {
t.Errorf("expected text/html, got %s", ct)
}
body, _ := io.ReadAll(resp.Body)
if !strings.Contains(string(body), "<!DOCTYPE html") {
t.Error("expected HTML DOCTYPE in response")
}
}
func TestAutocompleter_EmptyQuery(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/autocompleter?q=")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusBadRequest {
t.Errorf("expected status 400, got %d", resp.StatusCode)
}
}
func TestAutocompleter_NoQuery(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/autocompleter")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusBadRequest {
t.Errorf("expected status 400, got %d", resp.StatusCode)
}
}
func TestSearch_SourceURLInFooter(t *testing.T) {
server, _ := newTestServer(t)
resp, err := http.Get(server.URL + "/?q=test")
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
if !strings.Contains(string(body), "https://src.example.com") {
t.Error("expected source URL in footer")
}
if !strings.Contains(string(body), "AGPLv3") {
t.Error("expected AGPLv3 link in footer")
}
}
func TestPreferences_PostSetsFaviconCookie(t *testing.T) {
server, _ := newTestServer(t)
client := &http.Client{CheckRedirect: func(*http.Request, []*http.Request) error {
return http.ErrUseLastResponse
}}
req, _ := http.NewRequest(http.MethodPost, server.URL+"/preferences", strings.NewReader("favicon=google&theme=dark"))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
resp, err := client.Do(req)
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusFound {
t.Fatalf("expected redirect 302, got %d", resp.StatusCode)
}
found := false
for _, c := range resp.Cookies() {
if c.Name == "favicon" {
found = true
if c.Value != "google" {
t.Fatalf("expected favicon cookie google, got %q", c.Value)
}
}
}
if !found {
t.Fatal("expected favicon cookie to be set")
}
}
func TestPreferences_GetReflectsFaviconCookie(t *testing.T) {
server, _ := newTestServer(t)
req, _ := http.NewRequest(http.MethodGet, server.URL+"/preferences", nil)
req.AddCookie(&http.Cookie{Name: "favicon", Value: "duckduckgo"})
resp, err := http.DefaultClient.Do(req)
if err != nil {
t.Fatalf("request failed: %v", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
html := string(body)
if !strings.Contains(html, `option value="duckduckgo" selected`) {
t.Fatalf("expected duckduckgo option selected, body: %s", html)
}
}

View file

@ -0,0 +1,61 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package httpclient
import (
"net/http"
"sync"
"time"
)
var (
defaultTransport http.RoundTripper
once sync.Once
)
// Default returns a shared, pre-configured http.RoundTripper suitable for
// outgoing engine requests. It is safe for concurrent use across goroutines.
// All fields are tuned for a meta-search engine that makes many concurrent
// requests to a fixed set of upstream hosts:
//
// - MaxIdleConnsPerHost = 20 (vs default of 2; keeps more warm connections
// to each host, avoiding repeated TCP+TLS handshakes)
// - MaxIdleConns = 100 (total idle connection ceiling)
// - IdleConnTimeout = 90s (prunes connections before they go stale)
// - DialContext timeout = 5s (fails fast on DNS/connect rather than
// holding a goroutine indefinitely)
func Default() http.RoundTripper {
once.Do(func() {
defaultTransport = &http.Transport{
MaxIdleConnsPerHost: 20,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
DialContext: dialWithTimeout(5 * time.Second),
}
})
return defaultTransport
}
// NewClient returns an http.Client that uses DefaultTransport and the given
// request timeout. The returned client reuses the shared connection pool,
// so all clients created via this function share the same warm connections.
func NewClient(timeout time.Duration) *http.Client {
return &http.Client{
Transport: Default(),
Timeout: timeout,
}
}

View file

@ -0,0 +1,30 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package httpclient
import (
"context"
"net"
"time"
)
// dialWithTimeout returns a DialContext function for http.Transport that
// respects the given connection timeout.
func dialWithTimeout(timeout time.Duration) func(context.Context, string, string) (net.Conn, error) {
d := &net.Dialer{Timeout: timeout}
return d.DialContext
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -42,7 +42,8 @@ type CORSConfig struct {
func CORS(cfg CORSConfig) func(http.Handler) http.Handler { func CORS(cfg CORSConfig) func(http.Handler) http.Handler {
origins := cfg.AllowedOrigins origins := cfg.AllowedOrigins
if len(origins) == 0 { if len(origins) == 0 {
origins = []string{"*"} // Default: no CORS headers. Explicitly configure origins to enable.
origins = nil
} }
methods := cfg.AllowedMethods methods := cfg.AllowedMethods
@ -70,6 +71,7 @@ func CORS(cfg CORSConfig) func(http.Handler) http.Handler {
origin := r.Header.Get("Origin") origin := r.Header.Get("Origin")
// Determine the allowed origin for this request. // Determine the allowed origin for this request.
// If no origins are configured, CORS is disabled entirely — no headers are set.
allowedOrigin := "" allowedOrigin := ""
for _, o := range origins { for _, o := range origins {
if o == "*" { if o == "*" {

View file

@ -51,7 +51,7 @@ func TestCORS_SpecificOrigin(t *testing.T) {
} }
func TestCORS_Preflight(t *testing.T) { func TestCORS_Preflight(t *testing.T) {
h := CORS(CORSConfig{})(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { h := CORS(CORSConfig{AllowedOrigins: []string{"https://example.com"}})(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
t.Error("handler should not be called for preflight") t.Error("handler should not be called for preflight")
})) }))
@ -100,6 +100,7 @@ func TestCORS_CustomMethodsAndHeaders(t *testing.T) {
})(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {})) })(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {}))
req := httptest.NewRequest("OPTIONS", "/search", nil) req := httptest.NewRequest("OPTIONS", "/search", nil)
req.Header.Set("Origin", "https://example.com")
rec := httptest.NewRecorder() rec := httptest.NewRecorder()
h.ServeHTTP(rec, req) h.ServeHTTP(rec, req)

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,10 +27,14 @@ import (
"log/slog" "log/slog"
) )
// RateLimitConfig controls per-IP rate limiting.
type RateLimitConfig struct { type RateLimitConfig struct {
Requests int Requests int
Window time.Duration Window time.Duration
CleanupInterval time.Duration CleanupInterval time.Duration
// TrustedProxies is a list of CIDR ranges that are allowed to set
// X-Forwarded-For / X-Real-IP. If empty, only r.RemoteAddr is used.
TrustedProxies []string
} }
func RateLimit(cfg RateLimitConfig, logger *slog.Logger) func(http.Handler) http.Handler { func RateLimit(cfg RateLimitConfig, logger *slog.Logger) func(http.Handler) http.Handler {
@ -53,18 +57,30 @@ func RateLimit(cfg RateLimitConfig, logger *slog.Logger) func(http.Handler) http
logger = slog.Default() logger = slog.Default()
} }
// Parse trusted proxy CIDRs.
var trustedNets []*net.IPNet
for _, cidr := range cfg.TrustedProxies {
_, network, err := net.ParseCIDR(cidr)
if err != nil {
logger.Warn("invalid trusted proxy CIDR, skipping", "cidr", cidr, "error", err)
continue
}
trustedNets = append(trustedNets, network)
}
limiter := &ipLimiter{ limiter := &ipLimiter{
requests: requests, requests: requests,
window: window, window: window,
clients: make(map[string]*bucket), clients: make(map[string]*bucket),
logger: logger, logger: logger,
trusted: trustedNets,
} }
go limiter.cleanup(cleanup) go limiter.cleanup(cleanup)
return func(next http.Handler) http.Handler { return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ip := extractIP(r) ip := limiter.extractIP(r)
if !limiter.allow(ip) { if !limiter.allow(ip) {
retryAfter := int(limiter.window.Seconds()) retryAfter := int(limiter.window.Seconds())
@ -92,6 +108,7 @@ type ipLimiter struct {
clients map[string]*bucket clients map[string]*bucket
mu sync.Mutex mu sync.Mutex
logger *slog.Logger logger *slog.Logger
trusted []*net.IPNet
} }
func (l *ipLimiter) allow(ip string) bool { func (l *ipLimiter) allow(ip string) bool {
@ -129,18 +146,48 @@ func (l *ipLimiter) cleanup(interval time.Duration) {
} }
} }
func extractIP(r *http.Request) string { // extractIP extracts the client IP from the request.
if xff := r.Header.Get("X-Forwarded-For"); xff != "" { // If trusted proxy CIDRs are configured, X-Forwarded-For is only used when
parts := strings.SplitN(xff, ",", 2) // the direct connection comes from a trusted proxy. Otherwise, only RemoteAddr is used.
return strings.TrimSpace(parts[0]) func (l *ipLimiter) extractIP(r *http.Request) string {
} return extractIP(r, l.trusted...)
if rip := r.Header.Get("X-Real-IP"); rip != "" { }
return strings.TrimSpace(rip)
func extractIP(r *http.Request, trusted ...*net.IPNet) string {
remoteIP, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
remoteIP = r.RemoteAddr
} }
host, _, err := net.SplitHostPort(r.RemoteAddr) // Check if the direct connection is from a trusted proxy.
if err != nil { isTrusted := false
return r.RemoteAddr if len(trusted) > 0 {
ip := net.ParseIP(remoteIP)
if ip != nil {
for _, network := range trusted {
if network.Contains(ip) {
isTrusted = true
break
}
}
}
} }
return host
if isTrusted {
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
parts := strings.SplitN(xff, ",", 2)
candidate := strings.TrimSpace(parts[0])
if net.ParseIP(candidate) != nil {
return candidate
}
}
if rip := r.Header.Get("X-Real-IP"); rip != "" {
candidate := strings.TrimSpace(rip)
if net.ParseIP(candidate) != nil {
return candidate
}
}
}
return remoteIP
} }

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -71,7 +71,7 @@ func GlobalRateLimit(cfg GlobalRateLimitConfig, logger *slog.Logger) func(http.H
w.Header().Set("Content-Type", "text/plain; charset=utf-8") w.Header().Set("Content-Type", "text/plain; charset=utf-8")
w.WriteHeader(http.StatusServiceUnavailable) w.WriteHeader(http.StatusServiceUnavailable)
_, _ = w.Write([]byte("503 Service Unavailable — global rate limit exceeded\n")) _, _ = w.Write([]byte("503 Service Unavailable — global rate limit exceeded\n"))
logger.Warn("global rate limit exceeded", "ip", extractIP(r)) logger.Warn("global rate limit exceeded", "remote", r.RemoteAddr)
return return
} }

View file

@ -1,6 +1,7 @@
package middleware package middleware
import ( import (
"net"
"net/http" "net/http"
"net/http/httptest" "net/http/httptest"
"testing" "testing"
@ -93,8 +94,9 @@ func TestRateLimit_DifferentIPs(t *testing.T) {
func TestRateLimit_XForwardedFor(t *testing.T) { func TestRateLimit_XForwardedFor(t *testing.T) {
h := RateLimit(RateLimitConfig{ h := RateLimit(RateLimitConfig{
Requests: 1, Requests: 1,
Window: 10 * time.Second, Window: 10 * time.Second,
TrustedProxies: []string{"10.0.0.0/8"},
}, nil)(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { }, nil)(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK) w.WriteHeader(http.StatusOK)
})) }))
@ -143,17 +145,27 @@ func TestRateLimit_WindowExpires(t *testing.T) {
} }
func TestExtractIP(t *testing.T) { func TestExtractIP(t *testing.T) {
// Trusted proxy: loopback
loopback := mustParseCIDR("127.0.0.0/8")
privateNet := mustParseCIDR("10.0.0.0/8")
tests := []struct { tests := []struct {
name string name string
xff string xff string
realIP string realIP string
remote string remote string
trusted []*net.IPNet
expected string expected string
}{ }{
{"xff", "203.0.113.50, 10.0.0.1", "", "10.0.0.1:1234", "203.0.113.50"}, // No trusted proxies → always use RemoteAddr.
{"real_ip", "", "203.0.113.50", "10.0.0.1:1234", "203.0.113.50"}, {"no_trusted_xff", "203.0.113.50, 10.0.0.1", "", "10.0.0.1:1234", nil, "10.0.0.1"},
{"remote", "", "", "1.2.3.4:5678", "1.2.3.4"}, {"no_trusted_real", "", "203.0.113.50", "10.0.0.1:1234", nil, "10.0.0.1"},
{"xff_over_real", "203.0.113.50", "10.0.0.1", "10.0.0.1:1234", "203.0.113.50"}, {"no_trusted_remote", "", "", "1.2.3.4:5678", nil, "1.2.3.4"},
// Trusted proxy → XFF is respected.
{"trusted_xff", "203.0.113.50, 10.0.0.1", "", "10.0.0.1:1234", []*net.IPNet{privateNet}, "203.0.113.50"},
{"trusted_real_ip", "", "203.0.113.50", "10.0.0.1:1234", []*net.IPNet{privateNet}, "203.0.113.50"},
// Untrusted remote → XFF ignored even if present.
{"untrusted_xff", "203.0.113.50, 10.0.0.1", "", "1.2.3.4:5678", []*net.IPNet{loopback}, "1.2.3.4"},
} }
for _, tt := range tests { for _, tt := range tests {
@ -167,9 +179,17 @@ func TestExtractIP(t *testing.T) {
} }
req.RemoteAddr = tt.remote req.RemoteAddr = tt.remote
if got := extractIP(req); got != tt.expected { if got := extractIP(req, tt.trusted...); got != tt.expected {
t.Errorf("extractIP() = %q, want %q", got, tt.expected) t.Errorf("extractIP() = %q, want %q", got, tt.expected)
} }
}) })
} }
} }
func mustParseCIDR(s string) *net.IPNet {
_, network, err := net.ParseCIDR(s)
if err != nil {
panic(err)
}
return network
}

View file

@ -0,0 +1,92 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
package middleware
import (
"net/http"
"strconv"
"strings"
)
// SecurityHeadersConfig controls which security headers are set.
type SecurityHeadersConfig struct {
// FrameOptions controls X-Frame-Options. Default: "DENY".
FrameOptions string
// HSTSMaxAge controls the max-age for Strict-Transport-Security.
// Set to 0 to disable HSTS (useful for local dev). Default: 31536000 (1 year).
HSTSMaxAge int
// HSTSPreloadDomains adds "includeSubDomains; preload" to HSTS.
HSTSPreloadDomains bool
// ReferrerPolicy controls the Referrer-Policy header. Default: "no-referrer".
ReferrerPolicy string
// CSP controls Content-Security-Policy. Default: a restrictive policy.
// Set to "" to disable CSP entirely.
CSP string
}
// SecurityHeaders returns middleware that sets standard HTTP security headers
// on every response.
func SecurityHeaders(cfg SecurityHeadersConfig) func(http.Handler) http.Handler {
frameOpts := cfg.FrameOptions
if frameOpts == "" {
frameOpts = "DENY"
}
hstsAge := cfg.HSTSMaxAge
if hstsAge == 0 {
hstsAge = 31536000 // 1 year
}
refPol := cfg.ReferrerPolicy
if refPol == "" {
refPol = "no-referrer"
}
csp := cfg.CSP
if csp == "" {
csp = defaultCSP()
}
hstsValue := "max-age=" + strconv.Itoa(hstsAge)
if cfg.HSTSPreloadDomains {
hstsValue += "; includeSubDomains; preload"
}
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("X-Frame-Options", frameOpts)
w.Header().Set("Referrer-Policy", refPol)
w.Header().Set("Permissions-Policy", "camera=(), microphone=(), geolocation=()")
w.Header().Set("Content-Security-Policy", csp)
if hstsAge > 0 {
w.Header().Set("Strict-Transport-Security", hstsValue)
}
next.ServeHTTP(w, r)
})
}
}
// defaultCSP returns a restrictive Content-Security-Policy for the
// metasearch engine.
func defaultCSP() string {
return strings.Join([]string{
"default-src 'self'",
"script-src 'self' 'unsafe-inline' https://unpkg.com",
"style-src 'self' 'unsafe-inline'",
"img-src 'self' https: data:",
"connect-src 'self'",
"font-src 'self'",
"frame-ancestors 'none'",
"base-uri 'self'",
"form-action 'self'",
}, "; ")
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -21,7 +21,7 @@ import (
"net/url" "net/url"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
// MergeResponses merges multiple compatible JSON responses. // MergeResponses merges multiple compatible JSON responses.

View file

@ -4,7 +4,7 @@ import (
"strings" "strings"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func TestMergeResponses_DedupResultsAndSets(t *testing.T) { func TestMergeResponses_DedupResultsAndSets(t *testing.T) {

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -26,6 +26,30 @@ import (
var languageCodeRe = regexp.MustCompile(`^[a-z]{2,3}(-[a-zA-Z]{2})?$`) var languageCodeRe = regexp.MustCompile(`^[a-z]{2,3}(-[a-zA-Z]{2})?$`)
// maxQueryLength is the maximum allowed length for the search query.
const maxQueryLength = 1024
// knownEngineNames is the allowlist of valid engine identifiers.
var knownEngineNames = map[string]bool{
"wikipedia": true, "wikidata": true, "arxiv": true, "crossref": true,
"braveapi": true, "brave": true, "qwant": true,
"duckduckgo": true, "github": true, "reddit": true,
"bing": true, "google": true, "youtube": true,
// Image engines
"bing_images": true, "ddg_images": true, "qwant_images": true,
}
// validateEngines filters engine names against the known registry.
func validateEngines(engines []string) []string {
out := make([]string, 0, len(engines))
for _, e := range engines {
if knownEngineNames[strings.ToLower(e)] {
out = append(out, strings.ToLower(e))
}
}
return out
}
func ParseSearchRequest(r *http.Request) (SearchRequest, error) { func ParseSearchRequest(r *http.Request) (SearchRequest, error) {
// Supports both GET and POST and relies on form values for routing. // Supports both GET and POST and relies on form values for routing.
if err := r.ParseForm(); err != nil { if err := r.ParseForm(); err != nil {
@ -50,6 +74,9 @@ func ParseSearchRequest(r *http.Request) (SearchRequest, error) {
if strings.TrimSpace(q) == "" { if strings.TrimSpace(q) == "" {
return SearchRequest{}, errors.New("missing required parameter: q") return SearchRequest{}, errors.New("missing required parameter: q")
} }
if len(q) > maxQueryLength {
return SearchRequest{}, errors.New("query exceeds maximum length")
}
pageno := 1 pageno := 1
if s := strings.TrimSpace(r.FormValue("pageno")); s != "" { if s := strings.TrimSpace(r.FormValue("pageno")); s != "" {
@ -105,6 +132,8 @@ func ParseSearchRequest(r *http.Request) (SearchRequest, error) {
// engines is an explicit list of engine names. // engines is an explicit list of engine names.
engines := splitCSV(strings.TrimSpace(r.FormValue("engines"))) engines := splitCSV(strings.TrimSpace(r.FormValue("engines")))
// Validate engine names against known registry to prevent injection.
engines = validateEngines(engines)
// categories and category_<name> params mirror the webadapter parsing. // categories and category_<name> params mirror the webadapter parsing.
// We don't validate against a registry here; we just preserve the requested values. // We don't validate against a registry here; we just preserve the requested values.
@ -132,6 +161,10 @@ func ParseSearchRequest(r *http.Request) (SearchRequest, error) {
delete(catSet, category) delete(catSet, category)
} }
} }
// HTML UI uses a single ?category=images (etc.) query param; honor it here.
if single := strings.TrimSpace(r.FormValue("category")); single != "" {
catSet[single] = true
}
categories := make([]string, 0, len(catSet)) categories := make([]string, 0, len(catSet))
for c := range catSet { for c := range catSet {
categories = append(categories, c) categories = append(categories, c)
@ -167,16 +200,16 @@ func ParseSearchRequest(r *http.Request) (SearchRequest, error) {
return SearchRequest{ return SearchRequest{
Format: OutputFormat(format), Format: OutputFormat(format),
Query: q, Query: q,
Pageno: pageno, Pageno: pageno,
Safesearch: safesearch, Safesearch: safesearch,
TimeRange: timeRange, TimeRange: timeRange,
TimeoutLimit: timeoutLimit, TimeoutLimit: timeoutLimit,
Language: language, Language: language,
Engines: engines, Engines: engines,
Categories: categories, Categories: categories,
EngineData: engineData, EngineData: engineData,
AccessToken: accessToken, AccessToken: accessToken,
}, nil }, nil
} }
@ -221,4 +254,3 @@ func parseAccessToken(r *http.Request) string {
return "" return ""
} }

View file

@ -72,3 +72,21 @@ func TestParseSearchRequest_CategoriesAndEngineData(t *testing.T) {
} }
} }
func TestParseSearchRequest_SingularCategoryParam(t *testing.T) {
r := httptest.NewRequest(http.MethodGet, "/search?q=cats&category=images", nil)
req, err := ParseSearchRequest(r)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
found := false
for _, c := range req.Categories {
if c == "images" {
found = true
break
}
}
if !found {
t.Fatalf("expected category images from ?category=images, got %v", req.Categories)
}
}

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -127,8 +127,8 @@ func writeCSV(w http.ResponseWriter, resp SearchResponse) error {
func writeRSS(w http.ResponseWriter, resp SearchResponse) error { func writeRSS(w http.ResponseWriter, resp SearchResponse) error {
q := resp.Query q := resp.Query
escapedTitle := xmlEscape("kafka search: " + q) escapedTitle := xmlEscape("samsa search: " + q)
escapedDesc := xmlEscape("Search results for \"" + q + "\" - kafka") escapedDesc := xmlEscape("Search results for \"" + q + "\" - samsa")
escapedQueryTerms := xmlEscape(q) escapedQueryTerms := xmlEscape(q)
link := "/search?q=" + url.QueryEscape(q) link := "/search?q=" + url.QueryEscape(q)

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -18,29 +18,32 @@ package search
import ( import (
"context" "context"
"net/http" "encoding/json"
"fmt"
"sync" "sync"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/cache" "github.com/metamorphosis-dev/samsa/internal/cache"
"github.com/metamorphosis-dev/kafka/internal/config" "github.com/metamorphosis-dev/samsa/internal/config"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/kafka/internal/engines" "github.com/metamorphosis-dev/samsa/internal/engines"
"github.com/metamorphosis-dev/kafka/internal/upstream" "github.com/metamorphosis-dev/samsa/internal/httpclient"
"github.com/metamorphosis-dev/samsa/internal/upstream"
) )
type ServiceConfig struct { type ServiceConfig struct {
UpstreamURL string UpstreamURL string
HTTPTimeout time.Duration HTTPTimeout time.Duration
Cache *cache.Cache Cache *cache.Cache
EnginesConfig *config.Config CacheTTLOverrides map[string]time.Duration
EnginesConfig *config.Config
} }
type Service struct { type Service struct {
upstreamClient *upstream.Client upstreamClient *upstream.Client
planner *engines.Planner planner *engines.Planner
localEngines map[string]engines.Engine localEngines map[string]engines.Engine
cache *cache.Cache engineCache *cache.EngineCache
} }
func NewService(cfg ServiceConfig) *Service { func NewService(cfg ServiceConfig) *Service {
@ -49,7 +52,7 @@ func NewService(cfg ServiceConfig) *Service {
timeout = 10 * time.Second timeout = 10 * time.Second
} }
httpClient := &http.Client{Timeout: timeout} httpClient := httpclient.NewClient(timeout)
var up *upstream.Client var up *upstream.Client
if cfg.UpstreamURL != "" { if cfg.UpstreamURL != "" {
@ -59,118 +62,177 @@ func NewService(cfg ServiceConfig) *Service {
} }
} }
var engineCache *cache.EngineCache
if cfg.Cache != nil {
engineCache = cache.NewEngineCache(cfg.Cache, cfg.CacheTTLOverrides)
}
return &Service{ return &Service{
upstreamClient: up, upstreamClient: up,
planner: engines.NewPlannerFromEnv(), planner: engines.NewPlannerFromEnv(),
localEngines: engines.NewDefaultPortedEngines(httpClient, cfg.EnginesConfig), localEngines: engines.NewDefaultPortedEngines(httpClient, cfg.EnginesConfig),
cache: cfg.Cache, engineCache: engineCache,
} }
} }
// derefString returns the string value of a *string, or "" if nil.
func derefString(s *string) string {
if s == nil {
return ""
}
return *s
}
// Search executes the request against local engines (in parallel) and // Search executes the request against local engines (in parallel) and
// optionally the upstream instance for unported engines. // optionally the upstream instance for unported engines.
//
// Individual engine failures are reported as unresponsive_engines rather
// than aborting the entire search.
//
// If a Valkey cache is configured and contains a cached response for this
// request, the cached result is returned without hitting any engines.
func (s *Service) Search(ctx context.Context, req SearchRequest) (SearchResponse, error) { func (s *Service) Search(ctx context.Context, req SearchRequest) (SearchResponse, error) {
// Check cache first. queryHash := cache.QueryHash(
if s.cache != nil { req.Query,
cacheKey := cache.Key(req) int(req.Pageno),
if cached, hit := s.cache.Get(ctx, cacheKey); hit { int(req.Safesearch),
return cached, nil req.Language,
} derefString(req.TimeRange),
} )
merged, err := s.executeSearch(ctx, req)
if err != nil {
return SearchResponse{}, err
}
// Store in cache.
if s.cache != nil {
cacheKey := cache.Key(req)
s.cache.Set(ctx, cacheKey, merged)
}
return merged, nil
}
// executeSearch runs the actual engine queries and merges results.
func (s *Service) executeSearch(ctx context.Context, req SearchRequest) (SearchResponse, error) {
localEngineNames, upstreamEngineNames, _ := s.planner.Plan(req) localEngineNames, upstreamEngineNames, _ := s.planner.Plan(req)
// Run all local engines concurrently. // Phase 1: Parallel cache lookups — classify each engine as fresh/stale/miss
type engineResult struct { type cacheResult struct {
name string engine string
resp contracts.SearchResponse cached cache.CachedEngineResponse
err error hit bool
fresh *contracts.SearchResponse // nil if no fresh response
fetchErr error
unmarshalErr bool // true if hit but unmarshal failed (treat as miss)
} }
localResults := make([]engineResult, 0, len(localEngineNames)) cacheResults := make([]cacheResult, len(localEngineNames))
var wg sync.WaitGroup var lookupWg sync.WaitGroup
var mu sync.Mutex for i, name := range localEngineNames {
lookupWg.Add(1)
go func(i int, name string) {
defer lookupWg.Done()
for _, name := range localEngineNames { result := cacheResult{engine: name}
eng, ok := s.localEngines[name]
if !ok { if s.engineCache != nil {
mu.Lock() cached, ok := s.engineCache.Get(ctx, name, queryHash)
localResults = append(localResults, engineResult{ if ok {
name: name, result.hit = true
resp: unresponsiveResponse(req.Query, name, "engine_not_registered"), result.cached = cached
}) if !s.engineCache.IsStale(cached, name) {
mu.Unlock() // Fresh cache hit — deserialize and use directly
var resp contracts.SearchResponse
if err := json.Unmarshal(cached.Response, &resp); err == nil {
result.fresh = &resp
} else {
// Unmarshal failed — treat as cache miss (will fetch fresh synchronously)
result.unmarshalErr = true
result.hit = false // treat as miss
}
}
// If stale: result.fresh stays zero, result.cached has stale data
}
}
cacheResults[i] = result
}(i, name)
}
lookupWg.Wait()
// Phase 2: Fetch fresh for misses and stale entries
var fetchWg sync.WaitGroup
for i, name := range localEngineNames {
cr := cacheResults[i]
// Fresh hit — nothing to do in phase 2
if cr.hit && cr.fresh != nil {
continue continue
} }
wg.Add(1) // Stale hit — return stale immediately, refresh in background
go func(name string, eng engines.Engine) { if cr.hit && len(cr.cached.Response) > 0 && s.engineCache != nil && s.engineCache.IsStale(cr.cached, name) {
defer wg.Done() fetchWg.Add(1)
go func(name string) {
defer fetchWg.Done()
eng, ok := s.localEngines[name]
if !ok {
return
}
freshResp, err := eng.Search(ctx, req)
if err != nil {
s.engineCache.Logger().Debug("background refresh failed", "engine", name, "error", err)
return
}
s.engineCache.Set(ctx, name, queryHash, freshResp)
}(name)
continue
}
r, err := eng.Search(ctx, req) // Cache miss — fetch fresh synchronously
if !cr.hit {
fetchWg.Add(1)
go func(i int, name string) {
defer fetchWg.Done()
mu.Lock() eng, ok := s.localEngines[name]
defer mu.Unlock() if !ok {
cacheResults[i] = cacheResult{
engine: name,
fetchErr: fmt.Errorf("engine not registered: %s", name),
}
return
}
if err != nil { freshResp, err := eng.Search(ctx, req)
localResults = append(localResults, engineResult{ if err != nil {
name: name, cacheResults[i] = cacheResult{
resp: unresponsiveResponse(req.Query, name, err.Error()), engine: name,
}) fetchErr: err,
return }
return
}
// Cache the fresh response
if s.engineCache != nil {
s.engineCache.Set(ctx, name, queryHash, freshResp)
}
cacheResults[i] = cacheResult{
engine: name,
fresh: &freshResp,
hit: false,
}
}(i, name)
}
}
fetchWg.Wait()
// Phase 3: Collect responses for merge
responses := make([]contracts.SearchResponse, 0, len(cacheResults))
for _, cr := range cacheResults {
if cr.fetchErr != nil {
responses = append(responses, unresponsiveResponse(req.Query, cr.engine, cr.fetchErr.Error()))
continue
}
// Use fresh data if available (fresh hit or freshly fetched), otherwise use stale cached
if cr.fresh != nil {
responses = append(responses, *cr.fresh)
} else if cr.hit && len(cr.cached.Response) > 0 {
var resp contracts.SearchResponse
if err := json.Unmarshal(cr.cached.Response, &resp); err == nil {
responses = append(responses, resp)
} }
localResults = append(localResults, engineResult{name: name, resp: r})
}(name, eng)
}
wg.Wait()
// Collect successful responses and determine upstream fallbacks.
responses := make([]contracts.SearchResponse, 0, len(localResults)+1)
upstreamSet := map[string]bool{}
for _, e := range upstreamEngineNames {
upstreamSet[e] = true
}
for _, lr := range localResults {
responses = append(responses, lr.resp)
// If a local engine returned nothing (e.g. qwant anti-bot), fall back
// to upstream if available.
if shouldFallbackToUpstream(lr.name, lr.resp) && !upstreamSet[lr.name] {
upstreamEngineNames = append(upstreamEngineNames, lr.name)
upstreamSet[lr.name] = true
} }
} }
// Upstream proxy for unported (or fallback) engines. // Upstream proxy for unported (or fallback) engines.
// ... rest of the existing code is UNCHANGED ...
if s.upstreamClient != nil && len(upstreamEngineNames) > 0 { if s.upstreamClient != nil && len(upstreamEngineNames) > 0 {
r, err := s.upstreamClient.SearchJSON(ctx, req, upstreamEngineNames) r, err := s.upstreamClient.SearchJSON(ctx, req, upstreamEngineNames)
if err != nil { if err != nil {
// Upstream failure is treated as a single unresponsive engine entry.
responses = append(responses, contracts.SearchResponse{ responses = append(responses, contracts.SearchResponse{
Query: req.Query, Query: req.Query,
UnresponsiveEngines: [][2]string{{"upstream", err.Error()}}, UnresponsiveEngines: [][2]string{{"upstream", err.Error()}},
@ -195,12 +257,12 @@ func (s *Service) executeSearch(ctx context.Context, req SearchRequest) (SearchR
func unresponsiveResponse(query, engine, reason string) contracts.SearchResponse { func unresponsiveResponse(query, engine, reason string) contracts.SearchResponse {
return contracts.SearchResponse{ return contracts.SearchResponse{
Query: query, Query: query,
NumberOfResults: 0, NumberOfResults: 0,
Results: []contracts.MainResult{}, Results: []contracts.MainResult{},
Answers: []map[string]any{}, Answers: []map[string]any{},
Corrections: []string{}, Corrections: []string{},
Infoboxes: []map[string]any{}, Infoboxes: []map[string]any{},
Suggestions: []string{}, Suggestions: []string{},
UnresponsiveEngines: [][2]string{{engine, reason}}, UnresponsiveEngines: [][2]string{{engine, reason}},
} }
} }
@ -209,12 +271,12 @@ func unresponsiveResponse(query, engine, reason string) contracts.SearchResponse
func emptyResponse(query string) contracts.SearchResponse { func emptyResponse(query string) contracts.SearchResponse {
return contracts.SearchResponse{ return contracts.SearchResponse{
Query: query, Query: query,
NumberOfResults: 0, NumberOfResults: 0,
Results: []contracts.MainResult{}, Results: []contracts.MainResult{},
Answers: []map[string]any{}, Answers: []map[string]any{},
Corrections: []string{}, Corrections: []string{},
Infoboxes: []map[string]any{}, Infoboxes: []map[string]any{},
Suggestions: []string{}, Suggestions: []string{},
UnresponsiveEngines: [][2]string{}, UnresponsiveEngines: [][2]string{},
} }
} }

View file

@ -6,8 +6,8 @@ import (
"testing" "testing"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/kafka/internal/engines" "github.com/metamorphosis-dev/samsa/internal/engines"
) )
// mockEngine is a test engine that returns a predefined response or error. // mockEngine is a test engine that returns a predefined response or error.

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -16,7 +16,7 @@
package search package search
import "github.com/metamorphosis-dev/kafka/internal/contracts" import "github.com/metamorphosis-dev/samsa/internal/contracts"
// Re-export the JSON contract types so the rest of the code can stay in the // Re-export the JSON contract types so the rest of the code can stay in the
// `internal/search` namespace without creating an import cycle. // `internal/search` namespace without creating an import cycle.

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -27,7 +27,8 @@ import (
"strings" "strings"
"time" "time"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/samsa/internal/httpclient"
) )
type Client struct { type Client struct {
@ -44,6 +45,9 @@ func NewClient(baseURL string, timeout time.Duration) (*Client, error) {
if err != nil { if err != nil {
return nil, fmt.Errorf("invalid upstream base URL: %w", err) return nil, fmt.Errorf("invalid upstream base URL: %w", err)
} }
if u.Scheme != "http" && u.Scheme != "https" {
return nil, fmt.Errorf("upstream URL must use http or https, got %q", u.Scheme)
}
// Normalize: trim trailing slash to make URL concatenation predictable. // Normalize: trim trailing slash to make URL concatenation predictable.
base := strings.TrimRight(u.String(), "/") base := strings.TrimRight(u.String(), "/")
@ -53,9 +57,7 @@ func NewClient(baseURL string, timeout time.Duration) (*Client, error) {
return &Client{ return &Client{
baseURL: base, baseURL: base,
http: &http.Client{ http: httpclient.NewClient(timeout),
Timeout: timeout,
},
}, nil }, nil
} }
@ -108,7 +110,7 @@ func (c *Client) SearchJSON(ctx context.Context, req contracts.SearchRequest, en
} }
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
return contracts.SearchResponse{}, fmt.Errorf("upstream search failed: status=%d body=%q", resp.StatusCode, string(body)) return contracts.SearchResponse{}, fmt.Errorf("upstream search failed with status %d", resp.StatusCode)
} }
// Decode upstream JSON into our contract types. // Decode upstream JSON into our contract types.

127
internal/util/validate.go Normal file
View file

@ -0,0 +1,127 @@
// samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License,// or (at your option) any later version.
package util
import (
"fmt"
"net"
"net/url"
"strings"
)
// SafeURLScheme validates that a URL is well-formed and uses an acceptable scheme.
// Returns the parsed URL on success, or an error.
func SafeURLScheme(raw string) (*url.URL, error) {
u, err := url.Parse(raw)
if err != nil {
return nil, err
}
if u.Scheme != "http" && u.Scheme != "https" {
return nil, fmt.Errorf("URL must use http or https, got %q", u.Scheme)
}
return u, nil
}
// IsPrivateIP returns true if the IP address is in a private, loopback,
// link-local, or otherwise non-routable range.
func IsPrivateIP(host string) bool {
// Strip port if present.
h, _, err := net.SplitHostPort(host)
if err != nil {
h = host
}
// Resolve hostname to IPs.
ips, err := net.LookupIP(h)
if err != nil || len(ips) == 0 {
// If we can't resolve, reject to be safe.
return true
}
for _, ip := range ips {
if isPrivateIPAddr(ip) {
return true
}
}
return false
}
func isPrivateIPAddr(ip net.IP) bool {
privateRanges := []struct {
network *net.IPNet
}{
// Loopback
{mustParseCIDR("127.0.0.0/8")},
{mustParseCIDR("::1/128")},
// RFC 1918
{mustParseCIDR("10.0.0.0/8")},
{mustParseCIDR("172.16.0.0/12")},
{mustParseCIDR("192.168.0.0/16")},
// RFC 6598 (Carrier-grade NAT)
{mustParseCIDR("100.64.0.0/10")},
// Link-local
{mustParseCIDR("169.254.0.0/16")},
{mustParseCIDR("fe80::/10")},
// IPv6 unique local
{mustParseCIDR("fc00::/7")},
// IPv4-mapped IPv6 loopback
{mustParseCIDR("::ffff:127.0.0.0/104")},
}
for _, r := range privateRanges {
if r.network.Contains(ip) {
return true
}
}
return false
}
func mustParseCIDR(s string) *net.IPNet {
_, network, err := net.ParseCIDR(s)
if err != nil {
panic(fmt.Sprintf("validate: invalid CIDR %q: %v", s, err))
}
return network
}
// ValidatePublicURL checks that a URL is well-formed, uses http or https,
// and does not point to a private/reserved IP range.
func ValidatePublicURL(raw string) error {
u, err := url.Parse(raw)
if err != nil {
return fmt.Errorf("invalid URL: %w", err)
}
if u.Scheme != "http" && u.Scheme != "https" {
return fmt.Errorf("URL must use http or https, got %q", u.Scheme)
}
if u.Host == "" {
return fmt.Errorf("URL must have a host")
}
if IsPrivateIP(u.Host) {
return fmt.Errorf("URL points to a private or reserved address: %s", u.Host)
}
return nil
}
// SanitizeResultURL ensures a URL is safe for rendering in an href attribute.
// It rejects javascript:, data:, vbscript: and other dangerous schemes.
func SanitizeResultURL(raw string) string {
if raw == "" {
return ""
}
u, err := url.Parse(raw)
if err != nil {
return ""
}
switch strings.ToLower(u.Scheme) {
case "http", "https", "":
return raw
default:
return ""
}
}

View file

@ -7,7 +7,8 @@ var DEFAULT_PREFS = {
theme: 'system', theme: 'system',
engines: ALL_ENGINES.slice(), engines: ALL_ENGINES.slice(),
safeSearch: 'moderate', safeSearch: 'moderate',
format: 'html' format: 'html',
favicon: 'none' // 'none' | 'google' | 'duckduckgo'
}; };
var STORAGE_KEY = 'kafka_prefs'; var STORAGE_KEY = 'kafka_prefs';
@ -22,7 +23,8 @@ function loadPrefs() {
theme: parsed.theme || DEFAULT_PREFS.theme, theme: parsed.theme || DEFAULT_PREFS.theme,
engines: parsed.engines || DEFAULT_PREFS.engines.slice(), engines: parsed.engines || DEFAULT_PREFS.engines.slice(),
safeSearch: parsed.safeSearch || DEFAULT_PREFS.safeSearch, safeSearch: parsed.safeSearch || DEFAULT_PREFS.safeSearch,
format: parsed.format || DEFAULT_PREFS.format format: parsed.format || DEFAULT_PREFS.format,
favicon: parsed.favicon || DEFAULT_PREFS.favicon
}; };
} catch (e) { } catch (e) {
prefs = DEFAULT_PREFS; prefs = DEFAULT_PREFS;
@ -43,6 +45,25 @@ function applyTheme(theme) {
} }
} }
function applyFavicon(service) {
var faviconMap = {
google: function(domain) { return 'https://www.google.com/s2/favicons?domain=' + encodeURIComponent(domain) + '&sz=32'; },
duckduckgo: function(domain) { return 'https://icons.duckduckgo.com/ip3/' + encodeURIComponent(domain) + '.ico'; },
self: function(domain) { return '/favicon/' + encodeURIComponent(domain); }
};
var imgs = document.querySelectorAll('.result-favicon');
imgs.forEach(function(img) {
var domain = img.getAttribute('data-domain');
if (!domain) return;
if (service === 'none') {
img.style.display = 'none';
} else if (faviconMap[service]) {
img.style.display = '';
img.src = faviconMap[service](domain);
}
});
}
function syncEngineInput(prefs) { function syncEngineInput(prefs) {
var input = document.getElementById('engines-input'); var input = document.getElementById('engines-input');
if (input) { if (input) {
@ -103,26 +124,12 @@ function renderPanel(prefs) {
engineToggles += '<label class="engine-toggle"><input type="checkbox" value="' + escapeHtml(name) + '"' + checked + '><span>' + escapeHtml(name) + '</span></label>'; engineToggles += '<label class="engine-toggle"><input type="checkbox" value="' + escapeHtml(name) + '"' + checked + '><span>' + escapeHtml(name) + '</span></label>';
}); });
var ssOptions = [
{ val: 'moderate', label: 'Moderate' }, var faviconOptions = '';
{ val: 'strict', label: 'Strict' }, ['none', 'google', 'duckduckgo', 'self'].forEach(function(src) {
{ val: 'off', label: 'Off' } var labels = { none: 'None', google: 'Google', duckduckgo: 'DuckDuckGo', self: 'Self (Kafka)' };
]; var selected = prefs.favicon === src ? ' selected' : '';
var fmtOptions = [ faviconOptions += '<option value="' + src + '"' + selected + '>' + labels[src] + '</option>';
{ val: 'html', label: 'HTML' },
{ val: 'json', label: 'JSON' },
{ val: 'csv', label: 'CSV' },
{ val: 'rss', label: 'RSS' }
];
var ssOptionsHtml = '';
var fmtOptionsHtml = '';
ssOptions.forEach(function(o) {
var sel = prefs.safeSearch === o.val ? ' selected' : '';
ssOptionsHtml += '<option value="' + o.val + '"' + sel + '>' + o.label + '</option>';
});
fmtOptions.forEach(function(o) {
var sel = prefs.format === o.val ? ' selected' : '';
fmtOptionsHtml += '<option value="' + o.val + '"' + sel + '>' + o.label + '</option>';
}); });
body.innerHTML = body.innerHTML =
@ -145,6 +152,10 @@ function renderPanel(prefs) {
'<label for="pref-format">Default format</label>' + '<label for="pref-format">Default format</label>' +
'<select id="pref-format">' + fmtOptionsHtml + '</select>' + '<select id="pref-format">' + fmtOptionsHtml + '</select>' +
'</div>' + '</div>' +
'<div class="setting-row">' +
'<label for="pref-favicon">Favicon service</label>' +
'<select id="pref-favicon">' + faviconOptions + '</select>' +
'</div>' +
'</div>'; '</div>';
// Theme buttons // Theme buttons
@ -175,24 +186,6 @@ function renderPanel(prefs) {
})(checkboxes[j])); })(checkboxes[j]));
} }
// Safe search
var ssEl = panel.querySelector('#pref-safesearch');
if (ssEl) {
ssEl.addEventListener('change', function() {
prefs.safeSearch = ssEl.value;
savePrefs(prefs);
});
}
// Format
var fmtEl = panel.querySelector('#pref-format');
if (fmtEl) {
fmtEl.addEventListener('change', function() {
prefs.format = fmtEl.value;
savePrefs(prefs);
});
}
// Close button // Close button
var closeBtn = panel.querySelector('.settings-popover-close'); var closeBtn = panel.querySelector('.settings-popover-close');
if (closeBtn) closeBtn.addEventListener('click', closePanel); if (closeBtn) closeBtn.addEventListener('click', closePanel);
@ -201,6 +194,7 @@ function renderPanel(prefs) {
function initSettings() { function initSettings() {
var prefs = loadPrefs(); var prefs = loadPrefs();
applyTheme(prefs.theme); applyTheme(prefs.theme);
applyFavicon(prefs.favicon);
syncEngineInput(prefs); syncEngineInput(prefs);
renderPanel(prefs); renderPanel(prefs);
@ -269,3 +263,80 @@ if (document.readyState === 'loading') {
} else { } else {
initSettings(); initSettings();
} }
// Preferences page navigation
function initPreferences() {
var nav = document.getElementById('preferences-nav');
if (!nav) return;
var sections = document.querySelectorAll('.pref-section');
var navItems = nav.querySelectorAll('.preferences-nav-item');
function showSection(id) {
sections.forEach(function(sec) {
sec.style.display = sec.id === 'section-' + id ? 'block' : 'none';
});
navItems.forEach(function(item) {
item.classList.toggle('active', item.getAttribute('data-section') === id);
});
}
navItems.forEach(function(item) {
item.addEventListener('click', function() {
showSection(item.getAttribute('data-section'));
});
});
// Load saved preferences
var prefs = loadPrefs();
// Apply favicon settings immediately on preferences page
applyFavicon(prefs.favicon);
// Theme
var themeEl = document.getElementById('pref-theme');
if (themeEl) {
themeEl.value = prefs.theme || 'system';
themeEl.addEventListener('change', function() {
prefs.theme = themeEl.value;
savePrefs(prefs);
applyTheme(prefs.theme);
});
}
// Safe search
var ssEl = document.getElementById('pref-safesearch');
if (ssEl) {
ssEl.value = prefs.safeSearch || 'moderate';
ssEl.addEventListener('change', function() {
prefs.safeSearch = ssEl.value;
savePrefs(prefs);
});
}
// Format (if exists on page)
var fmtEl = document.getElementById('pref-format');
if (fmtEl) {
fmtEl.value = prefs.format || 'html';
fmtEl.addEventListener('change', function() {
prefs.format = fmtEl.value;
savePrefs(prefs);
});
}
// Favicon service (if exists on page)
var faviconEl = document.getElementById('pref-favicon');
if (faviconEl) {
faviconEl.value = prefs.favicon || 'none';
faviconEl.addEventListener('change', function() {
prefs.favicon = faviconEl.value;
savePrefs(prefs);
applyFavicon(prefs.favicon);
});
}
// Show first section by default
showSection('search');
}
document.addEventListener('DOMContentLoaded', initPreferences);

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,15 @@
{{define "image_item"}}
<a class="image-result" href="{{.URL}}" target="_blank" rel="noopener noreferrer">
<div class="image-thumb">
{{if .Thumbnail}}
<img src="{{.Thumbnail}}" alt="{{.Title}}" loading="lazy">
{{else}}
<div class="image-placeholder" aria-hidden="true">🖼️</div>
{{end}}
</div>
<div class="image-meta">
<span class="image-title">{{.Title}}</span>
{{if .Content}}<span class="image-source">{{.Content}}</span>{{end}}
</div>
</a>
{{end}}

View file

@ -1,24 +1,25 @@
{{define "title"}}{{end}} {{define "title"}}{{end}}
{{define "content"}} {{define "content"}}
<div class="search-hero"> <div class="home-container">
<div class="hero-logo"> <a href="/" class="home-logo">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"> <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true">
<circle cx="11" cy="11" r="8"/> <circle cx="11" cy="11" r="8"/>
<path d="m21 21-4.35-4.35"/> <path d="m21 21-4.35-4.35"/>
</svg> </svg>
</div> <span class="home-logo-text">samsa</span>
<p class="hero-tagline">Search the web privately, without tracking or censorship.</p> </a>
<div class="search-box"> <p class="home-tagline">Private meta-search, powered by open source.</p>
<form method="GET" action="/search" role="search" id="search-form">
<input type="text" name="q" id="q" placeholder="Search the web…" autocomplete="off" autofocus> <form class="search-form" method="GET" action="/search" role="search">
<button type="submit" class="search-box-submit" aria-label="Search"> <div class="search-box">
<input type="text" name="q" placeholder="Search the web…" autocomplete="off" autofocus>
<button type="submit" class="search-btn" aria-label="Search">
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"> <svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round">
<circle cx="11" cy="11" r="8"/> <circle cx="11" cy="11" r="8"/>
<path d="m21 21-4.35-4.35"/> <path d="m21 21-4.35-4.35"/>
</svg> </svg>
</button> </button>
</form> </div>
</div> </form>
</div> </div>
<div id="results"></div>
{{end}} {{end}}

View file

@ -1,12 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"> <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>kafka</ShortName> <ShortName>samsa</ShortName>
<Description>A privacy-respecting, open metasearch engine</Description> <Description>A privacy-respecting, open metasearch engine</Description>
<InputEncoding>UTF-8</InputEncoding> <InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding> <OutputEncoding>UTF-8</OutputEncoding>
<LongName>kafka — Privacy-respecting metasearch</LongName> <LongName>samsa — Privacy-respecting metasearch</LongName>
<Image width="16" height="16" type="image/svg+xml">/static/img/favicon.svg</Image> <Image width="16" height="16" type="image/svg+xml">/static/img/favicon.svg</Image>
<Contact>https://git.ashisgreat.xyz/penal-colony/kafka</Contact> <Contact>https://git.ashisgreat.xyz/penal-colony/samsa</Contact>
<Url type="text/html" method="GET" template="{baseUrl}/search?q={searchTerms}&amp;format=html"> <Url type="text/html" method="GET" template="{baseUrl}/search?q={searchTerms}&amp;format=html">
<Param name="pageno" value="{startPage?}" /> <Param name="pageno" value="{startPage?}" />
<Param name="language" value="{language?}" /> <Param name="language" value="{language?}" />

View file

@ -0,0 +1,112 @@
{{define "title"}}Preferences{{end}}
{{define "content"}}
<div class="preferences-container">
<h1 class="preferences-title">Preferences</h1>
<form class="preferences-form" method="POST" action="/preferences">
<section class="pref-section">
<h2 class="pref-section-title">Appearance</h2>
<div class="pref-row">
<label for="theme-select">Theme</label>
<select name="theme" id="theme-select">
<option value="light" {{if eq .Theme "light"}}selected{{end}}>Light</option>
<option value="dark" {{if eq .Theme "dark"}}selected{{end}}>Dark</option>
</select>
</div>
</section>
<section class="pref-section">
<h2 class="pref-section-title">Search Engines</h2>
<p class="pref-desc">Select which engines to use for searches.</p>
<div class="engine-grid">
<label class="engine-toggle">
<input type="checkbox" name="engine" value="google" checked>
<span>Google</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="duckduckgo" checked>
<span>DuckDuckGo</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="bing" checked>
<span>Bing</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="brave" checked>
<span>Brave</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="wikipedia" checked>
<span>Wikipedia</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="wikidata" checked>
<span>Wikidata</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="github">
<span>GitHub</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="reddit">
<span>Reddit</span>
</label>
<label class="engine-toggle">
<input type="checkbox" name="engine" value="youtube">
<span>YouTube</span>
</label>
</div>
</section>
<section class="pref-section">
<h2 class="pref-section-title">Privacy</h2>
<div class="pref-row">
<div class="pref-row-info">
<label>Safe Search</label>
<p class="pref-desc">Filter explicit content from results</p>
</div>
<select name="safesearch">
<option value="0">Off</option>
<option value="1" selected>Moderate</option>
<option value="2">Strict</option>
</select>
</div>
<div class="pref-row">
<div class="pref-row-info">
<label for="pref-favicon">Favicon Service</label>
<p class="pref-desc">Fetch favicons for result URLs. "None" is most private.</p>
</div>
<select name="favicon" id="pref-favicon">
<option value="none" {{if eq .FaviconService "none"}}selected{{end}}>None</option>
<option value="google" {{if eq .FaviconService "google"}}selected{{end}}>Google</option>
<option value="duckduckgo" {{if eq .FaviconService "duckduckgo"}}selected{{end}}>DuckDuckGo</option>
<option value="self" {{if eq .FaviconService "self"}}selected{{end}}>Self (Kafka)</option>
</select>
</div>
</section>
<section class="pref-section">
<h2 class="pref-section-title">Language</h2>
<div class="pref-row">
<label for="search-lang">Interface &amp; Search Language</label>
<select name="language" id="search-lang">
<option value="all" selected>All languages</option>
<option value="en">English</option>
<option value="de">Deutsch</option>
<option value="fr">Français</option>
<option value="es">Español</option>
<option value="zh">中文</option>
<option value="ja">日本語</option>
<option value="ru">Русский</option>
</select>
</div>
</section>
<div class="pref-actions">
<a href="/" class="btn-secondary">Cancel</a>
<button type="submit" class="btn-primary">Save Preferences</button>
</div>
</form>
</div>
{{end}}

View file

@ -1,15 +1,17 @@
{{define "result_item"}} {{define "result_item"}}
<article class="result"> <article class="result" data-engine="{{.Engine}}">
<div class="result_header"> <div class="result_header">
<a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.Title}}</a> <a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.SafeTitle}}</a>
</div> </div>
<div class="result_url"> <div class="result_url">
<img class="result-favicon" src="https://www.google.com/s2/favicons?domain={{.URL | urlquery}}&sz=32" alt="" loading="lazy" onerror="this.style.display='none'"> {{if .FaviconIconURL}}
<img class="result-favicon" src="{{.FaviconIconURL}}" alt="" loading="lazy" width="14" height="14">
{{end}}
<a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.URL}}</a> <a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.URL}}</a>
<span class="engine-badge">{{.Engine}}</span> <span class="engine-badge" data-engine="{{.Engine}}">{{.Engine}}</span>
</div> </div>
{{if .Content}} {{if .Content}}
<p class="result_content">{{.Content}}</p> <p class="result_content">{{.SafeContent}}</p>
{{end}} {{end}}
</article> </article>
{{end}} {{end}}

View file

@ -1,46 +1,43 @@
{{define "title"}}{{if .Query}}{{.Query}} — {{end}}{{end}} {{define "title"}}{{if .Query}}{{.Query}} — {{end}}{{end}}
{{define "content"}} {{define "content"}}
<div class="results-layout"> <div class="results-container">
<!-- Compact search bar --> <div class="results-header">
<div class="search-compact"> <div class="results-header-inner">
<div class="search-box"> <a href="/" class="results-logo">
<form method="GET" action="/search" role="search" id="search-form"> <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true">
<input type="text" name="q" id="q" value="{{.Query}}" autocomplete="off" autofocus <circle cx="11" cy="11" r="8"/>
hx-get="/search" hx-target="#results" hx-trigger="keyup changed delay:500ms" hx-include="this"> <path d="m21 21-4.35-4.35"/>
<button type="submit" class="search-box-submit" aria-label="Search"> </svg>
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"> <span>samsa</span>
<circle cx="11" cy="11" r="8"/> </a>
<path d="m21 21-4.35-4.35"/>
</svg> <form class="header-search" method="GET" action="/search" role="search">
</button> {{if and .ActiveCategory (ne .ActiveCategory "all")}}
<input type="hidden" name="category" value="{{.ActiveCategory}}">
{{end}}
<div class="search-box">
<input type="text" name="q" value="{{.Query}}" placeholder="Search…" autocomplete="off">
<button type="submit" class="search-btn" aria-label="Search">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round">
<circle cx="11" cy="11" r="8"/>
<path d="m21 21-4.35-4.35"/>
</svg>
</button>
</div>
</form> </form>
</div> </div>
</div> </div>
<!-- Results --> <div class="category-tabs" role="tablist">
<div class="results-column" id="results"> <a href="/search?q={{.Query | urlquery}}&amp;category=" class="category-tab {{if or (eq .ActiveCategory "") (eq .ActiveCategory "all")}}active{{end}}">All</a>
{{template "results_inner" .}} <a href="/search?q={{.Query | urlquery}}&amp;category=general" class="category-tab {{if eq .ActiveCategory "general"}}active{{end}}">General</a>
<a href="/search?q={{.Query | urlquery}}&amp;category=it" class="category-tab {{if eq .ActiveCategory "it"}}active{{end}}">IT</a>
<a href="/search?q={{.Query | urlquery}}&amp;category=news" class="category-tab {{if eq .ActiveCategory "news"}}active{{end}}">News</a>
<a href="/search?q={{.Query | urlquery}}&amp;category=images" class="category-tab {{if eq .ActiveCategory "images"}}active{{end}}">Images</a>
</div> </div>
<!-- Sidebar --> <div class="results-content">
<aside class="sidebar" id="sidebar"> {{template "results_inner" .}}
{{if .Suggestions}} </div>
<div class="sidebar-card">
<div class="sidebar-title">Suggestions</div>
<div class="suggestion-list">
{{range .Suggestions}}<span class="suggestion"><a href="/search?q={{. | urlquery}}">{{.}}</a></span>{{end}}
</div>
</div>
{{end}}
{{if .UnresponsiveEngines}}
<div class="sidebar-card">
<div class="sidebar-title">Engines with issues</div>
<ul class="unresponsive-engines">
{{range .UnresponsiveEngines}}<li>{{index . 0}}: {{index . 1}}</li>{{end}}
</ul>
</div>
{{end}}
</aside>
</div> </div>
{{end}} {{end}}

View file

@ -1,24 +1,52 @@
{{define "results_inner"}} {{define "results_inner"}}
{{if .Corrections}} {{if .Corrections}}
<div class="correction">{{range .Corrections}}{{.}} {{end}}</div> <div id="corrections" class="correction">{{range .Corrections}}{{.}} {{end}}</div>
{{end}} {{end}}
{{if or .Answers .Infoboxes}} {{if .Infoboxes}}
<div id="answers"> <div class="infobox-list" role="region" aria-label="Summary">
{{range .Answers}}
<div class="dialog-error">{{.}}</div>
{{end}}
{{range .Infoboxes}} {{range .Infoboxes}}
<div class="infobox"> <aside class="infobox-card">
{{if .title}}<div class="title">{{.title}}</div>{{end}} {{if .ImgSrc}}
{{if .content}}<div>{{.content}}</div>{{end}} <div class="infobox-image-wrap">
{{if .img_src}}<img src="{{.img_src}}" alt="{{.title}}">{{end}} <img src="{{.ImgSrc}}" alt="" class="infobox-img" loading="lazy" width="120" height="120">
</div> </div>
{{end}}
<div class="infobox-main">
{{if .Title}}<h2 class="infobox-title">{{.Title}}</h2>{{end}}
{{if .Content}}<p class="infobox-content">{{.Content}}</p>{{end}}
{{if .URL}}<a href="{{.URL}}" class="infobox-link" target="_blank" rel="noopener noreferrer">Read article on Wikipedia</a>{{end}}
</div>
</aside>
{{end}} {{end}}
</div> </div>
{{end}} {{end}}
<div class="results-meta"> {{if .UnresponsiveEngines}}
<div class="engine-errors-wrap" role="region" aria-label="Engine errors">
<details class="engine-errors">
<summary>Some search engines had errors</summary>
<ul class="engine-errors-list">
{{range .UnresponsiveEngines}}
<li class="engine-error-item">
<code class="engine-error-engine">{{index . 0}}</code>
<span class="engine-error-reason">{{index . 1}}</span>
</li>
{{end}}
</ul>
</details>
</div>
{{end}}
{{if .Answers}}
<div id="answers">
{{range .Answers}}
<div class="dialog-error">{{.}}</div>
{{end}}
</div>
{{end}}
<div class="results-meta" id="results-meta">
{{if .NumberOfResults}} {{if .NumberOfResults}}
<span>{{.NumberOfResults}} results</span> <span>{{.NumberOfResults}} results</span>
{{end}} {{end}}
@ -26,16 +54,28 @@
<div id="urls" role="main"> <div id="urls" role="main">
{{if .Results}} {{if .Results}}
{{if .IsImageSearch}}
<div class="image-grid">
{{range .Results}}
{{if eq .Template "images"}}
{{template "image_item" .}}
{{end}}
{{end}}
</div>
{{else}}
{{range .Results}} {{range .Results}}
{{if eq .Template "videos"}} {{if eq .Template "videos"}}
{{template "video_item" .}} {{template "video_item" .}}
{{else if eq .Template "images"}}
{{template "image_item" .}}
{{else}} {{else}}
{{template "result_item" .}} {{template "result_item" .}}
{{end}} {{end}}
{{end}} {{end}}
{{else if not .Answers}} {{end}}
{{else if and (not .Answers) (not .Infoboxes)}}
<div class="no-results"> <div class="no-results">
<div class="no-results-icon">🔍</div> <div class="no-results-icon" aria-hidden="true">🔍</div>
<h2>No results found</h2> <h2>No results found</h2>
<p>Try different keywords or check your spelling.</p> <p>Try different keywords or check your spelling.</p>
</div> </div>
@ -43,40 +83,26 @@
</div> </div>
{{if .Pageno}} {{if .Pageno}}
<nav class="pagination" role="navigation"> <nav class="pagination" role="navigation" aria-label="Pagination">
{{if gt .Pageno 1}} {{if gt .Pageno 1}}
<form method="GET" action="/search" class="prev-next"> <a class="pag-link" href="/search?q={{.Query | urlquery}}&amp;pageno={{.PrevPage}}{{if and .ActiveCategory (ne .ActiveCategory "all")}}&amp;category={{.ActiveCategory | urlquery}}{{end}}">← Prev</a>
<input type="hidden" name="q" value="{{.Query}}">
<input type="hidden" name="pageno" value="{{.PrevPage}}">
<button type="submit">← Prev</button>
</form>
{{end}} {{end}}
{{range .PageNumbers}} {{range .PageNumbers}}
{{if .IsCurrent}} {{if .IsCurrent}}
<span class="page-current">{{.Num}}</span> <span class="page-current" aria-current="page">{{.Num}}</span>
{{else}} {{else}}
<form method="GET" action="/search" class="page-link"> <a class="pag-link" href="/search?q={{$.Query | urlquery}}&amp;pageno={{.Num}}{{if and $.ActiveCategory (ne $.ActiveCategory "all")}}&amp;category={{$.ActiveCategory | urlquery}}{{end}}">{{.Num}}</a>
<input type="hidden" name="q" value="{{$.Query}}">
<input type="hidden" name="pageno" value="{{.Num}}">
<button type="submit">{{.Num}}</button>
</form>
{{end}} {{end}}
{{end}} {{end}}
{{if .HasNext}} {{if .HasNext}}
<form method="GET" action="/search" class="prev-next"> <a class="pag-link" href="/search?q={{.Query | urlquery}}&amp;pageno={{.NextPage}}{{if and .ActiveCategory (ne .ActiveCategory "all")}}&amp;category={{.ActiveCategory | urlquery}}{{end}}">Next →</a>
<input type="hidden" name="q" value="{{.Query}}">
<input type="hidden" name="pageno" value="{{.NextPage}}">
<button type="submit">Next →</button>
</form>
{{end}} {{end}}
</nav> </nav>
{{end}} {{end}}
<div class="back-to-top" id="backToTop"> <div class="back-to-top">
<a href="#">↑ Back to top</a> <a href="#top">↑ Back to top</a>
</div> </div>
<div class="htmx-indicator">Searching…</div>
{{end}} {{end}}

View file

@ -1,5 +1,5 @@
{{define "video_item"}} {{define "video_item"}}
<article class="result video-result"> <article class="result video-result" data-engine="{{.Engine}}">
{{if .Thumbnail}} {{if .Thumbnail}}
<div class="result_thumbnail"> <div class="result_thumbnail">
<a href="{{.URL}}" target="_blank" rel="noopener noreferrer"> <a href="{{.URL}}" target="_blank" rel="noopener noreferrer">
@ -9,13 +9,19 @@
{{end}} {{end}}
<div class="result_content_wrapper"> <div class="result_content_wrapper">
<div class="result_header"> <div class="result_header">
<a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.Title}}</a> <a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.SafeTitle}}</a>
</div> </div>
<div class="result_url"> <div class="result_url">
<span class="engine-badge">youtube</span> {{if .FaviconIconURL}}
<img class="result-favicon" src="{{.FaviconIconURL}}" alt="" loading="lazy" width="14" height="14">
{{end}}
{{if .URL}}
<a href="{{.URL}}" target="_blank" rel="noopener noreferrer">{{.URL}}</a>
{{end}}
<span class="engine-badge" data-engine="{{.Engine}}">{{.Engine}}</span>
</div> </div>
{{if .Content}} {{if .Content}}
<p class="result_content">{{.Content}}</p> <p class="result_content">{{.SafeContent}}</p>
{{end}} {{end}}
</div> </div>
</article> </article>

View file

@ -1,4 +1,4 @@
// kafka — a privacy-respecting metasearch engine // samsa — a privacy-respecting metasearch engine
// Copyright (C) 2026-present metamorphosis-dev // Copyright (C) 2026-present metamorphosis-dev
// //
// This program is free software: you can redistribute it and/or modify // This program is free software: you can redistribute it and/or modify
@ -18,13 +18,17 @@ package views
import ( import (
"embed" "embed"
"encoding/xml"
"html"
"html/template" "html/template"
"io/fs" "io/fs"
"net/http" "net/http"
"net/url"
"strconv" "strconv"
"strings" "strings"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
"github.com/metamorphosis-dev/samsa/internal/util"
) )
//go:embed all:templates //go:embed all:templates
@ -50,6 +54,19 @@ type PageData struct {
UnresponsiveEngines [][2]string UnresponsiveEngines [][2]string
PageNumbers []PageNumber PageNumbers []PageNumber
ShowHeader bool ShowHeader bool
IsImageSearch bool
// Theme is the user's selected theme (light/dark) from cookie
Theme string
FaviconService string
// New fields for three-column layout
Categories []string
CategoryIcons map[string]string
DisabledCategories []string
ActiveCategory string
TimeFilters []FilterOption
TypeFilters []FilterOption
ActiveTime string
ActiveType string
} }
// ResultView is a template-friendly wrapper around a MainResult. // ResultView is a template-friendly wrapper around a MainResult.
@ -58,12 +75,20 @@ type ResultView struct {
// TemplateName is the actual template to dispatch to, computed from Template. // TemplateName is the actual template to dispatch to, computed from Template.
// "videos" maps to "video_item", everything else maps to "result_item". // "videos" maps to "video_item", everything else maps to "result_item".
TemplateName string TemplateName string
// Domain is the hostname extracted from the result URL, used for favicon proxying.
Domain string
// FaviconIconURL is the resolved favicon image URL for the user's favicon preference (empty = hide).
FaviconIconURL string
// SafeTitle and SafeContent are HTML-unescaped versions for rendering.
// The API returns HTML entities which Go templates escape by default.
SafeTitle template.HTML
SafeContent template.HTML
} }
// PageNumber represents a numbered pagination button. // PageNumber represents a numbered pagination button.
type PageNumber struct { type PageNumber struct {
Num int Num int
IsCurrent bool IsCurrent bool
} }
// InfoboxView is a template-friendly infobox. // InfoboxView is a template-friendly infobox.
@ -71,12 +96,20 @@ type InfoboxView struct {
Title string Title string
Content string Content string
ImgSrc string ImgSrc string
URL string
}
// FilterOption represents a filter radio option for the sidebar.
type FilterOption struct {
Label string
Value string
} }
var ( var (
tmplFull *template.Template tmplFull *template.Template
tmplIndex *template.Template tmplIndex *template.Template
tmplFragment *template.Template tmplFragment *template.Template
tmplPreferences *template.Template
) )
func init() { func init() {
@ -88,13 +121,16 @@ func init() {
} }
tmplFull = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS, tmplFull = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS,
"base.html", "results.html", "results_inner.html", "result_item.html", "video_item.html", "base.html", "results.html", "results_inner.html", "result_item.html", "video_item.html", "image_item.html",
)) ))
tmplIndex = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS, tmplIndex = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS,
"base.html", "index.html", "base.html", "index.html",
)) ))
tmplFragment = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS, tmplFragment = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS,
"results_inner.html", "result_item.html", "video_item.html", "results.html", "results_inner.html", "result_item.html", "video_item.html", "image_item.html",
))
tmplPreferences = template.Must(template.New("").Funcs(funcMap).ParseFS(tmplFS,
"base.html", "preferences.html",
)) ))
} }
@ -103,25 +139,90 @@ func StaticFS() (fs.FS, error) {
return fs.Sub(staticFS, "static") return fs.Sub(staticFS, "static")
} }
// OpenSearchXML returns the OpenSearch description XML with {baseUrl} // OpenSearchXML returns the OpenSearch description XML with the base URL
// replaced by the provided base URL. // safely embedded via xml.EscapeText (no raw string interpolation).
func OpenSearchXML(baseURL string) ([]byte, error) { func OpenSearchXML(baseURL string) ([]byte, error) {
tmplFS, _ := fs.Sub(templatesFS, "templates") tmplFS, _ := fs.Sub(templatesFS, "templates")
data, err := fs.ReadFile(tmplFS, "opensearch.xml") data, err := fs.ReadFile(tmplFS, "opensearch.xml")
if err != nil { if err != nil {
return nil, err return nil, err
} }
result := strings.ReplaceAll(string(data), "{baseUrl}", baseURL)
var buf strings.Builder
xml.Escape(&buf, []byte(baseURL))
escapedBaseURL := buf.String()
result := strings.ReplaceAll(string(data), "{baseUrl}", escapedBaseURL)
return []byte(result), nil return []byte(result), nil
} }
// faviconIconURL returns a safe img src for the given service and hostname, or "" for none/invalid.
func faviconIconURL(service, domain string) string {
domain = strings.TrimSpace(domain)
if domain == "" {
return ""
}
switch service {
case "google":
return "https://www.google.com/s2/favicons?domain=" + url.QueryEscape(domain) + "&sz=32"
case "duckduckgo":
return "https://icons.duckduckgo.com/ip3/" + domain + ".ico"
case "self":
return "/favicon/" + domain
default:
return ""
}
}
// FromResponse builds PageData from a search response and request params. // FromResponse builds PageData from a search response and request params.
func FromResponse(resp contracts.SearchResponse, query string, pageno int) PageData { func FromResponse(resp contracts.SearchResponse, query string, pageno int, activeCategory, activeTime, activeType, faviconService string) PageData {
// Set defaults
if activeCategory == "" {
activeCategory = "all"
}
pd := PageData{ pd := PageData{
Query: query, Query: query,
Pageno: pageno, Pageno: pageno,
NumberOfResults: resp.NumberOfResults, NumberOfResults: resp.NumberOfResults,
UnresponsiveEngines: resp.UnresponsiveEngines, UnresponsiveEngines: resp.UnresponsiveEngines,
FaviconService: faviconService,
// New: categories with icons
Categories: []string{"all", "news", "images", "videos", "maps"},
DisabledCategories: []string{"shopping", "music", "weather"},
CategoryIcons: map[string]string{
"all": "🌐",
"news": "📰",
"images": "🖼️",
"videos": "🎬",
"maps": "🗺️",
"shopping": "🛒",
"music": "🎵",
"weather": "🌤️",
},
ActiveCategory: activeCategory,
IsImageSearch: activeCategory == "images",
// Time filters
TimeFilters: []FilterOption{
{Label: "Any time", Value: ""},
{Label: "Past hour", Value: "h"},
{Label: "Past 24 hours", Value: "d"},
{Label: "Past week", Value: "w"},
{Label: "Past month", Value: "m"},
{Label: "Past year", Value: "y"},
},
ActiveTime: activeTime,
// Type filters
TypeFilters: []FilterOption{
{Label: "All results", Value: ""},
{Label: "News", Value: "news"},
{Label: "Videos", Value: "video"},
{Label: "Images", Value: "image"},
},
ActiveType: activeType,
} }
// Convert results. // Convert results.
@ -131,7 +232,24 @@ func FromResponse(resp contracts.SearchResponse, query string, pageno int) PageD
if r.Template == "videos" { if r.Template == "videos" {
tmplName = "video_item" tmplName = "video_item"
} }
pd.Results[i] = ResultView{MainResult: r, TemplateName: tmplName} // Sanitize URLs to prevent javascript:/data: scheme injection.
var domain string
if r.URL != nil {
safe := util.SanitizeResultURL(*r.URL)
r.URL = &safe
if u, err := url.Parse(safe); err == nil {
domain = u.Hostname()
}
}
r.Thumbnail = util.SanitizeResultURL(r.Thumbnail)
pd.Results[i] = ResultView{
MainResult: r,
TemplateName: tmplName,
Domain: domain,
FaviconIconURL: faviconIconURL(faviconService, domain),
SafeTitle: template.HTML(html.UnescapeString(r.Title)),
SafeContent: template.HTML(html.UnescapeString(r.Content)),
}
} }
// Convert answers (they're map[string]any — extract string values). // Convert answers (they're map[string]any — extract string values).
@ -154,9 +272,12 @@ func FromResponse(resp contracts.SearchResponse, query string, pageno int) PageD
iv.Title = v iv.Title = v
} }
if v, ok := ib["img_src"].(string); ok { if v, ok := ib["img_src"].(string); ok {
iv.ImgSrc = v iv.ImgSrc = util.SanitizeResultURL(v)
} }
if iv.Title != "" || iv.Content != "" { if v, ok := ib["url"].(string); ok {
iv.URL = util.SanitizeResultURL(v)
}
if iv.Title != "" || iv.Content != "" || iv.ImgSrc != "" {
pd.Infoboxes = append(pd.Infoboxes, iv) pd.Infoboxes = append(pd.Infoboxes, iv)
} }
} }
@ -188,9 +309,9 @@ func FromResponse(resp contracts.SearchResponse, query string, pageno int) PageD
} }
// RenderIndex renders the homepage (search box only). // RenderIndex renders the homepage (search box only).
func RenderIndex(w http.ResponseWriter, sourceURL string) error { func RenderIndex(w http.ResponseWriter, sourceURL, theme string) error {
w.Header().Set("Content-Type", "text/html; charset=utf-8") w.Header().Set("Content-Type", "text/html; charset=utf-8")
return tmplIndex.ExecuteTemplate(w, "base", PageData{ShowHeader: true, SourceURL: sourceURL}) return tmplIndex.ExecuteTemplate(w, "base", PageData{ShowHeader: true, SourceURL: sourceURL, Theme: theme})
} }
// RenderSearch renders the full search results page (with base layout). // RenderSearch renders the full search results page (with base layout).
@ -233,4 +354,13 @@ func RenderSearchAuto(w http.ResponseWriter, r *http.Request, data PageData) err
return RenderSearch(w, data) return RenderSearch(w, data)
} }
var _ = strconv.Itoa // RenderPreferences renders the full preferences page.
func RenderPreferences(w http.ResponseWriter, sourceURL, theme, faviconService string) error {
w.Header().Set("Content-Type", "text/html; charset=utf-8")
return tmplPreferences.ExecuteTemplate(w, "base", PageData{
ShowHeader: true,
SourceURL: sourceURL,
Theme: theme,
FaviconService: faviconService,
})
}

View file

@ -1,9 +1,10 @@
package views package views
import ( import (
"strings"
"testing" "testing"
"github.com/metamorphosis-dev/kafka/internal/contracts" "github.com/metamorphosis-dev/samsa/internal/contracts"
) )
func mockSearchResponse(query string, numResults int) contracts.SearchResponse { func mockSearchResponse(query string, numResults int) contracts.SearchResponse {
@ -36,11 +37,11 @@ func mockEmptyResponse() contracts.SearchResponse {
} }
func TestFromResponse_Basic(t *testing.T) { func TestFromResponse_Basic(t *testing.T) {
resp := mockSearchResponse("kafka trial", 42) resp := mockSearchResponse("samsa trial", 42)
data := FromResponse(resp, "kafka trial", 1) data := FromResponse(resp, "samsa trial", 1, "", "", "", "none")
if data.Query != "kafka trial" { if data.Query != "samsa trial" {
t.Errorf("expected query 'kafka trial', got %q", data.Query) t.Errorf("expected query 'samsa trial', got %q", data.Query)
} }
if data.NumberOfResults != 42 { if data.NumberOfResults != 42 {
t.Errorf("expected 42 results, got %d", data.NumberOfResults) t.Errorf("expected 42 results, got %d", data.NumberOfResults)
@ -55,7 +56,7 @@ func TestFromResponse_Basic(t *testing.T) {
func TestFromResponse_Pagination(t *testing.T) { func TestFromResponse_Pagination(t *testing.T) {
resp := mockSearchResponse("test", 100) resp := mockSearchResponse("test", 100)
data := FromResponse(resp, "test", 3) data := FromResponse(resp, "test", 3, "", "", "", "none")
if data.PrevPage != 2 { if data.PrevPage != 2 {
t.Errorf("expected PrevPage 2, got %d", data.PrevPage) t.Errorf("expected PrevPage 2, got %d", data.PrevPage)
@ -80,7 +81,7 @@ func TestFromResponse_Pagination(t *testing.T) {
} }
func TestFromResponse_Empty(t *testing.T) { func TestFromResponse_Empty(t *testing.T) {
data := FromResponse(mockEmptyResponse(), "", 1) data := FromResponse(mockEmptyResponse(), "", 1, "", "", "", "none")
if data.NumberOfResults != 0 { if data.NumberOfResults != 0 {
t.Errorf("expected 0 results, got %d", data.NumberOfResults) t.Errorf("expected 0 results, got %d", data.NumberOfResults)
@ -90,6 +91,31 @@ func TestFromResponse_Empty(t *testing.T) {
} }
} }
func TestFromResponse_FaviconIconURL(t *testing.T) {
u := "https://example.com/path"
resp := contracts.SearchResponse{
Query: "q",
NumberOfResults: 1,
Results: []contracts.MainResult{{Title: "t", URL: &u, Engine: "bing"}},
Answers: []map[string]any{},
Corrections: []string{},
Infoboxes: []map[string]any{},
Suggestions: []string{},
UnresponsiveEngines: [][2]string{},
}
data := FromResponse(resp, "q", 1, "", "", "", "google")
if len(data.Results) != 1 {
t.Fatalf("expected 1 result, got %d", len(data.Results))
}
got := data.Results[0].FaviconIconURL
if got == "" || !strings.Contains(got, "google.com/s2/favicons") {
t.Fatalf("expected google favicon URL, got %q", got)
}
if !strings.Contains(got, "example.com") {
t.Fatalf("expected domain in favicon URL, got %q", got)
}
}
func TestIsHTMXRequest(t *testing.T) { func TestIsHTMXRequest(t *testing.T) {
tests := []struct { tests := []struct {
name string name string