Commit graph

11 commits

Author SHA1 Message Date
a7f594b7fa feat: add YouTube engine with config file and env support
YouTube Data API v3 engine:
- Add YouTubeConfig to EnginesConfig with api_key field
- Add YOUTUBE_API_KEY env override
- Thread *config.Config through search service to factory
- Factory falls back to env vars if config fields are empty
- Update config.example.toml with youtube section

Also update default local_ported to include google and youtube.
2026-03-22 01:57:13 +00:00
1689cab9bd feat: add YouTube engine via Data API v3
Uses the official YouTube Data API v3. Requires YOUTUBE_API_KEY
environment variable (free from Google Cloud Console).

Returns video results with title, description, channel, publish
date, and thumbnail URL. Falls back gracefully if no API key.
2026-03-22 01:53:19 +00:00
31fdd5e06f Merge branch 'feat/google-engine', remote-tracking branch 'origin/main' 2026-03-22 01:35:20 +00:00
4be9cf2725 feat: add Google engine using GSA User-Agent scraping
SearXNG approach: use Google Search Appliance (GSA) User-Agent
pool — these are whitelisted enterprise identifiers Google trusts.

Key techniques:
- GSA User-Agent (iPhone OS + GSA/ version) instead of Chrome desktop
- CONSENT=YES+ cookie to bypass EU consent wall
- Parse /url?q= redirector URLs (unquote + strip &sa= params)
- div.MjjYud class for result containers (SearXNG selector)
- data-sncf divs for snippets
- detect sorry.google.com blocks
- Suggestions from ouy7Mc class cards
2026-03-22 01:29:46 +00:00
fcd9be16df refactor: remove SearXNG references and rename binary to kafka
- Rename cmd/searxng-go to cmd/kafka
- Remove all SearXNG references from source comments while keeping
  "SearXNG-compatible API" in user-facing docs
- Update binary paths in README, CLAUDE.md, and Dockerfile
- Update log message to "kafka starting"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 01:47:03 +01:00
0d3f3c19d7 fix: add missing engines to defaultPortedEngines
duckduckgo, github, reddit, and bing were registered in factory.go
and config.go but missing from planner.go, so they were silently
skipped when LOCAL_PORTED_ENGINES was not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 00:13:57 +01:00
6346fb7155 chore: update Go module path to github.com/metamorphosis-dev/kafka
Module path now matches the GitHub mirror location.
All internal imports updated across 35+ files.
2026-03-21 19:42:01 +00:00
e5295fa69d chore: rename project from gosearch to kafka
A search engine named after a man who proved answers don't exist.

Renamed everywhere user-facing:
- Brand name, UI titles, OpenSearch description, CSS filename
- Docker service name, NixOS module (services.kafka)
- Cache key prefix (kafka:), User-Agent strings (kafka/0.1)
- README, config.example.toml, flake.nix descriptions

Kept unchanged (internal):
- Go module path: github.com/ashie/gosearch
- Git repository URL: git.ashisgreat.xyz/penal-colony/gosearch
- Binary entrypoint: cmd/searxng-go
2026-03-21 19:20:47 +00:00
a8ab29b23a fix: fix DDG and Bing parsers — verified with live tests
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)

Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
  (?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked

Live test results:
- DuckDuckGo: 9 results 
- GitHub: 10 results (14,768 total) 
- Bing: 10 results via RSS 
- Reddit: skipped (403 from sandbox, needs browser-like context)
2026-03-21 16:57:02 +00:00
df8fe9474b feat: add DuckDuckGo, GitHub, Reddit, and Bing engines
- DuckDuckGo: scrapes Lite HTML endpoint for results
  - Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
  - HTML parser extracts result links and snippets from DDG Lite markup
  - Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape

- GitHub: uses public Search API (repos, sorted by stars)
  - No auth required (10 req/min unauthenticated)
  - Shows stars, language, topics, last updated date
  - Paginated via GitHub's page parameter

- Reddit: uses public JSON search API
  - Respects safesearch (skips over_18 posts)
  - Shows subreddit, score, comment count
  - Links self-posts to the thread URL

- Bing: scrapes web search HTML (b_algo containers)
  - Extracts titles, URLs, and snippets from Bing's result markup
  - Handles Bing's tracking URL encoding

- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
  region mapping tests, live request tests (skipped in short mode)

9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing
2026-03-21 16:52:11 +00:00
dc44837219 feat: build Go-based SearXNG-compatible search service
Implement an API-first Go rewrite with local engine adapters, upstream fallback, and Nix-based tooling so searches can run without matching the original UI while preserving response compatibility.

Made-with: Cursor
2026-03-20 20:34:08 +01:00