Update LICENSE file and add AGPL header to all source files.
AGPLv3 ensures that if someone runs Kafka as a network service and
modifies it, they must release their source code under the same license.
DuckDuckGo:
- Fixed parser to handle single-quoted class attributes (class='result-link')
- Decode DDG tracking URLs (uddg= parameter) to extract real URLs
- Match snippet extraction to actual DDG Lite HTML structure (</td> terminator)
Bing:
- Switched from HTML scraping (blocked by JS detection) to RSS endpoint
(?format=rss) which returns parseable XML
- Added JSON API response parsing as fallback
- Returns graceful unresponsive_engines entry when blocked
Live test results:
- DuckDuckGo: 9 results ✅
- GitHub: 10 results (14,768 total) ✅
- Bing: 10 results via RSS ✅
- Reddit: skipped (403 from sandbox, needs browser-like context)
- DuckDuckGo: scrapes Lite HTML endpoint for results
- Language-aware region mapping (de→de-de, ja→jp-jp, etc.)
- HTML parser extracts result links and snippets from DDG Lite markup
- Shared html_helpers.go with extractAttr, stripHTML, htmlUnescape
- GitHub: uses public Search API (repos, sorted by stars)
- No auth required (10 req/min unauthenticated)
- Shows stars, language, topics, last updated date
- Paginated via GitHub's page parameter
- Reddit: uses public JSON search API
- Respects safesearch (skips over_18 posts)
- Shows subreddit, score, comment count
- Links self-posts to the thread URL
- Bing: scrapes web search HTML (b_algo containers)
- Extracts titles, URLs, and snippets from Bing's result markup
- Handles Bing's tracking URL encoding
- Updated factory, config defaults, and config.example.toml
- Full test suite: unit tests for all engines, HTML parsing tests,
region mapping tests, live request tests (skipped in short mode)
9 engines total: wikipedia, arxiv, crossref, braveapi, qwant,
duckduckgo, github, reddit, bing