# Matitos - URLs Fetcher -> Inserts raw URLs - Fetch parsing URL host - Fetch from RSS feed - Fetch keyword search (Google search & news, DuckDuckGo, ...) ++ Sources -> Robustness to TooManyRequests block - Selenium based - Sites change their logic, request captcha, ... - Brave Search API - Free up to X requests per day. Need credit card association (no charges) - Bing API - Subscription required - Yandex. No API? ++ Proxy / VPN? TooManyRequests, ... ++ Search per locale (nl-NL, fr-FR, en-GB) - URLs Processing -> Updates raw URLs - Extracts title, description, content, image and video URLs, main image URL, language, keywords, authors, tags, published date - Determines if it is a valid article content ++ Proxy / VPN? Bypass geoblock - Visualization of URLs - Filter URLs - By status, search, source, language, ... - Charts - Valid URLs - Generate summary - One paragraph - At most three paragraphs - Classification - 5W: Who, What, When, Where, Why of a Story - Related to child abuse? - ... - Content generation - URLs Selection - Valid content - Language of interest - Published (or fetch) date during last_week - Fetched by at least N sources - Use classifications and summaries - Merge summaries, ...