Valid content filter, language detect on min chars, fetch missingkids.org
This commit is contained in:
34
README.md
34
README.md
@@ -1 +1,33 @@
|
||||
# Matitos
|
||||
# Matitos
|
||||
|
||||
- Scheduled tasks
|
||||
- Fetcher -> Inserts raw URLs
|
||||
- Fetch parsing URL host
|
||||
- Fetch from RSS feed
|
||||
- Fetch searching (Google search & news, DuckDuckGo, ...)
|
||||
- Process URLs -> Updates raw URLs
|
||||
- Extracts title, description, content, image and video URLs, main image URL, language, keywords, authors, tags, published date
|
||||
- Determines if it is a valid article content
|
||||
- Valid URLs
|
||||
- Generate summary
|
||||
- Classification
|
||||
- 5W: Who, What, When, Where, Why of a Story
|
||||
- Related to child abuse?
|
||||
- ...
|
||||
|
||||
Georgia Institute of Technology
|
||||
https://comm.gatech.edu › resources › writers
|
||||
|
||||
|
||||
- Visualization of URLs
|
||||
- Filter URLs
|
||||
- By status, search, source, language
|
||||
- Charts
|
||||
|
||||
- Content generation
|
||||
- Select URLs:
|
||||
- Valid content
|
||||
- language=en
|
||||
- published_date during last_week
|
||||
- Use classifications
|
||||
- Merge summaries, ...
|
||||
Reference in New Issue
Block a user