Skip to content
/ judas Public

A crawler and indexer for major Nigerian Newspaper websites | Categorize articles | Summarize articles | Rewrite articles in TV, Radio, or Online article format

Notifications You must be signed in to change notification settings

uche-exe/judas

Repository files navigation

Judas

A simple crawler that uses crawls news articles and analyzes their content

Setup

  1. Navigate into the judas directory.
  2. Create a virtual environment named my-env:
python3 -m venv my-env
  1. Activate your virtual environment my-env:
python3 my-env/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Paste your OpenAI API key in the .env.example file and rename the file to .env.
OPENAI_API_KEY=<YOUR_API_KEY>

Usage

  1. The keywords file is in the data folder. You can add as many more context files for GPT to index and use. (N.B: This will cost more money as the usage is calculated by number of tokens/characters).
  2. To run the crawler, run the index.py file with the URL to be crawled/categorized as the second command line argument. For example:
python3 index.py <url_to_be_crawled>
  1. Alternatively, you can modify the index.py file and crawl the URL directly by replacing this line:
url = sys.argv[1]

Running and using the API

  1. Run the flask app
python3 app.py

The Flask server will start running on port 5000.

----------------------------- WARNING ---------------------------------------

NOTE: -- These Instructions are outdated. The README will be updated soon --

-----------------------------------------------------------------------------

  1. Send a post request to localhost:5000/category/
{
    "link": "<str:article_url>"
}

A successful request should return a 200 status code with the following json

{
    "category": "<str:news_category>"
}

About

A crawler and indexer for major Nigerian Newspaper websites | Categorize articles | Summarize articles | Rewrite articles in TV, Radio, or Online article format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published