A simple crawler that uses crawls news articles and analyzes their content
- Navigate into the
judas
directory. - Create a virtual environment named
my-env
:
python3 -m venv my-env
- Activate your virtual environment
my-env
:
python3 my-env/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Paste your OpenAI API key in the
.env.example
file and rename the file to.env
.
OPENAI_API_KEY=<YOUR_API_KEY>
- The keywords file is in the
data
folder. You can add as many more context files for GPT to index and use. (N.B: This will cost more money as the usage is calculated by number of tokens/characters). - To run the crawler, run the
index.py
file with the URL to be crawled/categorized as the second command line argument. For example:
python3 index.py <url_to_be_crawled>
- Alternatively, you can modify the
index.py
file and crawl the URL directly by replacing this line:
url = sys.argv[1]
- Run the flask app
python3 app.py
The Flask server will start running on port 5000.
- Send a post request to
localhost:5000/category/
{
"link": "<str:article_url>"
}
A successful request should return a 200
status code with the following json
{
"category": "<str:news_category>"
}