Skip to content

Construction of a model to determine whether summaries of some news article were written by humans or machine generated.

License

Notifications You must be signed in to change notification settings

paultheron-X/INF582-NLP_challenge

Repository files navigation

INF582-NLP_challenge

Paper associated to this git repository : https://www.researchgate.net/publication/361569713_INF582_NLP_Challenge_Summary_Source_Prediction
Readme associated to the git repository : https://github.com/paultheron-X/INF582-NLP_challenge.git
Authors : Jérémie Dentan, Louis Gautier, Paul Théron

Getting started

In order to set up the project, please use a virtual environment that can be created and activated with

python3 -m venv .venv
source ./.venv/bin/activate

Then, install the required libraries with

pip install --upgrade pip
pip3 install -r requirements.txt

Documentation

An academic description of our work is available :

Dowload data and compute features

To reproduce the results of the paper, please run :

bash sh/data_download.sh
bash preprocessing.sh

The preprocessing time is about 1h. However, we provide the computed .csv files in processed_data directory.

Prediction

Simply run:

python main.py

or, for the same algorithm with fine-tuned parameters (same performance yet) :

python main_xgbtuned.py

About

Construction of a model to determine whether summaries of some news article were written by humans or machine generated.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •