GitHub - chandrusekar47/quora-dedup: Taking a stab at the Quora Question pairs Kaggle Challenge

Quora Question Pairs Kaggle Challenge

We've tried different machine learning approaches to take a stab at this Kaggle challenge. The idea was to compare their effectiveness and also understand why a certain approach works (or doesn't).

Feed forward neural network trained on features derived from sentence vectors for both questions.
Siamese convolutional neural network trained on the word vectors of both questions.
Using TF-IDF scores as weights
Using different sources for word vectors (Wikipedia, google news & training data set) to see if one representation is better than the other ones.

Datasets

Training datasets can be downloaded from https://www.kaggle.com/c/quora-question-pairs/data
Wikipedia word vectors can be downloaded from https://github.com/idio/wiki2vec/
Google News word vectors can be downloaded from https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit

Installation & setup

We used the following third party libraries - gensim, nltk, scipy, sklearn. We also used editdistance package (to compute levenshtein distance), fuzzywuzzy package(to compute partial ratio). They can be installed using :

sudo pip install gensim
sudo pip install scipy
sudo pip install sklearn
sudo pip install editdistance
sudo pip install fuzzywuzzy

How to run the code

python main.py classify neural_network data/train_40k_qn_pairs_features.csv data/test_qn_pairs_features.csv latest_predictions-2.csv

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
models		models
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TfidfEmbeddingVectorizer.py		TfidfEmbeddingVectorizer.py
common.py		common.py
convolutionalNN.py		convolutionalNN.py
generate_data.py		generate_data.py
generate_data_dumps.py		generate_data_dumps.py
main.py		main.py
neural_network.py		neural_network.py
other_classifiers.py		other_classifiers.py
quora_cnn_siamese.ipynb		quora_cnn_siamese.ipynb
tf_example.py		tf_example.py
threshold_classifier.py		threshold_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quora Question Pairs Kaggle Challenge

Datasets

Installation & setup

How to run the code

About

Releases

Packages

Contributors 3

Languages

chandrusekar47/quora-dedup

Folders and files

Latest commit

History

Repository files navigation

Quora Question Pairs Kaggle Challenge

Datasets

Installation & setup

How to run the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages