AMD

#This project is done for the course Algorithms for Massive Datasets taught by Prof. Dario Malchiodi at University of Milan

#Code executed in Google Colab #Need to upload your Kaggle json file hence to pull the dataset from Kaggle

#Description: #Cleaned data (removed unnecessary string,code patterns, stopwords #Tokenized the questions #Implemented TfIdf to extract feature vectors (of 8 vectors each) #Feature vectors passed to MinHashLSH (of 5 buckets) #Used "approxnearestNeighbours" to group similar questions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AMD

Files

README.md

Latest commit

History

README.md

File metadata and controls

AMD