GitHub - tirthsvora/Data_Extraction_and_NLP: Automated scraping article text from 100 URLs along with sentiment and readability analysis in each

Hello! Welcome to this beaautiful repository (coz I have used beautifulsoup😉)

Outline:

Automate the entire Process of: Excel file contiaining 100 URLs of articles. Scrape the URL content from the web, perform sentiment analysis, calculate readability scores and save the output to an Excel File.

I have used 7 text files containing stopWords along with Master Dictionary of positive and Negative words
Extensively used Beautifulsoup for parsing and TextBlob for tokenization
Sentiment analysis: positive score, negative score, polarity and subjectivity score
Readability analysis: Fog index, complex words, avg_word length etc

Instructions to run

Create a virtual enviornment
clone the repository
pip install -r requirements.txt
python data_analysis.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Input.xlsx		Input.xlsx
LICENSE		LICENSE
README.md		README.md
StopWords_Auditor.txt		StopWords_Auditor.txt
StopWords_Currencies.txt		StopWords_Currencies.txt
StopWords_DatesandNumbers.txt		StopWords_DatesandNumbers.txt
StopWords_Generic.txt		StopWords_Generic.txt
StopWords_GenericLong.txt		StopWords_GenericLong.txt
StopWords_Geographic.txt		StopWords_Geographic.txt
StopWords_Names.txt		StopWords_Names.txt
data_analysis.py		data_analysis.py
final_output.xlsx		final_output.xlsx
negative-words.txt		negative-words.txt
positive-words.txt		positive-words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello! Welcome to this beaautiful repository (coz I have used beautifulsoup😉)

Outline:

Instructions to run

About

Releases

Packages

Languages

License

tirthsvora/Data_Extraction_and_NLP

Folders and files

Latest commit

History

Repository files navigation

Hello! Welcome to this beaautiful repository (coz I have used beautifulsoup😉)

Outline:

Instructions to run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages