Skip to content

Automated scraping article text from 100 URLs along with sentiment and readability analysis in each

License

Notifications You must be signed in to change notification settings

tirthsvora/Data_Extraction_and_NLP

Repository files navigation

Hello! Welcome to this beaautiful repository (coz I have used beautifulsoup😉)

Outline:

Automate the entire Process of: Excel file contiaining 100 URLs of articles. Scrape the URL content from the web, perform sentiment analysis, calculate readability scores and save the output to an Excel File.

  • I have used 7 text files containing stopWords along with Master Dictionary of positive and Negative words
  • Extensively used Beautifulsoup for parsing and TextBlob for tokenization
  • Sentiment analysis: positive score, negative score, polarity and subjectivity score
  • Readability analysis: Fog index, complex words, avg_word length etc

Instructions to run

  1. Create a virtual enviornment
  2. clone the repository
  3. pip install -r requirements.txt
  4. python data_analysis.py

About

Automated scraping article text from 100 URLs along with sentiment and readability analysis in each

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages