Database-Management-System-for-Library

Sentiment Analysis Text Mining using R

The project is about Database Management System for Library using PHP,SQL, Bootstrap #Project and applying quering Techniques

Few popular hashtags -

`#Library` `#PHP` `#SQL`

`#Database System` `#MYSQLi` `#Management Systems`

Motivation

This project is the prototype of a Simple Library Management System. Librarian has a provision to add book details like ISBN number, book title, author name, edition and publication details through the web page. In addition to this, librarian or any user has a provision to search for the available books in the library by the book name. If book details are present in the database, the search details are displayed on the web page.

About the Project

What is Text Mining?

Text classification or text categorization is an activity of labelling natural language texts with relevant predefined categories. The idea is to automatically organize text in different classes. It can drastically simplify and speed-up your search through the documents or texts!

Steps involved in this project

3 major steps in Database-Management-System-for-Library code :

While training and building a model keep in mind that the first model is never the best one, so the best practice is the “trial and error” method. To make that process simpler, you should create a function for training and in each attempt save results and accuracies.
I decided to sort the EDA process into two categories: general pre-processing steps that were common across all vectorizers and models and certain pre-processing steps that I put as options to measure model performance with or without them
Accuracy was chosen as a measure of comparison between models since greater the accuracy, better the model performance on test data.

Explanation

First of all, I've created a Bag of Words file. This file clean_data.R contains all the methods to preprocess and generate bag of words. We use Corpus library to handle preprocessing and to generate Bag of Words .
The following general pre-processing steps were carried out since any document being input to a model would be required to be in a certain format:

Converting to lowercase
Removal of stop words
Removing alphanumeric characters
Removal of punctuations
Vectorization: TfVectorizer was used. The model accuracy was compared with those that used TfIDFVectorizer. In all cases, when TfVectorizer was used, it gave better results and hence was chosen as the default Vectorizer.

The following steps were added to the pre-processing steps as optional to see how model performance changed with and without these steps: 1. Stemming 2. Lemmatization 3. Using Unigrams/Bigrams

Confusion Matrix for Support Vector Machine using Bag of Words Generated using clean_data.r

> confusionMatrix(table(predsvm,data.test$folder_class))
Confusion Matrix and Statistics

       
predsvm  1  2  3  4
      1 31  0  0  0
      2  0 29  6  0
      3  0  3 28  0
      4  0  0  0 23

Overall Statistics
                                          
               Accuracy : 0.925           
                 95% CI : (0.8624, 0.9651)
    No Information Rate : 0.2833          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.8994          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 4

-The most interesting deduction is that the more specific the newsgroup topic is, the more accurate that the Naïve Bayes classifier can determine what newsgroup a document belongs to and the converse is also true where the less specific the newsgroup is, the accuracy rate plummets.

-We can see this in Accuracy where every newsgroup that isn’t a misc will always have an accuracy rate of at least 50%. The bottom newsgroups for terms of accuracy rate are all misc which includes a 0.25% accuracy rate for talk.politics.misc.

-A reason for this is that the posts that are written in misc newsgroups are rarely related to the actual root of the newsgroup. The misc section caters to other topics of discussion other than the “root newsgroup” meaning that it is much easier for the classifier to confuse a document from a misc newsgroup with another newsgroup and much harder for the classifier to even consider the root newsgroup since topics regarding the root newsgroup at posted there instead.

-For example, a post about guns is posted in talk.religion.misc can be easily classified as being talk.politics.guns because it would have to use similar words found in the posts found in talk.politics.guns. Likewise, posts about politics in talk.politics.misc are less likely because you are more likely to post in or talk.politics.guns (where wildcard is the relevant section for the type of politics to be discussed).

Libraries Used

Installation

Install randomForest using pip command: install.packages("randomForest")
Install caret using pip command: install.packages("caret")
Install mlr using pip command: install.packages("mlr")
Install MASS using pip command: install.packages("MASS")

How to run?

Project Reports

Download for the report.

Useful Links

Related Work

Text Mining Analyzer - A Detailed Report on the Analysis

Contributing

Clone this repository:

git clone https://github.com/iamsivab/Database-Management-System-for-Library.git

Check out any issue from here.
Make changes and send Pull Request.

Need help?

📧 Feel free to contact me @ balasiva001@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
css		css
images		images
includes		includes
js		js
languages		languages
LICENSE		LICENSE
README.md		README.md
downloadExcel.php		downloadExcel.php
downloadFunction.php		downloadFunction.php
getSqlQuery.php		getSqlQuery.php
index.php		index.php
new-php.php		new-php.php
new.php		new.php
register.php		register.php
setSqlQuery.php		setSqlQuery.php
sets.php		sets.php
update.php		update.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database-Management-System-for-Library

Sentiment Analysis Text Mining using R

The project is about Database Management System for Library using PHP,SQL, Bootstrap #Project and applying quering Techniques

Few popular hashtags -

`#Library` `#PHP` `#SQL`

`#Database System` `#MYSQLi` `#Management Systems`

Motivation

About the Project

What is Text Mining?

Steps involved in this project

Explanation

Libraries Used

Installation

How to run?

Project Reports

Useful Links

Related Work

Contributing

Need help?

License

About

Releases

Sponsor this project

Packages

Languages

License

storieswithsiva/Database-Management-System-for-Library

Folders and files

Latest commit

History

Repository files navigation

Database-Management-System-for-Library

Sentiment Analysis Text Mining using R

The project is about Database Management System for Library using PHP,SQL, Bootstrap #Project and applying quering Techniques

Few popular hashtags -

#Library #PHP #SQL

#Database System #MYSQLi #Management Systems

Motivation

About the Project

What is Text Mining?

Steps involved in this project

Explanation

Libraries Used

Installation

How to run?

Project Reports

Useful Links

Related Work

Contributing

Need help?

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

`#Library` `#PHP` `#SQL`

`#Database System` `#MYSQLi` `#Management Systems`

Packages