Skip to content

Opinion Mining Corpus for Colloquial Variety of Arabic language

Notifications You must be signed in to change notification settings

AhmedObaidi/omcca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

omcca

Opinion Mining Corpus for Colloquial Variety of Arabic language This is a part of my Phd Thesis. Since there is no data publicly available for opinion mining in colloquial varieties of Arabic language, I had to build this corpus. The data has been collected from Jeeran web site (Jeeran.com, 2013). Jeeran is a reviewing platform for the Arab world launched in 2010. It provides a platform for users to add their reviews regarding various kinds of public places, such as hotels, shops, restaurants and libraries.

In Jeeran web site, to write a review, user should provide a textual opinion about the place which is to be reviewed, in addition to a numerical rate for the place. The rate is between 1 and 5, where 1 represents an extremely negative opinion, and 5 extremely positive one. Most opinions are written in dialect of the country in which the place is located. The usable data are for places in Hashemite Kingdom of Jordan, and the Kingdom of Saudi Arabia. The number of reviewers from other countries is few compared with Hashemite Kingdom of Jordan and the Kingdom of Saudi Arabia reviewers, which makes them unusable for machine learning process.

The result is 28,576 reviews, which represents sentiments of 5,422 different reviewers, covering 27 different categories.

About

Opinion Mining Corpus for Colloquial Variety of Arabic language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages