This repository hosts all the datasets published in Dravidian Languages. Some Datasets are not publicly available yet, do reach out to us if you want to use the datasets.
- KANCMD: Kannada Code-Mixed Dataset for Sentiment Analysis and Offenisve Language Detection
- Dataset [To be Updated]
- Hope Speech detection in under-resourced Kannada language
- Dataset [Can be imported directly from Hugginface Datasets]
- A sentiment analysis dataset for code-mixed Malayalam-English
- Dataset [To be Updated]
- DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages
- Corpus creation for sentiment analysis in code-mixed Tamil-English text
- Dataset [To be Updated]
- HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion
- Dataset [To be Updated]
- A Dataset for Troll Classification of TamilMemes
- Dataset [To be Updated]
- Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text
- Dataset [To be Updated]
- TrollsWithOpinion: A Dataset for Predicting Domain-specific Opinion Manipulation in Troll Memes
- Dataset [To be Updated]
- DravidianMultiModality: A Dataset for Multimodal Sentiment Analysis in Tamil and Malayalam
- Dataset [To be Updated]
- Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments
- Dataset [To be Updated]
Reach Out to us if you need any datasets for your work.
Mail us at dravidianlangtech@gmail.com or bharathiraja.akr@gmail.com