ShareChat
Moj

ADIMA

Moj
X
ShareChat

A multilingual, expertly annotated and well-balanced dataset for abuse detection in audio chatrooms.

ADIMA comprises of 11,775 audio utterances (~5-60 seconds) in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users.

10 Indic Languages

ShareChat in Hindi
ShareChat in Gujarati
ShareChat in Punjabi
ShareChat in Telugu
ShareChat in Malayalam
ShareChat in Tamil
ShareChat in Bengali
ShareChat in Kannada
ShareChat in Bhojpuri
ShareChat in Haryanvi

ADIMA has been sourced from real life conversations and can be used for developing content moderation algorithms for enabling safe and healthy interactions.

Publications

Please cite the following if you make use of the dataset.

ADIMA: Abuse Detection In Multilingual Audio

ICASSP, 2022

Vikram Gupta, Rini Sharon, Ramit Sawhney, Debdoot Mukherjee

Paper | Code

Multilingual and Multimodal Abuse Detection

INTERSPEECH, 2022

Rini Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta

Paper | Code

Team

License

ADIMA dataset is available for only reseach purposes. Any commercial use of the dataset is strictly prohibited.

ShareChat
Moj

Follow Us

Copyright © 2026 Mohalla Tech Private Limited.