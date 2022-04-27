The problem with context and slang worsens when it comes to the range of languages in India. While the Centre of Internet and Security (CIS) has annotators in different languages, they too face the same problems, because the connotation of a word, or a certain case of its usage, might not necessarily be problematic across different languages.

CIS' project to identify problematic language on social media started off with working on English-language datasets, based on which a set of guidelines were formed for future annotators to work on.

"We had a set of guidelines and then we have a bunch of annotators, six for each language and they were supposed to – from their own experience and with what our guidelines are saying – mark a post for it being problematic," a researcher working on the project told The Quint.