Read also: Writing a Master's Thesis in Language Technology
Properly identifying hatespeech is a pressing issue for social media sites as well as for smaller companies, clubs, and organisations that allow for user-generated content. Many such sites currently use slow, manual moderation, which mean that abusive posts will be left online for too long without appropriate action being taken or that content will be published with delay (which might be unacceptable to the users, e.g., in online chat rooms).
The thesis project would look into previous efforts to identify hate speech and cyber bullying, as well as available flame-annotated datasets from chat rooms, online games, Wikipedia and Twitter, and investigate various machine learning methods to identify such language.