Round Corner
Department of Computer and Information Science


Gender and/or age based author profiling

Until a few years ago, gender-based language studies mainly concentrated on speech. However, social media texts now provide plenty of data for extracting author profiles based on parameters such as gender, age and geolocation, while on the other hand posing new challenges for language analysis due to the often unconventional and abbreviated language used, as well as other characteristica of social media text such as usage of hashtags, emojis, emoticons, code-switching (mixing languages), etc. In addition, many users do not volunteer their actual and true profiles. The theme for this thesis project would thus be investigate automatic (machine learning) methods to either extract and classify author profiles in online texts, or to figure out whether one specific user (or type of user) could have written a given text (i.e., cyber forensics).


Björn Gambäck Björn Gambäck
315 IT-bygget
735 93354 
NTNU logo