Marker valg for å avgrense hvilke oppgaver som skal vises.
Pro-eating disorder groups (pro-ED) are social media sub-cultures that encourage disordered and dangerous eating behaviours, e.g., Pro-Ana (pro-anorexia), Pro-Mia (pro-bulimia) and Thinspro (Thinspiration, a combination of “thin” and “inspiration”). Automatic detection of users sharing, supporting or following pro-ED content can provide information for understanding and preventing eating disorders, as well as for social media moderation. Data on some such users on Twitter have already been annotated, but to fully apply machine learning algorithms such as deep learning to the problem, more data need to be gathered, tentatively from various sites such as Twitter, Reddit and Tumblr. The thesis work would then experiment with applying various machine learners to this data.
When two individuals who are bi- or multi-lingual in an overlapping set of languages communicate, they tend to switch seemlessly and effortlessly between the languages (codes) they share. Such code-switching is most prominent in spoken language conversations, but also occurs frequently in social media texts that are fairly informal and conversational in nature. The aim of this project is to apply various machine learning methods to such code-switched texts from Twitter, Facebook or Whatsapp, and to, e.g., identify the language of each word or to annotate the texts with part-of-speech tags or utterance boundaries.
To be creative, we need to produce something which is new, meaningful and has some sort of value. Computers are able to support humans in creative processes, but to also themselves be creative or to assess if an idea or a product is creative. A master thesis project on computational creativity can investigate any creative field matching the interests and backgrounds of the student or students (language, design, music, art, mathematics, computer programming, etc.), and concentrate on one or several aspects of computational creativity, such as the production, understanding or evaluation of creativity, or on computer systems that support human creativity.
Creativity can be found in nature and in humans, but also in computers, and entails to produce something which is new. However, just “newness” isn’t a sufficient condition for us to consider an idea to be creative, it also has to have some value and meaning: If a 2 year old draws some lines on a paper, we rarely consider it to be art; while if a grown-up does the same, we interpret it as having some deeper meaning – and if the grown-up signs the paper with a well-known artist name, we attribute both an underlying meaning and a monetary value to it. Creativity is thus something which isn’t only a result of the effort of a producer, but also very much the result of how the result is viewed by the consumer.
Vis hele beskrivelsen ]
Computational creativity can involve computer programmes that in themselves are creative, but also systems that are able to recognise and access creativity, as well as programmes that assist humans in creative tasks. There are many creativity-supporting systems (e.g., Adobe PhotoShop), and a few systems that themselves (possibly) are creative, such as “The Painting Fool” and “AARON” (two artificial artists). There are also systems that draws art based on textual or musical input, or generates music based on images. A master thesis on the topic could address any of these strands and approaches, depending on the student(s) background and interests.
[ Skjul beskrivelse ]
Computational linguistic creativity can be aimed at creating systems that either are creative themselves (e.g., generate poetry, write lyrics to music, produce analogies or metaphors; or chatterbots), or try to understand creativity (e.g., identify sarcasm, understand humour or interpret rhymes), or support humans in creative processes (such as PhotoShop in the image domain), or evaluate creativity.
A master thesis project in the field could concentrate on one or several of these different aspects of computational linguistic creativity (e.g., generate and evaluate computational poetry, or translate on-line jokes between two languages).
Computers have been used in music both as support for creativity and as creative agents themselves, and both for the composition of the music scores and for writing lyrics. The first algorithmic composition system appeared already in the 1950s (the Illiac suite, Hiller & Isaacson 1958), and since then rule-based systems, stochastic methods, grammar-based methods, neural networks, and evolutionary methods have all been utilised to compose music, and/or for generating lyrics. A master thesis on the topic could address any of these strands and approaches, depending on the student(s) background and interests.
The project will investigate automatic methods for emphasis selection (choosing candidates for emphasis) in short texts, and should be based on work towards the SemEval 2020 shared task "Emphasis Selection for Written Text in Visual Media". The topic of the shared task is specified by the organisers by: "word emphasis is used to better capture the intent, removing the ambiguity that may exist in plain text. Word Emphasis can clarify or even change the meaning of a sentence by drawing attention to some specific information, and it can be done with Colors, Backgrounds, or Fonts, Italic and Boldface."
A key aspect of sentiment analysis is identifying the target(s) of the opinion, that is, to determine which entities in a text the expressed sentiment relates to. Exploring how textual entities are related to a text’s overall sentiment can yield information on how given entities are portrayed in social media, e.g., on Twitter. This requires the application of sentiment analysis techniques as well as named entity recognition and linking, and the use of heuristic or grammatical features to determine entity relevance and sentiment strength.
Natural language processing grapples with an ever-changing and moving target. The focus of study, natural language, is natural because it changes, interacts and evolves in various directions. The bio-inspired computational methods described as evolutionary computation and/or genetic algorithms create computational models that evolve a population of individuals to find a solution to a given problem. This project will investigate how evolutionary computation can been employed in some natural language processing task, ranging from efforts to induce grammars to models of language.
Until a few years ago, gender-based language studies mainly concentrated on speech. However, social media texts now provide plenty of data for extracting author profiles based on parameters such as gender, age and geolocation, while on the other hand posing new challenges for language analysis due to the often unconventional and abbreviated language used, as well as other characteristica of social media text such as usage of hashtags, emojis, emoticons, code-switching (mixing languages), etc. In addition, many users do not volunteer their actual and true profiles. The theme for this thesis project would thus be investigate automatic (machine learning) methods to either extract and classify author profiles in online texts, or to figure out whether one specific user (or type of user) could have written a given text (i.e., cyber forensics).
Extremist groups take to social media since they facilitate cheap, quick, and broad dissemination of messages, and allow for unfettered communication with an audience without the filter or ‘selectivity’ of mainstream news outlets. There have in recent years been substantial efforts to identify members already belonging to extremist organisations and track their Internet activities. The present proposal, however, is primarily aimed at the individuals targeted by the extremists, i.e., persons susceptible to their ideas. The goals would be to profile persons vulnerable to extremism and intercept them before they fully turn to the extremist organisations, and to identify sources of extremism and hate-speech in order to preventively destabilize extremist networks.
During the Spring of 2017, parliamentary committees in Germany and the UK strongly criticised leading social media sites such as Facebook, Twitter and Youtube for failing to take sufficient and quick enough action against hate-speech, with the German government threatening to fine the social networks up to 50 million euros if they continue to fail to remove hateful postings within a week.
With legislation in other countries set to follow, properly identifying hatespeech is a pressing issue, not only for the major players, but also for smaller companies, clubs, and organisations that allow for user-generated content on their sites. Many such sites currently use slow, manual moderation, which mean that abusive posts will be left online for too long without appropriate action being taken or that content will be published with delay (which might be unacceptable to the users, e.g., in online chat rooms).
The thesis project would look into previous efforts to identify hate speech and cyber bullying, as well as available flame-annotated datasets from chat rooms, online games, Wikipedia and Twitter, and investigate various machine learning methods to identify such language.
Native Language Identification is the task of identifying the native language of a writer based solely on a sample of their writing in another language. The task is typically framed as a classification problem where the set of native languages is known beforehand. Most work has focused on identifying the native language of writers learning English as a second language. The master thesis work connects to previous work in IDI's AI group and potentially involves participation in a "shared task competition" on Native Language Identification where training and test data is made available by the organisers (such as https://sites.google.com/site/nlisharedtask2013/).
The Customer-Insight unit at DNB consists of data analysts, data scientists and business analysts who are interested in using machine learning techniques to improve predictions of use-cases such as churn-prevention and up-sales. To do that, the suggest Master's Thesis would investigate sentiment analysis of chat texts in Norwegian (assigning positive, negative or neutral sentiment) and then the usage of this sentiment as input to a supervised classifier to predict churn.
DNB PULS is DNB's product for corporate banking and includes a rule-based recommendation system which generates advices to a variety of customer types. The suggested Master's Thesis topic would be to explore machine learning alternatives to this rule-based solution, working together with the PULS team.
In recent years, micro-blogging has become prevalent, and the Twitter API allows users to collect a corpus from their micro-blogosphere. The posts, named tweets, are limited to 140 characters, and are often used to express positive or negative emotions to a person or product. In this project, the goal is to use the Twitter corpus to do sentiment analysis, that is, to classify tweets as to whether they express positive or negative opinions, or are neutral/objective. The work could build on previous master theses at NTNU, and potentially aim to participate in a shared task competition on Twitter Sentiment Analysis.
The master thesis project is aimed at the automatic classification of tweets containing figurative language, that is, language which intentionally conveys secondary or extended meanings (such as sarcasm, irony and metaphor). Such figurative language creates a significant challenge for sentiment analysis systems, as direct approaches based on words and their lexical semantics often are inadequate in the face of indirect meanings. One goal of the project is to find a set of tweets that are rich in figurative language, another goal is to determine whether the writer of each such tweet has expressed a positive or negative sentiment, and possibly the degree to which this sentiment has been communicated.
For the data collection part, the project could tentatively build on data sets from the Semantic Evaluation (SemEval) shared task exercises, in particular the tweets annotated for figurative language in SemEval-15 (Task 11) and those annotated for sarcasm in the SemEval tasks on Twitter sentiment analysis (Sem-14 Task 9, SemEval-15 Task 10, SemEval-16 Task 4).
Several models can be used to find out how users’ social media networks, behaviour and language are related to their ethical practices and personalities, Such models include Schwartz’ values and ethics model and Goldberg's Big 5 model that defines personality traits such as openness, conscientiousness, extraversion, agreeableness and neuroticism. The thesis project would investigate applying such models to social media text and how the user personalties are reflected by the social networks that they participate in and develop.
DNB has implemented a platform for collecting user activity logs across websites and mobility apps. Specific parts of the logs are manually tagged with rules (business meanings) that are used by the bank to make decisions (e.g., to grant a loan). The process of defining these rules is called "contextualisation" - which is a time consuming, iterative and error prone task. Hence DNB wishes to evaluate various unsupervised (and other) approaches that can help to automate or semi-automate the contextualisation.
The aim of this project will be to use evolutionary algorithms as a vehicle to investigate the main dynamics in language evolution. Language evolution is a highly multi-disciplinary research field with the main theories involving biological evolution, language learning, and cultural evolution. When trying to understand the origins of languages, we can try to compensate for the lack of empirical evidence by utilizing evolutionary computational methods to create simulations of how language may have evolved over time, e.g., by creating "language games" to simulate communication between agents in a social setting. In general, simulations on language evolution tend to have relatively small and fixed population sizes, something this study could aim to change.