Round Corner
Department of Computer and Information Science


Topic modelling vs keyword extraction in terms of forming document search queries

RQ: Which of the two, or a combination, is better when it comes to summarizing a text into keywords to be used in search queries?

Iris AI metric:
Given an existing corpus of documents, a the person evaluating should use the documents to form a search query in Google Scholar. The corpus will be of size of tens of papers specified by the Iris AI team. They all will be in the topic of topic modeling and keyword extraction. After selecting a document he/she should run both a keyword extraction algorithm and a topic modeling algorithm, get the topics of the current document and its describing words, and use these as a search query in Google Scholar. The goal is to find, in the first page of Google Scholar, the papers they have in their corpus. That is a required evaluation metric for the algorithms.

The goal is an evaluation of the quality of the two categories of algorithms - and/or a combination - in terms of getting papers that are close to the presented papers.

The students are free to propose new metrics and compare them to the proposed one. They also could suggest different keyword extraction or topic modelling algorithms that can perform optimally in the presented setting.

Your profile
Coding experience: Ideally Python, but C or Java experience could be accepted too.

Experience in Machine Learning: Preferably taken courses in Machine Learning and Big Data( or Data Science).

We are a startup company, and we’d love to engage with students who are at least intrigued by the idea of working with a small, fast-moving team with big ambitions.

Solid english skills is expected. Where you want to write your thesis from is irrelevant as we’re already a distributed team and work mainly on Hangout and Slack.

If we like each other and Iris AI progresses well there will be opportunities after completion of your master thesis.

About Iris AI
Iris AI is an Artificial Intelligence that will read all of the world's research and help us connect the dots.

Our first goal is an AI-assistant to help tech entrepreneurs and innovators navigate the world of science. It is solving the problem of the massive amount of knowledge we have, which is impossible to navigate for someone who is not a deep know the terminology of the fields and doesn't quite know what they're looking for yet as they're exploring opportunities. You should be able to input any scientific text over 500 words and immediately find all relevant research, across disciplines.

The first baby step product is live on our site, where you can explore the science around a TED talk. The next version of our tool is scheduled for launch in early September. The basic version will always be available for free for individual users, and we are targeting corporations as clients.

We are a fairly young startup, the team was formed August 2015 at Singularity University at NASA Ames Research Park in Silicon Valley. We are a Norwegian-registered company however our team is entirely distributed across Norway, Sweden, Finland, Spain and Ukraine. These first 9 months we have built our first tool, sold it to our first customer, been part of 500 Startups’ Nordic accelerator program, secured a seed investment of €300.000, launched an AI Training program with more than 300 trainers and initiated pilot partnerships with several multinational corporations.

We’re moving very fast, we’re sincerely ambitious and we believe we can have a positive impact on the world. And we’d like you to be part of this adventure.



Anders Kofod-Petersen Anders Kofod-Petersen
Adjunct Professor
360 IT-bygget
NTNU logo