RQ: What is the best way to evaluate topic modelling and keyword extraction algorithms in terms of quality?
When you extract a set of keywords from documents, how do you evaluate the quality of the algorithm?
What methods exist currently, which ones are applicable and useful for achieving higher quality, which are comprehensible to humans?
Students are also encouraged to propose new methods on top of what they find in their study.
The work requires a literature review on what metrics exists and then analysis on the Iris AI keyword extraction algorithm and topic modelling implemented in Python together with the currently used dataset. Expected output is assessment of the metrics, summarization of the results and proposition either an existing metric, a new metric, or a combination with adjustment, to be used for assessing quality of current and future algorithms.
Coding experience: Ideally Python, but C or Java experience could be accepted too.
Experience in Machine Learning: Preferably taken courses in Machine Learning and Big Data( or Data Science).
We are a startup company, and we’d love to engage with students who are at least intrigued by the idea of working with a small, fast-moving team with big ambitions.
Solid english skills is expected. Where you want to write your thesis from is irrelevant as we’re already a distributed team and work mainly on Hangout and Slack.
If we like each other and Iris AI progresses well there will be opportunities after completion of your master thesis.
About Iris AI
Iris AI is an Artificial Intelligence that will read all of the world's research and help us connect the dots.
Our first goal is an AI-assistant to help tech entrepreneurs and innovators navigate the world of science. It is solving the problem of the massive amount of knowledge we have, which is impossible to navigate for someone who is not a deep know the terminology of the fields and doesn't quite know what they're looking for yet as they're exploring opportunities. You should be able to input any scientific text over 500 words and immediately find all relevant research, across disciplines.
The first baby step product is live on our site ted.iris.ai, where you can explore the science around a TED talk. The next version of our tool is scheduled for launch in early September. The basic version will always be available for free for individual users, and we are targeting corporations as clients.
We are a fairly young startup, the team was formed August 2015 at Singularity University at NASA Ames Research Park in Silicon Valley. We are a Norwegian-registered company however our team is entirely distributed across Norway, Sweden, Finland, Spain and Ukraine. These first 9 months we have built our first tool, sold it to our first customer, been part of 500 Startups’ Nordic accelerator program, secured a seed investment of €300.000, launched an AI Training program with more than 300 trainers and initiated pilot partnerships with several multinational corporations.
We’re moving very fast, we’re sincerely ambitious and we believe we can have a positive impact on the world. And we’d like you to be part of this adventure.