Round Corner
Department of Computer and Information Science


Machine learning to identify consistent biological signals

Contemporary biomedical research generates large quantities of data, where each experiment’s design choice contributes to a potential bias in data. More often than not, the analyses deal with the issue by analysing only a single dataset at time, as the biases are difficult to accurately describe. Ironically, combining datasets with different biases can be used to identify and eliminate the biases, and keep only the genuine biological variation.

The goal of this project is creating a machine-learning algorithm to automatically identify consistent expression profiles in biological data, and biases associated to individual experiment designs. The student will learn how to apply machine learning to real-world data. The problem may be approached both as designing an estimator or creating a classifier. The expression profiles can be perceived as signals at a higher abstraction level, and a thorough understanding of the biology behind is not necessary. Nevertheless, basic understanding of the experimental designs will be required to adequately massage the data for the machine-learning model.

The focus is mainly on RNA expression data, and should include microarray and RNA-seq data. Relevant experimental designs to include in the work are time-series experiments (measurements at time-intervals following introduction of/release from a treatment) and data comparing two experimental conditions. An example of the former might be measuring expression throughout cell-cycle upon releasing the cell culture from a treatment blocking the cell culture to progress past a certain point in the cell cycle. An example of the latter could be siRNA screens, where a given RNA is post-transcriptionally degraded by introducing a specifically designed siRNA.

The topic is research-oriented, and students can choose whichever machine-learning approach and programming language they deem convenient. Possible approaches include deep neural networks or SVMs, both of which are available in existing Python libraries. GPU programming may be an advantage, but is not necessary.

For more info, contact Antonin Klima (, or Pål Sætrom (



Pål Sætrom Pål Sætrom
416 IT-bygget
NTNU logo