Cluster analysis is one of the the fundamental tasks in machine learning and data mining. It is used in a wide range of applications such as biology, medicine, world wide web, chemistry, climatology, finance, and social science. In practice, data is often distributed in curved manifolds and the number of clusters is unknown. Conventional clustering methods do not handle such situations. In the project we will attack the problem by using visualization techniques, especially modern nonlinear dimensionality reduction, to facilitate human in the loop of cluster discovery.
This is a research oriented topic. The student will practice state-of-the-art machine learning methods including stochastic neighbor embedding, nonnegative matrix factorization, large-scale and non-convex optimization as well as their software development in Python or Matlab. Good programming and university mathematics are required.