Latent Classification models for Binary data

This page gives some supplemental information for the paper

Helge Langseth and Thomas D. Nielsen: Latent Classification models for Binary data
Publication details: Pattern Recognition 42:2724-2736, 2009.
Download: [Paper], [Preprint]
Source code (matlab): Available upon request


One of the simplest, and yet most consistently well-performing set of classifiers is the naive Bayes models (a special class of Bayesian Network models). However, these models rely on the (naive) assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance. To relax this independence assumption, we have in previous work proposed a family of models, called latent classification models (LCMs). LCMs are defined for continuous domains and generalize the naive Bayes model by using latent variables to model class-conditional dependencies between the attributes. In addition to providing good classification accuracy, the LCM model has several appealing properties, including a relatively small parameter space making it less susceptible to over-fitting. In this paper we take a first-step towards generalizing LCMs to hybrid domains, by proposing an LCM model for domains with binary attributes. We present algorithms for learning the proposed model, and we describe a variational approximation-based inference procedure. Finally, we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for a number of different domains, including the problem of recognizing symbols in black and white images.

Generative properties of the classifier

The paper describes classification of high-dimentional data, where all data are binary. We exemplify its usage on black-and-white (binary-pixel) images of handwritten digits. The movies (link below) show how the classifier tries to classify a partial image, and impute the remaining (undisclosed) part of the picture based on what has been seen so far. An example image from these movies are shown in Figure 1.

Figure 1: Partially disclosed images

Movies of partially disclosed images: Digit 0, Digit 1, Digit 2, Digit 3, Digit 4, Digit 5, Digit 6, Digit 7, Digit 8, Digit 9

Internal representation during learning

In the paper we describe a classification problem where the first class contains images of digits 0 and 1, the other class represents images of the digits 6 and 7. We use a mixture model internally in the classifier to represent this, and here provide a movie which shows how the internal representation is learned iteratively during the EM iterations.

The movie contains 6 images that are updated in parallel, a screen-shot from the process early in the iterations is shown below:

Figure 2: Internal representation of the {0,1} vs. {6,7} problem -- Early phase of learning

In Figure 2, the two images to the left (Numbered 1 and 2) show the current approximation to the mean image in each class (Image 1 for Class 1 (digits 0 and 1), Image 2 for Class 2 (digits 6 and 7). The images labelled 3 and 4 show the representation of the first mixture, images labelled 5 and 6 give the representation of the second mixture.

A later state of the EM iteration is shown in Figure 3. Notice how the mixture model has learned (unsupervised) that it is easier to learn a good classifier by separating the images of class 1 into two groups of images (the first containing zeros, the other holds the ones); the other class is handled similarly, with a separation of the images in that class into two subgroups; the 6's and the 7's.

Figure 3: Internal representation of the {0,1} vs. {6,7} problem -- Later phase of learning
Movie of EM iterations: [.MPEG]