One of the simplest, and yet most consistently well-performing set of
classifiers is the naive Bayes models (a special class of Bayesian Network models). However, these models rely on the
(naive) assumption that all the attributes used to describe an instance are
conditionally independent given the class of that instance.
To relax this independence assumption, we have in previous work proposed a family of models, called
latent classification models (LCMs). LCMs are defined for
continuous domains and generalize the naive Bayes model by using latent variables to model class-conditional
dependencies between the attributes. In addition to providing good
classification accuracy, the LCM model has several
appealing properties, including a relatively small parameter space making
it less susceptible to over-fitting. In this paper we take a
first-step towards generalizing LCMs to hybrid domains, by proposing
an LCM model for domains with binary attributes.
We present algorithms for learning
the proposed model, and we describe a variational
approximation-based inference procedure. Finally, we empirically compare the
accuracy of the proposed model to the
accuracy of other classifiers for a number of different
domains, including the problem of recognizing
symbols in black and white images.
Generative properties of the classifier
The paper describes classification of high-dimentional data, where all data are binary. We exemplify its usage on black-and-white (binary-pixel) images of handwritten digits. The movies (link below) show how the classifier tries to classify a partial image, and impute the remaining (undisclosed) part of the picture based on what has been seen so far.
An example image from these movies are shown in Figure 1.
Movies of partially disclosed images: Digit 0, Digit 1, Digit 2, Digit 3, Digit 4, Digit 5, Digit 6, Digit 7, Digit 8, Digit 9
Internal representation during learning
In the paper we describe a classification problem where the first class contains images of digits 0 and 1, the other class represents images of the digits 6 and 7. We use a mixture model internally in the classifier to represent this, and here provide a movie which shows how the internal representation is learned iteratively during the EM iterations.
The movie contains 6 images that are updated in parallel, a screen-shot from the process early in the iterations is shown below:
A later state of the EM iteration is shown in Figure 3. Notice how the mixture model has learned (unsupervised) that it is easier to learn a good classifier by separating the images of class 1 into two groups of images (the first containing zeros, the other holds the ones); the other class is handled similarly, with a separation of the images in that class into two subgroups; the 6's and the 7's.