Bayesian networks is a statistical method for Data Mining, a statistical method for discovering valid, novel and potentially useful patterns in data. A description of Probabilistic networks and machine learning techniques is given in [15].
Bayesian networks are used to represent essential information in databases in a network structure. The network consists of edges and vertices, where the vertices are events and the edges relations between events. A simple Bayesian network is illustrated in Fig 3.2, where symptoms are dependent on a disease, and a disease is dependent on age, work and work environment. Introductions to Bayesian networks can be found in [13], [25], [17], [14] and [18].
Bayesian networks are easy to interpret for humans, and are able to store causal relationships , that is, relations between causes and effects. The networks can be used to represent domain knowledge, and it is possible to control inference and produce explanations on a network.
A simple usage of Bayesian networks is denoted naive Bayesian classification [15]. These networks consist only of one parent and several child nodes as in Fig 3.3. Classification is done by considering the parent node to be a hidden variable (H in the figure) stating which class (child node) each object in the database should belong to. An existing system using naive Bayesian classification is AutoClass [29].
The theoretical foundation for Bayesian networks is Bayes rule, which states:
Where H is a hypothesis, and e an event. is the posterior
probability, and P(H) is the prior probability. A proof of Bayes rule is
given in appendix A.
To give a formal definition of Bayesian networks, we introduce some terminology which is taken from [25]:
If a subset of Z nodes in a graph G intercepts all paths between the nodes
X and Y (written ), then this corresponds
to conditional independence between X and Y given Z:
conversely:
with respect to some dependency model M.
A Directed, Acyclic Graph (DAG) D is said to be a I-map of a dependency model M if for every three disjoint sets of vertices, X, Y and Z we have:
A DAG is a minimal I-map of M if none of its arrows can be deleted without destroying its I-mapness.
Given a probability distribution P
on a set of variables U, a DAG is called a Bayesian
Network of P iff
D is a minimal I-map of P.
A Bayesian network is shown in Fig 3.3, representing the probability distribution P: