Last Updated: 7 July 1999
My interest in the problem of source estimation has arisen from work performed with Herbert G. Vaughan, Jr. and others in the Dynamic Brain Imaging Laboratory on combining information from several different non-invasive human brain imaging techniques to delineate the neural processes that underlie human cognition and sensorimotor function. These techniques include electroencephalography (EEG), magnetoencephalography (MEG), anatomical magnetic resonance imaging (MRI) and functional MRI (MRI). While each of these techniques is useful, there is no single technique that provides the spatial and temporal resolution necessary to make inferences about the intracranial sources of activity. We have recently demonstrated (Knuth and Vaughan 1998) that exciting new algorithms in the area of blind source separation (BSS) are related to those currently used in electromagnetic source localization (ESL) by expressing both in a Bayesian inferential framework. This framework provides a methodology by which several different types of information can be combined to aid in making inferences about a problem.
This html manuscript contains the following basic information on Bayesian Source Estimation:
To understand human neurophysiology, we rely on several types of non-invasive neuroimaging techniques. The techniques used in this laboratory are primarily high-density (128 channel) electroencephalography (EEG) and anatomical and functional magnetic resonance imaging (MRI and fMRI). Through collaborations with other neuroimaging laboratories we often are able to obtain magnetoencephalographic (MEG) recordings in addition to the other imaging techniques. The difficulty arises when one has data from several different imaging techniques and wishes to use all of the available data to make inferences about the underlying neurophysiology of the human subject.
We begin by describing the situation for EEG or MEG recordings.
Please keep in mind that this is a brief and simplified explanation.
More details can be found in our publications at the end of
this html document.
More recent material can be found here, and my complete
publications here.
When a region of neural tissue (consisting of about 100,000 neurons) is synchronously active, detectable extracellular electric currents and magnetic fields are generated. These regions of activity can be modeled as "current dipoles" because they generate a dipolar electric current field in the surrounding volume of the head. These extracellular currents flow throughout the volume of the head and create potential differences on the surface of the head that can be detected with surface electrodes in a procedure called electroencephalography (EEG). One can also place superconducting coils above the head and detect the magnetic fields generated by the activity in a procedure called magnetoencephalography (MEG).
If one knows the positions and orientations of the sources in the brain, one can calculate the patterns of electric potentials or magnetic fields on the surface of the head. This is called the forward problem.
If instead all one has are the patterns of electric potential or magnetic fields then one must try to calculate the locations and orientations of the sources. This is called the inverse problem. Inverse problems are notoriously more difficult to solve than forward problems. In this case, given only the electric potentials and magnetic fields on the surface there is no unique solution to the problem. The only hope is that there is additional information available that can be used to constrain the infinte set of possible solutions to a single unique solution. This is where Bayesian statistics will be useful.
The idea is that one must use all the information available to solve the problem. We will demonstrate this by focusing on the inverse problem where we have information from one technique only, say EEG. (MEG is very similar to EEG and is actually easier to deal with in the sense that the forward problem is easier to deal with for MEG).
In the cartoon below, we show three neural sources, represented in this case by equivalent current dipoles (blue arrows), in the cortical grey matter of the brain. The electrodes on the surface of the head detect the potential differences due to the extracellular currents generated by these active sources. The purple arrows merely demonstrate that each electrode detects some of the current flow from each neural source. (The currents do not flow directly from the sources to the electrodes, but instead flow throughout the volume of the entire head.)
![]() |
The problem of electromagnetic source localization deals with using the spatial patterns of potential difference on the scalp to estimate the most likely positions and orientations of the neural sources. This works quite well when there is only one source active. As soon as several sources are simultaneously active, the spatial patterns can become very complicated and can suggest many possible combinations of neural sources.
There is some information that is neglected by ESL techniques, and that is that the neural sources exhibit temporal behavior. In fact, each electrode detects a mixture of the time-varying activity of each neural source. If one succeeds in localizing the sources, one has also succeeded in separating their signals. Similarly, if one succeeds in separating their signals, localization will have been made much easier. This suggests that the problem of electromagnetic source localization (ESL) is related to the problem of blind source separation (BSS) where one separates the signals using the recorded mixtures and a little a priori information about the source signals. This relationship is described in more detail in Knuth and Vaughan 1998.
The diagram below demonstrates the similarity of our problem above to the problem of BSS. We have neural sources which exhibit a time course of activity described by s(t). This neural activity generates electric currents and magnetic fields which propagate through the volume of the head and are recorded by the detectors . The three detectors in this cartoon record mixtures of those source signals, described by x(t). These mixtures are related to the original source signals by the linear transformation A, called the mixing matrix, such that x(t) = A s(t). The linear transformation A depends on the geometry of the sources and detectors and the laws governing the propagation of the signals in the medium. The problem of BSS is to separate those mixed signals by finding an inverse transformation W, called the separation matrix into the estimates of the original source waveforms, described by y(t).
![]() |
BSS utilizes the higher-order statistics of the signals themselves to estimate the mixing matrix A, up to a permutation and rescaling. On the other hand, ESL utilizes knowledge about the signal propagation and the geometry to estimate the parameters such as source locations and orientations, on which the elements of the mixing matrix explicitly depend.
Below we describe the methodology, based on Bayesian inference, with which we combine the problems of BSS and ESL into a single problem which we call Bayesian Source Estimation (BSE).
Description of the Methodology
The problems of source separation and localization are by their very natures problems of inductive inference.
There is not enough information available to deduce a solution, so one must infer the most probable solution.
The natural starting point to solving an inference problem is the application of Bayes' Theorem:
| (1) | . |
Bayes' theorem can be viewed as describing how our prior knowledge changes due to the acquisition of new information (in this case the recording of some data).
We now recast Bayes' Theorem into a form useful for the source estimation problem. Our model of the
physical situation will consist of the mixing matrix A and the original source waveforms s(t).
The data we record consists of the sampled mixtures, x(t). We rewrite Bayes' Theorem as a
proportionality by absorbing the evidence into a proportionality constant:
| (2) | . |
| (3) | . |
For now we will look at the problem of blind source separation (BSS). The term "blind" refers to the fact
that nothing is known about the mixing process. This is a statement of ignorance about the possible
values and relationships among values of the elements of the mixing matrix. To express this ignorance, we
assign a uniform probability density to the mixing matrix prior P(A | I ). This
assignment reflects the fact that we believe any matrix is as likely to be the correct matrix as any other.
The proportionality above reduces to:
| (4) | . |
Histograms of Digitized Source Waveform Amplitudes
|
|
Super-Gaussian Source Density Model
|
|
Sub-Gaussian Source Density Model
|
|
Speech sounds, as shown in the figure above, have a tendency to have sharply peaked amplitude densities. These can be well-modeled by a Lapacian or other super-Gaussian functions, such as sigmoidal (or logistic), inverse tangent or hyperbolic tangent functions. The lower frequency waveforms are more difficult to model because they tend to have multi-modal probability densities. Modelling the individual peaks in these densities, especially when one does not know where they are or how many there are, proves to be very difficult. In these cases, a relatively flat sub-Gaussian model best describes one's prior knowledge. To deal with cases where sources may have either super- or sub-Gaussian amplitude densities, one effective solution utilizes a source amplitude model parameterized by the sign of the kurtosis, which is a binary-valued parameter. A positive kurtosis indexes a super-Gaussian density, and a negative kurtosis indexes a sub-Gaussian density. This was demonstrated by Lee and Sejnowski [1].
The difficulty is to find a probability distribution that provides the "most honest" representation of one's knowledge. There are techniques for doing so, namely Jaynes' principles of Group Invariance [2] and Maximum Entropy [3].
Once one's prior knowledge has been expressed as a prior probability, this can be substituted into the
Equation above. The next trick is to assign the likelihood function. Assignment of a Gaussian could be
used to express one's doubt as to the predictive power of the model in the case where it is known that there
is additional noise or where simplifying approximations have been made. In the case of the
Bell and Sejnowski
Independent Component Analysis (ICA) algorithm [4], one assumes that the signals are noise-free.
This is equivalent to
having complete confidence that the linear mixing model, x(t) = A s(t), accurately describes the
physical situation. To express this expectation as a probability, we use the delta function:
| (5) | ![]() |
The final step, we will discuss here is marginalization. In our example, the model consists of a description of both
the mixing matrix A and the source waveforms s(t). Because our model is linear, we don't really need to
determine both aspects of the model. The mixing matrix is easiest to obtain, and we can eliminate the source waveforms
from the model by marginalizing over them. This can be performed using the fact that for two hypotheses, A and B, we can write:
| (6) | ![]() |
Marginalizing our model, Eqn. (3) over all possible waveforms s(t), we obtain
| (7) | ![]() |
It is often more convenient to work with the logarithm of the probability. Taking the logarithm of the equation above
results in
| (8) | ![]() |
Now we can substitute in our likelihood and priors. Since the mixing matrix prior is a constant for the BSS problem, we
are left with
| (9) | ![]() |
The final step is to find the matrix that maximizes the posterior probability. This is called the Maximum A Posteriori or MAP criterion. It results in a single solution. However, it is not always the best way to solve the problem, since the posterior probability is the real answer to our question. For simple problems, one can sometimes plot or write a formula for the posterior probability. This provides a complete description of the solution. Monte Carlo sampling of the posterior probability is useful when the dimensionality of the model space is too large to visualize.
Most BSS applications necessitate a single solution, and while it is not necessarily the best thing to do, we can provide one using the MAP criterion. The remainder of the work presented here falls along these lines, although we are currently working on utilizing Monte Carlo techniques to answer more difficult questions.
There are many ways to find a maximum of the posterior probability or, equivalently, its logarithm in Eqn. 9. The method implemented in the Bell-Sejnowski ICA is that of stochastic gradient ascent. The derivative of the logarithm of the probability is used to create an update rule for the matrix A or its inverse W. The data is presented to the update rule and corrections to the current guess are made which constantly increase the probability that the guess is correct. We will not discuss this derivation here since it is discussed in Bell and Sejnowski's paper [4], MacKay's paper [5], as well as several of ours [6-9].
The main idea here is that these problems are inference problems and are most effectively approached using Bayesian inference. Through the Bayesian methodology we are provided with a natural and logically consistent method by which we can include all the information available about a problem. Blind separation techniques work nicely in a wide varienty of cases, but if we have additional information, it would be very unwise not to utilize it. In the context of speech separation, of which several demonstrations are provided below, our brains utilize many types of addtional information. This information includes knowledge of the format frequencies of the speaker's voice, phonetic and syntactic information about the language being spoken, content of the conversation and visual information about the motions and positions of the mouth and lips. In addition, it is well known that primary auditory cortex is not provided with time-series data of the amplitudes of the sound waveforms. These considerations demonstrate the necessity of a formalism that allows for the incorporation of any type of information.
References for the Methodology
Examples of the Application of BSS and BSE
This section contains a brief but growing example of the Bayesian Source Estimation algorithm.
The algorithm utilizes information about the signal propagation through the medium in addition to some basic
information about the source locations to aid in the estimation of the original source signals.
In this case, we assume that we know that the signals propagate according to an inverse square attenuation law.
Information about the propagation of the sound signals equates to information about the possible form of the mixing matrix
or values of its elements. Specifically, we expect that the elements of the mixing matrix will have the particular form:
| (10) | ![]() |
is the amplitude of source j and
is the
distance between detector i and source j.
In a recent paper [1], I derive the following prior probability for the mixing matrix elements
Aij based on assumptions regarding the minimum and maximum possible amplitudes of the sources,
mean expectations of the source positions and the corresponding expected square deviation from the mean:
| (11) | ![]() |
is the Gamma function,
is the incomplete Gamma function,
b1j denotes the minimum expected amplitude of source j,
b2j denotes the maximum expected amplitude of source j and
the mean distance between the source and detector is given by
with
an expected square deviation from the mean given by
.
A final approximation consisting of an assumption of independence of the matrix elements simplifies the
prior probability of the entire mixing matrix, A to:
| (12) | ![]() |
By substituting Eqn. (12) above into the general Eqn. (3) for the posterior probability. One can derive a stochastic gradient algorithm analagous to the Bell-Sejnowski ICA algorithm, but utilizing our additional information about the signal propagation.
Demonstration
The true source locations are illustrated in the figure below by the RED numbered points.
The detector locations are illustrated by the BLUE numbered points.
We assume some knowledge about the source locations. The believed positions of the sources are described by the centers of the
spheres in the figure, where the sphere radius describes the expected standard deviation of the believed locations from the true locations.
Note that in this example, we have rather precise knowledge of the location of source 3.
Geometry used in the Inverse Square Mixture Simulation
![]() |
Click on the waveform to play the sound file ...![]() |
The figure above shows the original waveforms emitted by sources 1 through 5.
The mixed waveforms represent the waveforms recorded by the detectors 1 through 5.
The results of applying the Bell-Sejnowski ICA algorithm with a super-Gaussian (sigmoidal) source amplitude density and the Bayesian Source Estimation algorithm with the same source amplitude density are shown in the next figure. ICA using the super-Gaussian density cannot separate the sources 1 and 2 because, as demonstrated above for source 1, "Hail", the super-Gaussian density does not accurately describe the source's true amplitude density. This lack of accurate information results in a diagonal solution in which the two separated signals consist of the sum and difference of re-scaled versions of the original source waveforms.
However, the additional accurate information allowed BSE algorithm to separate the two source signals even though the source amplitude density prior presumed incorrect information about the source signals. Overall the separated signals contained slightly more noise than ICA, but the diagonal solution was averted. We stress that additional prior information about the mixing is not a substitute for an inaccurate representation of prior information regarding source amplitude densities.
Click on the waveform to play the sound file ...
Examples utilizing both super-Gaussian and sub-Gaussian source density models will be included soon.
References for the Example
Our Publications on Bayesian Source Estimation
Links to other Researchers involved in BSS, ESL and Bayesian Statistics
BSS and ICA Links
(This is a short list of meta-links)
Bayesian Links
(Again a short list)
ESL Links
(Again a short list)