DEMIX Python/NumPy implementation


DEMIX is an algorithm that counts the number of sources, based on their spatial cues, and returns the estimated parameters, which are related to the relative amplitudes between the channels, as well as the relative time shifts. The full description is given in [Arberet2010]:

Arberet, S.; Gribonval, R. & Bimbot, F. A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture IEEE Transactions on Signal Processing, 2010, 58, 121 - 133

This implementation is based on the MATLAB Toolbox provided by the authors of the above article.

Additionally, this implementation further allows time-frequency representations other than the short-term Fourier transform (STFT).


class pyfasst.demixTF.DEMIX(audio, nsources=2, wlen=2048, hopsize=1024, neighbors=20, verbose=0, maxclusters=100, tfrepresentation='stft', tffmin=25, tffmax=18000, tfbpo=48, winFunc=<function sqrt_blackmanharris at 0x102b1e9b0>)[source]

DEMIX algorithm, for 2 channels.


compute for each cluster in self.clusters a threshold depending on the other clusters, in order to keep only those points in cluster that are close to the actual centroid, but not close to centroids of other clusters.

The returned clusters are the original clusters thresholded.


Computes the time-frequency clusters, along with their centroids, which contain the parameters of the mixing process - namely theta, which parameterizes the relative amplitude, and delta, which is homogeneous to a delay in samples between the two channels.


Compute the PCA features


Computes the signal representation, stft

compute_temporal(ind_cluster_pts, zoom)[source]

This computes the inverse Fourier transform of the estimated Steering Vectors, weighed by their inverse variance

The result is a detection function that provides peaks at the most likely delta - the delay in samples.


reconfigures the cluster indices in self.clusters such that all the Time-Freq points that appear in more than one cluster are dismissed from all computations

estimDAOBound(confidence, confidenceVal=None)[source]

computes the max distance between centroid and points


returns a TF mask which is True if their corresponding value of delta is close enough to the delta from the centroid.


returns the TF points whose theta is close to that of the centroid, among the points considered in index_pts_to_classify

TODO: make the function for different scales, as in matlab toolbox


distance between the centroids

identify_deltaT(ind_cluster_pts, centroid, threshold=0.8)[source]

returns the delay maxDelta in samples that corresponds to the largest peak of the cluster defined by the provided cluster index


reestimate cluster centroids

considering all the cluster masks, reestimate the centroids, discarding the clusters for which there was no well-defined delta.


Refining the clusters in order to verify that they are possible. Additionally, if self.nsources is defined, this method only keeps the required number. Otherwise, it is decided by choosing the most likely centroids.


DJL: this did never happen in DEMIX Matlab version, have to contact authors for explanations...


using optimal spatial filters to obtain separated signals

this is a beamformer implementation. MVDR or assuming the sources are normal, independent and with same variance (not sure whether this does not mean that we can’t separate them...)


Maazaoui, M.; Grenier, Y. & Abed-Meraim, K.
``Blind Source Separation for Robot Audition using
Fixed Beamforming with HRTFs'', 
in proc. of INTERSPEECH, 2011.

per channel, the filter steering vector, source p:

\[b(f,p) = \frac{R_{aa,f}^{-1} a(f,p)}{a^{H}(f,p) R_{aa,f}^{-1} a(f,p)}\]

Generates the steering vectors a(p,f,c) for source p, (reduced) freq f and channel c.

\[a[p,f,0] = \cos(\theta_p)\]\[a[p,f,1] = \sin(\theta_p) \exp(- 2 j \pi f \delta_p)\]
pyfasst.demixTF.confidenceFromVar(variance, neighbors)[source]

Computes the confidence, in dB, for a given number of neighbours and a variance.

pyfasst.demixTF.get_indices_peak(sequence, ind_max, threshold=0.8)[source]

returns the indices of the peak around ind_max, with values down to threshold * sequence[ind_max]

Table Of Contents

Previous topic


Next topic


This Page