You can find more about the technique and how to use this module in the provided documentation in doc/ (using the python package)
Adapted from the Matlab toolbox available at: http://bass-db.gforge.inria.fr/fasst/
Jean-Louis Durrieu, EPFL-STI-IEL-LTS5
jean DASH louis AT durrieu DOT ch
2012-2013 http://www.durrieu.ch
This software is distributed under the terms of the GNU Public (http://www.gnu.org/licenses/gpl.txt)
FASST: Flexible Audio Source Separation Toolbox
This is the superclass that implements the core functions for the framework for audio source separation as introduced in [Ozerov2012]
A. Ozerov, E. Vincent and F. Bimbot “A General Flexible Framework for the Handling of Prior Information in Audio Source Separation,” IEEE Transactions on Audio, Speech and Signal Processing 20(4), pp. 1118-1133 (2012) Available: http://hal.inria.fr/hal-00626962/
In order to use it, one should sub-class this class, and in particular define several elements that are assumed by the core functions for estimation and separation in this class, see below for a list.
Parameters: |
|
---|
Some important attributes of this class are:
Variables: |
|
---|
For examples, see also:
GEM iteration: one iteration of the Generalized Expectation- Maximization algorithm to update the various parameters whose FASST.spec_comp[spec_ind]['frdm_prior'] is set to 'free'.
Returns: | loglik (double): the log-likelihood of the data, given the updated parameters |
---|
Compute the sum of the spectral powers corresponding to the spatial components as provided in the list spat_comp_ind
NB: because this does not take into account the mixing process, the resulting power does not, in general, correspond to the the observed signal’s parameterized spectral power.
Matlab FASST Toolbox help:
% V = comp_spat_comp_power(mix_str, spat_comp_ind,
% spec_comp_ind, factor_ind);
%
% compute spatial component power
%
%
% input
% -----
%
% mix_str : mixture structure
% spat_comp_ind : spatial component index
% spec_comp_ind : (opt) factor index (def = [], use all components)
% factor_ind : (opt) factor index (def = [], use all factors)
%
%
% output
% ------
%
% V : (F x N) spatial component power
Parameters: |
|
---|
Note: thanks to object-oriented programming, no need to provide the structure containing all the parameters, the instance has direct access to them.
Note2: this may not completely work because the factor_ind should actually also depend on the index of the spectral component. TODO?
Computes the signal representation, according to the provided signal representation flag, in FASST.sig_repr_params['transf']
Matlab FASST Toolbox help:
% WG = comp_WG_spat_comps(mix_str);
%
% compute Wiener gains for spatial components
%
%
% input
% -----
%
% mix_str : input mix structure
%
%
% output
% ------
%
% WG : Wiener gains [M x M x F x N x K_spat]
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Flexible Audio Source Separation Toolbox (FASST), Version 1.0
%
% Copyright 2011 Alexey Ozerov, Emmanuel Vincent and Frederic Bimbot
% (alexey.ozerov -at- inria.fr, emmanuel.vincent -at- inria.fr,
% frederic.bimbot -at- irisa.fr)
%
% This software is distributed under the terms of the GNU Public
% License version 3 (http://www.gnu.org/licenses/gpl.txt)
%
% If you use this code please cite this research report
%
% A. Ozerov, E. Vincent and F. Bimbot
% "A General Flexible Framework for the Handling of Prior
% Information in Audio Source Separation,"
% IEEE Transactions on Audio, Speech and Signal Processing 20(4),
% pp. 1118-1133 (2012).
% Available: http://hal.inria.fr/hal-00626962/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
only for nb channels = 2
sigma_comps_diag ncomp x nchan x nfreq x nframes
only for stereo case self.audioObject.channels==2
Computes the sufficient statistics, used to update the parameters.
Inputs:
Parameters: |
|
---|
Outputs:
Returns: |
|
---|
Estimates the a posteriori model for the provided audio signal. In particular, this runs self.iter_num times the Generalized Expectation-Maximisation algorithm FASST.GEM_iteration(), to update the various parameters of the model, so as to maximize the likelihood of the data given these parameters.
From these parameters, the posterior expectation of the “hidden” or latent variables (here the spatial and spectral components) can be computed, leading to the estimation of the separated underlying sources.
Consider using FASST.separate_spat_comps() or FASST.separate_spatial_filter_comp() to obtain the separated time series, once the parameters have been estimated.
Returns: | logliks: The log-likelihoods as computed after each GEM iteration. |
---|
Using the cross-spectrum in self.Cx[1] to estimate the time difference of arrival detection function (the Generalized Cross-Correlation GCC), with the phase transform (GCC-PHAT) weighing function for the cross-spectrum.
setting the spatial parameters
Parameters: | initMethod (str) – initialization method. Can be either of: ‘demix’, ‘rand’. If ‘demix’, then the spatial parameters are initialized by the anechoic steering vector corresponding to the first directions estimated by the DEMIX algorithm [Arberet2010], using the algorithm implemented in pyfasst.demixTF. |
---|
Computes an NMF on the one-channel mix (averaging diagonal of self.Cx, which are the power spectra of the corresponding channel)
then, for all spec_comp in self.spec_comps, we set:
spec_comp['FB'] = W
spec_comp['TW'] = H
initialize the spectral components with an NMF decomposition, with individual decomposition of the monophonic signal TF representation.
TODO make keepFBind and keepTWind, in order to provide finer control on which indices are updated. Also requires a modified NMF decomposition function.
Initialize all the components with the same amplitude and spectral matrices W and H.
mvdr_2d(self, theta, # in radians distanceInterMic=.3, # in meters )
MVDR minimum variance distortion-less response spatial filter, for a given angle theta and given distance between the mics.
self.Cx is supposed to provide the necessary covariance matrix, for the “Capon” filter.
Computes the various quantities necessary for the estimation of the main parameters:
Outputs
Returns: |
|
---|
Separate the sources as defined by the spectral components provided in spec_comp_ind.
This function differs from separate_spat_comps in the way that it does not assume the sources are defined by their spatial positions.
Parameters: |
|
---|
Note: Trying to bring into one method ozerov’s separate_spec_comps and separate_spat_comps
This separates the sources for each spatial component.
Parameters: |
|
---|
Separates the sources using only the estimated spatial filter (i.e. the mixing parameters in self.spat_comps[j][‘params’])
In particular, we consider here the corresponding MVDR filter, as exposed in [Maazaoui2011].
per channel, the filter steering vector, source p:
with
It corresponds also to the given model in FASST, assuming that all the spectral powers are equal across all sources. Here, by computing the Wiener Gain WG to get the images, we actually have
and the denominator therefore is the trace of the “numerator”.
[Maazaoui2011] | Maazaoui, M.; Grenier, Y. and Abed-Meraim, K. Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs, in proc. of INTERSPEECH, 2011. |
A helper function to set a FASST.spec_comp[spec_ind]['factor'][fact_ind][partLabel] to the given value.
TODO 20130522 finish this function to make it general purpose...
Update the mixing parameters, according to the current estimated spectral component parameters.
Parameters: |
|
---|
The input parameters should be computed by compute_suff_stat() and retrieve_subsrc_params(), done automatically in GEM_iteration().
Conveniently adds methods to transform a MultiChanNMFConv object such that the time structure is configured as a hidden Markov model (HMM)
Takes the multichannel NMF instantaneous class, and makes it convolutive!
Simply adds a method makeItConvolutive() in order to transform instantaneous mixing parameters into convolutive ones.
Example:
>>> import pyfasst.audioModel as am
>>> filename = 'data/tamy.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFConv(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=50)
>>> # to be more flexible, the user _has to_ make the parameters
>>> # convolutive by hand. This way, she can also start to estimate
>>> # parameters in an instantaneous setting, as an initialization,
>>> # and only after "upgrade" to a convolutive setting:
>>> model.makeItConvolutive()
>>> # estimate the parameters
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')
The following example shows the results for a more synthetic example (synthetis anechoic mixture of the voice and the guitar, with a delay of 0 for the voice and 10 samples from the left to the right channel for the guitar):
>>> import pyfasst.audioModel as am
>>> filename = 'data/dev1__tamy-que_pena_tanto_faz___thetas-0.79,0.79_delays-10.00,0.00.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFConv(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=200)
>>> # to be more flexible, the user _has to_ make the parameters
>>> # convolutive by hand. This way, she can also start to estimate
>>> # parameters in an instantaneous setting, as an initialization,
>>> # and only after "upgrade" to a convolutive setting:
>>> model.makeItConvolutive()
>>> # we can initialize these parameters with the DEMIX algorithm:
>>> model.initializeConvParams(initMethod='demix')
>>> # and estimate the parameters:
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')
This class implements the Multi-channel Non-Negative Matrix Factorisation (NMF)
Inputs:
Parameters: |
|
---|
Example:
>>> import pyfasst.audioModel as am
>>> filename = 'data/tamy.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFInst_FASST(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=50)
>>> # estimate the parameters
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')
sets the spectral component’s frequency basis.
Parameters: |
|
---|
multi channel source/filter model nbcomps components, nbcomps-1 SF models, 1 residual component
Initialize the spectral components with the instrument labels as well as with the components stored in the provided dictionary in instru2modelfile
NB: needs the gmm-gsmm module to be installed and in the pythonpath
Multiple Channel Source Separation, with Lead/Accompaniment initial separation
This instantiation of multiChanSourceF0Filter provides convenient methods (multichanLead.runDecomp() for instance) to separate the lead instrument from the accompaniment, as in [Durrieu2011], and then use the obtained parameters/signals in order to initialize the more general source separation algorithm.
Tentative plan for estimation:
- estimate the Lead/Accompaniment using SIMM
- estimate the spatial parameters for each of the separated signals
- plug the SIMM params and the spatial params into pyFASST, and
- re-estimate
- write the estimated signals and enjoy success!
NB: as for now, the sole Lead/Accompaniment separation achieves better separation than the combination of all the possibilities, probably because of a more flexible framework for the former than for the latter. Some results have been published at the SiSEC 2013 evaluation campaign.
This method runs the SIMM estimation on the provided audio file.
The lead source is assumed to be self.spec_comps[0]
separates the audio signal into lead+accompaniment, including more noisy components for the lead than self.estimSIMM