# audioModel¶

## Description¶

FASST (Flexible Audio Source Separation Toolbox) class
subclass it to obtain your own flavoured source separation model!

You can find more about the technique and how to use this module in the provided documentation in doc/ (using the python package)

Adapted from the Matlab toolbox available at: http://bass-db.gforge.inria.fr/fasst/

## Contact¶

Jean-Louis Durrieu, EPFL-STI-IEL-LTS5

jean DASH louis AT durrieu DOT ch

2012-2013 http://www.durrieu.ch

## Reference¶

class pyfasst.audioModel.FASST(audio, transf='stft', wlen=2048, hopsize=512, iter_num=50, sim_ann_opt='ann', ann_PSD_lim=[None, None], verbose=0, nmfUpdateCoeff=1.0, tffmin=25, tffmax=18000, tfWinFunc=None, tfbpo=48, lambdaCorr=0.0)[source]

FASST: Flexible Audio Source Separation Toolbox

This is the superclass that implements the core functions for the framework for audio source separation as introduced in [Ozerov2012]

A. Ozerov, E. Vincent and F. Bimbot “A General Flexible Framework for the Handling of Prior Information in Audio Source Separation,” IEEE Transactions on Audio, Speech and Signal Processing 20(4), pp. 1118-1133 (2012) Available: http://hal.inria.fr/hal-00626962/

In order to use it, one should sub-class this class, and in particular define several elements that are assumed by the core functions for estimation and separation in this class, see below for a list.

Parameters: audio – the audio filename transf – a string describing the desired Time-Frequency (TF) representation wlen – length of the analysis windows, mostly for STFT representation hopsize (integer) – the size of samples between two analysis frames iter_num (integer) – number of GEM iterations for parameter esitmation sim_ann_opt (str) – type of annealing strategy (‘ann’. ‘no_ann’) ann_PSD_lim (list) – list of 2 elements, ann_PSD_lim[0] is the amount of added noise to the PSD at beginning, and ann_PSD_lim[1] is this amount at the end of the estimation. verbose (integer) – level of verbose: 0 for almost nothing, greater than 1 for debug messages nmfUpdateCoeff (double) – the exponent for the Non-Negative Matrix Factorization-type updates within the GEM iteration tffmin – minimum frequency for the TF representation tffmax – maximal frequency tfWinFunc – window function (please provide a python function here) tfbpo (integer) – number of bins per octave, for Constant-Q-based representations lambdaCorr (double) – penalization term to control the correlation between the sources.

Some important attributes of this class are:

Variables: spat_comps (dict) – a dictionary containing the spatial component parameters and variables. In particular, for a given component numbered spat_ind, spec_comps (dict) – the spectral component parameters dictionary. audioObject (pyfasst.audioObject.AudioObject) – the audio object that is to be processed. See pyfasst.audioObject.AudioObject for details. sig_repr_params – Parameters for the computation of the signal TF representation. The keys in this dictionary are: ‘transf’ - the type of TF representation‘wlen’ - the window length in samples ‘fsize’ - the size of the Fourier transform (in samples) ‘hopsize’ - the hop-size in samples between two consecutive frames ‘tffmin’, ‘tffmax’, ‘tfbpo’, ‘tfWinFunc’, ‘hopfactor’ - variables related to specific TF representations. tft (pyfasst.tftransforms.tft.TFTransform) – The object that implements the TF transform. See pyfasst.tftransforms.tft.TFTransform Cx (numpy.ndarray) – The computed transform, after FASST.omp_transf_Cx() has been called. For memory management, as of 20130820, FASST.Cx, for a given frame and given frequency, is supposed to be Hermitian: only the upper diagonal is therefore kept.

For examples, see also:

GEM_iteration()[source]

GEM iteration: one iteration of the Generalized Expectation- Maximization algorithm to update the various parameters whose FASST.spec_comp[spec_ind]['frdm_prior'] is set to 'free'.

Returns: loglik (double): the log-likelihood of the data, given the updated parameters
comp_spat_cmps_powers(spat_comp_ind, spec_comp_ind=[], factor_ind=[])[source]

Compute the sum of the spectral powers corresponding to the spatial components as provided in the list spat_comp_ind

NB: because this does not take into account the mixing process, the resulting power does not, in general, correspond to the the observed signal’s parameterized spectral power.

comp_spat_comp_power(spat_comp_ind, spec_comp_ind=[], factor_ind=[])[source]

Matlab FASST Toolbox help:

% V = comp_spat_comp_power(mix_str, spat_comp_ind,
%                          spec_comp_ind, factor_ind);
%
% compute spatial component power
%
%
% input
% -----
%
% mix_str           : mixture structure
% spat_comp_ind     : spatial component index
% spec_comp_ind     : (opt) factor index (def = [], use all components)
% factor_ind         : (opt) factor index (def = [], use all factors)
%
%
% output
% ------
%
% V                 : (F x N) spatial component power
Parameters: spat_comp_ind (integer) – index of the spatial component spec_comp_ind (list) – list of indices for the spectral components whose spatial component corresponds to the provided spat_comp_ind factor_ind (list) – list of indices of factors to be included.

Note: thanks to object-oriented programming, no need to provide the structure containing all the parameters, the instance has direct access to them.

Note2: this may not completely work because the factor_ind should actually also depend on the index of the spectral component. TODO?

comp_transf_Cx()[source]

Computes the signal representation, according to the provided signal representation flag, in FASST.sig_repr_params['transf']

compute_Wiener_gain_2d(sigma_comp_diag, sigma_comp_off, inv_sigma_mix_diag, inv_sigma_mix_off, timeInvariant=False)[source]

Matlab FASST Toolbox help:

% WG = comp_WG_spat_comps(mix_str);
%
% compute Wiener gains for spatial components
%
%
% input
% -----
%
% mix_str           : input mix structure
%
%
% output
% ------
%
% WG                : Wiener gains [M x M x F x N x K_spat]
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Flexible Audio Source Separation Toolbox (FASST), Version 1.0
%
% Copyright 2011 Alexey Ozerov, Emmanuel Vincent and Frederic Bimbot
% (alexey.ozerov -at- inria.fr, emmanuel.vincent -at- inria.fr,
%  frederic.bimbot -at- irisa.fr)
%
% This software is distributed under the terms of the GNU Public
% License version 3 (http://www.gnu.org/licenses/gpl.txt)
%
% If you use this code please cite this research report
%
% A. Ozerov, E. Vincent and F. Bimbot
% "A General Flexible Framework for the Handling of Prior
% Information in Audio Source Separation,"
% IEEE Transactions on Audio, Speech and Signal Processing 20(4),
% pp. 1118-1133 (2012).
% Available: http://hal.inria.fr/hal-00626962/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
compute_inv_sigma_mix_2d(sigma_comps_diag, sigma_comps_off)[source]

only for nb channels = 2

sigma_comps_diag ncomp x nchan x nfreq x nframes

compute_sigma_comp_2d(spat_ind, spec_comp_ind)[source]

only for stereo case self.audioObject.channels==2

compute_suff_stat(spat_comp_powers, mix_matrix)[source]

Computes the sufficient statistics, used to update the parameters.

Inputs:

Parameters: spat_comp_powers (numpy.ndarray) – (total_spat_rankxnbFreqsSigReprxnbFramesSigRepr) ndarray. the estimated power spectra for the spatial components, as computed by FASST.retrieve_subsrc_params(). mix_matrix (numpy.ndarray) – (total_spat_rank x nchannels x nbFreqsSigRepr) ndarray. the mixing parameters, as a rank x n_channels x n_freqs ndarray. Computed from FASST.retrieve_subsrc_params()

Outputs:

Returns: hat_Rxx hat_Rxs hat_Rss hat_Ws loglik
estim_param_a_post_model()[source]

Estimates the a posteriori model for the provided audio signal. In particular, this runs self.iter_num times the Generalized Expectation-Maximisation algorithm FASST.GEM_iteration(), to update the various parameters of the model, so as to maximize the likelihood of the data given these parameters.

From these parameters, the posterior expectation of the “hidden” or latent variables (here the spatial and spectral components) can be computed, leading to the estimation of the separated underlying sources.

Consider using FASST.separate_spat_comps() or FASST.separate_spatial_filter_comp() to obtain the separated time series, once the parameters have been estimated.

Returns: logliks: The log-likelihoods as computed after each GEM iteration.
gcc_phat_tdoa_2d()[source]

Using the cross-spectrum in self.Cx[1] to estimate the time difference of arrival detection function (the Generalized Cross-Correlation GCC), with the phase transform (GCC-PHAT) weighing function for the cross-spectrum.

initializeConvParams(initMethod='demix')[source]

setting the spatial parameters

Parameters: initMethod (str) – initialization method. Can be either of: ‘demix’, ‘rand’. If ‘demix’, then the spatial parameters are initialized by the anechoic steering vector corresponding to the first directions estimated by the DEMIX algorithm [Arberet2010], using the algorithm implemented in pyfasst.demixTF.
initialize_all_spec_comps_with_NMF(sameInitAll=False, **kwargs)[source]

Computes an NMF on the one-channel mix (averaging diagonal of self.Cx, which are the power spectra of the corresponding channel)

$C_x \approx W H$

then, for all spec_comp in self.spec_comps, we set:

spec_comp['FB'] = W
spec_comp['TW'] = H

initialize_all_spec_comps_with_NMF_indiv(niter=10, updateFreqBasis=True, updateTimeWeight=True, **kwargs)[source]

initialize the spectral components with an NMF decomposition, with individual decomposition of the monophonic signal TF representation.

TODO make keepFBind and keepTWind, in order to provide finer control on which indices are updated. Also requires a modified NMF decomposition function.

initialize_all_spec_comps_with_NMF_same(niter=10, **kwargs)[source]

Initialize all the components with the same amplitude and spectral matrices W and H.

mvdr_2d(theta, distanceInterMic=0.3)[source]

mvdr_2d(self, theta, # in radians distanceInterMic=.3, # in meters )

MVDR minimum variance distortion-less response spatial filter, for a given angle theta and given distance between the mics.

self.Cx is supposed to provide the necessary covariance matrix, for the “Capon” filter.

renormalize_parameters()[source]

Re-normalize the components

retrieve_subsrc_params()[source]

Computes the various quantities necessary for the estimation of the main parameters:

Outputs

Returns: spat_comp_powers - (total_spat_rank x nbFreqsSigRepr x nbFramesSigRepr) ndarray the spatial component power spectra. Note that total_spat_rank is the sum of all the spatial ranks for all the sources. mix_matrix - (total_spat_rank x nchannels x nbFreqsSigRepr) ndarray the mixing matrices for each source rank_part_ind - dictionary: each key is one source, and the values are the indices in spat_comp_powers and mix_matrix that correspond to that source. If the spatial rank of source j is 2, then its spectra will appear twice in spat_comp_powers, with mixing parameters (potentially different one from the other) appearing in two sub-matrices of mix_matrix.
separate_comps(dir_results=None, spec_comp_ind=None, suffix=None)[source]

Separate the sources as defined by the spectral components provided in spec_comp_ind.

This function differs from separate_spat_comps in the way that it does not assume the sources are defined by their spatial positions.

Parameters: dir_results – provide the (existing) folder where to write the results spec_comp_ind (dict) – a dictionary telling which spectral component to include in which separated source. If None, the default is to assume each spectral component is one source. Note that this is different to the behaviour of separate_spat_comps() which assumes that each spatial component corresponds to one source. suffix (dict) – a dictionary containing the labels for each source. If None, then no suffix is appended to the file names and the files are simply numbered XXX_nbComps.

Note: Trying to bring into one method ozerov’s separate_spec_comps and separate_spat_comps

separate_spat_comps(dir_results=None, suffix=None)[source]

This separates the sources for each spatial component.

Parameters: dir_results – provide the (existing) folder where to write the results suffix (dict) – a dictionary containing the labels for each source. If None, then no suffix is appended to the file names and the files are simply numbered XXX_nbComps.
separate_spatial_filter_comp(dir_results=None, suffix=None)[source]

Separates the sources using only the estimated spatial filter (i.e. the mixing parameters in self.spat_comps[j][‘params’])

In particular, we consider here the corresponding MVDR filter, as exposed in [Maazaoui2011].

per channel, the filter steering vector, source p:

$b(f,p) = \frac{R_{aa,f}^{-1} a(f,p)}{a^{H}(f,p) R_{aa,f}^{-1} a(f,p)}$

with

$R_{aa,f} = \sum_q a(f,q) a^{H}(f,q)$

It corresponds also to the given model in FASST, assuming that all the spectral powers are equal across all sources. Here, by computing the Wiener Gain WG to get the images, we actually have

$b(f,p) a(f,p)^H$

and the denominator therefore is the trace of the “numerator”.

 [Maazaoui2011] Maazaoui, M.; Grenier, Y. and Abed-Meraim, K. Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs, in proc. of INTERSPEECH, 2011.
setComponentParameter(newValue, spec_ind, fact_ind=0, partLabel='FB', prior='free', keepDimensions=True)[source]

A helper function to set a FASST.spec_comp[spec_ind]['factor'][fact_ind][partLabel] to the given value.

TODO 20130522 finish this function to make it general purpose...

update_mix_matrix(hat_Rxs, hat_Rss, mix_matrix, rank_part_ind)[source]

Update the mixing parameters, according to the current estimated spectral component parameters.

Parameters: hat_Rxs – (nbFreqsSigRepr x 2 x nbspatcomp) ndarray. Estimated intercorrelation between the observation and the sources, for each frequency bin of the TF representation. hat_Rss – (nbFreqsSigRepr x nbspatcomp x nbspatcomp) ndarray. Estimated auto-correlation matrix for each frequency bin, mix_matrix – (total_spat_rank x nchannels x nbFreqsSigRepr) ndarray. the mixing parameters, as a rank x n_channels x n_freqs ndarray. rank_part_ind – a dictionary giving, for each spectral component, which spatial component should be used.

The input parameters should be computed by compute_suff_stat() and retrieve_subsrc_params(), done automatically in GEM_iteration().

update_spectral_components(hat_W)[source]

Update the spectral components, with hat_W as the expected value of power (and computed from )

class pyfasst.audioModel.MultiChanHMM(audio, nbComps=3, nbNMFComps=4, spatial_rank=2, **kwargs)[source]

Conveniently adds methods to transform a MultiChanNMFConv object such that the time structure is configured as a hidden Markov model (HMM)

makeItHMM()[source]

Turns the required parameters into HMM time constraints

makeItSHMM()[source]

Turns the required parameters into SHMM time constraints

class pyfasst.audioModel.MultiChanNMFConv(audio, nbComps=3, nbNMFComps=4, spatial_rank=2, **kwargs)[source]

Takes the multichannel NMF instantaneous class, and makes it convolutive!

Simply adds a method makeItConvolutive() in order to transform instantaneous mixing parameters into convolutive ones.

Example:

>>> import pyfasst.audioModel as am
>>> filename = 'data/tamy.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFConv(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=50)
>>> # to be more flexible, the user _has to_ make the parameters
>>> # convolutive by hand. This way, she can also start to estimate
>>> # parameters in an instantaneous setting, as an initialization,
>>> # and only after "upgrade" to a convolutive setting:
>>> model.makeItConvolutive()
>>> # estimate the parameters
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')


The following example shows the results for a more synthetic example (synthetis anechoic mixture of the voice and the guitar, with a delay of 0 for the voice and 10 samples from the left to the right channel for the guitar):

>>> import pyfasst.audioModel as am
>>> filename = 'data/dev1__tamy-que_pena_tanto_faz___thetas-0.79,0.79_delays-10.00,0.00.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFConv(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=200)
>>> # to be more flexible, the user _has to_ make the parameters
>>> # convolutive by hand. This way, she can also start to estimate
>>> # parameters in an instantaneous setting, as an initialization,
>>> # and only after "upgrade" to a convolutive setting:
>>> model.makeItConvolutive()
>>> # we can initialize these parameters with the DEMIX algorithm:
>>> model.initializeConvParams(initMethod='demix')
>>> # and estimate the parameters:
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')

makeItConvolutive()[source]

If the spatial parameters are instantaneous, then it will be turned into a convolutive version of it. In this case, it duplicates the instantaneous parameter on all the frequencies and spatial rank.

class pyfasst.audioModel.MultiChanNMFInst_FASST(audio, nbComps=3, nbNMFComps=4, spatial_rank=2, **kwargs)[source]

This class implements the Multi-channel Non-Negative Matrix Factorisation (NMF)

Inputs:

Parameters: audio – as in FASST, audio is the filename of the file to be processed or directly a pyfasst.audioObject.AudioObject nbComps (integer) – the number of desired sources/components nbNMFComps (integer) – the number of NMF components for each of the components. TODO: allow to pass a list so that the user can control the number of elements source by source, individually spatial_rank (integer or list) – the spatial rank of all the components. If it’s a nbComps-long list, then spatial_rank[n] will be the spatial rank for the n-th source.

Example:

>>> import pyfasst.audioModel as am
>>> filename = 'data/tamy.wav'
>>> # initialize the model
>>> model = am.MultiChanNMFInst_FASST(
audio=filename,
nbComps=2, nbNMFComps=32, spatial_rank=1,
verbose=1, iter_num=50)
>>> # estimate the parameters
>>> log_lik = model.estim_param_a_post_model()
>>> # separate the sources using these parameters
>>> model.separate_spat_comps(dir_results='data/')

setSpecCompFB(compNb, FB, FB_frdm_prior='fixed')[source]

sets the spectral component’s frequency basis.

Parameters: compNb (integer) – the component to be initialized FB (numpy.ndarray) – the initial array to put in spec_comp[compNb]['factor'][0]['FB'] FB_frdm_prior (str) – either ‘fixed’ or ‘free’.
class pyfasst.audioModel.multiChanSourceF0Filter(audio, nbComps=3, nbNMFResComps=1, nbFilterComps=20, nbFilterWeigs=[4], minF0=39, maxF0=2000, minF0search=80, maxF0search=800, stepnoteF0=16, chirpPerF0=1, spatial_rank=1, sparsity=None, **kwargs)[source]

multi channel source/filter model nbcomps components, nbcomps-1 SF models, 1 residual component

estim_param_a_post_model()[source]

Estimation of model parameters, using the sparsity constraints.

initSpecCompsWithLabelAndFiles(instrus=[], instru2modelfile={}, freqBasisAdaptive='fixed')[source]

Initialize the spectral components with the instrument labels as well as with the components stored in the provided dictionary in instru2modelfile

instrus is a list with labels:
‘SourceFilter’:
keep the intialized source filter model
‘Free_<nb_comp>’:
initialize the model with an adaptable spectral component using nb_comp elements in the NMF frequency basis
<key_in_instru2modelfile>:
initialize with the :py:class:GSMM available and stored in the archive npz with filename instru2modelfile[key_in_instru2modelfile]

NB: needs the gmm-gsmm module to be installed and in the pythonpath

initializeFreeMats(niter=10)[source]

initialize free matrices, with NMF decomposition

makeItConvolutive()[source]

Takes the spatial parameters and sets them to a convolutive mixture, in case the parameter has not yet been changed to ‘conv’ mode.

reweigh_sparsity_constraint(sigma)[source]
setSpecCompFB(compNb, FB, FB_frdm_prior='fixed')[source]

SetSpecCompFB

sets the spectral component’s frequency basis.

class pyfasst.audioModel.multichanLead(*args, **kwargs)[source]

Multiple Channel Source Separation, with Lead/Accompaniment initial separation

This instantiation of multiChanSourceF0Filter provides convenient methods (multichanLead.runDecomp() for instance) to separate the lead instrument from the accompaniment, as in [Durrieu2011], and then use the obtained parameters/signals in order to initialize the more general source separation algorithm.

Tentative plan for estimation:

1. estimate the Lead/Accompaniment using SIMM
2. estimate the spatial parameters for each of the separated signals
3. plug the SIMM params and the spatial params into pyFASST, and
4. re-estimate
5. write the estimated signals and enjoy success!

NB: as for now, the sole Lead/Accompaniment separation achieves better separation than the combination of all the possibilities, probably because of a more flexible framework for the former than for the latter. Some results have been published at the SiSEC 2013 evaluation campaign.

demixOnGivenFile(filename, nsources=1)[source]

running the DEMIX algorithm from :demix.DEMIX:

demixOnSepSIMM(unvoiced=True)[source]

run DEMIX on the separated signals resulting from SIMM model

estimSIMM(maxFrames=4000, dir_results='tmp/', simmIterNum=30)[source]

This method runs the SIMM estimation on the provided audio file.

The lead source is assumed to be self.spec_comps[0]

estimSUIMM(maxFrames=4000, **kwargs)[source]

separates the audio signal into lead+accompaniment, including more noisy components for the lead than self.estimSIMM

initConvDemixOnSepSrc(suffix)[source]

initialize the convolutive parameters with DEMIX, running on each of the separated sources

runDecomp(instrus=[], instru2modelfile={}, dir_results='tmp/', maxFrames=4000, niter_nmf=20, niter_simm=30)[source]

Running the scheme that should make me famous.

Reference

audioObject