SeparateLeadStereo¶

SeparateLeadStereo, with Time-Frequency choice

Provides a class (SeparateLeadProcess) within which several processings can be run on an audio file, in order to extract the lead instrument/main voice from a (stereophonic) audio mixture.

class pyfasst.SeparateLeadStereo.SeparateLeadStereoTF.SeparateLeadProcess(inputAudioFilename, windowSize=0.0464, hopsize=None, NFT=None, nbIter=10, numCompAccomp=40, minF0=39, maxF0=2000, stepNotes=16, chirpPerF0=1, K_numFilters=4, P_numAtomFilters=30, imageCanvas=None, wavCanvas=None, progressBar=None, verbose=True, outputDirSuffix='/', minF0search=None, maxF0search=None, tfrepresentation='stft', cqtfmax=4000, cqtfmin=50, cqtbins=48, cqtWinFunc=<function sqrt_blackmanharris at 0x10260ded8>, cqtAtomHopFactor=0.25, initHF00='random', freeMemory=True)[source]¶

SeparateLeadProcess

class which implements the source separation algorithm, separating the ‘lead’ voice from the ‘accompaniment’. It can deal automatically with the task (the ‘lead’ voice becomes the most energetic one), or can be manually told what the ‘lead’ is (through the melody line).

Attributes

dataType : dtype

this is the input data type (usually the same as the audio encoding)

displayEvolution : boolean

display the evolution of the arrays (notably HF0)

F, N : integer, integer

F the number of frequency bins in the time-frequency representation: (this is half the Fourier bins, + 1)

N the number of analysis input frames

files :

dictionary containing the filenames of the output files for the separated signals, with the following keys (after initialization)

‘inputAudioFilename’ : input filename

‘mus_output_file’ : output filename for the estimated ‘accompaniment’, appending ‘_acc.wav’ to the radical.

‘outputDirSuffix’ : the subfolder name to be appended to the path of the directory of the input file, the output files will be written in that subfolder

‘outputDir’ : the full path of the output files directory

‘pathBaseName’ : base name for the output files (full path + radical for all output files)

‘pitch_output_file’ : output filename for the estimated melody line appending ‘_pitches.txt’ to the radical.

‘voc_output_file’ : output filename for the estimated ‘lead instrument’, appending ‘_voc.wav’ to the radical.

Additionally, the estimated ‘accompaniment’ and ‘lead’ with unvoiced parts estimation are written to the corresponding filename without these unvoiced parts, to which ‘_VUIMM.wav’ is appended.

imageCanvas : instance from MplCanvas or MplCanvas3Axes

canvas used to draw the image of HF0

scaleData : double

maximum value of the input data array. With scipy.io.wavfile, the data array type is integer, and does not fit well with the algorithm, so we need this scaleData parameter to navigate back and forth between the double and integer representation.

scopeAllowedHF0 : double

scope of allowed F0s around the estimated/given melody line

stftParams : dictionary with the parameters for the time-frequency representation (Short-Time Fourier Transform - STFT), with the keys:

‘hopsize’ : the step, in number of samples, between analysis frames for the STFT

‘NFT’ : the number of Fourier bins on which the Fourier transforms are computed.

‘windowSizeInSamples’ : analysis frame length, in samples

SIMMParams : dictionary with the parameters of the SIMM model (Smoothed Instantaneous Mixture Model [DRDF2010]), with following keys:

‘alphaL’, ‘alphaR’ : double

stereo model, panoramic parameters for the lead part

‘betaL’, ‘betaR’ : (R,) ndarray

stereo model, panoramic parameters for each of the component of the accompaniment part.

‘chirpPerF0’ : integer

number of F0s between two ‘stable’ F0s, modelled as chirps.

‘F0Table’ : (NF0,) ndarray

frequency in Hz for each of the F0s appearing in WF0

‘HF0’ : (NF0*chirpPerF0, N) ndarray, estimated

amplitude array corresponding to the different F0s (this is what you want if you want the visualisation representation of the pitch saliances).

‘HF00’ : (NF0*chirpPerF0, N) ndarray, estimated

amplitude array HF0, after being zeroed everywhere outside the given scope from the estimated melody

‘HGAMMA’ : (P, K) ndarray, estimated

amplitude array corresponding to the different smooth shapes, decomposition of the filters on the smooth shapes in WGAMMA

‘HM’ : (R, N) ndarray, estimated

amplitude array corresponding to the decomposition of the accompaniment on the spectral shapes in WM

‘HPHI’ : (K, N) ndarray, estimated

amplitude array corresponding to the decomposition of the filter part on the filter spectral shapes in WPHI, defined as np.dot(WGAMMA, HGAMMA)

‘K’ : integer

number of filters for the filter part decomposition

‘maxF0’ : double

the highest F0 candidate

‘minF0’ : double

the lowest F0 candidate

‘NF0’ : integer

number of F0s in total

‘niter’ : integer

number of iterations for the estimation algorithm

‘P’ : integer

number of smooth spectral shapes for the filter part (in WGAMMA)

‘R’ : integer

number of spectral shapes for the accompaniment part (in WM)

‘stepNotes’ : integer

number of F0s between two semitones

‘WF0’ : (F, NF0*chirpPerF0) ndarray, fixed

‘dictionary’ of harmonic spectral shapes for the F0 candidates generated thanks to the KLGLOTT88 model [DRDF2010]

‘WGAMMA’ : (F, P) ndarray, fixed

‘dictionary’ of smooth spectral shapes for the filter part

‘WM’ : (F, R) ndarray, estimated

array of spectral shapes that are directly estimated on the signal

verbose : boolean: if True, the program writes some information about what is happening
wavCanvas : instance from MplCanvas or MplCanvas3Axes: the canvas that is going to be used to draw the input audio waveform
XL, XR : (F, N) ndarray: resp. left and right channel STFT arrays

Methods

Constructor : reads the input audio file, computes the STFT,

generates the different dictionaries (for the source part, harmonic patterns WF0, and for the filter part, smooth patterns WGAMMA).

automaticMelodyAndSeparation :

launches sequence of methods to estimate the parameters, estimate the melody, then re-estimate the parameters and at last separate the lead from the rest, considering the lead is the most energetic source of the mixture (with some continuity regularity)

estimSIMMParams :

estimates the parameters of the SIMM, i.e. HF0, HPHI, HGAMMA, HM and WM

estimStereoSIMMParams :

estimates the parameters of the stereo version of the SIMM, i.e. same parameters as estimSIMMParams, with the alphas and betas

estimStereoSUIMMParams :

same as above, but first adds ‘noise’ components to the source part

initiateHF0WithIndexBestPath :

computes the initial HF0, before the estimation, given the melody line (estimated or not)

runViterbi :

estimates the melody line from HF0, the energies of each F0 candidates

setOutputFileNames :

triggered when the text fields are changed, changing the output filenames

writeSeparatedSignals :

computing and writing the adaptive Wiener filtered separated files

writeSeparatedSignalsWithUnvoice() :

computing and writing the adaptive Wiener filtered separated files, unvoiced parts.

References

This is a class that encapsulates our work on source separation, published as:

[DDR2011]

J.-L. Durrieu, B. David and G. Richard, A Musically Motivated Mid-Level Representation For Pitch Estimation And Musical Audio Source Separation, IEEE Journal of Selected Topics on Signal Processing, October 2011, Vol. 5 (6), pp. 1180 - 1191.

and

[DRDF2010]

J.-L. Durrieu, G. Richard, B. David and C. F’evotte, Source/Filter Model for Main Melody Extraction From Polyphonic Audio Signals, IEEE Transactions on Audio, Speech and Language Processing, special issue on Signal Models and Representations of Musical and Environmental Sounds, March 2010, vol. 18 (3), pp. 564 – 575.

As of 3/1/2012, available at http://www.durrieu.ch/research

autoMelSepAndWrite(maxFrames=1000)[source]¶: Fully automated estimation of melody and separation of signals.

automaticMelodyAndSeparation()[source]¶: Fully automated estimation of melody and separation of signals.

checkChunkSize(maxFrames)[source]¶: Computes the number of chunks of size maxFrames, and changes maxFrames in case it does not provide long enough chunks (especially the last chunk).

computeChroma(maxFrames=3000)[source]¶: Compute the chroma matrix.

computeMonoX(start=0, stop=None)[source]¶: Computes and return SX, the mono channel or mean over the channels of the power spectrum of the signal

computeNFrames()[source]¶: compute Nb Frames:

computeStereoX(start=0, stop=None)[source]¶

Compute the transform on each of the channels.

TODO this function should be modified such that we only use the pyfasst.tftransforms.tft.TFTransform framework. This could prove complicated though (especially for multiple chunk processing.). Current state (20130820): hack mainly focussed on STFT as a TF representation.

computeWF0()[source]¶: Computes the frequency basis for the source part of SIMM, if tfrepresentation is a CQT, it also computes the cqt/hybridcqt transform object.

determineTuning()[source]¶: Determine Tuning by checking the peaks corresponding to all possible patterns

estimHF0(R=1, maxFrames=1000)[source]¶: estimating and storing only HF0 for the whole excerpt, with only

estimStereoSIMMParamsWriteSeps(maxFrames=1000)[source]¶: Estimates the parameters little by little, by chunks, and sequentially writes the signals. In the end, concatenates all these separated signals into the desired output files

estimStereoSUIMMParamsWriteSeps(maxFrames=1000)[source]¶: same as estimStereoSIMMParamsWriteSeps, but adds the unvoiced element in HF0

setOutputFileNames(outputDirSuffix)[source]¶

If already loaded a wav file, at this point, we can redefine where we want the output files to be written.

Could be used, for instance, between the first estimation or the Viterbi smooth estimation of the melody, and the re-estimation of the parameters.

writeSeparatedSignals(suffix='.wav')[source]¶: Writes the separated signals to the files in self.files. If suffix contains ‘VUIMM’, then this method will take the WF0 and HF0 that contain the estimated unvoiced elements.

writeSeparatedSignalsWithUnvoice()[source]¶: A wrapper to give a decent name to the function: simply calling self.writeSeparatedSignals with the ‘_VUIMM.wav’ suffix.

SeparateLeadStereo¶

Previous topic

Next topic

This Page

Navigation

SeparateLeadStereo¶

Previous topic

Next topic

This Page

Quick search

Navigation