Jean-Louis Durrieu, PhD Candidate at the ENST - Blind Source Separation Results

Jean-Louis DURRIEU
PhD Candidate at the Ecole Nationale Supérieure des Télécommunications (ENST)

Below are some resulting sounds from the source separation algorithm we have designed, based on a previous melody detection, followed by a spectral Wiener filtering of the original signal. The songs we used are from the ISMIR 2004 Audio Melody Extraction Contest database (see the dedicated ismir 2004 website for more details and to download the test set and the reference files). We suggest you to use headphones in order to be able to hear slight differences and artifacts in the resulting sounds. To go back to the page with the flash player, click here.

The parameters for the analysis of MIREX 2004 database songs are:

sampling rate: 44100Hz
length of analysis windows: 46.44 ms (2048 samples)
hopsize: 5.8 ms (256 samples)
frequency range for melody detection: Fmin = 100Hz, Fmax = 800Hz (except when precised)
frequency quantization for the melody: discretized to every 8th tone (that is to say there are 48 frequencies per octave)
length of the songs: between 14s and 25s

The columns in the following table are:

Title: title name of the song in original test set
Original: the original song
Sep. Singer: estimated/separated singer signal
Sep. Music: estimated/separated music signal
Remix: left channel = estimated singer, right channel = estimated music
PitchMatch: percentage of "correctness" in pitch estimation in the song's pitched frames (see the ismir 2004 website for a description)
TotalMatch: percentage of "correctness" in pitch estimation in the whole song (also taking into account the silences in the singer reference track)

Title	Original	Sep. Singer	Sep. Music	Remix	PitchMatch (%)	TotalMatch (%)
opera_fem2	mp3	mp3	mp3	mp3	70.2	63.7
opera_fem4	mp3	mp3	mp3	mp3	78.8	80.2
opera_male3	mp3	mp3	mp3	mp3	65.5	66.8
opera_male5	mp3	mp3	mp3	mp3	82.8	77.7
daisy1	mp3	mp3	mp3	mp3	85.5	72.1
daisy2	mp3	mp3	mp3	mp3	85.3	74.6
daisy3	mp3	mp3	mp3	mp3	84.7	84.7
daisy4	mp3	mp3	mp3	mp3	90.5	90.5
pop1	mp3	mp3	mp3	mp3	74.2	62.7
pop2	mp3	mp3	mp3	mp3	79.8	63.8
pop3	mp3	mp3	mp3	mp3	76.8	62.1
pop4	mp3	mp3	mp3	mp3	79.4	64.8
jazz1	mp3	mp3	mp3	mp3	73.4	71.1
jazz2	mp3	mp3	mp3	mp3	71.2	67.3
jazz3	mp3	mp3	mp3	mp3	78.8	52.3
jazz4	mp3	mp3	mp3	mp3	71.4	57.0
midi1	mp3	mp3	mp3	mp3	74.6	70.7
midi2	mp3	mp3	mp3	mp3	81.9	81.9
midi3	mp3	mp3	mp3	mp3	60.0	60.5
midi4 (Fmax=1200)	mp3	mp3	mp3	mp3	84.8	73.5

Some results on a database I got from http://www.ee.columbia.edu/~graham/mirex_melody/, which seems to be the files "competitors" for the MIREX 2005 Melody Extraction Task could use to tune their algorithms.
The parameters are almost the same as before, except that the hopsize for the analysis windows is equal to 10ms (441 samples) to fit the given groundtruth.

Title	Original	Sep. Singer	Sep. Music	Remix	PitchMatch (%)	TotalMatch (%)
train01	mp3	mp3	mp3	mp3	80.9	55.1
train02	mp3	mp3	mp3	mp3	56.9	37.6
train03	mp3	mp3	mp3	mp3	77.7	48.3
train04	mp3	mp3	mp3	mp3	71.2	61.5
train05	mp3	mp3	mp3	mp3	76.5	60.0
train06	mp3	mp3	mp3	mp3	58.7	29.3
train07	mp3	mp3	mp3	mp3	72.2	56.4
train08	mp3	mp3	mp3	mp3	78.7	58.9
train09	mp3	mp3	mp3	mp3	85.1	70.4

... and below another example showing the possible use of our model for other instruments, in excerpts of take five (being a saxophonist, this example was compulsory for me!):

Title	Original	Sep. Saxophone	Sep. Background Music	Remix
takefive01	mp3	mp3	mp3	mp3
takefive02	mp3	mp3	mp3	mp3