Jean-Louis DURRIEU
PhD Candidate at the Ecole Nationale Supérieure des Télécommunications (ENST or TELECOM ParisTech)





Below are some resulting sounds from the source separation algorithm we have designed, based on a previous melody detection (in Durrieu et al., SINGER MELODY EXTRACTION IN POLYPHONIC SIGNALS USING SOURCE SEPARATION METHODS, see the publications page), followed by a spectral Wiener filtering of the original signal. The songs we used are from the ISMIR 2004 Audio Melody Extraction Contest database (see the dedicated ismir 2004 website for more details and to download the test set and the reference files). We suggest you to use headphones in order to be able to hear slight differences and artifacts in the resulting sounds. If you experience any problems with the flash player, you can also check this alternate page with direct links to the files.

The parameters for the analysis of MIREX 2004 database songs are:
  • sampling rate: 44100Hz
  • length of analysis windows: 46.44 ms (2048 samples)
  • hopsize: 5.8 ms (256 samples)
  • frequency range for melody detection: Fmin = 100Hz, Fmax = 800Hz (except when precised)
  • frequency quantization for the melody: discretized to every 8th tone (that is to say there are 48 frequencies per octave)
  • length of the songs: between 14s and 25s
The columns in the following table are:
  • Title: title name of the song in original test set
  • Original: the original song
  • Sep. Singer: estimated/separated singer signal
  • Sep. Music: estimated/separated music signal
  • Remix: left channel = estimated singer, right channel = estimated music
  • PitchMatch: percentage of "correctness" in pitch estimation in the song's pitched frames (see the ismir 2004 website for a description)
  • TotalMatch: percentage of "correctness" in pitch estimation in the whole song (also taking into account the silences in the singer reference track)
Here is a table showing the results obtained by the system described in our ICASSP'08 article, Singer melody extraction in polyphonic signals using source separation methods, which A. Ehmann from the University of Illinois, at Urbana Champaign, kindly accepted to run on the database that were used for mirex 2006.

Vx Recall Vx False Alm Vx d' Raw pitch Raw Chroma Overall Acc
dressler 89.9% 23.5% 2.00 80.0% 82.9% 77.3%
ryynanen 80.4% 15.4% 1.88 75.5% 78.2% 72.1%
poliner 90.2% 35.7% 1.66 72.6% 75.7% 69.3%
sutton 67.6% 17.0% 1.41 59.1% 62.5% 55.7%
brossier 99.5% 97.1% 0.66 48.3% 61.7% 39.8%
durrieu 99.99% 99.81% 0.72 76.13% 80.02% 61.92%
Here is the opendoc spreadsheet with all the results. The results might differ from the ones reported below on this page, because our system is not deterministic and relies on a random initialization that may change the results. However, they do not seem to be significantly different at each run.

Title
Original
Sep. Singer
Sep. Music
Remix
Pitch
Match (%)
Total
Match (%)
opera_fem2




70.2
63.7
opera_fem4




78.8
80.2
opera_male3




65.5
66.8
opera_male5




82.8
77.7
daisy1




85.5
72.1
daisy2




85.3
74.6
daisy3




84.7
84.7
daisy4




90.5
90.5
pop1




74.2
62.7
pop2




79.8
63.8
pop3




76.8
62.1
pop4




79.4
64.8
jazz1




73.4
71.1
jazz2




71.2
67.3
jazz3




78.8
52.3
jazz4




71.4
57.0
midi1




74.6
70.7
midi2




81.9
81.9
midi3




60.0
60.5
midi4
(Fmax=1200)




84.8
73.5

Some results on a database I got from http://www.ee.columbia.edu/~graham/mirex_melody/, which seems to be the files "competitors" for the MIREX 2005 Melody Extraction Task could use to tune their algorithms.
The parameters are almost the same as before, except that the hopsize for the analysis windows is equal to 10ms (441 samples) to fit the given groundtruth.

Title
Original
Sep. Singer
Sep. Music
Remix
Pitch
Match (%)
Total
Match (%)
train01




80.9 55.1
train02




56.9 37.6
train03




77.7 48.3
train04




71.2 61.5
train05




76.5 60.0
train06




58.7 29.3
train07




72.2 56.4
train08




78.7 58.9
train09




85.1
70.4


... and below another example showing the possible use of our model for other instruments, in excerpts of take five (being a saxophonist, this example was compulsory for me!):
Title
Original
Sep. Saxophone
Sep. Background Music
Remix
takefive01
takefive02
Document made with Nvu