Jean-Louis DURRIEU
PhD Candidate at the Ecole Nationale Supérieure des Télécommunications (ENST)





Below are some resulting sounds from the source separation algorithm we have designed, based on a previous melody detection, followed by a spectral Wiener filtering of the original signal. The songs we used are from the ISMIR 2004 Audio Melody Extraction Contest database (see the dedicated ismir 2004 website for more details and to download the test set and the reference files). We suggest you to use headphones in order to be able to hear slight differences and artifacts in the resulting sounds. To go back to the page with the flash player, click here.

The parameters for the analysis of MIREX 2004 database songs are:
  • sampling rate: 44100Hz
  • length of analysis windows: 46.44 ms (2048 samples)
  • hopsize: 5.8 ms (256 samples)
  • frequency range for melody detection: Fmin = 100Hz, Fmax = 800Hz (except when precised)
  • frequency quantization for the melody: discretized to every 8th tone (that is to say there are 48 frequencies per octave)
  • length of the songs: between 14s and 25s
The columns in the following table are:
  • Title: title name of the song in original test set
  • Original: the original song
  • Sep. Singer: estimated/separated singer signal
  • Sep. Music: estimated/separated music signal
  • Remix: left channel = estimated singer, right channel = estimated music
  • PitchMatch: percentage of "correctness" in pitch estimation in the song's pitched frames (see the ismir 2004 website for a description)
  • TotalMatch: percentage of "correctness" in pitch estimation in the whole song (also taking into account the silences in the singer reference track)

Title
Original
Sep. Singer
Sep. Music
Remix
PitchMatch (%) TotalMatch (%)
opera_fem2
mp3
mp3
mp3
mp3
70.2
63.7
opera_fem4
mp3 mp3
mp3
mp3
78.8
80.2
opera_male3
mp3
mp3
mp3
mp3
65.5
66.8
opera_male5
 mp3
 mp3
mp3
mp3
82.8
77.7
daisy1
mp3
mp3
mp3
mp3
85.5
72.1
daisy2
mp3
mp3
mp3
mp3
85.3
74.6
daisy3
mp3
 mp3
mp3
mp3
84.7
84.7
daisy4
mp3
 mp3
mp3
mp3
90.5
90.5
pop1
mp3
 mp3
mp3
mp3
74.2
62.7
pop2
mp3
 mp3
mp3
mp3
79.8
63.8
pop3
mp3
 mp3
mp3
mp3
76.8
62.1
pop4
mp3
 mp3
mp3
mp3
79.4
64.8
jazz1
mp3
 mp3
mp3
mp3
73.4
71.1
jazz2
mp3
 mp3
mp3
mp3
71.2
67.3
jazz3
mp3
 mp3
mp3
mp3
78.8
52.3
jazz4
mp3
 mp3
mp3
mp3
71.4
57.0
midi1
mp3
 mp3
mp3
mp3
74.6
70.7
midi2
mp3
 mp3
mp3
mp3
81.9
81.9
midi3
mp3
 mp3
mp3
mp3
60.0
60.5
midi4
(Fmax=1200)
mp3
mp3
mp3
mp3
84.8
73.5

Some results on a database I got from http://www.ee.columbia.edu/~graham/mirex_melody/, which seems to be the files "competitors" for the MIREX 2005 Melody Extraction Task could use to tune their algorithms.
The parameters are almost the same as before, except that the hopsize for the analysis windows is equal to 10ms (441 samples) to fit the given groundtruth.

Title
Original
Sep. Singer
Sep. Music
Remix
PitchMatch (%)
TotalMatch (%)
train01
mp3
mp3
mp3
mp3
80.9 55.1
train02
mp3
mp3
mp3
mp3
56.9 37.6
train03
mp3
mp3
mp3
mp3
77.7 48.3
train04
mp3
mp3
mp3
mp3
71.2 61.5
train05
mp3
mp3
mp3
mp3
76.5 60.0
train06
mp3
mp3
mp3
mp3
58.7 29.3
train07
mp3
mp3
mp3
mp3
72.2 56.4
train08
mp3
mp3
mp3
mp3
78.7 58.9
train09
mp3
mp3
mp3
mp3
85.1
70.4


... and below another example showing the possible use of our model for other instruments, in excerpts of take five (being a saxophonist, this example was compulsory for me!):
Title
Original
Sep. Saxophone
Sep. Background Music
Remix
takefive01
mp3
mp3
mp3
mp3
takefive02
mp3
mp3
mp3
mp3



Document made with Nvu