A Musically Motivated Mid-Level Representation For Pitch Estimation And Musical Audio Source Separation

Introduction


This page presents some results and media related to the submitted article "A musically motivated mid-level representation for pitch estimation and musical audio source separation", J.-L. Durrieu, B. David and G. Richard, IEEE Journal on Selected Topics on Signal Processing, Music Signal Processing, October 2011 (first submission 29th Sept. 2010, revised 2nd Feb. 2011), Vol. 5 (6), pp. 1180 - 1191.

Annotation files

We have annotated the 5 songs from the development database of the SiSEC 2010 "Professionally Produced Music Recordings" evaluation campaign. The annotation for each file is the melody, evaluated on frames of size 46.44ms (2048 samples@44100 Hz), every 5.8ms (256 samples). We gathered them in this archive. In each file, each row, the first value is the time-stamp (s) and the the second one is the fundamental frequency (Hz) of the corresponding frame.

Source code

Sound examples

The five songs used for the experiments are given in the following table, with typical separation results. The systems are the V(U)IMM systems, with 50 iterations. You can also access directly to the files here.
Original VIMM VUIMM
Bearlin
Bearlin Vocals
Bearlin Music
Tamy
Tamy Vocals
Tamy Music
Another
Another Vocals
Another Music
Fort
Fort Vocals
Fort Music
Ultimate
Ultimate Vocals
Ultimate Music

License

All the original WAV files can be found on the website for the evaluation campaign SiSEC2010. Here is an excerpt of the license section you can read on that page:
"All audio files are distributed under the terms different licenses, as listed below for each recording: All the former test and development data (test1 and dev1) are from MTG MASS database by M. Vinyes."

Evolution of estimated HF0

The following video (older color version here) shows the evolution of the matrix HF0 during the first parameter estimation, prior to the estimation of the melody, for the excerpt by J. Pastorius, "Three views of a secret". Time is in abscisse (in samples) while the ordinate scale corresponds to a logarithmic fundamental frequency scale.
It is interesting to see how the different contributions of the instruments, the trumpets, separated by one octave, and the harmonica, become clearer and clearer over the iterations. Indeed, it becomes visible that the upper octave trumpet has a pitch which is more varying, especially with the effect at 0.09s, while the attacks of the harmonica are clearly behind those of the trumpet.


Jean-Louis Durrieu
Last modified: Fri Sep 13 11:06:28 CEST 2013