# A Musically Motivated Mid-Level Representation For Pitch Estimation And Musical Audio Source Separation

## Introduction

This page presents some results and media related to the submitted article "A musically motivated mid-level representation for pitch estimation and musical audio source separation", J.-L. Durrieu, B. David and G. Richard, IEEE Journal on Selected Topics on Signal Processing, Music Signal Processing, October 2011 (first submission 29th Sept. 2010, revised 2nd Feb. 2011), Vol. 5 (6), pp. 1180 - 1191.

## Annotation files

We have annotated the 5 songs from the development database of the SiSEC 2010 "Professionally Produced Music Recordings" evaluation campaign. The annotation for each file is the melody, evaluated on frames of size 46.44ms (2048 samples@44100 Hz), every 5.8ms (256 samples). We gathered them in this archive. In each file, each row, the first value is the time-stamp (s) and the the second one is the fundamental frequency (Hz) of the corresponding frame.

## Source code

• BSSEval.zip: an archive containing a Python/NumPy/Cython implementation of BSSEval. These scripts were used to evaluate our algorithms. We have also tested them on some examples, and comparison with the original Matlab implementation seems correct for a delay parameter equal to 0 (no delay allowed). For higher delays, our implementation seems to be rather unstable.
• pitchEval.py: the scripts we used to evaluate the melody estimation.
• separateLeadStereo.zip: the programs and scripts implementing the proposed systems: melody estimation, VIMM and VUIMM to separate the lead instrument from the accompaniment.
• [Experimental] f0salience: a Vamp Plug-In which implements the salience function proposed in our article. Mainly thought as a plug-in for Sonic Visualiser. In the Git repository, you will find compiled version for Windows 32 bits (.dll), Linux 32 and 64 bits (.so) and MacOsX 10.6 32 bits (.dylib). Note that for Windows 64 bits, you can also use the 32 bit version, see the README file for details. Also important is to note that the linux compiled libraries seem not to work, but it should be fairly easy to manipulate the makefiles to fit your environment. The plug-in might still be prone to some errors. It does not exactly implement the algorithms explained in this article, although the representation obtained under Sonic Visualiser may roughly show what can be expected. When using it, Sonic Visualiser may seem to "freeze" for a rather long time - depending on the required parameters. The initialisation may indeed take some time to generate the basis - dictionary - matrices.

## Sound examples

The five songs used for the experiments are given in the following table, with typical separation results. The systems are the V(U)IMM systems, with 50 iterations. You can also access directly to the files here.
 Original VIMM VUIMM Bearlin Bearlin Vocals Bearlin Music Tamy Tamy Vocals Tamy Music Another Another Vocals Another Music Fort Fort Vocals Fort Music Ultimate Ultimate Vocals Ultimate Music

## Evolution of estimated $H$$F$0
The following video (older color version here) shows the evolution of the matrix $H$$F$0 during the first parameter estimation, prior to the estimation of the melody, for the excerpt by J. Pastorius, "Three views of a secret". Time is in abscisse (in samples) while the ordinate scale corresponds to a logarithmic fundamental frequency scale.