PhD Thesis: Indexing Content-Based Music Similarity Models for Fast Retrieval in Massive Databases

Indexing Content-Based Music Similarity Models for Fast Retrieval in Massive Databases

~ Dealing with the Music of the World ~
Download PhD Thesis
Defense Presentation (Jan. 31, 2012)

This thesis develops a large-scale music recommendation system. Three problems are solved preventing the currently top-performing class of content-based music similarity algorithms from being used as recommendation engine in huge databases with millions of songs.

It is shown how to correctly use their non-vectorial music similarity features with their non-metric divergences in centroid-computing algorithms.
An alleviation to the problem of “hubs” is presented.
A method to speed up music recommendation queries is developed.

All three methods are merged in a large-scale, high-quality music recommendation prototype. The prototype is called “Wolperdinger” and operates on a collection of 2.3 million songs. A query is processed in a fraction of a second on a standard PC.

Errata

~~~

Video of the Prototype

The video shows a music recommendation system operating on 2.3 million songs.

Related Publications

A Fast Audio Similarity Retrieval Method for Millions of Music Tracks, Schnitzer D., Flexer A., Widmer G., Multimedia Tools and Applications, in press, published online December 2010.
A Filter-and-Refine Indexing Method for Fast Similarity Search in Millions of Music Tracks, Schnitzer D., Flexer A., Widmer G., Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR’09), Kobe, Japan, 2009.
Using Mutual Proximity to Improve Content-Based Audio Similarity, Schnitzer D., Flexer A., Schedl M., Widmer G., Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11), Miami, FL, USA, 2011
Islands of Gaussians: The Self Organizing Map and Gaussian Music Similarity Features, Schnitzer D., Flexer A., Widmer G., Gasser M., Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10), Utrect, NL, 2010.
Method and a system for identifying similar audio tracks, US Patent (US 8190663), EP (EP2273384)

Relevant Links

Multivariate Normals (MVN) Matlab Toolbox[→]

Dominik Schnitzer