Sony Computer Science Laboratories, 2002-2006
As part of the music research team of Sony CSL (led by Francois Pachet), I designed and developped the first prototype of a software platform - the MusicBrowser - to search a collection of mp3 files based on content information (i.e. find me songs that sound like "The Beatles"). The system incorporated novel technologies such as automatic tempo analysis, playlist generation and timbre similarity. Parts of this project are now integrated in Sony Ericsson mobile phones. On some aspects, the MB had anticipated the technology behind iTunes and music recommendation by a few years.
Joint work with: Francois Pachet, Anthony Beurive, Aymeric Zils, Pierre Roy, Amaury LaBurthe.
Pachet, F., La Burthe, A., Zils, A. and Aucouturier, J.-J. Popular Music Access: The Sony Music Browser. Journal of the American Society for Information Science, 55(12):1037 -1044, 2004.
Aucouturier, J.-J. and Pachet, F., Finding Songs that Sound the Same. Proceedings of IEEE Benelux Workshop on Model-Based Processing and Coding of Audio, November 2002, Leuven, Belgium.
Aucouturier, J.-J. and Pachet, F. Music Similarity Measures: What’s the Use?. Proceedings of the International Symposium on Music Information Retrieval (ISMIR), October 2002, Paris, France.
Source code (Matlab)
This archive file (.ZIP, 4.9Mb) contains a set of Matlab files implementing Bag-of-frame timbre similarity. The code is not optimized, but should work fine for MatlabR12 and above. Feel free to use/modify it for your projects.
In more details, it contains:
- some routines to compute MFCC (from a Matlab toolbox by Mike Brookes, voicebox),
- a pattern recognition library (from a Matlab toolbox by Chris Bishop, netlab ) to compute GMMs,
- and my own code to link the two and make it work - hopefully
Be sure the first two packages are correctly found in path.
I have put very minimal instructions inline in each function, as well as a demo script (demo.m) (which can be used to test whether the paths are sorted out ok). With this code, you can do e.g. ML classification on complete signals, BOF similarity between signals, and Nearest-Neighbor classification. Feel free to ask for anything more advanced (such as finding a learned class of short sound in a long signal, etc.). Also, very little parameter tweeking is allowed for now (namely, nb of mfcc coefficients, nb of gaussian components). I'm more than happy to help with this and go in greater details. Just throw me an email.
Timbre similarity demos
Among other tools, the MusicBrowser includes an innovative "timbre similarity" search engine, which allows the user to ask the computer: "I like this tune, find me all the songs that sound the same". We introduced for this the so-called "bag-of-frames" (BOF) approach, now a standard technique. BOF works by comparing long-term statistical distributions of local spectral features computed on 50ms frames. (see papers for more details). Here are a few video demonstrations of an early implementation of the system (2004):
Query: a short musical sequence played by the user on a Korg Karma synthesizer, using a "trip-hop" preset. (audio recorded straight into the computer)
Nearest neighbors in a database of 10,000 mp3 songs include Finley Quaye and Portishead.
Query: a short musical sequence played by the user on a Korg Karma synthesizer, using a "funk" preset. (audio recorded straight into the computer)
Nearest neighbors in a database of 10,000 mp3 songs include hip-hop act De La Soul and funk artist Me'shell Ndegeocello.
Query: a short musical sequence played by the user on a (real) accordion. (audio recorded straight into the computer)
Nearest neighbors in a database of 10,000 mp3 songs include other accordion pieces and a unexpected-but-incredibly-relevant cover of The Sex Pistols.
Query: the song Knives by heavy metal band Therapy (i.e. a pre-existing mp3 file).
Nearest neighbors in a database of 10,000 mp3 songs include other songs by Therapy, as well as The Clash, Skunk Anansie or Pat Benatar.
Query: the song "Forever" by folk/blues artist Ben Harper (i.e. a pre-existing mp3 file).
Nearest neighbors in a database of 10,000 mp3 songs include other songs by Ben Harper, bluesman Keb Mo, songwriter/guitarist Leonard Cohen or acoustic reggae act "Tryo".
Query: the piece "Le moment de Verite" by fjazz pianist Ahmad Jamal (i.e. a pre-existing mp3 file).
Nearest neighbors in a database of 10,000 mp3 songs include other songs by Ahmad Jamal, jazz pianists Hank Jones or Alain Jean Marie, as well as italian pop pianist/singer Paolo Conte and all the way to Schumann and Debussy.
Query: a short musical sequence played by the user on a Korg Karma synthesizer, using an "acoustic guitar" arpeggiator. (audio recorded straight into the computer)
Nearest neighbors in a database of 10,000 mp3 songs include brasilian guitar legend Baden Powell, as well as other singer/guitarists such as Luz Casal, Paul Simon and Miossec
Public demonstrations
As part of the IST 6th-FWP European Project Semantic Hifi (granted to a consortium managed by IRCAM, Paris and including SONY CSL Paris), I took part in a week-long public workshop at Multimedia Library, Cite des Sciences de la Villette, Paris (National Science Museum), 15-18 June, 2004. During the workshop, users tested features of the Music Browser, including some of my doctoral research on pattern recognition. We collected user feedback through forms, GUI capture and informal debriefings. The workshop was initiated by and conducted under the leadership of Francois Pachet.