ListenBrainz as a correction engine?

alastairp · March 9, 2016, 12:26pm

This is something that we have had some ideas about. The ideal goal for ListenBrainz would be for everyone to send listen data to us using a MBID. This way we know exactly what recording is being talked about, and we can get good quality data from MusicBrainz. However, we know that this isn’t always possible. The person listening to the music may not have added MusicBrainz tags to their music, the music may not be in MusicBrainz, or the software they use to play the music might not understand MBID tags.

To work around this, we made MessyBrainz (http://messybrainz.org, https://github.com/metabrainz/messybrainz-server) a component of ListenBrainz. In this system, we take all of the metadata given to us by a client (which could include Artist, Title, Album, Year, Track position, MBIDs, for example), and give each unique set of information its own ID (which we call a MessyBrainz ID).

In our MessyBrainz database we currently have many IDs which probably represent the same data. Perhaps one submission includes a track number when another one doesn’t, or one contains a spelling mistake in the artist name or track title. Perhaps also, one of the submissions includes a MBID because someone tagged their file with Picard and submitted a listen with a MBID-supported music player.

There is an interesting task here to work out if two data submissions are actually the same. In fact, we think it’s so interesting that we proposed it as a potential SoC project: https://wiki.musicbrainz.org/Development/Summer_of_Code/2016#ListenBrainz:_A_way_to_associate_listens_with_MBIDs
The final goal of a project like this would be to query ListenBrainz with a MBID and find all submissions for this recording even if the recording title or artist name were spelt incorrectly.

There are some subtle and not-so-subtle problems which we have thought about here, which make this a difficult problem:

Artists sometimes have the same name
On Last.fm, they don’t handle this very well

http://www.last.fm/music/James+Morrison
"There are multiple artists called James Morrison: 1) an English singer-songwriter from Rugby 2) an Australian jazz musician who plays numerous instruments; best known for his trumpet playing 3) a notable south Sligo-style Irish fiddler. 4) “Jim” Morrison, lead singer of 1960s American rock group The Doors."

Some artists name their albums the same

And if they do, sometimes people refer to them with different names

http://musicbrainz.org/release-group/7683e0d7-3b1b-3300-8ee4-ee0ad1d6a308
This album is often referred to as Scratch, referring to the album cover by Hipgnosis
http://musicbrainz.org/release-group/055be730-dcad-31bf-b550-45ba9c202aa3
The Beatles, also known as the White Album, is the ninth studio album by English rock group the Beatles

Or people simply don’t know how to write the track title when they tag their audio

http://musicbrainz.org/release/bc081a27-34aa-4a0c-b6ec-9df4d0fd3a0d

Sometimes artists perform different songs with the same name

http://musicbrainz.org/search?query=summertime+AND+arid%3A561d854a-6a28-4aa7-8c99-323e6ce46c2a&type=recording&limit=25&method=advanced

And sometimes they’re on the same album

https://en.wikipedia.org/wiki/We_Want_Miles
The double album contains six tracks. Two versions of “Jean-Pierre”, a long and a short one, are the bookends on the first record

And that’s not even starting to mention how to deal with classical/orchestral recordings.

If you’re interested in this kind of thing, definitely talk to us!