GSoC idea: Adding music content with no MBIDs to AcousticBrainz

Every music track inherently has a lot of hidden information and thus is a goldmine for data analysts. For those artists with the talent to come up with new music and upload on soundcloud or bandcamp, it isn’t right that their music track data could go potentially unwritten in a repository, for it could be the most popular song of the year.
To forward the idea of adding music content with no MusicBrainz ID to AcousticBrainz we hope to:

  1. Collect information from existing sites like the Live music archive as mentioned in the main AcousticBrainz ideas page and tag them with uuids.
  2. Search through websites like soundcloud and fetch the username as artist name
  3. Apply Machine Learning and Natural Language Processing to identify the ‘N’ most probable track names.
  4. Choose the track name with highest similarity to existing track names on AcousticBrainz/MusicBrainz OR choose one that has highest similarity to the name given by the artist. Examples can be requested for a clearer picture on this
  5. Use MessyBrainz to generate a random uuid for the artist trackname pair.
  6. Group the uuids of the current track with an existing trackname on AcousticBrainz if trackname was chosen through highest similarity to existing trackname.

By identifying the user as artist name and grouping with already existing track names, we could potentially apply it to avoid a cold-start problem in recommender systems and thus increase popularity of tracks created by budding unknown artists.


Your proposal has a lot of open questions and not a lot of detail. Also, you need to use our application template to submit a proposal to us:

You need to add considerably more content to this proposal before we’ll seriously consider it. You might consider discussing this with people in our IRC channel.

Hi Rob, thank you for the reply.
I’m working on a new proposal draft with the correct template. I was under the impression that I had to post on this community forum, a high level abstract on what the project will achieve.
I will be expanding, in my draft proposal, how the implementation will be broken down week by week along with the questions posed on the MetaBrainz site.


Yes, but not under this specific category which is only for applications - when you go to make a new topic under this category, it even includes the application template already.

I’ll move this topic to the AcousticBrainz category for general discussion. Please make a new topic in the “GSoC applications” category with your application template (and link to this when submitting your application on the GSoC site (remember: deadline’s on Friday).