Every music track inherently has a lot of hidden information and thus is a goldmine for data analysts. For those artists with the talent to come up with new music and upload on soundcloud or bandcamp, it isn't right that their music track data could go potentially unwritten in a repository, for it could be the most popular song of the year.
To forward the idea of adding music content with no MusicBrainz ID to AcousticBrainz we hope to:
- Collect information from existing sites like the Live music archive as mentioned in the main AcousticBrainz ideas page and tag them with uuids.
- Search through websites like soundcloud and fetch the username as artist name
- Apply Machine Learning and Natural Language Processing to identify the 'N' most probable track names.
- Choose the track name with highest similarity to existing track names on AcousticBrainz/MusicBrainz OR choose one that has highest similarity to the name given by the artist. Examples can be requested for a clearer picture on this
- Use MessyBrainz to generate a random uuid for the artist trackname pair.
- Group the uuids of the current track with an existing trackname on AcousticBrainz if trackname was chosen through highest similarity to existing trackname.
By identifying the user as artist name and grouping with already existing track names, we could potentially apply it to avoid a cold-start problem in recommender systems and thus increase popularity of tracks created by budding unknown artists.