I’m new to acousticbrainz, I discovered it through Musicbrainz and the beets plugin to submit acousticbrainz data. I’m trying to read recording pages, and I have a few questions: First, what is the youtube link? Where does it come from? Sometime it is accurate like here http://acousticbrainz.org/3d1a7c1c-7445-443a-8afc-31a382bcd88a?n=4 but sometimes it is completely off (like here: http://acousticbrainz.org/5810528d-95ea-4976-b761-c8e4ef4a728f ) . Another question I have is about the ‘voice’ parameter: how should I understand it? It seems pretty random on the few examples I looked at: this http://acousticbrainz.org/3d1a7c1c-7445-443a-8afc-31a382bcd88a?n=4 is a choral with orchestra (ouverture of an oratorio), but it says it is instrumental with what I would interpret at first glance as a pretty high probability. How should I interpret this?
The youtube links are just a rough guess. All we do is tell youtube the name of the artist and of the track and take the first result that its search returns. This means that we don’t really have any control over the response.
In the future we’d like to take a youtube link to the recording if it exists on MusicBrainz.
There is some more information about the voice/instrumental classifier on our website: https://acousticbrainz.org/datasets/accuracy#voice_instrumental
The way that this kind of system works is that we provide a lot of examples of recordings with voice, and a lot of examples of recordings without voice. However, I’ve just checked this dataset and it’s mostly popular music. This means that I’m not surprised that it gets orchestral/choir music confused. A great addition to AcousticBrainz would be a classifier which can more accurately detect the presence/absence of a choir. I’ll add this to the list of things that we’d like to add!
Thanks for your response! I wonder: If someday you choose to add more examples of music with voice, would it then be necessary to scan all recordings again or are you gathering enough information that you can process it even after some time? Else it wouldn’t help much to add any classical music from what I get. Could it be possible to use the submitted data? Many recordings have works linked to them (especially in the classical case) that could be classified rather easily and then be used to train the system.
The way that the AcousticBrainz data works is that we are able to create new classification systems without distributing new software to re-scan the music.
This is implemented in our work on datasets. You can make your own dataset at https://acousticbrainz.org/datasets/create, giving examples of recording MBIDs with and without vocals. This could be just orchestral music, or just music from a specific era (e.g. only renaissance, classical, romantic, etc).
With this system you can test the accuracy of the dataset to see how well it distinguishes between each category. Use the “Evaluate” button after you’ve saved the dataset.
Do you have some ideas of example recordings that we could use to build such a dataset? If you don’t want to select MBIDs one at a time then we can automatically generate these lists (e.g. by looking at relationship information in MusicBrainz)
Actually, what does ‘instrumental/voice’ mean? How would for example an ouverture of a bach cantata like https://musicbrainz.org/work/07472580-468d-3de2-bb69-8f5917c2e731 fit into this, with its orchestra prelude followed by a choir accompanied by orchestra?