How to find all recordings of a work? Auto-create work?

(Numbered my questions to make responding easier)

I more or less get what recordings are and why different recordings exist for the same “song”. I’m wondering though how would I be able to find all recordings of a “song”? (1)

Digging deeper in the docs, I came to the understanding that the MB entity for what I call “song” is work. Am I correct on this? (2)

Recordings are very well registered in MB, while works are not all the time. For example, there are 5 recordings for “False Alarm” by The Weeknd, but only one of them is linked to the work:

Ideally, all of these recordings should be linked to the work, right? (3)

It makes sense that the works table is less complete compared to the recordings table, but this makes me wonder whether deriving works and linking recordings should be automated to some extent? (4)

5 Likes

(1) Work = the written music on the page, along with the lyrics. It has a “composer” and a “lyricist”. Or sometimes a combined “writer”.

(2) In MB terms a “song” is a Work that has words. Lyrics to sing. It is still a “Work” with or without words.

Recording = that music put onto a tape. Someone pics up a microphone and sings the song. They make a Recording of the Work.

They can make many different Recordings, different singers, different bands. They all link back to the one Work.

(3) a lot of editors don’t link or add Works. So there is a big disconnect. Every editor has different areas of interest in the database.

In a perfect world, every Recording would link to a Work. In this perfect world you can look at one Work and see every time it has been recorded, and every cover band who has also played that track. Along with all the Live versions.

(4) there are scripts that will help guess works, but they are not perfect and have to have a human check it. Too easy to make errors. If you use the Guess Works scripts you soon learn the problems with this method. They are 90% correct, but that 10% error will be common words, search failing to get a match. (As a Pink Floyd fan I laugh how often it fails to match “Time” and “Money”)

When you add a CD, or add something from Spotify you are forced to link your Track to a Recording. This is why Recordings are so well covered in the database.

Adding Works, and linking them to Recordings is optional. You are now really relying on the fans of an artist to dive in and fill in these missing links. Not everyone has the time and patience for this.

11 Likes

Thanks for the response!

I don’t expect we reach the “perfect world” state, but I would like to strive for it.

I completely understand that, I’m not asking people to input every missing detail into the DB. That is why I suggested automation. Another idea along these lines:

Whenever a release is added, an existing release group has to be selected or a new one will be created. Wouldn’t it make sense to do the same thing regards recordings and works? So whenever a recording is added an existing work has to be selected or a new one will be created. Furthermore, if a selection was already made by an algorithm*, people would only have to opt out in the minority of the cases, minimizing friction. This would ensure that there’s a related work for each recording.

*Algorithm: You mention “Guess Works” script. Is that a concrete script, and is it publicly available? I’m very much interested in developing an algorithm or enhancing an existing one if I can. I think my suggestion is quite rudimentary but it goes like this:

Finding the related work for a given recording:

  1. Take the title of the recording and strip the parenthesis suffix using the first capture group of this or similar regex: /(^.+) \(.+\)/gi. This is for removing suffixes like “(Remix)”, “(Live)”, etc.
  2. Find the work with name that equals (case insensitive) the stripped title among the works of the first artist of the recording. If a work with these attributes exist, I think it’s the correct match for most of the cases.

I know taking the first artist is not always the correct approach, but I found it to be quite effective, at least for my use (pop/hip hop/rap genres). Also, this wouldn’t really help the cover songs case, but then editors could opt out, or an accidentally created incorrect work could be later merged as well.

What do you think?

This is why we can all do our bit. :slight_smile: But all editors are different, and all have different data they input. They have different interests. I often find it surprising when I see respected editors who say they are not interested in Works, their focus is in different areas. All of these connections fascinate me, but then I am a bit data mad :crazy_face:

I can share the Guess Works userscripts I make use of, but they must be checked. Do not just press a button and trust them 100%. Their use can be a HUGE timesaver though. Better to have than not have.

I share them so you can see how even good Algorithms make errors. No computer algorithm can be fool proof. Remember - AI stands for Algorithmic Idiocy, Automated Inaccuracy

Do you use ViolentMonkey already? If not, I can explain that bit if needed.

Please note - the following are not supposed to be negative criticism of these scripts. They are HUGE time savers, but have limitations that need to be human checked. These script writers have done a great job, but I am also pretty lousy with the written word and accidentally upset people when I don’t mean to. Seriously - I love :heart:your scripts

Guess Related Works in Batch - GitHub - loujine/musicbrainz-scripts: Collection of greasemonkey scripts for MusicBrainz
For use on the Release \ Edit Relationships page. Try it out and you will see it uses the built in search, but this hits issues due to limitations the search algorithm. Try Dark Side of the Moon (or a Pink Floyd concert) and you see “Money” is always matched to “money, money, money” by Abba. And there are Loads of common Works names which always need checking. A lot of Works titles are not unique. To fix Guess Works, you need to fix Search. And Search can never be 100% perfect. This is a good example of when a fan of the artist really needs to be the one using this tool.

Batch Add Performance of - GitHub - murdos/musicbrainz-userscripts: Collection of userscripts for MusicBrainz, by various authors
This one works from the Artists \ Recordings page. This lets you select a list of recordings and attach Works en-masse . BUT it is buggy. It does not always seem to spot when a Work is already attached and marked (live), so you can end up with the Work attached twice.

Again - not a criticism, it is a big time saver, but in the wrong hands can lead to chaos if used without being checked.

Both these scripts are made by some of the best script writers, and have saved me a huge amount of time. Doing these bulk add is great as it is easier to clean up that small number of mistakes that are made. But it still requires knowledge of the artist to really work with these. This is why you can’t force this into a standard workflow.

An example test case:
I tried to find a Floyd concert… not any left that are not already linked. :smiley: So lets go to Dave Gilmour and test - Log in - MusicBrainz
Guess works here sets track 4 “Smile” to “Smile Smile Smile”, tracks 3 and 8 are off. You don’t really know which version of track 13 Shine-On was played, track 18 “Time” gets “Time After Time”. And 23 is off.

Trying to manually search “Time” is actually pretty hard even for the human who knows what they are looking for. “Money” even worse as it scrolls for a few pages.

This is tough for a human, but the scripts save us a HUGE amount of time on an example like this. 6 / 24 are missed.

1 Like

MB has a very small paid staff.
And many many volunteers. And us volunteers has … ah… the diversity of motives and … well …diversity of patience in getting things exactly right.
Even the requirement to have an MB Artist for a track pushes some, very few, of us towards not taking care/any care/enough care to either enter a new Artist or select the correct MB Artist. Instead an incorrect Artist is sometimes used.

Whilst what you suggest would get many more correct Work-Recording relationships entered I fear it would result in many incorrect relationships being entered. And they take a lot of time to correct or find. Or they pollute the db.

4 Likes

You cannot automatically guess what work should be selected for a recording.

If we stick to English, and if we only take the case where recording name is the same as work name:

There are several works called I LOVE YOU, and nothing says that this artist will only always play particularly one of them.
Potentially, an artist could cover a different I LOVE YOU for each of their recordings.

2 Likes

I never understood how to use the Batch Add Performance script.
It looks powerful but I don’t know how to check edits before submitting.

Usually I know exactly what work I want, so I open artist works tab and copy the work URL and paste it in the release relationship editor.

That’s faster for me. :grin:

Or create the work if not found.

2 Likes

These cover two very different situations. The first script listed above is the easiest to use and check and highly recommended.

Tick a number of Recordings on an Edit Relationships page and it does a simple search on each one for you. Hit rate usually quite high, but all it is doing it putting the Works in to place on the right hand side to check. This makes it easy to spot the errors and swap the correct “I Love You” work into place.

Nothing is finalised until you hit the button at the bottom of the page.

Can be a huge time saver when you have an artist who has lots of unique Works names. That Dave Gilmour example above is filled in in a few seconds, then you just spend time checking the results and correcting the errors. Highly efficient.

2 Likes

Thanks, then I should really try Guess Related Works, someday. Sounds safe(r). :wink:

2 Likes

First script, totally safe as you verify all of its choices before hitting the button. Highly recommended to be in all script toolboxes.

Second script, scary potential for making a HUGE mess that takes weeks to fix. A lot harder to check.

1 Like

Indeed. The Smithereens wrote a song called “Sundown”. They also covered the Gordon Lightfoot song “Sundown”. An auto-working robot would have almost no way of telling the difference.

Also – it was mentioned that Works be auto-created – this would lead to a large number of duplicate works requiring merging, similar to the case of Recordings now. I prefer that Works take a more “curated” path than a robotic auto-creation would do.

7 Likes

Sorry I went missing.

Thanks for the scripts! Not meaning to criticize either, but in Guess Related Works in Batch, the guessing part seems like merely a sophisticated search. I understand why that doesn’t work for the examples you brought up. My approach would be different: based on the actual relationships of the artist and recordings.

Anyways, I won’t be using these scripts on the site I understand they can be dangerous.

@mmirG I understand the issue with the additional friction, you’re right. And of course false negatives and false positives are bad too.


What I’m ultimately interested in is finding an automated way of identifying recordings of the same work. The core of it is really just a principle, technique, algorithm, you name it. Its manifestation could be something that’s part of the MusicBrainz platform, or a script, or simply the code I write use in my project. I can only control the last one, but I want to share whatever I find beneficial to the community.

To the “band covering a song with the same name” problem, I understand, but then again, we, as humans need some signal to know that one is a cover song and another isn’t. For example, “Sundown” by The Smithereens:

Out of the three I know that one is the cover, and another one is not the cover. I’m not totally sure, but I’m guessing the “Sundown (demo)” is not a cover. If we are strictly comparing titles, these are three different works. From one side that is good because false positives are avoided, on the other hand there would be duplicate works this way, what I call false negative.

So I have two questions:

  1. In case of a band covering a song with the same title: in theory there would always be something recorded into the db based on a human can understand that one is a cover and another one isn’t, right? Otherwise how would anybody know about this difference?

  2. This is more of an opinion-question, but are false positives worse than false negatives?

1 Like

If you are the one adding a release and the links to works, the booklet usually mentions the writers of the works, so you usually notice covers and link them to the correct (other artist’s) works.

By the way, on the topic of covers, I would love that we don’t have to mark all cover recordings as being covers. It would be either automatic or we would just mark the original recording.

I don’t understand exactly your two cases in the context of linking a recording to the bad work:

  • false positive: ?
  • false negative: ?

No links are better than bad links.

1 Like
  • false positive: incorrectly linking an original recording and a cover recording to the same work
  • false negative: creating duplicate works for two or more original recordings that are actually the recordings of the same song
1 Like

This is not a problem at all (so it’s better), and you will soon notice that they have the same writers and then the same lyrics, if you link that. :slight_smile:

1 Like