Entity merging tools “mb. MERGE HELPOR 2”

userscripts
Tags: #<Tag:0x00007f3098791560>

#1

Thanks @culinko for bring this topic to discussion. My mind is really not set despite my strong personal preference to not merge old entities into recent duplicates. :nerd_face:

Continuing the discussion from Your favourite User Scripts for MusicBrainz?! (so that they can be used while redesigning!):

It’s been a personal obsession to merge newer duplicates into pre‐existing entities rather than the opposite.
Even if the target data has to be fixed at the same time (better title style or missing artist credits).
I had made a user script to ease spotting the oldest created MBID (mb. MERGE HELPOR 2) and a wiki page to list the known inconvenients of merging old entities into newer ones.

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID

But maybe the inconvenience is mostly theoritical to me as I don’t use flickr machine tags any more and as mb. COLLECTION HIGHLIGHTER is just a user script.

The MERGE HELPOR also has other useful features for merge editors (mostly on the merge page itself), such as:

  • Highlights last clicked entity
  • Shows additional info for each about to be merged entities — An old example for works: https://imgur.com/5AlDF — each entity types has its set of added info, like amount of recordings, works, etc. for artists
  • Indicates oldest MBID
  • “Remove entity from merge” buttons
  • “Clear queue and add to merge” button — doesn’t reload page for nothing when nothing is checked (currently broken feature, apparently)

#2

Other than the age of the MBID, does it make a difference which direction the merge goes? If I merge a newer (but more complete) entry into an older but sparser one, will anything be lost? (If not, why doesn’t MB just pick for us based on the oldest MBID?)


#3

Yes, source data is partially lost.
Merge edits are destructive.
I can list what is lost depending on the type of entity that is merge when I have some more time.


#4

I can list what is lost depending on the type of entity that is merge when I have some more time.

Yes please. It would be great to have this documented somewhere.


#5

Hi @jesus2099. Do you have time to expand on your argument? I often read claims and counter-claims in edit notes about the better way to merge. See for example https://musicbrainz.org/edit/52711568, a recording merge edit.


#6

My preference for oldest MBID target for merges is explained by two examples below.
It is not a must do, it is just that why not:

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID


#7

I’m generally happy with a merge either way when the entities are equally good. But otherwise, if one has a better name, or artist credit (for a recording or release) or the like, I’d always merge into the most correct, whatever is the oldest. Otherwise, the resulting entity needs to be improved again.


#8

Except a few images I had uploaded, I don’t really use flickr MB machine tags.
But it’s an example of what is broken by targeting the duplicate instead of the pre existing MBID.
I do use COLLECTION HIGHLIGHTER, which is also broken by this.
If I find other problems, I will add them to that wiki page.


#9

Thanks @jesus2099. I get what you’re saying about the IDs and references into MB from outside.

Earlier in this thread, you said:

I don’t read that as the same issue you’re describing in your wiki page. Do you think that merges can be destructive to data that’s entirely within MBz?


#10

Yes, at least to some degree. If I merge “Symphony no. 1 in D major, op. X: I. Adagio” by “Orchestra X, Conductor Y” into “Adagio” by “Composer”, that’s IMO destructive. Sure, it can be reconstructed from a tracklist and a script, but it has made the data worse in the meantime :slight_smile:


#11

Yes, if you wrongly merge you cannot tell this AcoustID was for this track and that AcoustID was for that other track.
Same for ISRC, unless the Add ISRCs edits were will referenced (which is rare because ISRC web service does not allow edit notes, you have to come back to edit history to ad them manually).
Same for relationships, etc.
Splitting unmerging recordings is difficult and you will have to wipe out at least the AcoustIDs.