MERGE HELPOR 2 :: Entity merging tools

jesus2099 · January 26, 2018, 8:22am

Current version (click Raw to install): jesus2099/konami-command/ mb. MERGE HELPOR 2.user.js
Known bugs: Issues · jesus2099/konami-command · GitHub

Thanks @culinko for bring this topic to discussion. My mind is really not set despite my strong personal preference to not merge old entities into recent duplicates.

Continuing the discussion from Your favourite User Scripts for MusicBrainz?! (so that they can be used while redesigning!):

Your favourite User Scripts for MusicBrainz?! (so that they can be used while redesigning!)

mb. MERGE HELPOR 2

indicates which entity has the oldest MBID, so no data will be lost after the merge

merge-heplor.png1318x131 21.8 KB

I decided to mention this userscript despite thinking the concept of MBID age would be complicated/confusing for new or inexperienced editors. The most important strength of this userscript is that it potentially prevents the loss of data. I remember when I was a new editor I merged some recordings and picked the target recording based on: a) most correct name and/or artist credit, b) most information attached to it, such as ISRC or relationships or c) consisted of the most tracks. I didn’t really pay attention to the MBID age, so it’s quite possible that some information could have been lost, because afaik the edits for the recordings with older age than the ones I had merged into can’t be viewed anymore. The most often case is the first one where people are merging entities into the newer MBID simply because the title or artist credit is correct on the newer entity instead of fixing the older MBID and merging it into that one. I think I even had to defend one of my edits simply because someone thought I was merging the “correct” recording into the “incorrect” recording while I was merging the newer MBID into the older MBID while also fixing the title and artist credit on the recording with the older MBID at the same time. Perhaps there could be a warning of some sort or even a guide how to fix the “incorrect” entity in these cases to preserve the entity with the older MBID. This is just some food for thought.

It’s been a personal obsession to merge newer duplicates into pre‐existing entities rather than the opposite.
Even if the target data has to be fixed at the same time (better title style or missing artist credits).
I had made a user script to ease spotting the oldest created MBID (mb. MERGE HELPOR 2) and a wiki page to list the known inconvenients of merging old entities into newer ones.

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID

But maybe the inconvenience is mostly theoritical to me as I don’t use flickr machine tags any more and as mb. COLLECTION HIGHLIGHTER is just a user script.

The MERGE HELPOR also has other useful features for merge editors (mostly on the merge page itself), such as:

Highlights last clicked entity
Shows additional info for each about to be merged entities — An old example for works: Imgur: The magic of the Internet — each entity types has its set of added info, like amount of recordings, works, etc. for artists
Indicates oldest MBID
“Remove entity from merge” buttons
“Clear queue and add to merge” button — doesn’t reload page for nothing when nothing is checked (currently broken feature, apparently)

highstrung · January 29, 2018, 1:44am

Other than the age of the MBID, does it make a difference which direction the merge goes? If I merge a newer (but more complete) entry into an older but sparser one, will anything be lost? (If not, why doesn’t MB just pick for us based on the oldest MBID?)

jesus2099 · January 29, 2018, 6:18am

Yes, source data is partially lost.
Merge edits are destructive.
I can list what is lost depending on the type of entity that is merge when I have some more time.

obtext · January 29, 2018, 9:00am

I can list what is lost depending on the type of entity that is merge when I have some more time.

Yes please. It would be great to have this documented somewhere.

monxton · May 13, 2018, 2:23pm

Hi @jesus2099. Do you have time to expand on your argument? I often read claims and counter-claims in edit notes about the better way to merge. See for example https://musicbrainz.org/edit/52711568, a recording merge edit.

jesus2099 · May 13, 2018, 6:42pm

My preference for oldest MBID target for merges is explained by two examples below.
It is not a must do, it is just that why not:

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID

reosarevok · May 13, 2018, 6:45pm

I’m generally happy with a merge either way when the entities are equally good. But otherwise, if one has a better name, or artist credit (for a recording or release) or the like, I’d always merge into the most correct, whatever is the oldest. Otherwise, the resulting entity needs to be improved again.

jesus2099 · May 13, 2018, 7:17pm

Except a few images I had uploaded, I don’t really use flickr MB machine tags.
But it’s an example of what is broken by targeting the duplicate instead of the pre existing MBID.
I do use COLLECTION HIGHLIGHTER, which is also broken by this.
If I find other problems, I will add them to that wiki page.

monxton · May 13, 2018, 9:11pm

Thanks @jesus2099. I get what you’re saying about the IDs and references into MB from outside.

Earlier in this thread, you said:

I don’t read that as the same issue you’re describing in your wiki page. Do you think that merges can be destructive to data that’s entirely within MBz?

reosarevok · May 13, 2018, 10:00pm

Yes, at least to some degree. If I merge “Symphony no. 1 in D major, op. X: I. Adagio” by “Orchestra X, Conductor Y” into “Adagio” by “Composer”, that’s IMO destructive. Sure, it can be reconstructed from a tracklist and a script, but it has made the data worse in the meantime

jesus2099 · May 14, 2018, 5:41am

Yes, if you wrongly merge you cannot tell this AcoustID was for this track and that AcoustID was for that other track.
Same for ISRC, unless the Add ISRCs edits were will referenced (which is rare because ISRC web service does not allow edit notes, you have to come back to edit history to ad them manually).
Same for relationships, etc.
Splitting unmerging recordings is difficult and you will have to wipe out at least the AcoustIDs.

jesus2099 · December 29, 2019, 1:39pm

For the record, I have now, at last, added some checks so that this script is not used to skip edit notes.

Kid_Devine · March 9, 2020, 11:06pm

I don’t like this change, the script wouldn’t let me submit this edit. I’ve downgraded to the 2018 version.

jesus2099 · November 29, 2020, 9:50am

For info there is a bug that currently prevents displaying info on recording merge page.

This bug happens since the addition of AcoustID on merge page, which is great:

Update

It should be fixed now.
I hope my patch doesn’t bring any side effects.

highstrung · November 29, 2020, 11:01pm

Thank you, the updated version is working for me.

jesus2099 · November 30, 2020, 8:12am

Oh. I’ll check that.

Griomo · January 16, 2022, 1:46pm

How do I tell from 2 MBID:s themselves (by just looking at them, not going further with script, editing history or else) which one is the oldest=lowest?
I would expect f.i. 16d2e14a < 6c2b3dd3 but am wrong:

Detail

MBID:s are hexadecimal and written (as usual) in format of 8-4-4-4-12 integers (making 32 + 4 = 36 characters; to rule out hyphens). But where does it say (MusicBrainz Identifier - MusicBrainz Wiki does not) that MB has characters 0—9 mapped to values 0—9 in that order, and a—f as 10—15 (which is not always the case)? Also, at User:Jesus2099/Merge Into Oldest MBID - MusicBrainz Wiki, of the example pair given:
https://musicbrainz.org/recording/16d2e14a-2418-4a36-b0d0-3b592d827b84
Recording “Danses Polovtsiennes "Le Prince Igor"” by Royal Philharmonic Orchestra, Sir Thomas Beecham - MusicBrainz
16d2e14a-… is said to be more recent than 6c2b3dd3-…; as I understand it, “1 is a higher value than 6” (also for the last 12-digit part “3 is higher than b”) which should not be. The parenthesized (11,136,141) > (1,590,412) but how are those numbers related to MBID:s?

And yet in a current example, 69c5b36c > 0e012c51 that is,
0e012c51-4507-4d26-96ec-c9c6a8b5400b ← 69c5b36c-f9d1-412d-be31-8e46af5c5678
16,343,547 ← 17,101,496
(N.b. MBID order reversed to have both arrows pointing left.)

Guessing solution hides in “most significant bits” and MBID ‘age’ has to be calculated?

jesus2099 · January 16, 2022, 3:46pm

I don’t think MBID are generated in sequence.
I don’t think you can order MBID to find their age, their characters are rather random, no?
Edit history does not tell either because it is biased by earlier merges.

What I use is the database row ID, that you can see in the recording sidebar, both in the rating start links and in the Merge link.

Griomo · January 16, 2022, 4:05pm

Assumed as much as MBID not being visibly “ordered”. So, from the edit link https://musicbrainz.org/edit/86229202 , where can an editor find that database row ID in fewer steps than manually chasing down (and hoping for) the add-edit-with-earliest-timestamp?

Edit: The only search hit in documentation for “row ID” (0 for “database row ID”) is Development / ws / js - MusicBrainz : “Simply provide the tracklist row id in a call to /ws/js/tracklist” which presupposes ID already known (for tracklist, no hits for MB entities). Also, n.b. that “webservice […] isn’t version and may change at any time, please do not rely on it for anything important, use /ws/2 instead. (also note that the documentation here is currently incomplete and outdated).”
What is ‘database row ID’ actually?

Edit2: At a recording MBID page, f.i.

hovering over rating stars shows a link “[…]rate/?entity_type=recording&entity_id=17101496&rating=[…]” .
Is that it? It matches what your script puts into parentheses, but not “row ID”.

Edit3: The same entityID under merge link. This should be listed in plain sight under Details’ tab.

jesus2099 · January 16, 2022, 6:02pm

From there, you can click the sidebar Raw edit data for this edit link.

Or you can go to both Recordings and look at their rating star links or Merge link, in their sidebar.

I don’t know but this row ID changes each time the entity is merged into another one.
It keeps only the merge target row ID.