MERGE HELPOR 2 :: Entity merging tools


Thanks @culinko for bring this topic to discussion. My mind is really not set despite my strong personal preference to not merge old entities into recent duplicates. :nerd_face:

Continuing the discussion from Your favourite User Scripts for MusicBrainz?! (so that they can be used while redesigning!):

It’s been a personal obsession to merge newer duplicates into pre‐existing entities rather than the opposite.
Even if the target data has to be fixed at the same time (better title style or missing artist credits).
I had made a user script to ease spotting the oldest created MBID (mb. MERGE HELPOR 2) and a wiki page to list the known inconvenients of merging old entities into newer ones.

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID

But maybe the inconvenience is mostly theoritical to me as I don’t use flickr machine tags any more and as mb. COLLECTION HIGHLIGHTER is just a user script.

The MERGE HELPOR also has other useful features for merge editors (mostly on the merge page itself), such as:

  • Highlights last clicked entity
  • Shows additional info for each about to be merged entities — An old example for works: Imgur: The magic of the Internet — each entity types has its set of added info, like amount of recordings, works, etc. for artists
  • Indicates oldest MBID
  • “Remove entity from merge” buttons
  • “Clear queue and add to merge” button — doesn’t reload page for nothing when nothing is checked (currently broken feature, apparently)
4 Likes

Other than the age of the MBID, does it make a difference which direction the merge goes? If I merge a newer (but more complete) entry into an older but sparser one, will anything be lost? (If not, why doesn’t MB just pick for us based on the oldest MBID?)

3 Likes

Yes, source data is partially lost.
Merge edits are destructive.
I can list what is lost depending on the type of entity that is merge when I have some more time.

4 Likes

I can list what is lost depending on the type of entity that is merge when I have some more time.

Yes please. It would be great to have this documented somewhere.

2 Likes

Hi @jesus2099. Do you have time to expand on your argument? I often read claims and counter-claims in edit notes about the better way to merge. See for example https://musicbrainz.org/edit/52711568, a recording merge edit.

3 Likes

My preference for oldest MBID target for merges is explained by two examples below.
It is not a must do, it is just that why not:

https://wiki.musicbrainz.org/User:Jesus2099/Merge_Into_Oldest_MBID

1 Like

I’m generally happy with a merge either way when the entities are equally good. But otherwise, if one has a better name, or artist credit (for a recording or release) or the like, I’d always merge into the most correct, whatever is the oldest. Otherwise, the resulting entity needs to be improved again.

2 Likes

Except a few images I had uploaded, I don’t really use flickr MB machine tags.
But it’s an example of what is broken by targeting the duplicate instead of the pre existing MBID.
I do use COLLECTION HIGHLIGHTER, which is also broken by this.
If I find other problems, I will add them to that wiki page.

Thanks @jesus2099. I get what you’re saying about the IDs and references into MB from outside.

Earlier in this thread, you said:

I don’t read that as the same issue you’re describing in your wiki page. Do you think that merges can be destructive to data that’s entirely within MBz?

Yes, at least to some degree. If I merge “Symphony no. 1 in D major, op. X: I. Adagio” by “Orchestra X, Conductor Y” into “Adagio” by “Composer”, that’s IMO destructive. Sure, it can be reconstructed from a tracklist and a script, but it has made the data worse in the meantime :slight_smile:

Yes, if you wrongly merge you cannot tell this AcoustID was for this track and that AcoustID was for that other track.
Same for ISRC, unless the Add ISRCs edits were will referenced (which is rare because ISRC web service does not allow edit notes, you have to come back to edit history to ad them manually).
Same for relationships, etc.
Splitting unmerging recordings is difficult and you will have to wipe out at least the AcoustIDs.

4 Likes

For the record, I have now, at last, added some checks so that this script is not used to skip edit notes.

6 Likes

I don’t like this change, the script wouldn’t let me submit this edit. I’ve downgraded to the 2018 version.

1 Like

For info there is a bug that currently prevents displaying info on recording merge page.

This bug happens since the addition of AcoustID on merge page, which is great:

Update

It should be fixed now.
I hope my patch doesn’t bring any side effects.

4 Likes

Thank you, the updated version is working for me.

2 Likes

Oh. I’ll check that.

1 Like

How do I tell from 2 MBID:s themselves (by just looking at them, not going further with script, editing history or else) which one is the oldest=lowest?
I would expect f.i. 16d2e14a < 6c2b3dd3 but am wrong:

Detail

MBID:s are hexadecimal and written (as usual) in format of 8-4-4-4-12 integers (making 32 + 4 = 36 characters; to rule out hyphens). But where does it say (MusicBrainz Identifier - MusicBrainz Wiki does not) that MB has characters 0—9 mapped to values 0—9 in that order, and a—f as 10—15 (which is not always the case)? Also, at User:Jesus2099/Merge Into Oldest MBID - MusicBrainz Wiki, of the example pair given:
https://musicbrainz.org/recording/16d2e14a-2418-4a36-b0d0-3b592d827b84
Recording “Danses Polovtsiennes "Le Prince Igor"” by Royal Philharmonic Orchestra, Sir Thomas Beecham - MusicBrainz
16d2e14a-… is said to be more recent than 6c2b3dd3-…; as I understand it, “1 is a higher value than 6” (also for the last 12-digit part “3 is higher than b”) which should not be. The parenthesized (11,136,141) > (1,590,412) but how are those numbers related to MBID:s?

And yet in a current example, 69c5b36c > 0e012c51 that is,
0e012c51-4507-4d26-96ec-c9c6a8b5400b ← 69c5b36c-f9d1-412d-be31-8e46af5c5678
16,343,547 ← 17,101,496
(N.b. MBID order reversed to have both arrows pointing left.)

Guessing solution hides in “most significant bits” and MBID ‘age’ has to be calculated?

I don’t think MBID are generated in sequence.
I don’t think you can order MBID to find their age, their characters are rather random, no?
Edit history does not tell either because it is biased by earlier merges.

What I use is the database row ID, that you can see in the recording sidebar, both in the rating start links and in the Merge link.

2 Likes

Assumed as much as MBID not being visibly “ordered”. So, from the edit link https://musicbrainz.org/edit/86229202 , where can an editor find that database row ID in fewer steps than manually chasing down (and hoping for) the add-edit-with-earliest-timestamp?

Edit: The only search hit in documentation for “row ID” (0 for “database row ID”) is Development / ws / js - MusicBrainz : “Simply provide the tracklist row id in a call to /ws/js/tracklist” which presupposes ID already known (for tracklist, no hits for MB entities). Also, n.b. that “webservice […] isn’t version and may change at any time, please do not rely on it for anything important, use /ws/2 instead. (also note that the documentation here is currently incomplete and outdated).”
What is ‘database row ID’ actually?

Edit2: At a recording MBID page, f.i.

hovering over rating stars shows a link “[…]rate/?entity_type=recording&entity_id=17101496&rating=[…]” .
Is that it? It matches what your script puts into parentheses, but not “row ID”.

Edit3: The same entityID under merge link. This should be listed in plain sight under Details’ tab.

From there, you can click the sidebar Raw edit data for this edit link.

Or you can go to both Recordings and look at their rating star links or Merge link, in their sidebar.

I don’t know but this row ID changes each time the entity is merged into another one.
It keeps only the merge target row ID.

1 Like