Entity merging tools “mb. MERGE HELPOR 2”

Thanks @jesus2099. I get what you’re saying about the IDs and references into MB from outside.

Earlier in this thread, you said:

I don’t read that as the same issue you’re describing in your wiki page. Do you think that merges can be destructive to data that’s entirely within MBz?

Yes, at least to some degree. If I merge “Symphony no. 1 in D major, op. X: I. Adagio” by “Orchestra X, Conductor Y” into “Adagio” by “Composer”, that’s IMO destructive. Sure, it can be reconstructed from a tracklist and a script, but it has made the data worse in the meantime :slight_smile:

Yes, if you wrongly merge you cannot tell this AcoustID was for this track and that AcoustID was for that other track.
Same for ISRC, unless the Add ISRCs edits were will referenced (which is rare because ISRC web service does not allow edit notes, you have to come back to edit history to ad them manually).
Same for relationships, etc.
Splitting unmerging recordings is difficult and you will have to wipe out at least the AcoustIDs.

4 Likes

For the record, I have now, at last, added some checks so that this script is not used to skip edit notes.

6 Likes

I don’t like this change, the script wouldn’t let me submit this edit. I’ve downgraded to the 2018 version.

1 Like

For info there is a bug that currently prevents displaying info on recording merge page.

This bug happens since the addition of AcoustID on merge page, which is great:

Update

It should be fixed now.
I hope my patch doesn’t bring any side effects.

4 Likes

Thank you, the updated version is working for me.

2 Likes

Oh. I’ll check that.

1 Like

How do I tell from 2 MBID:s themselves (by just looking at them, not going further with script, editing history or else) which one is the oldest=lowest?
I would expect f.i. 16d2e14a < 6c2b3dd3 but am wrong:

Detail

MBID:s are hexadecimal and written (as usual) in format of 8-4-4-4-12 integers (making 32 + 4 = 36 characters; to rule out hyphens). But where does it say (MusicBrainz Identifier - MusicBrainz Wiki does not) that MB has characters 0—9 mapped to values 0—9 in that order, and a—f as 10—15 (which is not always the case)? Also, at User:Jesus2099/Merge Into Oldest MBID - MusicBrainz Wiki, of the example pair given:
https://musicbrainz.org/recording/16d2e14a-2418-4a36-b0d0-3b592d827b84
Recording “Danses Polovtsiennes "Le Prince Igor"” by Royal Philharmonic Orchestra, Sir Thomas Beecham - MusicBrainz
16d2e14a-… is said to be more recent than 6c2b3dd3-…; as I understand it, “1 is a higher value than 6” (also for the last 12-digit part “3 is higher than b”) which should not be. The parenthesized (11,136,141) > (1,590,412) but how are those numbers related to MBID:s?

And yet in a current example, 69c5b36c > 0e012c51 that is,
0e012c51-4507-4d26-96ec-c9c6a8b5400b ← 69c5b36c-f9d1-412d-be31-8e46af5c5678
16,343,547 ← 17,101,496
(N.b. MBID order reversed to have both arrows pointing left.)

Guessing solution hides in “most significant bits” and MBID ‘age’ has to be calculated?

I don’t think MBID are generated in sequence.
I don’t think you can order MBID to find their age, their characters are rather random, no?
Edit history does not tell either because it is biased by earlier merges.

What I use is the database row ID, that you can see in the recording sidebar, both in the rating start links and in the Merge link.

2 Likes

Assumed as much as MBID not being visibly “ordered”. So, from the edit link https://musicbrainz.org/edit/86229202 , where can an editor find that database row ID in fewer steps than manually chasing down (and hoping for) the add-edit-with-earliest-timestamp?

Edit: The only search hit in documentation for “row ID” (0 for “database row ID”) is Development / ws / js - MusicBrainz : “Simply provide the tracklist row id in a call to /ws/js/tracklist” which presupposes ID already known (for tracklist, no hits for MB entities). Also, n.b. that “webservice […] isn’t version and may change at any time, please do not rely on it for anything important, use /ws/2 instead. (also note that the documentation here is currently incomplete and outdated).”
What is ‘database row ID’ actually?

Edit2: At a recording MBID page, f.i.

hovering over rating stars shows a link “[…]rate/?entity_type=recording&entity_id=17101496&rating=[…]” .
Is that it? It matches what your script puts into parentheses, but not “row ID”.

Edit3: The same entityID under merge link. This should be listed in plain sight under Details’ tab.

From there, you can click the sidebar Raw edit data for this edit link.

Or you can go to both Recordings and look at their rating star links or Merge link, in their sidebar.

I don’t know but this row ID changes each time the entity is merged into another one.
It keeps only the merge target row ID.

1 Like

Aha, good to know. Thanks again.

1 Like

That is not the way to merge. You merge to the item with the most details, best quality details. Often this is the older ones. Please don’t just merge to the oldest without looking.

When you have two different recordings you need to look at them. Find the best quality one to use as the target. Or the one with the most releases associated to it. There is no rule that says “oldest is best”.

1 Like

There is no rule either to target the one with most details or the one that has the most releases.
The merged recording will contain all details and releases, in the end, whatever the direction.

There is a link in the OP to why I prefer merging to the oldest.

So the most important is to make sure that these recordings are indeed the same before merging them, add it is impossible to revert to previous situation.

A major reason I will check for the most used version as a merge target is due to how other people use the data on the outside world. If you have a recording with 100 tracks associated to it, and another recording that is older and only one track associated, then maintaining that most used MBID makes life easier for anyone else using the data.

I know that an old MBID is kept and redirected, but I still think of users of media centres like KODI that use the MBIDs to link the data. For then I will always target the more accurate and better used item.

To quote guidelines: How To Merge Recordings - MusicBrainz

you should choose the recording with the most correct information. If there is no real difference, the usual choice is the oldest entry.

And we do sometimes loose data. There has been a bug which I don’t think is yet fixed which means AcoustIDs are not currently merged.

2 Likes

It’s the exact reason why I target the oldest.
Check out my link in OP.
The one that has the most things is maybe due to a recent merge NOT in the correct direction, so, towards a brand new MBID, that cannot be already used outside.

I did read your OP. I just get bored of hearing “must be new to old” as that is a myth. I agree that is it usually the way, but not always. Better to take a quick check.

Hardest things I find to merge are VA releases. Often find myself double checking my edits after using Mass Merge and finding a couple of recordings that need to be flipped in direction due to more common use. Number of attached AcoustIDs is often a good hint. (Can’t remember which script it is that shows the AcoustIDs on the “current edits” page but that is REALLY handy for a quick check)

1 Like

Certainly, but I have some grip on how to check for best & most detail. The oldest ID then is a good point of favour between 2 equally detailed duplicates, but somewhat eluding to the eye.

True but maybe all these good things were linked to an MBID that recently disappeared in favour of a brand new MBID (into which it is no longer good to merge, benefit of age is already lost).
It very often happens that people adds duplicates then merge existing into their duplicates.

It’s why older row ID is more sure about which MBID is actually older, not amount of data.