Thanks @culinko for bring this topic to discussion. My mind is really not set despite my strong personal preference to not merge old entities into recent duplicates.
It’s been a personal obsession to merge newer duplicates into pre‐existing entities rather than the opposite.
Even if the target data has to be fixed at the same time (better title style or missing artist credits).
I had made a user script to ease spotting the oldest created MBID (mb. MERGE HELPOR 2) and a wiki page to list the known inconvenients of merging old entities into newer ones.
But maybe the inconvenience is mostly theoritical to me as I don’t use flickr machine tags any more and as mb. COLLECTION HIGHLIGHTER is just a user script.
The MERGE HELPOR also has other useful features for merge editors (mostly on the merge page itself), such as:
Highlights last clicked entity
Shows additional info for each about to be merged entities — An old example for works: Imgur: The magic of the Internet — each entity types has its set of added info, like amount of recordings, works, etc. for artists
Indicates oldest MBID
“Remove entity from merge” buttons
“Clear queue and add to merge” button — doesn’t reload page for nothing when nothing is checked (currently broken feature, apparently)
Other than the age of the MBID, does it make a difference which direction the merge goes? If I merge a newer (but more complete) entry into an older but sparser one, will anything be lost? (If not, why doesn’t MB just pick for us based on the oldest MBID?)
Yes, source data is partially lost.
Merge edits are destructive.
I can list what is lost depending on the type of entity that is merge when I have some more time.
Hi @jesus2099. Do you have time to expand on your argument? I often read claims and counter-claims in edit notes about the better way to merge. See for example https://musicbrainz.org/edit/52711568, a recording merge edit.
I’m generally happy with a merge either way when the entities are equally good. But otherwise, if one has a better name, or artist credit (for a recording or release) or the like, I’d always merge into the most correct, whatever is the oldest. Otherwise, the resulting entity needs to be improved again.
Except a few images I had uploaded, I don’t really use flickr MB machine tags.
But it’s an example of what is broken by targeting the duplicate instead of the pre existing MBID.
I do use COLLECTION HIGHLIGHTER, which is also broken by this.
If I find other problems, I will add them to that wiki page.
Thanks @jesus2099. I get what you’re saying about the IDs and references into MB from outside.
Earlier in this thread, you said:
I don’t read that as the same issue you’re describing in your wiki page. Do you think that merges can be destructive to data that’s entirely within MBz?
Yes, at least to some degree. If I merge “Symphony no. 1 in D major, op. X: I. Adagio” by “Orchestra X, Conductor Y” into “Adagio” by “Composer”, that’s IMO destructive. Sure, it can be reconstructed from a tracklist and a script, but it has made the data worse in the meantime
Yes, if you wrongly merge you cannot tell this AcoustID was for this track and that AcoustID was for that other track.
Same for ISRC, unless the Add ISRCs edits were will referenced (which is rare because ISRC web service does not allow edit notes, you have to come back to edit history to ad them manually).
Same for relationships, etc.
Splitting unmerging recordings is difficult and you will have to wipe out at least the AcoustIDs.
How do I tell from 2 MBID:s themselves (by just looking at them, not going further with script, editing history or else) which one is the oldest=lowest?
I would expect f.i. 16d2e14a < 6c2b3dd3 but am wrong:
And yet in a current example, 69c5b36c > 0e012c51 that is,
0e012c51-4507-4d26-96ec-c9c6a8b5400b ← 69c5b36c-f9d1-412d-be31-8e46af5c5678 16,343,547 ← 17,101,496
(N.b. MBID order reversed to have both arrows pointing left.)
Guessing solution hides in “most significant bits” and MBID ‘age’ has to be calculated?
I don’t think MBID are generated in sequence.
I don’t think you can order MBID to find their age, their characters are rather random, no?
Edit history does not tell either because it is biased by earlier merges.
What I use is the database row ID, that you can see in the recording sidebar, both in the rating start links and in the Merge link.
Assumed as much as MBID not being visibly “ordered”. So, from the edit link https://musicbrainz.org/edit/86229202 , where can an editor find that database row ID in fewer steps than manually chasing down (and hoping for) the add-edit-with-earliest-timestamp?
Edit: The only search hit in documentation for “row ID” (0 for “database row ID”) is Development / ws / js - MusicBrainz : “Simply provide the tracklist row id in a call to /ws/js/tracklist” which presupposes ID already known (for tracklist, no hits for MB entities). Also, n.b. that “webservice […] isn’t version and may change at any time, please do not rely on it for anything important, use /ws/2 instead. (also note that the documentation here is currently incomplete and outdated).”
What is ‘database row ID’ actually?
Edit2: At a recording MBID page, f.i.
hovering over rating stars shows a link “[…]rate/?entity_type=recording&entity_id=17101496&rating=[…]” .
Is that it? It matches what your script puts into parentheses, but not “row ID”.
Edit3: The same entityID under merge link. This should be listed in plain sight under Details’ tab.