A counterpoint here could be the second-largest Wikipedia, the German one, which is infamous for its notability and citation strictness for new articles.
Which keeps a lot of people from contributing.
The suggestion I was making to create a bot to add these links, therefore:
- The bot would not add all 64,000 items in one go it would abide by the bot limits
- This is only adding a link from a MusicBrainz artist to a Discogs artist, so there are not issues of spelling mistakes ectera, it would just be adding a link.
Are you a db knowledgable person?
Do such people agree about whether adding those 64K links is almost certain to go without any problems?
What would be the probability of anything going wrong with the adding process?
Seeing mention of 64,000 links sounds like a worryingly high number to trust to a script to just import en-mass to fill some gaps. Especially as it would then not be clear as to which links were carefully added by a human who was checking their data, and which was just mass imported.
I think we should add all 64,000 Discogs links. This is not a blind process. Don’t forget what the list actually is, and how these specific records were selected. The list was created from very conservative criteria to ensure that the risk of error in any entry is extremely low:
- The artist name is unique in both databases.
- The artist name in each database is associated with a release of exactly the same title.
These criteria were created (by a person) because it is especially low risk. The criteria seek to eliminate one of the potential variables: an ambiguous artist name. The release title is a further affirmation that the artist is a match.
Let’s consider some possible scenarios that could result in a mismatch:
Two different artists with the same name also have a release with a shared name. Not impossible, but think about it: In order for this mismatch to happen, MusicBrainz must have Artist A but not Artist B, while Discogs must have Artist B but not Artist A. That coincidence is so improbable, it would actually be really funny if it happened even once!
Both artists are unique in Discogs and MB because they are have releases for different artists in both databases, and no one has caught the error yet. For example: Imagine that there are actually two artists with the name Xipazzo Q. Onslaught, but everyone thinks there is only one, so they attribute all releases to the same artist in both database. By linking the artists, MB reinforces the misconception that the record is authoritative. (In library science, this would be called an “authority control” problem.)
In order for this to happen, the two artists would have to be similar enough to confuse editors. Like, they play the same instrument or the same genre, or they were active in the same country during the same time period. In other words, a rather ambiguous entry. I submit that in a case like this, a cursory review by a human editor is unlikely to discover the error, either. It would probably require some careful research or specific knowledge about the music.
These are the kinds of extremes that would result in a mismatch. How many of these do you think there are in the dataset?
Technically, given the limitations on bot edits and the fact the type of edit being made is already done frequently and involves no changes to existing database queries, I would think nil. The possibilities @sibilant describes seem less remote than a technical problem.
I have thought about writing this myself but it is something that could go wrong quite easily.
One things we could do instead is build a parallel database that is built for bots not humans.
This database would sit along side and have the information ready for a human editor to check and import the data.
When someone visits the artist page in musicbrainz there would be a prompt suggesting there is potentially a missing release and allow you to seed the release editor with this information.
There are some sites already doing this but we could add more features to these to make them easier to use and detect when there are releases missing.
We are in danger of going off-topic here as the question was about a very specific task (linking Discogs artists to musicbrainz artists)
I don’t understand how your database idea is for bots rather than humans, its sounds like it is for humans because you are just providing data that a human can then use to seed a release.I have already done a version of this with albunack.net you can browse by artist, and it shows you both MusicBrainz and Discogs albums and whether they are already linked, it then makes it easy to link or import a Discogs release into MusicBrainz. However although its easy to seed the release this way its still rather slow to do things this way because MusicBrainz is very slow adding a release, and if an attempt it made to add more than a few at a time there is a good chance MusicBrainz will add the basic release but not add the tracks because of a bug in MusicBrainz.
That is basically what we already have with http://reports.albunack.net/mbartist_discogsartist_report2.html thanks to @ijabz
I think we should not conflate different things here. This is about a very specific import, which IMHO is pretty straight forward, rather safe and does no big harm if there are some mistakes in it.
We all know that importing things like releases or entire artist discographies is much more difficult and much more likely to have quality issues. But this is not the case here.
If I’m understanding this post and a link is all that would happen, It sounds like a great tool. I would suggest that the link be shown in a special color (colour ) font or wording to help us immediately recognize it for what it is.
A lot of this discussion does seem a little overcautious for something as simple as an external link, I see little reason not to add those and correct them in the cases where they are inaccurate. But for more major edits (e.g. adding entirely new artist or release entities) I still am cautious about mass-auto-edits.
In any case, flagging automated edits in some way would be nice, so that they can be reviewed/monitored as what they are.
Thank you for the patronising comment. So are you saying I should not take part in the conversation if I do not work with databases every day of the week? Wow!
@ijabz The main thing I’d ask is something nice and clear in the comments that can be understood by normal editors. Something that links to a thread\description about the import. Just so people know to double check the link. Discogs links can often be taken as perfect when that isn’t always the case.
@sibilant is thinking similar lines to me. Some of the more obscure corners of MB have some weird \ confused data. Duplicate artists, etc. Wouldn’t want the two sets of confusion to confuse each other.
I know when entering some odd punk bands I have found artists confused on both sides due to how they were shown on the covers. Info I have needed to go and correct in both databases. Information that stood out when comparing both as a whole and cross referencing elsewhere.
It seems sensible idea, as long as it stands out in the comments. I can also see plenty of checks are already done.
I also trust @ijabz to even more want to see this as correct. To you it isn’t just some database experiment, but a source of more accurate data. It comes back to being about the music.
You mis-read me. I am trying to gauge how much weight to put on your comments about things going wrong when adding the data. I’ve got absolutely no knowledge in that area.
And congrats on staying cool when you’re responding to a patronisng comment.
I prefer having
- 3 common releases among singles and albums and EP only (excluding compilations and lives which often have common names)
- Exclude all release titles containing following words: best, hits, greatest, anthology, songs, ベスト, コレクション, ヒッツ, ゴールド , ゴールデン, gold, golden
Last time a bot (initials: L.A.) massively added lots of stuff, we are still fixing it years later.
OK, it was releases so it is much more damaging than discogs links.
This is not bad idea in principle however often an artist is well represented either in Discogs or MusicBrainz but not both, just look at the first page (artists are ordered by total number of releases discogs + mb) and you’ll see that in many cases there is only one release in either mb or discogs.
Go down to page 20 and the total number of releases mb + discogs is only seven
So the chances of getting three matches is quite low
I could do this if it was required to get agreement
Not fair to lump me in with this really.
In my opinion, the criteria that ijabz has established are strict enough to combine the arithmetic of dicogs and Mb.
You should not be too fussy with such an update, otherwise nothing will ever come out.
Yes, those statements sound sensible.
Given the specifics of this import, I am absolutely fine with this.
Ideally the bot will be created and used only for this specific import, to make these edits easy to find. The bot’s bio could explain the matching used.
Edit notes could perhaps specifically list the criteria for that match (“same name (X) and unique in both + share these exact release names: A, B, C”, but them in nice English and layout). (Including the artist name is important, in case artists get renamed later).
I would exclude self-titled releases, too.
The only thing I can say