Please fix MineoBot

Tags: #<Tag:0x00007f2a52145720> #<Tag:0x00007f2a52145590>

To those operating MineoBot on Wikidata (if not @Mineo), could someone make sure that the bot is not adding incorrect MusicBrainz IDs to items which are separated into work/release group? MineoBot added a release group to The Man With the Child in His Eyes (Q55776482), even though the release group is correctly linked to The Man with the Child in His Eyes / Moving (Q3521758), the item for the single. The bot seems to have inferred the link from the (incorrect) links to the Wikipedia articles, which are linked to the composition items on Wikidata. (@Moebeus has separated all, I think, of Kate Bush’s singles into composition, track and single. All of the Wikidata items are linked to the respective works, recordings and release groups.)

3 Likes

The problem here is not the bot but wrong data on MusicBrainz’s side. If the Wikipedia links are wrong, we should remove them :slight_smile: (in fact, we should probably remove them anyway, but much more so if they’re wrong!). The bot is actually a great way of finding wrong links that need removing :slight_smile: (but make sure to also remove the wrong info on the Wikidata side).

4 Likes

On the Wikidata side it’s a bit of a mess, and removing “wrong info” would entail a lot more than just changing the MusicBrainz identifiers. There is no “official” convention for how music data is organized, which means that basically every item for 20th and 21st century music that wasn’t edited by myself or @Moebeus represents more than one MusicBrainz entity; e.g. albums tend to have identifiers that refer to different editions, and singles are almost always conflated with songs. This is partly a consequence of imports from Wikipedia infoboxes, in which songwriting credits are combined with production credits, audio snippets, singles chronologies and YouTube links.

I did draft an RfC which would be an attempt to formalize a MusicBrainz-esque ontology, but there is a lot of work to do before anything is usable; e.g. there should probably be a separate property for specifying main/featured artist credits, but the property proposal discussion has been stalled (derailed, to some extent).

Going back to the “wrong info” thing, the easiest way to fix the item structuring problem in Wikidata would probably be to import whole items from MusicBrainz (probably release groups, recordings and works), which would also involve editing and/or merging existing items to maintain the current data. This would, of course, involve dealing with a fair amount of other issues at the same time.

(earlier discussion: Links to Wikidata)

One thing I’ve seen is it’s following anchors - it should probably ignore those, since those are all not supposed to even be there.

Would it be possible and/or acceptable to remove all Wikipedia links with anchors through the API?

Not possible at the moment, although a bot would. I’d like to know how many are there, maybe they can just be either removed or fixed by humans.

Maybe I misunderstood but to me it seems perfect if a WD item represents a release group (several editions) rather than a specific edition.

The same item for song and its A side single release is really usual and OK, no?
We just link the same WD to both song and single.

1 Like

Wikidata already has an established system of separate items for different editions of books based on the FRBR model; see WikiProject Books. It definitely makes sense to separate out different release dates and tracklists and such – we have a tracklist property, and not having editions would make it more difficult to indicate e.g. which tracks are only on the deluxe edition and only on the Japan edition and so on. It would also make it more difficult for data consumers to interpret that data.

This doesn’t really work (hence the amount of time spent manually reconstructing MusicBrainz items); there are a lot of cover versions with Wikidata items, which should be linked to MB recordings (and some should also be split into track/single); and there are singles with two articles, two songs covered by one article about a single, songs with multiple release groups (i.e. released as a single more than once), and so on.

Also, it doesn’t really make sense if all of the databases Wikidata links to use a completely different structure to Wikidata itself. Why would we want to do something completely different that conflates distinct entities and is more difficult to interpret? I don’t think it would be useful to explicitly allow items with both an ISRC and an ISWC—i.e. this isn’t really something that exists, so if you want to have actually useful music data please go to these other databases conveniently linked here, and never come back.

Well for me WD is just a multilingual WP link so I don’t mind when a WD is linked to several relevant MB entities and when an MB entity is linked to several relevant WD.
As long as from a work, an artist, an event or from a release group, I can get interesting reading in WP through those WD.

1 Like

Also, MineoBot is linking a lot of films, TV series and the like to release groups (where the release groups are for the relevant soundtrack album, which doesn’t have its own item). This is causing a lot of constraint violations and I don’t know if MineoBot should be doing that.

2 Likes

That sounds like we had a lot of crappy links :frowning: We were never supposed to link to those from the soundtrack. Hopefully the constraint violations will help us remove them, but that’s kinda annoying.

1 Like

noting that I’ve had to revert MineoBot more than once on two different items, not sure what’s happening here

Another user has complained on Mineo’s Wikidata talk page. Would blocking the bot’s account (on Wikidata or on the MusicBrainz server) be necessary? Mineo has not made any Wikidata edits since 20 December and has not made any MusicBrainz edits since 26 December, and on Wikidata being unresponsive to bot issues would usually be grounds for temporarily blocking the bot.

That Wikidata comment was just added, but I have stopped the bot for a bit (I don’t run it but have access to the Docker container) so that people can catch up and improve stuff as needed. It has more open edits on the MB side than it should, in any case, so giving it some downtime is probably a good idea :slight_smile:

2 Likes