MusicBrainz editor bot: FresoBot

So I’ve been wanting to dive into messing more around with the “raw” data of MusicBrainz for a while, and I’ve finally gotten started on it. One of the things I wanted was to see if it was possible to revive some of the bots that we used to have scour the database and fix up obvious derps and outdated data bits…

This led me to create…

FresoBot!

FresoBot is, code‐wise, a fork of @murdosfork of @lukzmusicbrainz-bot written in Python (2). FresoBot has a few (so far minor) updates/touch-ups to the codebase, but most importantly, it has a new script/module: spotify_url_cleanup.py.

After some testing, I’m now letting the bot free on MusicBrainz.org. To begin with, I’ve only let it do 25 edits. I looked them over myself, and they all look good to me, but I’m not going to let it do more for now until everyone else has had a chance to look its edits over, so please give https://musicbrainz.org/user/FresoBot/edits a look.

If no one has complained about any of its edits by tomorrow morning (say, about 12 hours from now), I’ll let it do 50 more then. If no one has any complaints/objections 24 hours from that, I’ll let it do 100/day. @yvanzo tells me «there are about 5K URLs to be cleaned up», so it should be doable to get Spotify URLs cleared up in a couple of months at that rate, without flooding the edit queue at the same time. :slight_smile:

I hope you will all welcome @FresoBot nicely! :heart:

PS. This bot is written on my own time and not as part of my MetaBrainz contract. The MetaBrainz Foundation is not involved in the bot’s development in any way.

16 Likes

Very cool! What other features have you been thinking about?

There’s one already listed at https://musicbrainz.org/user/FresoBot - I’ll try and add more as I come up with them. Murdos used to do a lot Discogs data matching. Maybe I’ll try and revive some of those. I’ve also been thinking about using the Spotify data to add additional Spotify URLs to MusicBrainz (e.g., if a Release has a Spotify URL and the MusicBrainz and Spotify releases have the same amount of tracks and they’re called roughly the same in the same order, add Spotify track URLs to all Recordings; if a Release has a Spotify URL but any Artist(s) involved do(es) not and their names are the same/similar, add Spotify links to artists). Of course, this latter part will be much easier if all the Spotify URLs are (fairly) uniform, which they are not at all right now… :wink:

4 Likes

I just ran another 50 edits of the Spotify URL cleanup script. If no one has any complaints by tomorrow, I’ll start doing 100/day (up to the limits set by the Bot Code of Conduct).

5 Likes

I have voted, they are all good as the old URL does indeed redirect to your target URL.

Here is a nice edit search to vote: FresoBot open edits not already voted by the person reading this post. :wink:

I did also dream of making some bot edits, but never took enough time to make it, it’s quite a lot of work to start the whole thing…

Anyway, very good one, here! :clap:

3 Likes

@Freso: Just one more thing: It should check that the new URL doesn’t already exist before! In such case, the best option is to edit old URL relationships to use the existing clean URL instead.

It is exactly what happened for 26 URLs (over 20K) with MBS-9597: Update VGMdb URLs to use https.

Erratum: Actually such edit is fine even in this case as it merges both URLs. This issue holds for direct database update only, as in MBS-9597, not for edits entered by a bot, as here.

2 Likes

Note that «Yes-voting on bot edits is discouraged unless the voter can 100% confirm they’re correct, since it helps them to go through with less eyes on them. If a bot edit gets rejected, a non-bot user can always re-enter it if he feels it’s correct - reverting the edit is much more difficult, especially for removals and merges.» Of course, in these cases, it is fairly easy to 100% confirm that they are correct, so maybe it’s okay to get voting on them and clear the queue so it doesn’t hit the “max. 2000 open edits at a time” limit (and so the changes go through sooner). :slight_smile:

2 Likes

I’ve made and run a new script, exit_url_cleanup.py which made I guess around 200 edits. I double and triple and quadruple checked the output several times, incl. a handful of the actually created edits on test and on the live database when I let it loose there. Many of the URLs need further cleaning than what this does, but it’s a step closer to being usable URLs. (Also lots of relationships that need fixing, which I also hope to be able to make the bot do.) Anyway, just a heads up. :slight_smile:

1 Like

I still haven’t gotten any complaints about any edits the bot has made, so I’ve been thinking about upping this to 200/day starting next Monday. If no one has any objections, I’ll Make It So™.

3 Likes

Could FresoBot also think about the Amazon Art issue? Would seem a useful task for a bot to handle.

Locating when CAA cover is missing and an ASIN link in place, could the bot then be the one running around downloading from Amazon and then uploading to CAA?

It could be really handy as it could mark it as “FresoBot Automatically Uploaded Art from Amazon”

It could, but I wouldn’t. Artwork from Amazon is in many instances not correct, and I have no way to verify every edit made by the bot, and I don’t want to make an automated instance upload wrong data.

5 Likes

If the ASIN link is already wrong, the displayed cover is actually also wrong.
It doesn’t get “wronger” if you upload this cover to CAA.
The big advantage would be that such a cover is available “forever” on CAA, not only for the time remaining until Amazon starts to charge MB and others for this service.
With a comment like @IvanDobsky “FresoBot Automatically Uploaded Art from Amazon” everyone knows, that the quality and matching for this cover isn’t 100% guaranteed.

2 Likes

Even if the ASIN is right, the cover may be wrong (and, by extension, the wrong ASIN may actually provide the right cover art!). Saying that “this Release corresponds to this ASIN” is making a statement that the Amazon Standard Identification Number corresponds to our MusicBrainz Identifier. Uploading the cover art to CAA is making a statement that that cover art is the correct cover art for the release. Those are two very different statements that are independent of each other.

Either way, I’m not going to do it for FresoBot.

5 Likes

But if below the cover you see “provided by Amazon” you already expect it to be wrong. You’ll think: Ah, that’s about what the cover of this album looks like, but nobody uploaded the correct cover for this very release yet.

5 Likes

I don’t believe that the majority of MB users really think this way.
They assume: What I get/see here is basically correct. It’s an encyclopedia. So every link to an external source is accurate.
Or do you really assume that every link to other external sources like Wikipedia, Facebook, Soundcloud, iTunes, Google Play etc is just a “looks like” or “maybe true”?

No, you’re right. Most users probably don’t think that way and when they e.g. download the cover art via picard they expect it to be correct. Then again most of them probably don’t care much if the cover art is a bit different.
But I meant the editors and not the users. If an editor sees an uploaded cover art they probably expect it to be correct, but if they see a cover art provided by amazon they might feel the need to find and upload the correct cover art.

5 Likes

« the … cover is … wrong … such a cover is available “forever” on CAA »

This is even wronger.

3 Likes

Not really, because you can always switch it to the matching release .-) :wink:
The risk that CAA is closing down is (IMHO) smaller then Amazon starts charging or prevent the direct linking to artwork on their servers.

I still have 60,000 MB Artist to Discogs links see http://reports.albunack.net/mbartist_discogsartist_report2.html and Is there any kind of project to improve holes in MusicBrainz coverage that I would like to add but have no mechanism to do so. Since you mention Discogs links perhaps you could add these, I can make the list available as a csv file.

3 Likes

It is wronger to have a wrong data.
Having no data is good, having wrong data is no good. Here is why:

Because having data (wrong or not), then chances that a human will edit is less than having no data.
Automatically setting wrong data will prevent human edits.

Human edits are either same low level (using linked cover without checking) or high level (eye check that it’s the correct version of the cover).

Bot edits will never eye‐check as thoroughly.

How do you do that? :thinking:

4 Likes