Vote: Policy on rotten links

alex_s7 · July 25, 2020, 7:20pm

This is a follow-up to the topic Policy on rotten links. Since there is no official policy, and there are different opinions, let us vote and create a policy

What shall be done when a “rotten link” is detected, that is a link from MusicBrainz to an external web site, where the external site is either permanently down, or has been transformed in such a way that it is utterly irrelevant for MusicBrainz purposes?

Multiple choices are possible. The vote remains active for 1 week, until 1.08.2020.

Keep obsolete link “as is”
Keep obsolete link “as is”, add validity date to mark it as obsolete
Remove obsolete link
Replace obsolete link with a link to Wayback Machine (archived version of the original site)
Other (please explain)

0 voters

jesus2099 · July 25, 2020, 8:13pm

You don’t need to set an end date. Sometimes it’s hard to know. You can just mark it as ended.

regagain · July 26, 2020, 5:26am

If a link relationship is marked as ended, perhaps it would be good if the link would automatically point to the page’s entry in the Wayback Machine.

phonebox · July 26, 2020, 10:02am

Adding any archive url as an official homepage, a purchase url etc. is incorrect as the archive url was never an official homepage for the given url nor was the music ever purchaseable on the archive url. Following this, removing the correct url doesn’t make sense. Url date editing would become a hassle if there’s no original url to edit.

Domain name ownerships change all the time but it shouldn’t affect whether or not we keep an url on the database. Generally the only reason an url should be deleted is if it was incorrect in the first place and such a service, as the url relationship describes, never existed on the url. How MB and other services want to display the “ended” relationships is one thing but the meaning from the database point of view seems clear.

I think (automatic?) archive urls would be a good thing but not if they aren’t marked as such.

justcheckingitout · July 26, 2020, 3:42pm

adding the archive link will only work if the link was archived beforehand. once a link is dead, it is too late to be archived.
and, i think adding archived links should somehow be noted as ‘archived’ - in addition to ‘ended’.

alex_s7 · July 26, 2020, 5:13pm

Automatic link to the last archived version will work only in case when an original site has been taken down. If the site has been taken over by another owner, and the old content has been replaced with something else (totally irrelevant for MusicBrainz purposes), the last version on archive.org will be the wrong one. In the case which triggered this discussion, the last genuine version is as of 2015, all later versions already has nothing to do with the orchestra.

alex_s7 · July 26, 2020, 5:21pm

I believe a source of disagreement is that you consider the primary purpose of the external URL differently from me and some other MusicBrainz users.

If I understand correctly, for you the primary purpose of the URL is to provide a historically accurate address where the information was available in the past, even if it is not available there anymore.

For me the primary purpose is to provide access to the content on the best-effort basis, not necessary in the original location. If the original location has been taken down but the content was archived, from my point of view an archive location is more useful than a historically accurate dead link. You are right that “the archive url was never an official homepage”, but at the same time you are wrong because by following the link to an archive the user could see the content of the former official homepage.

Both approaches definitely have advantages and disadvantages. A perfect solution would be to be able to provide an alternative “archived” location for obsolete links, explicitly marked as “archived”.

elomatreb · July 27, 2020, 9:18am

Since URLs are entities capable of having relationships, another possibility for clearly indicating archive links while still keeping the original URL around could be having an URL-URL relationship along the lines of “archived version of”? The original URL could still be set as ended (possibly with the date, if known). The archived URL could be automatically displayed if the source URL is marked as ended.

This could also make an automated system easier, since it would be possible for a bot to crawl URLs and add archived versions without having to know if the site really is gone, all without disrupting actual live data.

tigerman325 · July 27, 2020, 11:12am

I think if it’s a 404 than just mark it ended. If it brings you to a site that can potentially be virus laden or pornographic, etc. I’d remove it.

jesus2099 · July 27, 2020, 11:28am

Or display ended URLs as plain text, not hyperlinks, any more.

sound.and.vision · July 27, 2020, 12:10pm

I voted +1 for WayBackMachine archive links (they are used on other platforms such as Wikipedia), however justcheckingitout was correct in that WBM is no use if it was never captured in the first place.

One thing I would suggest then is that when people add URL’s to run it quickly through ye olde WBM (with outlinks ticked as that can help others).

jesus2099 · July 27, 2020, 12:55pm

yindesu · July 27, 2020, 3:24pm

I’m not an SEO, but would keeping around broken, spam, or malicious(!) links penalize MusicBrainz?

What value are we providing, and to whom, by serving up bad URLs?

Totosaurio3279 · July 27, 2020, 5:09pm

I think it could serve archiving purposes? Like “at some point in time this artist had their homepage at this url” even if it’s not working anymore.
I also think that leaving those URL as plain text as @jesus2099 said would be the safest choice.

regagain · July 27, 2020, 8:49pm

Maybe we can rely on the end date of relationship to link to the last relevant snapshot, or something like that.

elomatreb · July 27, 2020, 8:57pm

The whims of search engines shouldn’t be a factor in deciding how we should store our data, IMO. The value we are providing with outdated URLs is exactly the same as that provided by not deleting entries of inactive artists.

yindesu · July 27, 2020, 10:09pm

Are you sure? When I read this, it looked kind of like an admin being interested in improving MusicBrainz rankings in Google. So I have to assume that if storing links to spam/malware harms MusicBrainz, then we wouldn’t want to practice this policy.

How so? MusicBrainz pages can be used for tagging, following relationships, etc. Assuming we’re not talking about Internet Archive links, what can someone do with a URL that points to spam or malware?

bsammon · July 28, 2020, 12:35am

Seeing as this is somewhat independent of the main discussion here (and is relevant even if a decision is not reached about whether to change things) – I’ve created a bug report for this particular idea.

bsammon · July 28, 2020, 12:44am

I would definitely agree that we shouldn’t care about the “whims of search engines”, but I’m not sure I feel the same way about the “behavior of search engines”.
And I would say that dealing with search engines can/should be addressed in how we display our data as opposed to how we store our data.

I see two potential consequences here:

linking to questionable (whatever that means) sites could affect the page-rank of musicbrainz in search engines.
linking to questionable sites could affect the page-rank of those questionable sites in search engines. This raises a sort of ethical question about being a part of inflicting such sites on the world, as well as a practical question of whether we provide an incentive for questionable site-admins to spam musicbrainz with bad links.

bsammon · July 28, 2020, 12:52am

The main thing that occurs to my mind is that they serve as a useful sort of additional disambiguation. In my personal usage patterns, I’ve found myself trying to cross-reference my personal music-information archive with musicbrainz, and sometimes all I have in a particular entry in my archive is a band name and a URL (and usually a date for a concert not yet in musicbrainz), and it may be a 10-year old entry in my archive. And it’s generally a small-time indie band (or a DJ or a solo artist) who may or may not be very original in choosing a band name.