Vote: Policy on rotten links

Automatic link to the last archived version will work only in case when an original site has been taken down. If the site has been taken over by another owner, and the old content has been replaced with something else (totally irrelevant for MusicBrainz purposes), the last version on archive.org will be the wrong one. In the case which triggered this discussion, the last genuine version is as of 2015, all later versions already has nothing to do with the orchestra.

1 Like

I believe a source of disagreement is that you consider the primary purpose of the external URL differently from me and some other MusicBrainz users.

If I understand correctly, for you the primary purpose of the URL is to provide a historically accurate address where the information was available in the past, even if it is not available there anymore.

For me the primary purpose is to provide access to the content on the best-effort basis, not necessary in the original location. If the original location has been taken down but the content was archived, from my point of view an archive location is more useful than a historically accurate dead link. You are right that “the archive url was never an official homepage”, but at the same time you are wrong because by following the link to an archive the user could see the content of the former official homepage.

Both approaches definitely have advantages and disadvantages. A perfect solution would be to be able to provide an alternative “archived” location for obsolete links, explicitly marked as “archived”.

6 Likes

Since URLs are entities capable of having relationships, another possibility for clearly indicating archive links while still keeping the original URL around could be having an URL-URL relationship along the lines of “archived version of”? The original URL could still be set as ended (possibly with the date, if known). The archived URL could be automatically displayed if the source URL is marked as ended.

This could also make an automated system easier, since it would be possible for a bot to crawl URLs and add archived versions without having to know if the site really is gone, all without disrupting actual live data.

5 Likes

I think if it’s a 404 than just mark it ended. If it brings you to a site that can potentially be virus laden or pornographic, etc. I’d remove it.

1 Like

Or display ended URLs as plain text, not hyperlinks, any more.

9 Likes

I voted +1 for WayBackMachine archive links (they are used on other platforms such as Wikipedia), however justcheckingitout was correct in that WBM is no use if it was never captured in the first place.

One thing I would suggest then is that when people add URL’s to run it quickly through ye olde WBM (with outlinks ticked as that can help others).

3 Likes
6 Likes

I’m not an SEO, but would keeping around broken, spam, or malicious(!) links penalize MusicBrainz?

What value are we providing, and to whom, by serving up bad URLs?

1 Like

I think it could serve archiving purposes? Like “at some point in time this artist had their homepage at this url” even if it’s not working anymore.
I also think that leaving those URL as plain text as @jesus2099 said would be the safest choice.

5 Likes

Maybe we can rely on the end date of relationship to link to the last relevant snapshot, or something like that.

2 Likes

The whims of search engines shouldn’t be a factor in deciding how we should store our data, IMO. The value we are providing with outdated URLs is exactly the same as that provided by not deleting entries of inactive artists.

4 Likes

Are you sure? When I read this, it looked kind of like an admin being interested in improving MusicBrainz rankings in Google. So I have to assume that if storing links to spam/malware harms MusicBrainz, then we wouldn’t want to practice this policy.

How so? MusicBrainz pages can be used for tagging, following relationships, etc. Assuming we’re not talking about Internet Archive links, what can someone do with a URL that points to spam or malware?

2 Likes

Seeing as this is somewhat independent of the main discussion here (and is relevant even if a decision is not reached about whether to change things) – I’ve created a bug report for this particular idea.

3 Likes

I would definitely agree that we shouldn’t care about the “whims of search engines”, but I’m not sure I feel the same way about the “behavior of search engines”.
And I would say that dealing with search engines can/should be addressed in how we display our data as opposed to how we store our data.

I see two potential consequences here:

  1. linking to questionable (whatever that means) sites could affect the page-rank of musicbrainz in search engines.
  2. linking to questionable sites could affect the page-rank of those questionable sites in search engines. This raises a sort of ethical question about being a part of inflicting such sites on the world, as well as a practical question of whether we provide an incentive for questionable site-admins to spam musicbrainz with bad links.
1 Like

The main thing that occurs to my mind is that they serve as a useful sort of additional disambiguation. In my personal usage patterns, I’ve found myself trying to cross-reference my personal music-information archive with musicbrainz, and sometimes all I have in a particular entry in my archive is a band name and a URL (and usually a date for a concert not yet in musicbrainz), and it may be a 10-year old entry in my archive. And it’s generally a small-time indie band (or a DJ or a solo artist) who may or may not be very original in choosing a band name.

5 Likes

I’ve ticked “other” and “remove obsolete links”. What I’d prefer is – since, as pointed out above, it’s not in mb control what the wayback archive stores nor can mb assure a valid capture exists there – that broken links would get moved “out of sight” and only be visible like an edit history. Searching for an obsolete link in the wayback archive could be an (userscript) option, but it might be better not to return an easily clickable wb link per default.

Also, hiding rotten links might prevent bad things for metadata grabbers / 3rd party services using mb data. Must say that I’m not sure if this would really be an issue, just thinking it could be and an additional “invalid” flag added to a link is probably not good enough for cases where links lead to now nefarious pages.

1 Like

The vote is closed. The sum is more than 100% since it was possible to choose more than one variant. Results:

  • 60%: Keep obsolete link “as is”, add validity date to mark it as obsolete. As @jesus2099 noted, just mark as “ended” if the date is unknown.
  • 50%: Replace obsolete link with a link to Wayback Machine (archived version of the original site).
  • 21%: Remove obsolete link.
  • 5%: Other.
  • 0%: Keep obsolete link “as is”.

Majority (60%) is for keeping obsolete link “as is”, marking it as obsolete by either setting the validity date (if known) or just marking as “ended”.

@bsammon had created a ticket to show “ended” links in plain text, to prevent users from accidentally following obsolete links. Please consider voting for the ticket:

5 Likes

Since ended links are now not clickable anymore, can I get any opinions on my suggestion earlier in this thread?

When I go through the pages saved in the Wayback Machine to find a timestamp that contains the correct information, it would be nice to store this somehow to save others the trouble if they want to look up the information as well.

5 Likes

Makes sense from my perspective. If you create a ticket on MetaBrainz JIRA, I will vote for it.

1 Like

I opened https://tickets.metabrainz.org/browse/STYLE-1450.

7 Likes