Wayback Machine downloads

psychoadept · July 5, 2020, 7:44am

If a recording was available for free on a website but is no longer, but is archived by the Wayback Machine, is it ok to still put a “download for free” link on the recording?

Specifically:

jesus2099 · July 5, 2020, 8:29am

You can check the ENDED checkbox of the release-URL relationship.
Click the info link of the URL, then the small pencil.

If you know the end date, you can also set that.

But we should store the original URL, not the archived one.
User scripts take care of providing the matching archived links on ENDED URLs.

psychoadept · July 5, 2020, 9:09am

What userscripts, specifically? And why not link directly to the archived URL rather than relying on userscripts?

jesus2099 · July 6, 2020, 8:04am

It’s mb. ALL LINKS in my GitHub - jesus2099/konami-command: power‐ups for various web sites repository.

Keep documenting the genuine URL is more clear and historically informative
We would have to edit dead links from genuine URL to archive URL manually
The web archive format could change and we would have to manually edit all archived links again

So it’s better to leave genuine URL in place and the MB website would show you the computed archive link (for all ended URL links).

psychoadept · July 6, 2020, 4:14pm

I mean, I can add in the historical URL based on wayback machine, but this is a newly added recording so at this point the extra effort goes into updating it with the old link, not the new one.

In any case, my question was really about whether it’s appropriate to use wayback machine downloads in place of the originals, so I guess that’s a yes?

bsammon · July 6, 2020, 5:56pm

I really don’t like the idea of designing/configuring Musicbrainz in a way that assumes/requires that people will use User Scripts. Isn’t that effectively assuming/requiring that people will use one of a very limited set of browsers?
I don’t even like assuming/requiring that users have Javascript turned on (currently only the case for maybe 50% of editing functionality and 5% of viewing functionality)

The ideal solution to this would be to update the schema & editing UI so that a URL entry can have a “original URL” attribute and an “archive URL” attribute. And the “archive URL” field would pop up in the edit page whenever the “item has ended” checkbox is checked.

mmirG · July 7, 2020, 1:52am

I’ve got a similar view as bsammon.
And add another negative of that pathway - it makes MB more unfriendly to those global citizens who aren’t members of the techological elite.

I can see in the short and medium term having a website that is neg-UX without scripts can be a reasonable solution to balancing resources against UX for a wide range of people.
But in the longer term it seems very exclusionary in its effect.

This goes to the idea of MB deciding who its target populations are.
And then prioritising delivering a encyclopedia those populations can use enjoyably.

highstrung · July 7, 2020, 6:33am

Wouldn’t a computed archive link be misleading in the cases where specific URLs aren’t available from the wayback machine? In my experience that’s a significant percentage.

psychoadept · July 7, 2020, 6:37am

Or it might be there in another location, for that matter. I’ve noticed that website redesigns result in all kinds of screwy archives.

jesus2099 · July 7, 2020, 2:22pm

We should just keep the original URL and MB should compute and display the archive URL when it’s ENDED:

yindesu · July 7, 2020, 3:50pm

What about when somebody else buys the domain and turns it into spam or malware? I don’t think an end date property addresses this very well.

bsammon · July 7, 2020, 4:58pm

My concern with this is that I doubt a computer-calculated archive URL would be as good as one that had been researched/verified by an actual person. “https://web.archive.org/web/*/” URLs can be easily computer-calculated, but they are often just links to a page that says “Here’s 50 times we tried to archive the page–go spend an hour figuring out which ones are useful”.
Determining which of the various snapshots on web.archive.org are useful is a more challenging task (for a computer program) and if someone has already gone to the trouble to manually research it, they should be able to share their results with the world.

Also, anyone interested in this from a technical standpoint should research Wikipedia’s Internet Archive bot (I haven’t) which is itself not 100% foolproof.

bsammon · July 7, 2020, 5:06pm

Also, interesting reading at https://en.wikipedia.org/wiki/Wikipedia:Link_rot for some insight on how others deal with this issue.

bsammon · July 7, 2020, 5:25pm

more interesting reading:

a previous discussion of the topic:

jesus2099 · July 7, 2020, 6:57pm

We can use the date of add and it will link to the nearest available archived page.

jesus2099 · July 7, 2020, 7:00pm

We can display the URL (which is already hidden in the relationships tab, away from the sidebar), as text only, not add hyperlink. As tooltip, maybe.

aerozol · July 7, 2020, 9:50pm

Would it solve it to auto link to the last archived page before the end date?

And the link is only a direct link if there is an end date - we could have a broader ‘ended’ checkbox that perhaps just makes it link to the Wayback Machine page that shows all the archived dates, and the user can choose where to go from there. …or just do that for all of them I guess.

highstrung · July 7, 2020, 11:52pm

The end date would presumably be the date when an editor noticed the link is no longer valid; we don’t know how long before that the link stopped working.

When archive.org has multiple snapshots, finding the last meaningful snapshot is a manual process, in my experience. I’ve found numerous cases where later snapshots are just error pages.

aerozol · July 8, 2020, 12:12am

Naturally text and buttons would be tweaked to make it clear that the end date relates to when you know a site has been removed/ to link to a specific wayback page.

Your point re. multiple later snapshots being error pages is specifically what it would be looking to address.

highstrung · July 8, 2020, 1:01am

I still don’t see how. If I notice in July 2020 that example.com/some-page is no longer working, I put that as the end date of the link. But the last good snapshot may be from 2018 with a bunch of error page snapshots between then and now. I don’t know a way to find that last good snapshot other than manually going to the wayback machine and hunting through all the snapshots they have.