CD Baby store closes on 31 March

Tags: #<Tag:0x00007f510eb479c0>

Starting on April 1st, our retail store will transition into a download portal where fans can download previous purchases and redeem download cards.

The store wasn’t profitable enough, so CD Baby is closing it down. If there’s anything you want to scrape (or save in the Internet Archive), now is probably the time.

Some courtesy links:


Thanks, there are a number of artist profiles that may not be elsewhere that I will need to move/put in the waybackmachine.


If the Wayback Machine fails on any page, try For me it worked successfully on release pages that Wayback Machine couldn’t process.


Should we alert the IA to this, so we can get some help archiving everything? (I know this was floated in the thread where it was revealed FreeDB would be shutting down the same day.)


Sounds like a good idea to me. I sent an email to let them know about the closure and about the problem that the Wayback Machine has with archiving product pages. (Also made a donation 'cause they rock.) :slightly_smiling_face:


Sorry if I’m late in noticing this, but the Internet Archive has made a major update to the Wayback Machine, and it can now save CD Baby album pages.


I’ve been having issues connecting to the IA’s servers to save all these pages. Right now, it’s alternating between 502s and ERR_CONNECTION_REFUSED. I hope this means they’re stressed out from lots of people archiving CD Baby pages before the latter site shuts down…

ETA: These issues are intermittent. I also occasionally get internal server errors, 503s and plain ol’ timeouts.

I hope this post doesn’t dissuade anyone from archiving all the links they can.

1 Like

This came up on the IA forums a while back. Iirc, someone claimed that persistent errors of this sort went away after they cleared cache (and maybe cookies and site settings/data).

If you already did that and still get 502s or 503s, maybe it has to do with the National Emergency Library going online?

1 Like

The day is finally upon us…

Anyone have any thoughts on how to scrape Google’s search results and cache without risking an IP ban? Most of the unarchived data was for artists and albums from the past three years. Google has at least 75,000 albums from this time period in their cache, plus more than 500,000 artist pages.

The goal is to rescue artist bios, artist area info, album descriptions, personnel credits, and recording details that can’t be found anywhere else.


@reosarevok, could we disable cdbaby URL cleanup now?
Or another solution for my issue:
A CD was linked to
When I wanted to mark this URL as ENDED, it was priorly automatically changed from to

This disables the possibility to explore archived version of and only leads to empty Web Archive for instead… :confused:


IA never fully crawled, so there are only a few saved pages here and there. Before the store’s subdomain changed in May 2017, was crawlable for many years; IA has lots of archives from that period. Also, the URLs for album pages stayed the same for more than ten years.

The best approach might be to use the Wayback Availability JSON API to determine which artist/album pages were archived from each of the two subdomains. Some pages can only be found at, some only at [www.] Since each subdomain represents a different time period, you need both to have a more complete history.


I just would like that MB didn’t change my URL when I added the ENDED flag.
As cdbaby doesn’t exist, we should (or could?) remove this normalisation code.

1 Like

I agree, your link should not have been changed.

In fact, even if a CD Baby URL hasn’t been marked “ended”, I don’t think the URLs should be “updated” to Some artist/album pages only had functional URLs at, while others were created during the era (May 2017 – March 2020).