CD Baby store closes on 31 March

Starting on April 1st, our retail store will transition into a download portal where fans can download previous purchases and redeem download cards.

The store wasn’t profitable enough, so CD Baby is closing it down. If there’s anything you want to scrape (or save in the Internet Archive), now is probably the time.

Some courtesy links:


Thanks, there are a number of artist profiles that may not be elsewhere that I will need to move/put in the waybackmachine.


If the Wayback Machine fails on any page, try For me it worked successfully on release pages that Wayback Machine couldn’t process.


Should we alert the IA to this, so we can get some help archiving everything? (I know this was floated in the thread where it was revealed FreeDB would be shutting down the same day.)


Sounds like a good idea to me. I sent an email to let them know about the closure and about the problem that the Wayback Machine has with archiving product pages. (Also made a donation 'cause they rock.) :slightly_smiling_face:


Sorry if I’m late in noticing this, but the Internet Archive has made a major update to the Wayback Machine, and it can now save CD Baby album pages.


I’ve been having issues connecting to the IA’s servers to save all these pages. Right now, it’s alternating between 502s and ERR_CONNECTION_REFUSED. I hope this means they’re stressed out from lots of people archiving CD Baby pages before the latter site shuts down…

ETA: These issues are intermittent. I also occasionally get internal server errors, 503s and plain ol’ timeouts.

I hope this post doesn’t dissuade anyone from archiving all the links they can.

1 Like

This came up on the IA forums a while back. Iirc, someone claimed that persistent errors of this sort went away after they cleared cache (and maybe cookies and site settings/data).

If you already did that and still get 502s or 503s, maybe it has to do with the National Emergency Library going online?

1 Like

The day is finally upon us…

Anyone have any thoughts on how to scrape Google’s search results and cache without risking an IP ban? Most of the unarchived data was for artists and albums from the past three years. Google has at least 75,000 albums from this time period in their cache, plus more than 500,000 artist pages.

The goal is to rescue artist bios, artist area info, album descriptions, personnel credits, and recording details that can’t be found anywhere else.