Internet Archive apparently has (far!) less items in their "coverartarchive" than what MB's stats report

coverartarchive
internetarchive
Tags: #<Tag:0x00007f0764f874c0> #<Tag:0x00007f0764f87380>

#1

If you look up all items in the “coverartarchive” collection at Internet Archive you end up with just over 1300 (1313 as of this writing), but MusicBrainz’s Cover Art statistics reports that 780,717 Releases (as of last update) should have cover art. This is a discrepancy of 779404 items/releases.

Does anyone know what’s going on here? @bitmap?


Google not indexing CAArchive (or parts thereof)
#2

It looks like a problem with the Cover Art Archive’s search. For example, searching for “Noisia” gives just one hit, but if you look in MusicBrainz you’ll find that most releases do have cover art.

At least it seems that MusicBrainz’ statistics are correct.


#3

Yeah, I think the error is on the Internet Archive side as well, which means that you can’t reliably do something like

import internetarchive as ia

search = ia.search_items('collection:coverartarchive')

for result in search:
    ia.get_item(result['identifier'])).download()

(for downloading the entire coverartarchive) or other possibly nifty things via the IA’s own API. :frowning:


#4

Yep, they’re removing the coverartarchive collection from their public search indexes because, as I was told, the results are causing confusion with their digitized CD results. I guess we should be building our own CAA search eventually.


Google not indexing CAArchive (or parts thereof)
#5

Right. They are still there, and available, but they do not come up in the search. there are 834,452 items (covers).

The reason we did this is that they seem confusing to normal users. We are working on getting our music collection fixed up, and thought it would not hurt MB’s use of the cover art archive to have them archived but not come up in searches. Sound reasonable?

-brewster