MusicBrainz doesn't seem to appear in many search results?

Tags: #<Tag:0x00007f839e48dc48>

Just wondering, is everything looking alright on MusicBrainz’s Google Search Console and sitemaps and such get refreshed in reasonable timeframes?

I’m asking this because even when specifically searching for artists on Google I really haven’t noticed any MusicBrainz results, which is IMHO strange because MusicBrainz is a good source of information in many cases.

7 Likes

This is definitively an issue, I’m doing a lot of searches for artists and can confirm sites like discogs, allmusic, and metalarchives often appear on 1st google results page, and MB is perfectly invisible even when it’s the sole site of this kind having infos about an artist.
Sitemaps are submitted, googlebot is crawling, but it seems to me there’s an issue. We know the site is likely to be ranked down because it isn’t mobile compatible (but devs are working on react move, that should let us to fix that in near future).
But still my feeling is that something is wrong, because searching for a not very popular band which is in MB since years still doesn’t yield to results.

Any SEO expert around?

10 Likes

I think I usually find MB on second page in duckduckgo (But I usually use the !mb “bang” system to directly use MB search).

Update, here is a few cases:

Update 2, I’ve heard of Qwant, I wanted to test (same set):

My un-informed suspicion is that if MB had Google-ads appearing on the screen then MB would appear higher in search results.

In general, finding pages that are not monetized has become, over the last say 5 years, very difficult on Google.

5 Likes

Harder to find without ads, maybe, but it doesn’t seem to exist or be indexed in a majority of cases.

1 Like

If it’s not a secret, what does the “Coverage” page on search console show? And does the “Sitemap” page show the sitemap list and all the sitemaps?

I had trouble Google ignoring my sitemap list (the sitemaps in it weren’t listed and crawled) and had to make a script to submit every index page into the sitemap list, maybe this is the case here too?

1 Like

Not a secret, here are screenshots from search console:


Few errors, mostly due to transient connectivity issues.

3 Likes

What does it say as the explanation for the excluded pages? (“Crawled but not indexed”?) Just wondering also if there are any missing sitemaps or sitemaps that haven’t been read for years as well?

Status Type Validation Pages
Warning Indexed, though blocked by robots.txt Not Started 55590
Error Submitted URL seems to be a Soft 404 Started 526
Error Submitted URL has crawl issue Started 30
Error Submitted URL blocked by robots.txt N/A 0
Error Server error (5xx) N/A 0
Excluded Crawled - currently not indexed N/A 382728
Excluded Discovered - currently not indexed N/A 303697
Excluded Duplicate without user-selected canonical N/A 218684
Excluded Alternate page with proper canonical tag N/A 67009
Excluded Blocked by robots.txt N/A 66672
Excluded Soft 404 N/A 46018
Excluded Page with redirect N/A 22286
Excluded Crawl anomaly N/A 13767
Excluded Excluded by ‘noindex’ tag N/A 20
Excluded Duplicate, submitted URL not selected as canonical N/A 12
Excluded Not found (404) N/A 4
Valid Indexed, not submitted in sitemap N/A 590560
Valid Submitted and indexed N/A 5437
1 Like

This sounds a bit suspicious, especially in this amount, is this intentional or has Google missed a bunch of sitemaps?

200k page is a lot, what does it count as a duplicate? Is the canonical URL configuration correct?

This looks very familiar though, I have a site with a similar issue, a lot of pages being excluded:

I tried a lot of things, noticed that the only noticeably working “cure” was to reduce the amount of errors, for every fixed error page Google seems to be willing to index tens if not hundreds more pages. My hunch is that Googlebot crawls one site until it encounters an error and then it enters some massive cooldown period, skips the broken page sometimes and continues.

2 Likes

I didn’t take the time to try all your sets but looks similar with Bing.

2 Likes

This sounds a bit suspicious, especially in this amount, is this intentional or has Google missed a bunch of sitemaps?

In the sitemaps, we actually only list pages that have embedded JSON-LD markup, to ensure those are fully ingested by Google (we even supply hourly, incremental sitemap updates to them). The only reason we have sitemaps to begin with is because they contracted us to embed semantic markup (JSON-LD) in our pages, and needed a way for us to ping them when any of the markup changed.

So I’m not surprised if it says a ton of pages aren’t in the sitemaps.

200k page is a lot, what does it count as a duplicate? Is the canonical URL configuration correct?

I checked which URLs it’s complaining about for this, and the vast majority are random URLs from our FTP site, nothing MusicBrainz related. For example… http://ftp.musicbrainz.org/pub/ros/ros_docs_mirror/electric/api/bmp085/html/structbmp085__smd500__calibration__param__t-members.html

Though there are a few MB ones like https://musicbrainz.org/artist/3ebb2aa0-c5ac-4aaf-b654-d6ad17526508?va=0, I assume because ?va=0 is a no-op there. That’s something we could improve.

5 Likes

Maybe worth excluding the ftp share using robots.txt?

You can use this page and set va and any other no-op parameters there as “No: Doesn’t affect page content (ex: tracks usage)”.

Any chance they could be contacted again to see why Google is indexing so few pages?

Here is an example of a Release that does not appear in my Google results:

The “issue” is that ?va=0 is sometimes not a noop, so it does sometimes affect page content. Maybe the solution would be to not include that link for artists where it wouldn’t display/change anything? Edit: The problem with that solution would then be that users wouldn’t be “trained” to having it be there, which might (or might not!) be UX issue. Whatever we do, it’s always a compromise… :slight_smile:

2 Likes

This seems like a fair compromise? IMHO the lack of visibility on search engines seems a bit worse than not having users being trained to have the parameter there.

I’ve made a ticket for this one thing now at least:

3 Likes