MusicBrainz doesn't seem to appear in many search results?

Avamander · January 14, 2020, 4:52am

Just wondering, is everything looking alright on MusicBrainz’s Google Search Console and sitemaps and such get refreshed in reasonable timeframes?

I’m asking this because even when specifically searching for artists on Google I really haven’t noticed any MusicBrainz results, which is IMHO strange because MusicBrainz is a good source of information in many cases.

Zas · January 14, 2020, 8:18am

This is definitively an issue, I’m doing a lot of searches for artists and can confirm sites like discogs, allmusic, and metalarchives often appear on 1st google results page, and MB is perfectly invisible even when it’s the sole site of this kind having infos about an artist.
Sitemaps are submitted, googlebot is crawling, but it seems to me there’s an issue. We know the site is likely to be ranked down because it isn’t mobile compatible (but devs are working on react move, that should let us to fix that in near future).
But still my feeling is that something is wrong, because searching for a not very popular band which is in MB since years still doesn’t yield to results.

Any SEO expert around?

jesus2099 · January 14, 2020, 8:51am

I think I usually find MB on second page in duckduckgo (But I usually use the !mb “bang” system to directly use MB search).

Update, here is a few cases:

https://duckduckgo.com/?q=blankey+jet+city 2nd page
https://www.qwant.com/?q=ajico 3rd page
https://duckduckgo.com/?q=ajico+ペピン tail of 2nd page with artist + single title
https://duckduckgo.com/?q=ajico+live head of 2nd page with artist + album title
https://duckduckgo.com/?q=陰陽座 4th page
https://duckduckgo.com/?q=onmyo-za “nowhere” with Latin alias (I think we should add all aliases in keywords in meta tags of the artist page, same for works, etc.)
https://duckduckgo.com/?q=山口百恵 nowhere (very famous)
https://duckduckgo.com/?q=Phương%20Thanh nowhere either
https://duckduckgo.com/?q="Phương%20Thanh" not better with quotes

Update 2, I’ve heard of Qwant, I wanted to test (same set):

https://www.qwant.com/?q=blankey%20jet%20city 1st page
https://www.qwant.com/?q=ajico 4th page
https://www.qwant.com/?q=ajico+ペピン nowhere with single title (?)
https://www.qwant.com/?q=ajico+live 1st page with album title
https://www.qwant.com/?q=陰陽座 nowhere (?)
https://www.qwant.com/?q=onmyo-za of course not better with alias
https://www.qwant.com/?q=山口百恵 nowhere
https://www.qwant.com/?q=Phương%20Thanh nowhere
https://www.qwant.com/?q="Phương%20Thanh" not better with quotes

mmirG · January 14, 2020, 8:53am

My un-informed suspicion is that if MB had Google-ads appearing on the screen then MB would appear higher in search results.

In general, finding pages that are not monetized has become, over the last say 5 years, very difficult on Google.

Avamander · January 14, 2020, 2:36pm

Harder to find without ads, maybe, but it doesn’t seem to exist or be indexed in a majority of cases.

Avamander · January 14, 2020, 2:50pm

If it’s not a secret, what does the “Coverage” page on search console show? And does the “Sitemap” page show the sitemap list and all the sitemaps?

I had trouble Google ignoring my sitemap list (the sitemaps in it weren’t listed and crawled) and had to make a script to submit every index page into the sitemap list, maybe this is the case here too?

Zas · January 14, 2020, 3:09pm

Not a secret, here are screenshots from search console:

Few errors, mostly due to transient connectivity issues.

Avamander · January 14, 2020, 3:12pm

What does it say as the explanation for the excluded pages? (“Crawled but not indexed”?) Just wondering also if there are any missing sitemaps or sitemaps that haven’t been read for years as well?

Zas · January 14, 2020, 3:18pm

Status	Type	Validation	Pages
Warning	Indexed, though blocked by robots.txt	Not Started	55590
Error	Submitted URL seems to be a Soft 404	Started	526
Error	Submitted URL has crawl issue	Started	30
Error	Submitted URL blocked by robots.txt	N/A	0
Error	Server error (5xx)	N/A	0
Excluded	Crawled - currently not indexed	N/A	382728
Excluded	Discovered - currently not indexed	N/A	303697
Excluded	Duplicate without user-selected canonical	N/A	218684
Excluded	Alternate page with proper canonical tag	N/A	67009
Excluded	Blocked by robots.txt	N/A	66672
Excluded	Soft 404	N/A	46018
Excluded	Page with redirect	N/A	22286
Excluded	Crawl anomaly	N/A	13767
Excluded	Excluded by ‘noindex’ tag	N/A	20
Excluded	Duplicate, submitted URL not selected as canonical	N/A	12
Excluded	Not found (404)	N/A	4
Valid	Indexed, not submitted in sitemap	N/A	590560
Valid	Submitted and indexed	N/A	5437

Avamander · January 14, 2020, 3:51pm

This sounds a bit suspicious, especially in this amount, is this intentional or has Google missed a bunch of sitemaps?

200k page is a lot, what does it count as a duplicate? Is the canonical URL configuration correct?

This looks very familiar though, I have a site with a similar issue, a lot of pages being excluded:

I tried a lot of things, noticed that the only noticeably working “cure” was to reduce the amount of errors, for every fixed error page Google seems to be willing to index tens if not hundreds more pages. My hunch is that Googlebot crawls one site until it encounters an error and then it enters some massive cooldown period, skips the broken page sometimes and continues.

Fabe56 · January 14, 2020, 8:05pm

I didn’t take the time to try all your sets but looks similar with Bing.

Bitmap · January 15, 2020, 1:01am

This sounds a bit suspicious, especially in this amount, is this intentional or has Google missed a bunch of sitemaps?

In the sitemaps, we actually only list pages that have embedded JSON-LD markup, to ensure those are fully ingested by Google (we even supply hourly, incremental sitemap updates to them). The only reason we have sitemaps to begin with is because they contracted us to embed semantic markup (JSON-LD) in our pages, and needed a way for us to ping them when any of the markup changed.

So I’m not surprised if it says a ton of pages aren’t in the sitemaps.

200k page is a lot, what does it count as a duplicate? Is the canonical URL configuration correct?

I checked which URLs it’s complaining about for this, and the vast majority are random URLs from our FTP site, nothing MusicBrainz related. For example… http://ftp.musicbrainz.org/pub/ros/ros_docs_mirror/electric/api/bmp085/html/structbmp085__smd500__calibration__param__t-members.html

Though there are a few MB ones like Brickfoot - MusicBrainz, I assume because ?va=0 is a no-op there. That’s something we could improve.

Avamander · January 16, 2020, 2:42pm

Maybe worth excluding the ftp share using robots.txt?

You can use this page and set va and any other no-op parameters there as “No: Doesn’t affect page content (ex: tracks usage)”.

Avamander · January 16, 2020, 2:43pm

Any chance they could be contacted again to see why Google is indexing so few pages?

mmirG · January 17, 2020, 11:25am

Here is an example of a Release that does not appear in my Google results:

Freso · January 18, 2020, 12:03pm

The “issue” is that ?va=0 is sometimes not a noop, so it does sometimes affect page content. Maybe the solution would be to not include that link for artists where it wouldn’t display/change anything? Edit: The problem with that solution would then be that users wouldn’t be “trained” to having it be there, which might (or might not!) be UX issue. Whatever we do, it’s always a compromise…

Avamander · January 19, 2020, 5:00am

This seems like a fair compromise? IMHO the lack of visibility on search engines seems a bit worse than not having users being trained to have the parameter there.

Freso · January 19, 2020, 2:57pm

I’ve made a ticket for this one thing now at least:

chaban · May 27, 2020, 5:23am

It’s kinda amazing how so many pages are not shown by Google yet there are folks so desperate that they want to remove releases from MB in hopes to influence Google results.

An editor has concerns that the BBC Music links could negatively affect the indexing of MB.

I’ve searched for that Brazilian artist incl. disambiguation and birth year
Guess what, neither MB nor BBC showed up but some SEO crap did. And Discogs of course.

Zas · May 27, 2020, 9:38am

The thing is that Googlebot is fairly active, it’s crawling our websites almost permanently, so no, Googlebot (but also BingBot and others) are activily crawling.
But we still don’t appear in results, even for entities being in the database since years.

It is indexed.

It appears on second page for me, when searching on title.

Capture du 2020-05-27 11-29-11

I think what is a big factor is the fact most pages have no original textual content, we have mostly links to external resources, and bits of data, titles with only few words (tracklist), birth date, etc… for most bots I think our pages look “empty”. Also since they are text-empty, short excerpts shown are rather unattractive, not sure many people select them.
We don’t provide audio player, or ways to buy the actual music either, biographies shown are coming from Wikipedia mostly, and reviews are done elsewhere (CritiqueBrainz).

We don’t even have cover art and/or artists photos and/or label imprint images shown by default (or not at all). Also we have good quality data, but we lack quantity (if one wants to know all LP releases of one album he has better chance on discogs, plus he would be able to buy one directly from there).

And MusicBrainz website isn’t really mobile-compatible yet (this doesn’t help).

In short, I think our bad ranking is more due to the very nature of MusicBrainz, rather than the lack of indexation.