MusicBrainz doesn't seem to appear in many search results?

I didn’t take the time to try all your sets but looks similar with Bing.

2 Likes

This sounds a bit suspicious, especially in this amount, is this intentional or has Google missed a bunch of sitemaps?

In the sitemaps, we actually only list pages that have embedded JSON-LD markup, to ensure those are fully ingested by Google (we even supply hourly, incremental sitemap updates to them). The only reason we have sitemaps to begin with is because they contracted us to embed semantic markup (JSON-LD) in our pages, and needed a way for us to ping them when any of the markup changed.

So I’m not surprised if it says a ton of pages aren’t in the sitemaps.

200k page is a lot, what does it count as a duplicate? Is the canonical URL configuration correct?

I checked which URLs it’s complaining about for this, and the vast majority are random URLs from our FTP site, nothing MusicBrainz related. For example… http://ftp.musicbrainz.org/pub/ros/ros_docs_mirror/electric/api/bmp085/html/structbmp085__smd500__calibration__param__t-members.html

Though there are a few MB ones like Brickfoot - MusicBrainz, I assume because ?va=0 is a no-op there. That’s something we could improve.

5 Likes

Maybe worth excluding the ftp share using robots.txt?

You can use this page and set va and any other no-op parameters there as “No: Doesn’t affect page content (ex: tracks usage)”.

Any chance they could be contacted again to see why Google is indexing so few pages?

Here is an example of a Release that does not appear in my Google results:

The “issue” is that ?va=0 is sometimes not a noop, so it does sometimes affect page content. Maybe the solution would be to not include that link for artists where it wouldn’t display/change anything? Edit: The problem with that solution would then be that users wouldn’t be “trained” to having it be there, which might (or might not!) be UX issue. Whatever we do, it’s always a compromise… :slight_smile:

2 Likes

This seems like a fair compromise? IMHO the lack of visibility on search engines seems a bit worse than not having users being trained to have the parameter there.

I’ve made a ticket for this one thing now at least:

3 Likes

It’s kinda amazing how so many pages are not shown by Google yet there are folks so desperate that they want to remove releases from MB in hopes to influence Google results.

An editor has concerns that the BBC Music links could negatively affect the indexing of MB.

I’ve searched for that Brazilian artist incl. disambiguation and birth year
Guess what, neither MB nor BBC showed up but some SEO crap did. And Discogs of course.

3 Likes

The thing is that Googlebot is fairly active, it’s crawling our websites almost permanently, so no, Googlebot (but also BingBot and others) are activily crawling.
But we still don’t appear in results, even for entities being in the database since years.

It is indexed.

It appears on second page for me, when searching on title.

Capture du 2020-05-27 11-29-11

I think what is a big factor is the fact most pages have no original textual content, we have mostly links to external resources, and bits of data, titles with only few words (tracklist), birth date, etc… for most bots I think our pages look “empty”. Also since they are text-empty, short excerpts shown are rather unattractive, not sure many people select them.
We don’t provide audio player, or ways to buy the actual music either, biographies shown are coming from Wikipedia mostly, and reviews are done elsewhere (CritiqueBrainz).

We don’t even have cover art and/or artists photos and/or label imprint images shown by default (or not at all). Also we have good quality data, but we lack quantity (if one wants to know all LP releases of one album he has better chance on discogs, plus he would be able to buy one directly from there).

And MusicBrainz website isn’t really mobile-compatible yet (this doesn’t help).

In short, I think our bad ranking is more due to the very nature of MusicBrainz, rather than the lack of indexation.

9 Likes

I google:
“Music for Millions: Vol. 1” monada
and get as results this thread, including images, and an Amazon listing.
Nada Musicbrainz.
Google seems to index but not display here.

(I’ve just spent 20 minutes trying to get the browser of my choice, a fat slow incontinent dog that might need to be put down soon, to take that screen shot. Perhaps time for me to step away from the device.)

Googling:
site: musicbrainz.org “Music for Millions: Vol. 1” monada
= still no listing of Release displayed here.

2 Likes

1st or 2nd result for me, despite having France country checked: https://duckduckgo.com/?q=site%3Amusicbrainz.org+“Music+for+Millions%3A+Vol.+1”+monada

1 Like

If I do the same request, results are clearly showing, but if I remove site:musicbrainz.org from query, only one page of results, and no Musicbrainz at all…

2 Likes

@mmirG, can you link the release we are supposed to find? Is there a typo?

Come on, please paste a normal MB link.
My browser says I should not go there, security threat or something.

2 Likes

I copy and paste URL.
And something else appears in post.
Interesting.
Let’s try again.

2 Likes

I know websites that only contain descriptions of specific numbers, plaintext equivalent of hashes, digits of pi, plus the massive amount of SEO spam that’s indexed. It’s very weird that it’s Musicbrainz that gets excluded from the index that much IMO.

Hopefully MBS-10573 gets fixed some time soon and we can see if Google just dislikes duplicates.

This could be a bigger issue than the lack of textual data, it’s hard to predict.

@Zas

I’m thinking though, even if va=1 has an effect, it wouldn’t really hurt to mark it as “No, doesn’t affect page content”? The possibility that Google indexes fewer pages because of the duplicates is at least in my opinion, worse, than the chance that a bit of information might get left out on some pages.

Just noticed that with one artist Google even generates an infobox based on Musicbrainz data but doesn’t index Musicbrainz itself. So shady.

I hope they keep up the donations, because right now they’re hiding MB from potential contributors and donators.

2 Likes

Here is an example of an artist I added 3 days ago and already appears on Google.

Found their name on Spotify in writer credit section. Before it didn’t appear on Google.

2 Likes