Is anything being done with the MB blog now?

1 Like

I think this is a side effect of @julian45 protecting the blog from AI scraping with anubis. We’ll need to figure out how to make this come back :slight_smile: Thanks for noticing!

2 Likes

Same problem when you paste MB links in Discourse forum:

Making sure you're not a bot! (https://tickets.metabrainz.org/browse/MBS-9558) says “Making sure you’re not a bot!” at the moment.

No problems with OneBox enabled sites, like URL from MusicBrainz.org and wiki.MusicBrainz.org.

1 Like

@jesus2099

I can get in on this ticket.

But the link text is computed by Discourse as “Making sure you’re not a bot! ” instead of “Normalise usernames for Discourse SSO”.

Aah, I guess I can’t help with that.

is this the same reason the website seems to spike a reasonable amount of times in the last days? (like every minute or so)

The AI spam is causing this, yes

4 Likes

*I’m on it, the situation improved since 3 hours, but the setup is still under test, though response times on MB are now acceptable (I think).
*

6 Likes

AI - Annoying Interference

4 Likes

Anubis is also deployed for the ticket system/bug tracker (Jira) as of this Monday’s MetaBrainz meeting.

When I tried pasting a link to a Jira ticket in this reply just now, I got the appropriate metadata to show up in the onebox; that said, since I was able to line up the request that paste generated with Anubis’ logs, I’ve just added a rule to Anubis’ configuration that should make sure the metadata that our Discourse instance asks for is passed through from Jira.[1]


  1. That said, the rule will need to be updated if the Discourse instance is ever moved to another server of ours from its current one. ↩︎

3 Likes

Indeed, and here I’d thought I’d already prepared a rule that would cover the MB blog embed’s exact use pattern.[1] :frowning:

I’ll look into this and see if I can figure out what might be triggering Anubis from not letting MB fetch blog posts.

All, please bear with me (and the rest of the MB team) while work is ongoing to abate AI scraper traffic :folded_hands:


  1. i.e., by creating an allow rule based on the specific URL MB uses to fetch the blog post feed, as that URL is non-standard for WordPress instances ↩︎

3 Likes

I believe I’ve figured out a solution; if you go to the front page now, it should show the feed! :slight_smile:

The root cause appears to have been that the MB server software’s requests to the WordPress site looked a bit differently to Anubis than the MB source code may have made it out to be. I have now found a different heuristic for judging legitimate requests from MB that seems to be working so far; in order to prevent abuse, I’m not disclosing it at this time.

4 Likes

The mitigations appear to also block legitimate RSS readers btw, my feed has been broken for a while now.

1 Like

Which RSS reader is this? Depending on the User-Agent that your reader uses in its requests, Anubis might be currently treating it as suspect.

Also, what URL are you currently having your reader target?

I reported this with some detail as a ticket. The feed URL is https://blog.metabrainz.org/feed/ as advertised in the metadata.

1 Like