Can I see any analytics for MusicBrainz entities?

As an ordinary editor, I have no insight into how other people use the discographies I work on. I realise that there are people who try to exploit MusicBrainz, but are there any public analytics at all? I’m especially interested in failed lookups for particular artists (where a query from Picard or other client fails to match a release), but some page view stats would be really nice as well.

Normally I just work on what I want, and do as much as I’m willing and able to do, but I’ve reached a point where I’d like to be able to prioritise based on what users are trying to retrieve. Also, it’s kind of depressing when the only current sign of life in a discography is my own edit history.

5 Likes

Timely! This was just brought up yesterday in the Discord. I have actually snuck a piece of this feature into the MusicBrainz redesign mockup already:

(Might be worth doing queries + page views separately?)
I agree with everything you’ve said and have felt the same way often, editing in a vacuum. Maybe a developer can chip in with if this would be hard to do or not/introduce load.

Note that even though this is in the mockup, a ticket would still be helpful if you want to see this happen.

2 Likes

I knew I wasn’t the only one! And nice work! I’ll create a ticket unless somewhat else does it first. (If someone else does create it, please post a link to this thread to avoid a duplicate.)

Yes, pageviews and lookup queries definitely should be separate. Imagine if a Picard user matches an album, then clicks “Lookup in Browser”. If page views and album matches are both counted as “queries”, then the query count gets incremented twice; the result is a misleading statistic. For editors, the distinction is extra important because a user who visits the web page gets different information than a Picard/AudioRanger/Yate user. They have access to release- and RG-level relationships, for example.

2 Likes

It would require to make the website servers and the API servers to store a count of queries in the main database for each entity. It is technically doable but it isn’t a small change either, would require some more space but add just a bit to the load.

  • A count of API queries would probably not make any sense because of automated queries by bots and uncountable queries made on mirrors.
  • A count of MusicBrainz.org page views would theoretically make sense but in practice it would be very easy to fool it, so I’m not sure it is worth it either.

Not about using it. The only public numbers are about editing it, provided through the number of artist subscribers and release collections for example.

This would be very instructive indeed (even though there are many other ways to search the database) but I’m not sure how we can make public analytics from unique private search queries.

Alternatively this forum and the collaborative collections can be used to this purpose of community building. For example I remember of editing events around classical music composers in the past.

As an editor too, it is the kind of editing I would likely get into at some point.

You might want to look into CritiqueBrainz and ListenBrainz which push sharing own interest into music to another level. There are also a lot of MetaBrainz supporters which make creative use of MusicBrainz data.

1 Like

Thanks yvanzo!

My thinking was that it would just be nice to have some large number that lets editors know, yup, this is being used, good job. Even if it’s not accurate or is largely automated queries (should probably have a disclaimer/info box on what it is tracking) it can give a indication of relative useage compared to other editors.

3 Likes

If I understand the goal, this could be figured out with LB as well, no? A list of the most listened to unmatched tracks/releases? Not exactly the same as failed Picard lookups, but not far off I would think. And LB data is public already so no worries about exposing private data.

When artist pages exist on LB that could be something they show?

4 Likes

I’d like to have some way to find popular-but-missing albums too. I’ve been trying to approximate this lately for stuff featured on Bandcamp Daily using this script:

#!/bin/bash -e

# Reads HTML from a page or from stdin and prints Bandcamp album URLs that
# aren't present in MusicBrainz.

check_urls() {
  sed -nre 's!.*[;"]((https://[^.]*\.bandcamp\.com)?/album/[^"&]*).*!\1!p' | \
    sort | uniq | while read album; do
      if ! [[ "$album" == http* ]]; then
        if ! [[ "$1" == http* ]] ; then
          echo "Need base artist URL for $album" >&2
          exit 2
        fi
        album="${1%/}$album"
      fi

      if ! curl -s -f --get --data-urlencode "resource=${album}" \
          'https://musicbrainz.org/ws/2/url?inc=release-rels' >/dev/null; then
        echo "$album"
      fi
      sleep 1
    done
}

if [ "$1" = -h ] || [ "$1" = -help ] || [ "$1" = --help ] || [ $# -gt 1 ]; then
  echo "Usage: $0 <url>" >&2
  exit 2
elif [ $# -eq 1 ]; then
  curl -s "$1" | check_urls
else
  check_urls
fi

I pass it a Bandcamp Daily URL and it prints album URLs that are missing from MB:

$ check_bandcamp_urls.sh https://daily.bandcamp.com/lists/instrumental-doom-list
https://mayhemindie.bandcamp.com/album/de-mysteriis-dom-sathanas
https://monstermagnetofficial.bandcamp.com/album/tab
https://sonnyvincenttestors.bandcamp.com/album/sonny-vincent-with-members-of-rocket-from-the-crypt-studio-live

(I think I had support at one point for supplying a base URL if the page only contains relative /album paths, but it looks like I might’ve broken it. I hate shell scripts.)

Sometimes there’s already a digital-media release and it’s just missing the Bandcamp URL, but often the release or artist isn’t in MB yet. Compilation albums can be a lot of work!

This is also a cool way to hear a bunch of West African electronic music and South American indie rock that otherwise wouldn’t have been on my radar. :slight_smile:

Tangent: is there any way to get a list of top-selling Bandcamp albums? Pages like https://bandcamp.com/tag/best-selling that turn up in web searches seem to be completely bogus – those are just random albums that have been tagged best-selling by the musician. :roll_eyes: I reported this to Bandcamp but I doubt they’ll do anything about it.

6 Likes

Answering my own question: you can get this from https://bandcamp.com/#discover. Annoyingly, Bandcamp hides it if your browser is too narrow, so it doesn’t really work on phones.

I wrote a silly little command-line program at https://github.com/derat/bandcamp-discover for querying the API so I can pipe the albums into my script to check what’s not in MB:

% bandcamp-discover -format digital -genre soundtrack/video-game-music -ranking top | head -5
https://signalis-ost.bandcamp.com/album/signalis-original-soundtrack
https://radicaldreamland.bandcamp.com/album/celeste-original-soundtrack
https://celestestrawberryjam.bandcamp.com/album/strawberry-jams-vol-1
https://celestestrawberryjam.bandcamp.com/album/strawberry-jams-vol-2-2
https://radicaldreamland.bandcamp.com/album/celeste-b-sides
1 Like