Any recent MusicBrainz user surveys?

Have there been any recent efforts to collect high-level statistics about why
and how people use MusicBrainz? For example:

  • Why do editors add to or edit MB?
  • What do people do with data from MB?
  • Do most editors add missing releases from their own personal music collections, or add data about their favorite artists, or …?

As I become a more-active MB contributor, I’m interested in knowing more about how other people use the site to know how common my use cases are, in order to know how useful it is for me to enter data or suggest (or contribute?) changes on the server side.

Work in progress MB User Survey discusses a 2017 survey (announced at MusicBrainz User Survey – MetaBrainz Blog) in which 1,200 responses were collected. The survey closed long ago, but I think there’s a draft of the questions at MusicBrainz User Survey - Google Docs.

I haven’t been able to find a summary of the survey’s results, though. The latest reference I’ve seen was [<entity>:<mbid>|<name>] links - #9 by yvanzo, a comment from July 2018 saying that the results were being used to set priorities for development.

It’s not clear to me whether the 2017 MB survey was trying to collect answers from a representative sample of users (I wasn’t following the discussion forums or the blog at the time, so I didn’t hear about it then). Was it also advertised on the main MB landing page?

I’ve found the annual user survey results published by the Go (programming language) team to be quite interesting:

In addition to advertising the survey via things like blog posts, I think that the Go team randomly prompts a subset of users of a popular IDE plugin to try to collect responses from a (somewhat) representative sample.

7 Likes

I was still curious about this, so I downloaded part of a dump and wrote some code to compute edit-related stats. It doesn’t give any insight into why editors contribute or how people use the data, but it at least answers some questions I had about the number of active editors and their behavior.

Disclaimer: I am not an MB developer or statistician, and my code and conclusions may be wrong.


Editors with at least one applied ARTIST_CREATE edit by year

2000      0 editors
2001     21
2002     35
2003    412
2004   1032
2005   2818
2006   4891
2007   6001
2008   7802
2009   7526
2010   8568
2011   9098
2012   8936
2013   8732
2014   8753
2015   9172
2016   9322
2017  10448
2018  11039
2019  10661
2020  11226
2021  11555

I’m just throwing this in since it shows the growth in active editors going back to the beginning of MB.


Editors with at least one applied RELEASE_CREATE edit by year

2011   7624 editors
2012  11099
2013  11337
2014  11502
2015  12188
2016  12560
2017  13259
2018  13293
2019  13071
2020  15404
2021  15591

No pre-2011 data due to NGS, I assume.


Number of applied RELEASE CREATE edits by year

2011   63599 releases
2012  126300
2013  129121
2014  170078
2015  161045
2016  176809
2017  207404
2018  224088
2019  261990
2020  348981
2021  372402

I’m reassured that these numbers seem plausible when I compare them with the rate-of-change graph at Database Statistics - Timeline Graph - MusicBrainz.


Histogram of editors bucketed by number of applied RELEASE_CREATE edits in 2021

  1-10 |######################################## 12917 editors
 11-20 |### 986
 21-30 |# 432
 31-40 |# 241
 41-50 |
 51-60 |
 61-70 |
 71-80 |
 81-90 |
91-100 |
  >100 |## 500

I thought that this one was interesting. The vast majority of active editors added 10 or fewer releases in 2021, but there were 500 editors who added more than 100 releases. Maybe some of these were automated?


Average account age of editors with applied RELEASE_CREATE edits (relative to the end of the edit’s year)

2011  2.1 years
2012  2.3
2013  2.4
2014  2.5
2015  2.6
2016  2.7
2017  2.8
2018  3.0
2019  3.2
2020  3.2
2021  3.4

I interpret the annual increase as saying that editors are sticking around!


I briefly looked at some more esoteric stuff like correlations between edit types but didn’t really get anywhere with it. The code is at GitHub - derat/mbstats: Generates stats about the MusicBrainz database in case anyone else wants to play with it.

(Seeing the results of the 2017 user survey would still be cool. :slight_smile: )

8 Likes

I am not a robot but apparently my average release per year is more than 200, and I am not even in the top 10 editors. :wink:

5 Likes

Generally I am also not adding lots of releases IMO, but e.g. yesterday alone I created 39 of them. That’s because I was importing episodes of a radio series – and yes, these edits were partially automated :hammer_and_wrench:

2 Likes

This is probably not that unusual for volunteer projects, I would expect the distribution for e.g. Wikipedia editors to look similar.

Most MB users fall into one of two groups: One group adds releases as necessary for their personal collection if the find something missing while tagging, the other edits for the sake of editing itself. Those 500 editors almost certainly fall into the latter category.

1 Like

@jesus2099 - that is called “Quality over quantity”. Never rush to be in that top 10. It takes longer to research and add a single Release than it does to add bulk relationships.

I’m working on an artist discography at the moment that takes hours to read a biography to add a few edits, then seconds to click a button to add hundreds of edits for tracks by copying recording dates into performers.

@derat I like your stats breakdown - focusing on the Artists and Releases. This gives a good picture of the harder work. Well done to those 500 editors who added so many releases. Though it is a little weird seeing the gap above 40.

There are so many different types of editors. Some add the skeletons, others add the flesh.

I mostly just add my CD editions that are not yet here.
And have not yet added all of them, specifically.

I thought the gap above 40 was weird too! My histogram code was a little wonky and skipped printing counts for zero-length bars. :thinking: Here’s a better histogram of editors’ RELEASE_CREATE edit counts in 2021:

  1-10 |######################################## 12917
 11-20 |### 986
 21-30 |# 432
 31-40 |# 241
 41-50 | 157
 51-60 | 107
 61-70 | 82
 71-80 | 66
 81-90 | 61
91-100 | 42
  >100 |## 500

And here’s the upper end of that in more detail:

   1-100 |######################################## 15091
 101-200 |# 235
 201-300 | 89
 301-400 | 48
 401-500 | 21
 501-600 | 12
 601-700 | 16
 701-800 | 8
 801-900 | 11
901-1000 | 5
   >1000 | 55

I would’ve liked to find a way to measure tireless editors doing grungy work like tracking down information using web.archive.org or removing incorrect acoustic fingerprints, but that feels beyond the scope of what’s easy to get from a database dump. It looks like there are 168 edit types defined in the code, but so much seems to end up under RELATIONSHIP_CREATE.

Here’s the yearly count of editors with RELEASEGROUP_MERGE edits, which seems like it might be a reasonable proxy for editors who are subscribed to artists and cleaning up bad data:

2009    794
2010   1107
2011    949
2012    780
2013    794
2014    795
2015    845
2016    959
2017    962
2018    942
2019   1032
2020   1171
2021   1222
3 Likes