ListenBrainz listens from 2006 - 2020, broken down by hour of day

Tags: #<Tag:0x00007f756f2f3668>

I’m playing around with the Timescale DB to see if we can replace influxdb – so far the results are very encouraging. To see how well this new database performs, I decided to query all listens and plot them against the time of day when we recorded the listen and I got this graph:

All of this data is at UTC, since we do not have timezones available for where the listen occurred. I’m struck by how much this data looks like a sine wave.

Anyways, we’re working on stats for LB, so I thought I would share this random graph.


That’s really neat. Three questions immediately jump out to me:

  • Do we have information about the timezone that a user is in when they submit the listen, allowing us to break down this kind of data by timezone?
  • Using this data based on UTC, can we use it to predict in which timezone the majority of the data submitters are?
  • Does this curve differ if you make 15 of them, one for each year?

Not currently, no.

Oh, an interesting idea.

I can try to look at that. There was one timeslot that has some 200,000 more listens that any other – clearly a data anomaly, so I’ll be looking at that and see what caused that.

Thankfully, the potential move to timescale (from influx) gives me the chance to do some cleanup on the data. We will already need to sort all the listens before the import into timescale – this makes it trivial to remove duplicates and the slightly fuzzy duplicates that people have gotten from importing their data from

1 Like