All my listens got deleted/emptied

Maxr1998 · November 6, 2023, 12:08am

Sure! It was rather convoluted and very specific to my setup, so keep that in mind.

Steps taken to restore my listens

To submit my listens from Jellyfin, I’m using the jellyfin-listenbrainz plugin, which also logs each listen to the system log (journald in my case). With a simple grep for Playback stopped, I was able to get a list of listens in the following format:

Nov 02 14:50:45 iridium docker[320246]: [13:50:45] [INF] [23] Emby.Server.Implementations.Session.SessionManager: Playback stopped reported by app Finamp 0.6.19 playing Sad Girls Club. Stopped at 204164 ms

First, I needed to extract the log timestamp and title from the log lines, I used sed for that.

sed 's/^\(.*\) iridium.*playing \(.\+\)\. Stopped at .*$/\1 \2/' filtered.log > listens.log

I did have some issues with non-ascii song titles which I had to fix manually, in that case, you can use any other RegEx engine in a text editor that actually matches all characters with the dot operator.

Nov 02 14:50:45 Sad Girls Club

Next, I parsed the dates to UNIX timestamps with awk and date:

awk '{
         cmd ="date \"+%s\" -d \""$1" "$2" "$3"\""
         cmd | getline var
         printf var "|"; for (i = 4; i<=NF; i++) printf $i " "; print ""
         close(cmd)
     }' listens.log

This gave me a list of timestamps and track titles like this:

1698933045|Sad Girls Club

Lastly, I needed to map the track titles to internal Jellyfin GUIDs of each track. With some trial and error, I came up with this monster of an awk query that calls sqlite to directly query the Jellyfin database.

awk -F'|' '{
               cmd ="sqlite3 \"file:jellyfin/config/data/library.db?immutable=1\" \"SELECT substr(hguid, 7, 2) || substr(hguid, 5, 2) || substr(hguid, 3, 2) || substr(hguid, 1, 2) || \'-\' || substr(hguid, 11, 2) || substr(hguid, 9, 2) || \'-\' || substr(hguid, 15, 2) || substr(hguid, 13, 2) || \'-\' || substr(hguid, 17, 4) || \'-\' || substr(hguid, 21, 12) AS guid FROM (select lower(hex(Guid)) as hguid, name, type from TypedBaseItems where name = \'"$2"\' and type = \'MediaBrowser.Controller.Entities.Audio.Audio\');\""
               cmd | getline var
               print "{\"ListenedAt\":"$1",\"Id\":\""var"\"}"
               close(cmd)
           }' listens.log | jq -s

A lot of the complexity comes from having to convert the binary-encoded GUIDs to UUIDs, but transforming their endianness at the same time. Thanks StackOverflow for that

The result is a neat JSON structure:

[
    {
      "ListenedAt": 1698838676,
      "Id": "e8b03f41-6d3e-6091-87e5-d710c37456bd"
    },
    …
]

Inserting the contents of the array into the cache.json of the ListenBrainz plugin, restarting the Jellyfin server, and running the resubmission task then submits all listens as expected, while resolving metadata from the Jellyfin database and MusicBrainz like usual.
The reason this works is that the ListenBrainz plugin has a resubmission cache file that’s used when the network is down or ListenBrainz is having issues. Entering my listens manually slightly abuses the feature, but it’s easier than collecting the metadata yourself (recording data, player info, etc.).

All in all, this took me around an hour, and I’m pretty sure that not scripting it would’ve been a lot quicker. But this was a nice exercise and I learned something, so I’ll have sugarcoat it now

That seems to be the case. They both depend on the mapping table, which wasn’t generated after restoring the backup. There were attempts to regenerate it, but that currently fails. lucifer is working on fixing it, though.

See the chat logs for more details.
I fear it might take a while to restore that data, and it’s more complex than previously expected.

o3 · November 6, 2023, 3:46am

Hi @Maxr1998, thank you for the detailed response.

You seem to be extremely well-versed in this matter, so as a fellow ListenBrainz enthusiast and a fellow Linux user, may I request some help?

Reading your post pushed me to ransack my brain as to whether I could find similar logs for my lost listens. Then it hit me: a long time ago, I had set up a notification logger on my phone. And while it’s not as detailed as your logs, the app has provided me with a plaintext list of my listens for the time period (track title, artist, hour:minute).

That, combined with past experience using this tool (GitHub - Coloradohusky/ListenBrainz_File_Parser: Parses database files from different music listen tracker applications, and imports them into ListenBrainz) means that I now have a glimmer of hope in restoring my lost listens.

Unfortunately, the list is in plaintext form, has numerous duplicate entries, and is quite lengthy at 5,300+ lines of text.

And with my complete and utter lack of programming or scripting knowledge, manually converting said plaintext list into the .csv table format used by ListenBrainz_File_Parser will take me days, if not weeks.

It would mean the world to me if you could help out.

Here is what the first few lines of the plaintext file looks like:

KNOCK KNOCK
TWICE (트와이스)
<empty line>
Nov 3, 2023 01:01:00
<empty line>
<empty line>
KNOCK KNOCK
TWICE (트와이스)
<empty line>
Nov 3, 2023 01:01:00
<empty line>
<empty line>
Heart Shaker
TWICE (트와이스)
<empty line>
Nov 3, 2023 00:57:11
<empty line>
<empty line>

The rest of it is the same, with the same pattern of: track title, artist name, empty line, date+time, empty line, empty line, repeated for 5,300+ lines.

As you can see, the entry for ‘KNOCK KNOCK’ at [Nov 3, 2023 01:01:00] is duplicated. That occurs quite frequently in the text file. Some entries are duplicated three or even four times.

I somehow need to come up with the commands necessary to:

delete the duplicate entries (which span multiple lines)
subtract 9 hours from the time (because I’m in UTC+9 timezone), then convert that to the Unix time format
convert the entries (which span multiple lines) into a single-line format consisting of three columns, like so:

uts,artist,track
1600000000,TWICE (트와이스),KNOCK KNOCK
1700000000,TWICE (트와이스),Heart Shaker

Again, I’m sorry to dump all this on you like this, but it would mean the world to me if you could help.

Thank you very much in advance.

Maxr1998 · November 6, 2023, 12:15pm

Shouldn’t be too complicated. First, you probably want to get rid of the empty lines. Since I don’t know whether those contain any content in some entries, it might be preferable to filter them out based on position. awk can do that for you with the line number variable and a modulus calculation.

~> cat listens.log | awk 'NR%6==1||NR%6==2||NR%6==4' > 2.log
~> head 2.log
KNOCK KNOCK
TWICE (트와이스)
Nov 3, 2023 01:01:00
KNOCK KNOCK
TWICE (트와이스)
Nov 3, 2023 01:01:00
Heart Shaker
TWICE (트와이스)
Nov 3, 2023 00:57:11

The next step would be parsing the date. I believe the date utility uses the current timezone, so you can use the dates as-is.

~> awk '!/Nov /{print $0; next} {
            cmd ="date \"+%s\" -d \""$0"\""
            cmd | getline var
            print var
            close(cmd)
        }' 2.log > 3.log
~> head 3.log 
KNOCK KNOCK
TWICE (트와이스)
1698969660
KNOCK KNOCK
TWICE (트와이스)
1698969660
Heart Shaker
TWICE (트와이스)
1698969431

Note: your timestamps will probably be different for these sample entries due to the timezone. Throw one of them into unixtimestamp.com to check whether they are correct for you.

Now, you only have to merge them onto the same line. You should first check whether any of your track names contain any semicolons, which could cause problems for the CSV (using semicolons instead of commas, since those are even more likely in track names).
A simple grep will do:

~> grep ';' 3.log

This should print no results. If it does, please tell me and I’ll look into escaping.

To merge the entries, awk will help you again:

~> awk 'NR%3>0?ORS=";":ORS=RS' 3.log > 4.csv
~> head 4.csv 
KNOCK KNOCK;TWICE (트와이스);1698969660
KNOCK KNOCK;TWICE (트와이스);1698969660
Heart Shaker;TWICE (트와이스);1698969431
~> awk -F';' '{print $3 ";" $2 ";" $1}' 4.csv > 5.csv
~> head 5.csv
1698969660;TWICE (트와이스);KNOCK KNOCK
1698969660;TWICE (트와이스);KNOCK KNOCK
1698969431;TWICE (트와이스);Heart Shaker

Lastly, to remove the consecutive duplicates, use uniq:

~> uniq 5.csv > result.csv
~> head result.csv
1698969660;TWICE (트와이스);KNOCK KNOCK
1698969431;TWICE (트와이스);Heart Shaker

Definitely give the final file another look so that there are no surprises.

MatrixFur · November 6, 2023, 3:10pm

@rob Thank you so much for resolving the issue. It is unfortunate that this happened, but I am very thankful for all the work you and the rest of the team do!

o3 · November 8, 2023, 2:20am

Thank you very much. You are a life-saver.

Sorry for taking so long to get back to you. Your instructions were incredibly detailed and easy to understand, so thank you for that.

I ran into some problems with the File Parser tool, which after some troubleshooting, I was able to resolve on my own. I didn’t want to bother you with trivial questions so I decided to look it up myself, which took dumb old me way longer than expected.

But the end result being, everything worked out perfectly, and it would have been impossible without your help.

Thank you, sincerely!

Maxr1998 · November 8, 2023, 1:58pm

Awesome, glad to hear that it was successful!

o3 · November 8, 2023, 10:39pm

Yes, entirely successful, thanks to you. So thank you again!!

DontMindMe · November 10, 2023, 12:55am

Not sure the best place to mention this so I will here. Forgive me if anyone did and I missed it(or forgot).

I had missing history on for the 1st and 2nd as well. I import mine from last.fm so I decided I’d reset my timestamp and let it run through to get the missing history.

If you think that a partial import has somehow missed some listens, you may reset your previous import timestamp. This will cause your next import to be a complete import which should add any missing listens while still avoiding adding duplicates to your history.

Normally this is true, I’ve reset mine many times to get partial ones complete. But I tried now and it appeared to be duping all my listens(I looked back at the beginning of my history and everything was showing up twice). I didn’t let it finish(Takes a long time for all mine), so I’m not sure if I just never noticed before and it de-dupes after? Or if the dupes are a side effect of the incident?

Edit: This has apparently been fixed. Thank you devs!

o3 · November 15, 2023, 10:32am

Just checking back in for two things.

Many (or all, I’m not sure) of my manually-linked listens are still grayed out, i.e. the links to the recording IDs have not been restored.

Is the restoration still in-progress or is it finished? Because if it’s the latter, then I would like to report that the ‘recording-IDs-to-listens-manual-linking’ data restoration is incomplete.

Also, has it been confirmed for certain that listens outside of the affected data-loss period are a 100% intact? As in like, has someone (who perhaps had a 1:1 backup of their listens outside of the data-loss time period) run comparisons and 100% confirmed that every single listen outside of the lost period is safe and accounted for?

Because call me paranoid, but I would like to be a 100% sure, just in case.

Thank you in advance.

DontMindMe · November 15, 2023, 3:42pm

I asked about this on the discord.

Lucifer Morningstar(~4 days ago):

that particular table of manually linked data was lost during the incident too but I hope I can recover it.
It will take a while though a week at least I think, so if you want to start manually linking again feel free.