Large "size on disk"

I’m using Picard v1.4.2 on Windows 10. I have a problem where after saving tags to some files, their “size on disk” balloons.

An example (sticking to one track for simplicity):

Before saving
Size: 6,890,543 bytes
Size on disk: 7,340,032 bytes
Album art: None

After saving
Size: 6,993,397 bytes
Size on disk: 15,728,640 bytes
Album art: 500x500 px, 92 KB

Steps taken

  • I add a bunch of files to the left-hand pane. I Cluster them.
  • I select all the clusters and choose to Lookup.
  • In the right-hand pane, I find an album that has been perfectly matched, right-click it and choose “Save”.
  • About 50% of the time, it works as expected. The other 50%, the “size on the disk” for each file becomes huge.
  • It’s noticeable when it happens, because the Save is very slow - it takes 1-2 seconds for each track that it saves, whereas when it works correctly it takes maybe 0.2 seconds per track.
  • I can correct it, but not reliably. For example, if I right-click the album, and change it to another version, then save it, then back to the old version and save it again, then this will sometimes fix it. Sometimes it won’t. Sometimes removing all the cover art and saving again will fix it.

Any idea what is happening here?

Just to make sure - you’re not also saving the full-size album art on each file, right? (I think I did that once and the files were huge)

That would also affect the actual file size, not just the “size on disk”.

Whatever the issue here, my guess it’s that it’s a file system issue. Somehow the file gets fragmented over multiple sectors on the harddisk (the “size on disk” is effectively just how many sectors are being used to store the file). I would suggest to try a defragmentation, but I’m not sure if that will help. If it is a file system issue, I’m also not sure if there’s anything Picard can do to prevent it.

@knuvh Are the files saved on local or remote/network storage? Are you using NTFS? Can you tell what the cluster and sector sizes for the file system are?

4 Likes

Ah, pertinent questions - the files were saved onto network storage on an SMB share. It seems this was the issue, the SMB share was reporting inaccurate values for “size on disk”.

What I don’t understand is that for albums that would result in an inaccurate value of “size on disk”, Picard would take a long time to save the tags. I’m not sure what the difference is here between these saves and other saves (to the same SMB share) that were quick and resulted in more accurate “size on disk” values reported by SMB (but, incidentally, still mostly incorrect values).

I could investigate further, but I’m not sure if this is of interest, being an SMB issue rather than a Picard issue.

It could be that there were slight network hiccups when you saved those files. Do you experience delays on the same file(s) always? The delay and inaccurate report are likely linked, but as you say also likely an issue in SMB that Picard can’t really do anything about. Esp. if it doesn’t happen consistently for specific files.

1 Like

This is most likely because in case of the longer save the entire file was rewritten to storage, causing it to be probably relocated, maybe even get fragmented. The fast saves only write the tags themselves.

Most audio formats (the original ID3 v1 w is an exception to this) have the tags at the start of the file and then the audio data (so tags can be read early, even when streaming). Now if you do changes to the tags that result in a larger tag block than before this mean the entire file needs to be rewritten to disk (including the audio data) which takes much longer than just rewriting the tags. Especially on network storage and with large files, e.g. FLAC. To mitigate this problem there is usually some free space (padding) between the tags and the audio data, but this is not always enough to hold all new tags, especially when also storing cover art to files without cover art before.

3 Likes

That makes sense. Great explanation - thank you.

Going forward, I will copy music onto a local drive when tagging with Picard. I don’t think there are many use cases where that would be impractical to do.