I tried running the AcousticBrainz Extractor for Windows on my entire collection. It worked fine, but it looks like it would have to go on for a little longer than I would like to leave my computer on without turning it off. If I stop the extractor, will it properly continue where I stopped it next time without creating duplicates or anything like that, even if I feed it my entire collection again? I guess it must, but I’m not sure how that would work.
It should record which files it has submitted somewhere, and the next time you run it, all those files are skipped.
How would it do that? Considering it records multiple submissions for the same MBID.
AcousticBrainz dumps all full file paths of analysed files into a file
submitted.log (you can find it in
C:\Users\<your user name>\AppData\Local\AcousticBrainz\Submitter\cache on Windows. I believe all files you pass through AcousticBrainz are matched with the file paths in that log file and those files whose file paths are found in
submitted.log are skipped.
So if you move your files they would be analysed again, but MBID’s are not considered.
That’s good to know, thanks! If I did move and scan the same file again, would the server have a way to filter the duplicate submission out? Hopefully it would be the exact same data, but I’m not sure if the number of sources carries any importance.
I have no idea how the server handles it, sorry.
Locally on your computer we only de-duplicate based on the filename. If you move the files then the local app will resubmit them again.
However, the submission info contains the MD5 of the audio component of the file. This means that even if you change metadata tags the audio content will remain the same. This means that we are able to identify that both submissions are the same and remove one of them when we use the data.