As a result of ripping with morituri, I have a bunch of .cue files, and many of them contain ISRC data in them. At quick count, I have 320 CDs with ISRC data, consisting of 3845 ISRCs. I don't think there currently exists any code to submit these, so I'm thinking of writing some. I'd submit a patch to musicbrainz-isrcsubmit (which has an open ticket requesting it), but:
- My solution is going to be somewhat targeted to my setup—morituri, files not rearranged by Picard, etc.
- Not targeting to one-at-a-time submissions.
- I don't actually know Python...
I've looked at a few of the cue sheets, and my ripping drives (mostly LG BD-RE WH14NS40 and Lite-On iHAS124) do not appear to suffer from the adjacent track duplicate ISRC problem some drives do.
I plan to write the bot in Perl, and it'll be free software posted on either GitHub or GitLab. (Note to self: there is some ISRC submitting code in perl @ https://gist.github.com/njh/9159699)
At the moment, I'm thinking it should:
NOTE: All my rips are one directory per CD. Each directory contains a bunch of FLACs (one per track), a .cue file, and a .log file. There are other files as well, but they're not relevant for this. (All of these are actually symlinks due to git-annex—again, not really relevant)
- Check with some local, persistent database that the bot hasn't already processed this directory.
- Read the ISRCs from the .cue file. Do the following basic sanity checks:
- ISRCs are unique (I have CDs which break this, such as DEF057301700, but will review them manually before submitting)
- Each track has one and only one ISRC
- Strip out hyphens (e.g., in
NL-E42-11-02105), convert to uppercase (both of these are correct according to the ISRC validation bulletin).
- Confirm it's not the obviously invalid ISRC
- Confirm it matches the expected format
- If any check fails, skip the disc until I review the failure.
- Read the CDDB discid from the .cue file, compare it to the CDDB discid in the .log file. Throw a tantrum if they don't match. This should never happen.
- Extract MB disc ID from .log file.
- Extract release MBID and disc number from
- Query MB to confirm the disc ID is associated with the release. If not, skip disc until I deal with it (I expect some where I've failed to submit the discid when I created a new release in a release group)
- Use the disc number tag to find the correct disc of the release.
- For each recording on the given disc of the release (I'll get the recordings currently on the release—not the recording MBIDs tagged in my files):
- Some of my ISRCs have hyphens in them; pretty sure those should be stripped out. (Sometimes the field is quoted too, obviously the quotes aren't part of the ISRC).
- Confirm the ISRC I'm about to submit is not currently on it
- AFAIK, I should submit even if there is a different ISRC already on it.
- Submit the ISRC.
- Mark the directory processed in the local database.
- Check number of ISRCs submitted today vs. rate limit, then either proceed to the next directory or stop.
I've reviewed the Bot Code of Condict and of course plan to comply with it.
I'm looking for any comments/suggestions/etc., before I start work on this.
- Is there anything I should know about ISRCs before doing this?
- Are there other sanity checks I should perform?
- Is this code that'd be useful to anyone else?
- I presume this counts as a bot?
- Is there some reason this is a Bad Idea™?