Adding vinyl record track lengths from IA segments.json

Internet Archive’s own vinyl record items includes a segments.json file with the track lengths for each track, even for the records that are only available to hear as samples.

With so many vinyl records missing track lengths both here and on Discogs, and this being generally accurate (no such thing as 100% accurate for analogue sources, right?), this could be a great source to get this information. I already found several rare records that I care about and are missing track lengths here. This is really useful to find duplicate recordings, for example.

If I understand correctly, each track in the JSON file has the length in milliseconds. Can someone confirm this? Copying the info track by track, converting it into mm:ss and adding to each track individually can be very burdensome, but presumably it’s easy to write a script to automate this (for the kind of people who find writing scripts easy, not me).

IA’s LP collection: Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
Example of LP available in full: Soul As Sung By : Otis Redding : Free Download, Borrow, and Streaming : Internet Archive (segments.json)
Example of LP available in samples only: Magical Mystery Tour : The Beatles : Free Download, Borrow, and Streaming : Internet Archive (segments.json)


I wonder if this falls into @kellnerd’s area of interest, since he has been doing a lot of work on cueshit lately (a tool to convert between different cue sheet / chapter / tracklist formats).

Hehe, I was just about to reply here @aerozol :smile:

Indeed, I have just added Internet Archive Segment Data as a new supported format to cueshit.
You can now use it to convert that format into a MusicBrainz track parser listing (or something else).
To do this, you can either pass it the path to a downloaded *_segments.json file or simply the URL:

cueshit --to musicbrainz

Wow, @kellnerd, as if I could love you any more…. Thanks, I’ll be sure to test this later.


There is one thing about which I am still a bit undecided: Each track has two track number attributes, one is reset per medium (read never for the two examples here), the other is reset per medium side.

I am currently using the per side numbering (track 1 to 5 for side A, track 1 to 5 for side B for the Otis Redding example) because it was slightly easier to implement. But now I have prepared the changes for per medium numbering (track 1 to 10 for the example) to compare and I am unsure which one is the better option (as in: the expected output).

For the MusicBrainz format it is plausible to output per side numbers, that makes it easier to prepend A and B to the track numbers when using the tracklist for a Vinyl release.
On the other hand, I don’t believe that for example a cue sheet which resets the track number in between is even valid?

To describe a vinyl release, it makes sense to use side+track, just because that’s just the traditional way to do it. But if you think about it in terms of a CUE sheet, then you should generate two, one per side. If you think about it, a vinyl record is really two “mediums” glued back to back. You can never record or read both at the same time.

This is a different issue, but I’m even thinking it would probably be better for each disc side of double-sided discs to be it’s own medium in MB. A double-sided CD like a dualDisc is two “mediums” isn’t it? And two CUE sheets, you could never only have one.



First edit to add track lengths from IA’s segments data with your help. Won’t be the last. Thanks!