2 CDs with the same disc id - 10 and 12 tracks

Hi all,

I was just validating my rips and discovered two CDs, completely different release groups and track counts (10, 12) with the same disc id hash. :slight_smile:

n82ZDNwei4_6dg48X8nKzKwSFbw- 

The Go-Betweens/2005 Oceans Apart

TOC of the extracted CD

 Track |   Start  |  Length  | Start sector | End sector
---------------------------------------------------------
    1  |  0:00.00 |  3:25.28 |         0    |    15402
    2  |  3:25.28 |  4:02.04 |     15403    |    33556
    3  |  7:27.32 |  3:08.28 |     33557    |    47684
    4  | 10:35.60 |  3:53.59 |     47685    |    65218
    5  | 14:29.44 |  2:45.29 |     65219    |    77622
    6  | 17:14.73 |  6:18.25 |     77623    |   105997
    7  | 23:33.23 |  3:09.45 |    105998    |   120217
    8  | 26:42.68 |  4:25.16 |    120218    |   140108
    9  | 31:08.09 |  4:25.28 |    140109    |   160011
   10  | 35:33.37 |  3:28.16 |    160012    |   175627

and

Garbage/1998 Version 2.0

TOC der ausgelesenen CD

 Track |   Start  |   Länge  | Startsektor | Endsektor
-------------------------------------------------------
    1  |  0:00.00 |  4:36.45 |        0    |   20744
    2  |  4:36.45 |  3:38.10 |    20745    |   37104
    3  |  8:14.55 |  3:24.30 |    37105    |   52434
    4  | 11:39.10 |  4:08.07 |    52435    |   71041
    5  | 15:47.17 |  3:43.55 |    71042    |   87821
    6  | 19:30.72 |  4:52.03 |    87822    |  109724
    7  | 24:23.00 |  4:02.32 |   109725    |  127906
    8  | 28:25.32 |  4:11.70 |   127907    |  146801
    9  | 32:37.27 |  3:50.23 |   146802    |  164074
   10  | 36:27.50 |  4:03.35 |   164075    |  182334
   11  | 40:31.10 |  3:43.47 |   182335    |  199106
   12  | 44:14.57 |  5:24.05 |   199107    |  223411

I know it’s pretty common, but I would have thought rare for so many tracks.

Nice.

I calculated this discid for the first album…

https://musicbrainz.org/cdtoc/AZjxwLtEKPzX6djVpRfLvjSssU0-

Huh. That’s working backwards from the release, yes? My ID calculation from the paste TOC is correct (I’m pretty sure).

I must have an obscure copy of that album.

Actually, ignore my post. I just found a bug with my script - it wasn’t allowing for track lengths with a decimal point.

Thanks for the nudge!

FWIW, I did test your exact track lengths which I added to a fake cuesheet and used foobar2000 + the foo_musicbrainz component which calculates discids.

Yes after I realised my bug I figured you’d done something like that. Thanks.

It appears EAC uses the format
mm:ss.MM

while XLD uses
mm:ss:MM

(where MM is milliseconds)

and my regex didn’t account for EAC’s decimal point.

Actually this is not pretty common. The discid is basically SHA-1 checksum of the TOC. SHA-1 has 160 bits, So given two different arbitrary TOCs the probability that they both generate the same checksum is 1 / 2^160 (~10^-48). And even if you take the probability of a collision in a larger data set, let’s say 1 million different TOCs, the probability is still less then 10^-36 if my math was correct. MB currently has ~825k disc IDs, it’s rather unlikely that there is a collision with different TOCs.

What is actually much more likely is two completely different discs having the same TOC, and there had been a few cases of this as far as I remember.

And of course different releases of the same album often have the same disc ID. But this has not much to do with probabilities, but more with production processes :smiley:

2 Likes

It appears EAC uses the format
mm:ss.MM
[…]
(where MM is milliseconds)

MM here in your EAC logs is not in milliseconds. The unit is “sector length” (1/75 of a second). This is typical for applications displaying timestamps from CDs.

There is no need to decode these length fields to calculate disc IDs as they are merely human-readable presentation of the information from the start/end sector columns.

5 Likes

Thanks. I didn’t realise that. I wasn’t using it in my calculations, just in my matching regex:

#!/usr/bin/env perl
use utf8;
use Digest::SHA qw/sha1_base64/;
while(<>){
    if(/^\s*(\d+)\s+\|\s+[0-9:.]+\s+\|\s+[0-9:.]+\s+\|\s+(\d+)\s+\|\s+(\d+)\s*$/){
        $tt=$1;$los=$3+151;push @sos,$2+150;
    }
}
exit if !$tt;
$s=sprintf("%02X%02X%08X".("%08X"x99),1,$tt,$los,@sos);
$s2=sha1_base64($s)."=";$s2=~tr#+/=#._-#;printf"$s2\n";