How should the probabilities for high level features in AcousticBrainz be interpreted?

Is it just the probability that the classification is correct ? If so, what kind of distribution of probabilities should one expect for a given feature ? Is the distribution related to the accuracy of the model ?

I’m still learning statistics, so I am not sure if my questions make sense.



We are using LibSVM implementation of SVM classifiers and it can provide scores per each class and inferred approximate class probability values based on them. For more details see here or section 8 in LibSVM manual for even more details if you want a heavy dose of math.


Thanks for the links !

Maybe someone can comment if this seems reasonable ? (It is going to take a while to digest those papers)…

I downloaded the AcousticBrainz data (including increments) and plotted the distribution of probabilities for “relaxed” songs and there was a sharp peak for the value 0.8. Like, way more than any other value.

Does that mean that for most songs the classifier provides the same probability value for having assigned the correct label ?