I’m experimenting with some stuff, and am slowly working my way through a list of artist names which are high-risk for problems: recordings confused and misattributed to another artist with the same name, duplicate artist entities, etc. A bunch of these artists show pretty high name cardinality (likely 10+ legitimate, known separate artists sharing the same name), and a bunch are a mess.
Any interest in me sharing this list? It’s ~1000 artist names or so. Don’t know if this is old news. If nothing else, it’s an interesting dataset for comparing DSPs: iTunes, Spotify, et al have problems with these artists names too. If someone wants to see how their merge algo (isn’t) working, these are great torture tests.