Cleaning aliases

Tags: #<Tag:0x00007f050c1b2700>


Aliases is a very useful feature, but currently the data is polluted leading to many unexpected results. For example, for artist “D Sigual” there were records like these:

DSigual Vol. 3
DSigual Vol. 4
DSigual Vol. 7

Removing these aliases one by one is time consuming - even removing one wrong alias is a multi-click process. It would be great, if:

(1) When a new alias is being added, it is checked against a list of suspicious words like “vol”, “volume”, “feat”, etc. It may be helpful for the community to contribute to this list, or to get this list from scanning the current dataset.

(2) Run a script to identify all existing aliases with suspicious words, and - after visual inspection - remove all of them at once.

For example, a search for “vol” under “Artists” returns over 3,700 results, of which only 10 are legitimate. That’s a lot of clicking to clean them up manually.


Not strictly helpful, but I have a feeling that that list is the result of old single-medium releases being merged after we updated the release schema a while ago. My understanding is that doing so adds the titles of the subordinate entities to the alias list – which is admittedly more helpful with artists than with releases, but still has utility at times for anybody who searches for the old names. Basically, that’s not just someone who went through and added overly-specific aliases.


Also, for the reason @WovenTales outlined, I would probably not remove such aliases, but instead just make sure that they get marked properly with the type “Search hint”. If the aliases ended up via merging, that probably means that somewhere out there (Last.FM, other old datasets, badly tagged files, etc.), the artist is given as “DSigual Vol. 3” (as horrible as that is). Having that as a “Search hint” makes it easier to match up that specific string to that artist. (But that’s just me, I’m sure there are other people who prefer to prune the aliases a lot more. :slight_smile: )


I am not a fan of automation. I like partial automation. It makes for fewer errors in the end.
So, for me, I would rather see a script used to compile a list, but make the changes manually - or at least manually script assisted, where you may not have to do each one manually, but you do need to approve each change manually.

I have seen machines screw up far too many times to trust them, and when they do, it is always big errors.
Yes, they handle the bulk of the work fine, but correcting the errors they make are often more work than the original task would have been.