More Stupid Newbie Questions

Does doing this from a DOS prompt make any difference? Because Windows says no errors.

The rename juggles I just got you to do is showing that some how those two titles are different. Yet they look identical to us. The OS is seeing them as genuinely different text.

What is most puzzling to me is that you say that Picard made BOTH of those folders? So it is not as if Picard would suddenly want to pick a new character set. And everything is plain boring ASCII in what you are doing.

Weird… I’ll keep thinking…

1 Like

Yeah, when this first started happening I thought it was something to do with my scripting, but it also happens on regular files for EN artists without any exotic characters like Marvin Gaye (and this has happened with other Western artists too).

From my experience wrangling this over the last week it acts as though there’s some invisible timestamp metadata in the folders where, if I tell Picard to create two identically named folders more than a few minutes apart, it makes two separate ones.

But NTFS would not care about time stamps in a folder name. So I am still blaming pixies.

Try this. Make a new folder. And inside that folder make two new folders.

For “Folder A” COPY\PASTE the “Mai Yamane” from the first folder and rename Folder A.
For “Folder B” COPY\PASTE the “Mai Yamane” from the second folder and rename Folder B.

Does this allow you to still do this? If yes, then there is something magical about the text the OS is seeing as unique.

Worked like a charm. So it’s definitely something about the names themselves.

image

So you must have two unique strings.

And I assume you have carefully looked for an extra SPACE on the end? (Yeah, I know I ask the obvious, but just need to check)

Maybe something funky in the scripts? I can’t see anything, but I don’t do anything with file renames in Picard.

Yeah, no extra spaces or anything like that.

I should note that my assumption that this is based on timestamps isn’t just some random crap I guessed at, it’s based how I’ve noticed Picard behaving in other contexts. For instance: at one point I brute-changed my naming script to /Aphex Twin/%originalyear% etc. because I didn’t feel like dealing with all of his hoity-toity aliases.

But then I forgot to change it back to %albumartist% when I did my next batch of saving in the A range and a bunch of them ended up in that folder. And even when I changed it back, they didn’t move. Until a few minutes later, when I came back from a bathroom break, hit “Save” again as a lark, and then magically, they moved out of the /Aphex Twin/ folder and into their own proper places.

Ever since, I’ve noticed this behavior. If I hard-code a directory to get through that cluster (e.g., I have all of k.d. lang’s stuff in the k.d. lang folder regardless of whether it’s k.d. lang and the reclines), and forget to set it back, and then the next batch goes in that folder because I’m a dummy and forgot to set it back to %albumartist%, I just have to wait a while and hit save again then it starts working again. It’s a weird quirk that seems like it might be related to the current mystery.

Why does that last script have a \1 on the end?

$set(_album,$rreplace(%_album%,\([A-Za-z]\)\\.,\\1))

I am not used to these scripts, so not sure about all the double slashes, but it looks like it is adding extra hidden bytes. Take that one line out and see if it starts to behave.

And I’ve been dealing with NTFS for a few decades, so know there are no hidden time stamps in folder names. But you can have hidden characters from elsewhere on the character set, including ones you can’t see. I just can’t remember at this moment what is and isn’t legal.

The \1 is a regex backreference. The idea is that it will take any string where a letter is followed by . and strip the . while returning the letter. This is so I could return acronyms without periods (T.I.T.L.E. becomes TITLE) but not numbers (so 3.14 stays 3.14). It’s my own peculiar taste.

None of these scripts touch on the %albumartist% variable (they are exclusively %album% and %title% as I have been processing these on an artist-by-artist basis) so I don’t think they’d make a difference with regards to the folder names.

1 Like

Okay. Told you I don’t do regex scripts. Haha. As long as that works as expected and you don’t have too many \.

But you clearly have SOMETHING in that second name that is different to the first.

In the test folder, what happens if you rename and delete the last two characters off of both? Cursor at end of the text with END key, then delete the “ne”. Are these still unique? (I’m hunting for hidden text)

ALSO try opening an old CMD command prompt in there (SHIFT + RIGHT CLICK and it will be on the menu). Now do a DIR /X and you’ll see the short versions of the folder names. I guess these are different?

Edited with a researched update:
Found someone else who managed to do something like this ( https://www.tenforums.com/general-support/170933-two-folders-same-name-same-folder.html ) and they found hidden Unicode chars in their folder names by looking via DOS.

And if you want to get mega geeky a powershell script to hunt out the characters in filenames, but that would also mean copy\pasting these into the names of two txt files or similar: Windows / NTFS: Two files with identical long-names in the same directory? - Stack Overflow

My money is on something invisible like that.

What I pick up on too is that no one else has reported this. It is something “special” about your setup. Again I blame pixies. Try leaving chocolate out for them I often find that works.

Maybe you could see something if you copy those folder names into something else. Notepad++ allows you to set the encoding and also has a “show all characters” mode, so try pasting them into an ANSI file and a UTF8 or UTF16 file and see if anything shows up differently.

1 Like

Figured it out. Looking at the directories in the command line, for some godforsaken reason my one line of artist code (delprefixing “the”) had acquired a U+200E left-to-right character, so my files and folders were racking up invisible unicodes at the beginning every time I executed my script. I have absolutely no idea how it got there. But one bulk file rename later and no harm done.

5 Likes

Awesome, glad you found it

Actually I have a strong suspicion: we place those characters in the built-in scripting documentation before the functions. This fixed text direction issues with e.g. the Hebrew translation.

So if you copy pasted from there it could explain the character in your script.

Need to think how to prevent this, I guess that could be rather common. The better solution would be to use the HTML dir attribute, but Qt 5 doesn’t support this. Maybe we can limit inserting this character and use it only if Picard’s display language is set to a right-to-left language.

6 Likes

Excellent!! So you could also of had three or four folders created by your script? That would be funny to see if it kept adding more. :laughing:

And even better to see @outsidecontext spotting a possible source with the copy\paste from a website hack.

How many of those types of characters could be on a website for copy\pasta errors like this? if you know the website has this character, can it be looked for and stripped from file names? Or a warning popped up?

1 Like

I added https://tickets.metabrainz.org/browse/PICARD-2200 to track this. I see it as an issue that copy and pasting from official docs can easily bring you into this situation.

Just clarifying that this is not from a website, but from Picard’s built-in documentation view.

There are near endless possibilities for this. There is a huge amount of invisible control characters for all kinds of purposes. It’s also not safe to just discard all those characters e.g. on pasting, as one might want to insert such characters on purpose. Maybe tagger script needs a syntax to represent such characters in a visible way, in that case such characters could be converted to this syntax. E.g. in Python one could write the left-to-right mark as "\u200E", a similar syntax would work for Picard’s scripting language I think. Maybe if we extend Picard syntax to interpret \u200E this would:

  1. Allow people to include arbitrary unicode characters in the script for various purposes, without being stuck with invisible stuff or complicated copy and paste
  2. Make such characters visible when copy and pasting

I think if @shamboni would have seen the script like e.g.

\u200E$set(_album,$replace(%album%,/,-)) 

it would have been easy to spot the culprit.

5 Likes

This is one reason I personally always paste into Notepad++ first to strip out these kinds of chars. I pick them up on Websites and MS Word as well. Notepad++ takes it back down to “just the text” and I can spot the rogue additions.

I do understand the mess of spotting the hidden and how hidden is also sometimes required. Anything that could be done to shine a light on these hidden chars would be good. I know you love your puzzles. :smiley:

Minor gripe but is there a way I can tell Picard to always show a field in the bottom pane regardless of whether it is populated or not? Right now “date” and the various date-related metadata simply don’t show up if they are not populated in either my original mp3s or the MB release entry. I would like to simply double click the “date” field and type one in rather than futz with the clunky “add new tag” interface.

I could probably add a bit of intake script where it populates date with “0000” after a $len check but wanted to make sure it wasn’t a built-in setting somewhere I’m missing.

Slightly off-topic, but when I want to strip text I paste it into the navigation bar in my browser if I’m working there, or into the Windows search field in the taskbar, and copy it out again. V. quick! :smiley:

1 Like

Ditto - also the TO\SUBJECT fields on an email program…

Exactly why I don’t trust a browser to do anything - even if I have DuckDuckGo as the default

1 Like

Glad I am not the only one doing this. But you need to be aware that the browser might send this out to a search engine if it is looking for auto completion suggestions.

Yep :smiley:

2 Likes