Most of the time this works perfectly, but once in a great while someone will put a file in the directory that has a EM Dash char in it which is unicode, this causes a ton of issues with SetLength(FFolderItemInfo.Name, vFileInfo^.FileNameLength div 2);
the div 2 gets the correct length when the filename is ascii, but not Unicode.
If the filename has the EM char in it removing the div 2 works.
I seriously doubt that, given that
- dividing the FileNameLength by 2 assumes the string data consists of 2-byte elements. Indeed, Unicode on Windows is handled using UTF-16, which does use 2-byte elements.
- the EM Dash char (U+2014) takes up only one 2-byte WideChar in UTF-16.
Even for Unicode characters that require two 2-byte WideChars (aka, surrogate pairs), dividing the FileNameLength by 2 would still work just fine, since it counts the total number of bytes used for encoded elements, not the number of Unicode characters. So whether a given character takes up 1 or 2 elements, the FileNameLength will account for that as expected.
How can I get this to work for ascii and Unicode filenames?
The code shown works perfectly fine with Unicode filenames. Especially the EM Dash character, which takes up only 1 WideChar element. So the problem has to be something else. Please provide a hex dump of the raw FileName data, and the corresponding FileNameLength. Also, make sure that FFolderItemInfo.Name is a 16bit (Wide|Unicode)String and not an 8bit (Ansi|UTF8)String.
If FFolderItemInfo.Name is an 8bit string then passing the FileNameLength as-is to SetLength() is wrong, it would need to be translated from a 16bit length to an 8bit length first. But why is the code even calling SetLength() at all? WideCharLenToString() is already returning a String that took the FileNameLength into account, so the SetLength() afterwards is completely unnecessary.