The problem now is that people use different words for the same thing, and the same word (character) for different things.
ASerge used "character" for Pascal Char type.
fedkad used it for .. I am not sure what.
In Unicode a "character" can mean about 7 different things. The word must be used very carefully and its meaning should be clarified then.
As far as I know, in Lazarus or Free Pascal we are using UTF-8 characters ( https://en.wikipedia.org/wiki/UTF-8 )
Lazarus uses UTF-8 encoding in AnsiString. Don't use "characters" here, it is wrong.
FPC aims for the Delphi compatible UTF-16 strings with mode DelphiUnicode.
The default encoding of AnsiString is not Unicode at all.
... which means that whatever the number of bytes (i.e., 1, 2, 3, or 4) used to represent the Unicode "code point" for that characters, the utf8length function should return the number of "code points" (charactes).
"Character" should not be used in the meaning of "codepoint". Unicode is confusing enough without such extra confusion. How do you call combining codepoints then?
What means:
the Unicode "code point" for that characters?
I think the best meanings for "character" are:
1. Pascal Char, for historical reasons. In Unicode terms it represents a codeunit and is useful in many situations.
2. User perceived character. This involves combining codepoints, glyphs, ligatures and whatever.
Because they rely on the Windows API, which does not count in the "code point", but in double-byte characters. The SelectAll does not access the Windows API, but does the operation itself, in the "code point". Hence the difference.
Ok, "double-byte character" means here codeunit or Pascal WideChar.
The problem came with codepoints outside Unicode BMP which means surrogate pairs in UTF-16.
The LCL-Win32 binding code should take care of it and pass the values to WinAPI. It does not, somebody should fix it.