Recent

Author Topic: Problem Casting String/pWideChar with {$MODE DelphiUnicode}  (Read 4170 times)

kevin.black

  • Full Member
  • ***
  • Posts: 121
Problem Casting String/pWideChar with {$MODE DelphiUnicode}
« on: March 19, 2019, 11:51:19 pm »
Hi all,

I thought I posted this, but maybe I wrote it and forgot to post. Apologies if you find the other one :o

Firstly I need to do this because:

It's Delphi to FPC and now Strings are Unicode strings and pChars are actually pWideChars
It's in a DYLIB that is used by C/C++ so the only parameters to be safe, are integer, boolean and pWideChar
The C++ calling routine uses Char IIRC which maps to pWideChar

So again, not being narked. but don't need any questions like 'why?', it just is.

I have looked at the posts and Wiki and what I am doing 'should' work. I'm trying to be as compatible with the current (10.3.xx) version of Delphi which is what I am converting from. So here's the theory, strings are unicode string  type therefore setting x: string means x is a unicode string and and if I have a Edit then the text Edit.Text is also a string ergo it should be a unicode string (see {$MODE DelphiUnicode}).

My issue:
  • I have set the compiler directive {$MODE DELPHIUNICODE}
  • Stings are AFAICT unicode strings
  • I thought I could just cast a unicode String to a pWideChar
Code: Pascal  [Select][+][-]
  1. var
  2.   loginemail: pWideChar;
  3. ...
  4. ...
  5. WriteLog('DEBUG', 'Unicode String checkUserStatusEmailEdit.Text: ' + checkUserStatusEmailEdit.Text); // shows gobbledygook
  6. loginemail := pWideChar(checkUserStatusEmailEdit.Text);                                              // loginemail is pWideChar    xxx.Text is Unicode (Wide) string
  7. WriteLog('DEBUG', 'pWideChar Loginemail: ' + loginemail);                                            // has been cast to pWideChar and is crap
  8. WriteLog('DEBUG', 'Unicode String Loginemail: ' + string(loginemail));
  9.  


And noting that string 'should' be a unicode string as should the TEdit of checkUserStatusEmailEdit.Text (just an input field). This is what I get:

Code: Pascal  [Select][+][-]
  1. [2019-03-19 11:45:23] DEBUG [EMPServerTest] Unicode String checkUserStatusEmailEdit.Text: dbid:AAD4_bi4MP1qQDCvIciNh3x-9sa3bciOtRA
  2. [2019-03-19 11:45:26] DEBUG [EMPServerTest] pWideChar Loginemail: 扤摩䄺䑁弴楢䴴ㅐ共䍄䥶楣桎砳㤭慳戳楣瑏䅒栀䔮偍敓畣敲
  3. [2019-03-19 11:45:27] DEBUG [EMPServerTest] Unicode String Loginemail: 扤摩䄺䑁弴楢䴴ㅐ共䍄䥶楣桎砳㤭慳戳楣瑏䅒栀䔮偍敓畣敲
  4.  
BUT if I do this (I simply copy checkUserStatusEmailEdit.Text to a string variable (LE)) everything is goodness:
   
Code: Pascal  [Select][+][-]
  1. var
  2.   loginemail: pWideChar;
  3.   LE: string;
  4. ...
  5. ...
  6. WriteLog('DEBUG', 'Unicode String checkUserStatusEmailEdit.Text: ' + checkUserStatusEmailEdit.Text); // shows gobbledygook
  7. LE := checkUserStatusEmailEdit.Text;
  8. Loginemail := pWideChar(LE);
  9. WriteLog('DEBUG', 'pWideChar Loginemail: ' + loginemail);                                            // has been cast to pWideChar and is crap
  10. WriteLog('DEBUG', 'Unicode String Loginemail: ' + string(loginemail));
  11.  


I get this (perfectly):

Code: Pascal  [Select][+][-]
  1. [2019-03-19 11:43:09] DEBUG [EMPServerTest] Unicode String checkUserStatusEmailEdit.Text: dbid:AAD4_bi4MP1qQDCvIciNh3x-9sa3bciOtRA
  2. [2019-03-19 11:43:11] DEBUG [EMPServerTest] pWideChar Loginemail: dbid:AAD4_bi4MP1qQDCvIciNh3x-9sa3bciOtRA
  3. [2019-03-19 11:43:12] DEBUG [EMPServerTest] Unicode String Loginemail: dbid:AAD4_bi4MP1qQDCvIciNh3x-9sa3bciOtRA
  4.  
So my question (in several parts):

If I have {$MODE DelphiUnicode} does that ONLY apply to type 'string' and NOT to the result of say any component, ie a TEdit text like Edit.text IS AN ANSISTRING OR A WIDESTRING, but not a Unicode string (I understand Widestring and Unicodestring are not the same)?

And is that because the TEDIT has not been built with {$MODE DelphiUnicode} (clutching at straws here)?

And could I build the IDE with etc with {$MODE DelphiUnicode}?

Thanks

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Problem Casting String/pWideChar with {$MODE DelphiUnicode}
« Reply #1 on: March 20, 2019, 08:11:22 am »
1-{$mode DelphiUnicode} affects only the unit it is being used in.

2-TEdit.Text is of type TCaption. TCaption is AnsiString because the unit holds its declaration has {$mode objfpc}{$H+}

3-LCL (and its TEdit), as is today, expects String to be UTF8. For instance GetSelText:
Code: Pascal  [Select][+][-]
  1. function TCustomEdit.GetSelText : string;
  2. begin
  3.   Result := UTF8Copy(Text, SelStart + 1, SelLength)
  4. end;

4-RTL is planned to support UTF16. AFAIK it is not there yet, not 100%.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Problem Casting String/pWideChar with {$MODE DelphiUnicode}
« Reply #2 on: March 20, 2019, 09:46:35 am »
It's Delphi to FPC and now Strings are Unicode strings and pChars are actually pWideChars
It's in a DYLIB that is used by C/C++ so the only parameters to be safe, are integer, boolean and pWideChar
The C++ calling routine uses Char IIRC which maps to pWideChar
You should really check the declaration of the C++ code. The base types are char for single byte characters and wchar_t for multibyte characters, though in the C++ world wchar_t might be for UTF-32 characters, not UTF-16 ones as FPC's/Delphi's WideChar (e.g. on every non-Windows system). If the C++ code uses some other type you should check what base types it matches to.

If I have {$MODE DelphiUnicode} does that ONLY apply to type 'string' and NOT to the result of say any component, ie a TEdit text like Edit.text IS AN ANSISTRING OR A WIDESTRING, but not a Unicode string (I understand Widestring and Unicodestring are not the same)?
Correct. Compiled code is not influenced by modeswitches in another unit.

And is that because the TEDIT has not been built with {$MODE DelphiUnicode} (clutching at straws here)?
Correct.

And could I build the IDE with etc with {$MODE DelphiUnicode}?
I doubt it. The LCL is currently geared towards UTF-8, checking for UTF-16 will probably only be done once FPC also begins to switch targets to UTF-16.

Long story short: assign the Text property of the edit to a UnicodeString variable (or use a typecast) and then cast that to PWideChar (just as you did in your example).

kevin.black

  • Full Member
  • ***
  • Posts: 121
Re: Problem Casting String/pWideChar with {$MODE DelphiUnicode}
« Reply #3 on: March 20, 2019, 11:21:39 pm »
1-{$mode DelphiUnicode} affects only the unit it is being used in.
Of course it does (BFO - Blinding Flash of the Obvious).

2-TEdit.Text is of type TCaption. TCaption is AnsiString because the unit holds its declaration has {$mode objfpc}{$H+}
As above, and a BFO....

3-LCL (and its TEdit), as is today, expects String to be UTF8.
OK, I get that now.

4-RTL is planned to support UTF16. AFAIK it is not there yet, not 100%.
I think that would be good. And had I not been in the Delphi world converting code to FPC/Lazarus, it probably would not be an issue.

You should really check the declaration of the C++ code. The base types are char for single byte characters and wchar_t for multibyte characters, though in the C++ world wchar_t might be for UTF-32 characters, not UTF-16 ones as FPC's/Delphi's WideChar (e.g. on every non-Windows system). If the C++ code uses some other type you should check what base types it matches to.
I have, the C++ developer and I did a dance a number of years ago when he was C++ and I was Delphi. Delphi is Unicode/ pChar (pWideChar) so the C++ parameters he expects are pWideChar(TChar in his speak), integers and booleans. if a pChar is necessary as in pAnsiChar it is explicitly called out.

I doubt it. The LCL is currently geared towards UTF-8, checking for UTF-16 will probably only be done once FPC also begins to switch targets to UTF-16.
Yes, I suspect that's pretty solid advice.

Long story short: assign the Text property of the edit to a UnicodeString variable (or use a typecast) and then cast that to PWideChar (just as you did in your example).
Yes, absolutely agree. I think (sic) I was overthinking the whole exercise. Yours and engikin's  advice now make it quite clear. At least my strings will be unicode in the unit where I use the {$MODE DELPHIUNICODE} which reduces the pain significantly.

Thanks guys, clear in my head now.
« Last Edit: March 20, 2019, 11:31:47 pm by kevin.black »

 

TinyPortal © 2005-2018