Recent

Author Topic: UTF8 question  (Read 2030 times)

k1attila1

  • Full Member
  • ***
  • Posts: 105
UTF8 question
« on: September 19, 2018, 07:39:51 am »
Hi

   s:='AŐABCDEF';

   ShowMessage('x'+UTF8Trim(s)+'x');            //trim ok
   ShowMessage(UTF8LeftStr(s,3));               //leftstr  ok
   ShowMessage(UTF8Copy(s,1,3));                //copy ok
   ShowMessage(IntToStr(UTF8Length(s)));        //  ok =8
   ShowMessage(s[1]);     // 'A' - ok
   ShowMessage(s[2]);     //   show nothing why ?
   ShowMessage(s[3]);     // show nothing why ?
   ShowMessage(s[4]);     // 'A'  - it is the 3th character !
   ShowMessage(s[5]);
   ShowMessage(s[6]);
   ShowMessage(s[7]);
   ShowMessage(s[8]);
   ShowMessage(s[9]);    //why can i address 9th charachter when length is 8


I understand UTF8 system and i know that 'Ő' is 2 bytes.
But how can i solve it to show me correctly ?

thank you



JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: UTF8 question
« Reply #1 on: September 19, 2018, 08:56:28 am »
   ShowMessage(s[9]);    //why can i address 9th charachter when length is 8
No, UTF8Length is 8, Length is more.
Actually you answered your own question later by "I know that 'Ő' is 2 bytes".

Quote
I understand UTF8 system and i know that 'Ő' is 2 bytes.
But how can i solve it to show me correctly ?
Try this:
Code: Pascal  [Select][+][-]
  1. uses LazUnicode;
  2. ...
  3. var s, ch: String;
  4. ...
  5. s:='AŐABCDEF';
  6. for ch in s do
  7.   ShowMessage(ch);
Unit LazUnicode is in package LazUtils.
« Last Edit: September 19, 2018, 09:34:23 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

k1attila1

  • Full Member
  • ***
  • Posts: 105
Re: UTF8 question
« Reply #2 on: September 19, 2018, 10:04:48 am »
Thank you

But how can i change 1 character in an UTF8 string ?
old : s[2]:='A';


JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: UTF8 question
« Reply #3 on: September 19, 2018, 10:50:19 am »
But how can i change 1 character in an UTF8 string ?
old : s[2]:='A';
First, be careful with the term "character". It can mean many things in Unicode. One meaning is still the "codeunit" which here is Pascal "Char" type.
One meaning is a "user perceived character" which is different from a "codepoint".

Anyway, "characters" or codepoints must be treated as strings when replacing them because their lengths may differ.
StringReplace works because it supports different lengths for its input strings.
Unit LazUTF8 also has UTF8StringReplace which takes care of upper-/lowercase rules of Unicode.
Otherwise you must do some Copy() calls to construct a new string from the start, middle and end parts.
These examples may give you ideas:
 http://wiki.freepascal.org/UTF8_strings_and_characters
Remember, you can also do encoding agnostic code with unit LazUnicode.
« Last Edit: September 19, 2018, 10:54:15 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

k1attila1

  • Full Member
  • ***
  • Posts: 105
Re: UTF8 question
« Reply #4 on: September 19, 2018, 10:53:16 am »
Thank you again

 

TinyPortal © 2005-2018