Recent

Author Topic: Special characters directly from MemoryStream  (Read 3470 times)

ezlage

  • Guest
Special characters directly from MemoryStream
« on: May 04, 2016, 02:13:56 pm »
I have a MemoryStream that holds a/an HTML file.
I can convert it to a SQLite table, but chars like Ç, À, É, Õ and Ô are lost.
If I insert some text with this characters from a manually filled TEdit, works perfectly, but directly from a MemoryStream the chars are lost.

How can I prevent this?

Check the attachment.
« Last Edit: May 04, 2016, 02:23:50 pm by ezlage »

Michl

  • Full Member
  • ***
  • Posts: 226
Re: Special characters directly from MemoryStream
« Reply #1 on: May 05, 2016, 12:37:55 am »
Do you speak about html entities? If yes, maybe this pice of code will help you:
Code: Pascal  [Select][+][-]
  1. uses ..., HTMLDefs;
  2. ...
  3. function EntityToStr(const s: String): String;
  4. const
  5.   EntityStartByte = 38;
  6.   EntityEndByte = 59;
  7. var
  8.   StreamIn: TStringStream;
  9.   StreamOut: TStringStream;
  10.   Entity: String;
  11.   wc: WideChar;
  12.   b: Byte;
  13.   aPos: Int64;
  14.   StartCopy: Int64;
  15.   DummyOffset: Int64;
  16.   EntityEndSignal: Integer;
  17.  
  18.   procedure SetStreamInPosition(Value: Int64);
  19.   begin
  20.     aPos := Value;
  21.     StreamIn.Position := Value;
  22.   end;
  23.  
  24. begin
  25.   Result := '';
  26.   StreamIn := TStringStream.Create(s);
  27.   StreamOut := TStringStream.Create;
  28.   try
  29.     if StreamIn.Size = 0 then Exit;
  30.  
  31.     SetStreamInPosition(0);
  32.     StartCopy := 0;
  33.  
  34.     while aPos < StreamIn.Size do
  35.     begin
  36.       b := StreamIn.Readbyte;
  37.       Inc(aPos);
  38.  
  39.       if (b = EntityStartByte) then begin
  40.         DummyOffset := aPos;
  41.         EntityEndSignal := 0;
  42.         while (EntityEndSignal < 100) and (b <> EntityEndByte)
  43.         and (aPos + EntityEndSignal < StreamIn.Size) do
  44.         begin
  45.           b := StreamIn.Readbyte;
  46.           Inc(aPos);
  47.           Inc(EntityEndSignal);
  48.         end;
  49.  
  50.         if b <> EntityEndByte then
  51.         begin
  52.           SetStreamInPosition(DummyOffset);
  53.           Continue;
  54.         end;
  55.  
  56.         StreamIn.Position := DummyOffset;
  57.         Entity := StreamIn.ReadString(EntityEndSignal - 1);
  58.  
  59.         if not ResolveHTMLEntityReference(UnicodeString(Entity), wc{%H-}) then
  60.         begin
  61.           SetStreamInPosition(DummyOffset);
  62.           Continue;
  63.         end;
  64.  
  65.         StreamIn.Position := StartCopy;
  66.         aPos := DummyOffset;
  67.         if (aPos - StartCopy - 1 > 0) then
  68.           StreamOut.CopyFrom(StreamIn, aPos - StartCopy - 1);
  69.  
  70.         Entity := UTF8Encode(WideCharLenToString(@wc, 1));
  71.         StreamOut.WriteString(Entity);
  72.  
  73.         StartCopy := DummyOffset + EntityEndSignal;
  74.         SetStreamInPosition(StartCopy);
  75.       end;
  76.     end;
  77.  
  78.     if (aPos - StartCopy - 1 > 0) then
  79.     begin
  80.       StreamIn.Position := StartCopy;
  81.       StreamOut.CopyFrom(StreamIn, aPos - StartCopy - 1);
  82.     end;
  83.  
  84.     StreamOut.Position := 0;
  85.     Result := StreamOut.DataString;
  86.   finally
  87.     StreamOut.Free;
  88.     StreamIn.Free;
  89.   end;
  90. end;      
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

ezlage

  • Guest
Re: Special characters directly from MemoryStream
« Reply #2 on: May 05, 2016, 05:05:36 am »
Almost. But thank you for try to help me!

In my case, this piece, for example:
Code: Text  [Select][+][-]
  1. <tr><td>NÃO FAÇO MAL A NINGUÉM</td><td>...</td></tr>

Is extracted from the MemoryStream this way:
Code: Text  [Select][+][-]
  1. <tr><td>N?O FA?O MAL A NINGU?M</td><td>...</td></tr>

I need to read the data without lose the brazilian portuguese especial characters.

Someone more can help me?

Sorry by my poor english. Thank you!
« Last Edit: May 05, 2016, 08:59:25 pm by ezlage »

Michl

  • Full Member
  • ***
  • Posts: 226
Re: Special characters directly from MemoryStream
« Reply #3 on: May 05, 2016, 07:50:41 am »
Oh, then it seems to be a codepage issue. Changing the codepage of your strings should do the job. Something like:
Code: Pascal  [Select][+][-]
  1. s := SomeStringStream.DataString;
  2. Setcodepage(RawByteString (s), 1252, False); // the codepage you get
  3. Setcodepage(RawByteString (s), cp_utf8, True);
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

ezlage

  • Guest
Re: Special characters directly from MemoryStream
« Reply #4 on: May 05, 2016, 08:52:24 pm »
Michl, your suggestion solved my problem!

Thank you very much!

 

TinyPortal © 2005-2018