Recent

Author Topic: thtmldocument (DOM_HTML), get dom string ?  (Read 4528 times)

BubikolRamios

  • Sr. Member
  • ****
  • Posts: 267
thtmldocument (DOM_HTML), get dom string ?
« on: February 19, 2018, 09:10:04 am »
what goes wrong here ?
Code: Pascal  [Select][+][-]
  1.   doc: thtmldocument;
  2.   domNodeList: tdomnodelist;
  3.   domNode:TDOMNode;
  4. begin
  5.    //this works, tested, gets doc from some HTML page.
  6.    doc := GetDoc(FileUtil.CreateAbsolutePath(SynEdit1.Lines[i],siteRoorUrl));
  7.    
  8.    //Project xxxraised exception class 'External: SIGFPE'.
  9.    //At address 1002569A3
  10.    domNode := TDOMNode(doc.GetElementsByTagName('body')[0]);
  11.  
« Last Edit: February 19, 2018, 09:13:28 am by BubikolRamios »
lazarus 3.2-fpc-3.2.2-win32/win64

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #1 on: February 19, 2018, 09:14:11 am »
what goes wrong here ?
Code: [Select]
GetDoc(FileUtil.CreateAbsolutePath(SynEdit1.Lines,siteRoorUrl));
What does GetDoc actually do ? download a html file from the interwebs for you ?  i don't think so ;D


BubikolRamios

  • Sr. Member
  • ****
  • Posts: 267
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #2 on: February 19, 2018, 09:16:27 am »
not purified:

Code: Pascal  [Select][+][-]
  1. function TForm1.GetDoc (url: String): thtmldocument;
  2. var
  3.   HTTPGetResult: Boolean;
  4.   HTTPSender: THTTPSend;
  5.   //outString: string;
  6.  
  7.   doc: thtmldocument;
  8.   //els: tdomnodelist;
  9.  
  10.   //i: integer = 0;
  11.  
  12. begin
  13.  
  14.  
  15.   //downloading html
  16.   //http://wiki.lazarus.freepascal.org/Synapse
  17.  
  18.   //URL := Edit1.Text;
  19.   //Result := False;
  20.   HTTPSender := THTTPSend.Create;
  21.   try
  22.     HTTPGetResult := HTTPSender.HTTPMethod('GET', url);
  23.     siteRoorUrl := HTTPSender.TargetHost;
  24.  
  25.     if (HTTPSender.ResultCode >= 100) and (HTTPSender.ResultCode<=299) then begin
  26.       //SetString(outString, HTTPSender.Document.Memory, HTTPSender.Document.Size);
  27.       //showmessage(outString);
  28.       //Result := True;
  29.  
  30.       //tstringstream to doc
  31.       readhtmlfile(doc,HTTPSender.Document);
  32.     end;
  33.   finally
  34.     HTTPSender.Free;
  35.   end;
  36.  
  37.   result :=  doc;
  38. end;  
  39.  
lazarus 3.2-fpc-3.2.2-win32/win64

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #3 on: February 19, 2018, 09:31:20 am »
Cool, guessing games.  :)

next on the list: is your DOC variable initialized somewhere ?

better check to make sure before parsing.

Code: [Select]
doc := GetDoc(FileUtil.CreateAbsolutePath(SynEdit1.Lines[i],siteRoorUrl));
if assigned(doc) then
begin
  // parse the document
end

BubikolRamios

  • Sr. Member
  • ****
  • Posts: 267
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #4 on: February 19, 2018, 09:47:45 am »
As I told before, it is assigned. Nevertheless putted your check in.

Quote
//this works, tested, gets doc from some HTML page

problem is in last line from OP. Checked page html, 'body' element is there.
« Last Edit: February 19, 2018, 09:51:20 am by BubikolRamios »
lazarus 3.2-fpc-3.2.2-win32/win64

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #5 on: February 19, 2018, 09:53:59 am »
Before getting too frustrated, make sure you do some defensive programming when parsing html files otherwise you'll give up in the end.
Code: [Select]
TDOMNode(doc.GetElementsByTagName('body')[0]);
is not considered defensive, rather you assume that GetElementsByTagName will always result in a valid returned NodeList and that it contains at least one entry.

Mind you, i can not see/mimic your retrieved html contents. Hence the guessing  :)

BubikolRamios

  • Sr. Member
  • ****
  • Posts: 267
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #6 on: February 19, 2018, 10:10:45 am »
I get frustrated every time, doing something from  scratch with lazarus. Debugging is terrible. I do remember delphi 3.0 or something like that, it was much better.
Doing things mostly with java where debugging shows things as it should.

if filling so, you can test actual page:
Quote
http://www.pomurske-lekarne.si/javna-narocila/


Defensive programming comes last, always (-: I want some results first (-:
lazarus 3.2-fpc-3.2.2-win32/win64

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: thtmldocument (DOM_HTML), get dom string ?
« Reply #7 on: February 19, 2018, 11:30:46 am »
Debugging structures can be a bit problematic sometimes with lazarus. I rather skip it. I do most my work with fpc.

It can't hurt to at least write out some lines sometimes  :)

I think your problem might be that when the stream is read with html data that its position is set to the end of the stream. Therefor parsing it into a htmldocument delivers nothing at all.

I do not have experience with synapse though, so am assuming that HTTPSender.Document is a stream and faces the same issue ?

 

TinyPortal © 2005-2018