Recent

Author Topic: Malformed XML parser  (Read 3395 times)

gicla

  • New Member
  • *
  • Posts: 10
Malformed XML parser
« on: February 06, 2018, 11:57:50 pm »
 %)  %)
Help me please! I surfed the web for days but I could not find any help or code examples about how to parse a so called "malformed" XML file.
I have to read nodes in a log file in which there are messages XML style but without root node.
This is an example:
Code: Pascal  [Select][+][-]
  1. <Request><Seq>000241</Seq><Type>23</Type><Value>0</Value></Request>
  2. <Request><Seq>000241</Seq><Type>23</Type><Resp>00</Resp><Text>Approved</Text></Request>
Currently I'm using:
Code: Pascal  [Select][+][-]
  1. ReadXMLFile(Doc, fs);
  2. ParsingNodes := Doc.GetElementsByTagName('Request');
But it's working only if I add root element to the log file before to open it.

Any help will be appreciated.

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: Malformed XML parser
« Reply #1 on: February 07, 2018, 01:10:14 am »
But it's working only if I add root element to the log file before to open it.

You would still have to add the root-node manually because XML will still need to be valid  :)

Here is another way to accomplish that:
Code: Pascal  [Select][+][-]
  1.   RootNode := Doc.CreateElement('root');
  2.   Doc.AppendChild(RootNode);
  3.   ReadXMLFragment(RootNode, 'data.xml');
  4.  
  5.   ParsingNodes := Doc.GetElementsByTagName('Request');
  6.  

gicla

  • New Member
  • *
  • Posts: 10
Re: Malformed XML parser
« Reply #2 on: February 07, 2018, 11:11:07 am »
Thanks molly,
your hint it's perfectly working with the xml in my previous example. Problem now is that the real xml I'm going to read is slightly different, it has a declaration and looks like the following:
Code: XML  [Select][+][-]
  1. 2018-01-30 10:30:43,862 INFO  [Transaction-1] [REQUEST  ] message:[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Request><Seq>000241</Seq><Type>23</Type><Value>0</Value><Text>Not Approved</Text></Request>]
  2. 2018-01-30 10:30:43,862 INFO  [Transaction-3] [REQUEST  ] message:[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Request><Seq>000241</Seq><Type>23</Type><Resp>00</Resp><Text>Approved</Text></Request>]

When trying to open it, the command ReadXMLFragment generate the following error: "XML declaration is not allowed here".
Is there any parameter I can set in order for the ReadXMLFragment command to ignore the XML declaration?

This is the current piece of code:
Code: Pascal  [Select][+][-]
  1. var
  2.   I: Integer;
  3.   fs: TFileStream;
  4.   PassNode: TDOMNode;
  5.   Doc: TXMLDocument;
  6.   ParsingNodes: TDOMNodeList;
  7.  
  8. begin
  9.   if OpenDialog1.Execute then
  10.     begin
  11.       fs := nil;
  12.       fs := TFileStream.Create(Utf8ToAnsi(OpenDialog1.FileName), fmOpenRead or fmShareDenyNone);
  13.  
  14.       Doc := TXMLDocument.Create;
  15.       PassNode := Doc.CreateElement('root');
  16.       Doc.AppendChild(PassNode);
  17.       ReadXMLFragment(PassNode, fs);                        <-- LINE GENERATING THE ERROR
  18.       ParsingNodes := Doc.GetElementsByTagName('TransactionResponse'); //seleziona solo responses
  19.  
  20.       for I := 0 to doc.DocumentElement.GetChildCount - 1 do
  21.         begin
  22. //          ... anything I want to do with the XML ...
  23.         end;
  24.     end;
  25. end;

Thanks in advance!

Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: Malformed XML parser
« Reply #3 on: February 07, 2018, 08:08:38 pm »
Thanks molly,
your hint it's perfectly working with the xml in my previous example. Problem now is that the real xml I'm going to read is slightly different, it has a declaration and looks like the following:
Code: XML  [Select][+][-]
  1. 2018-01-30 10:30:43,862 INFO  [Transaction-1] [REQUEST  ] message:[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Request><Seq>000241</Seq><Type>23</Type><Value>0</Value><Text>Not Approved</Text></Request>]
  2. 2018-01-30 10:30:43,862 INFO  [Transaction-3] [REQUEST  ] message:[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Request><Seq>000241</Seq><Type>23</Type><Resp>00</Resp><Text>Approved</Text></Request>]

When trying to open it, the command ReadXMLFragment generate the following error: "XML declaration is not allowed here".
Is there any parameter I can set in order for the ReadXMLFragment command to ignore the XML declaration?


You need to learn and use regular expression to extract the xml portion from the data.

http://wiki.freepascal.org/Regexpr

This is the current piece of code:
Code: Pascal  [Select][+][-]
  1. var
  2.   I: Integer;
  3.   fs: TFileStream;
  4.   PassNode: TDOMNode;
  5.   Doc: TXMLDocument;
  6.   ParsingNodes: TDOMNodeList;
  7.  
  8. begin
  9.   if OpenDialog1.Execute then
  10.     begin
  11.       fs := nil;
  12.       fs := TFileStream.Create(Utf8ToAnsi(OpenDialog1.FileName), fmOpenRead or fmShareDenyNone);
  13.  
  14.       Doc := TXMLDocument.Create;
  15.       PassNode := Doc.CreateElement('root');
  16.       Doc.AppendChild(PassNode);
  17.       ReadXMLFragment(PassNode, fs);                        <-- LINE GENERATING THE ERROR
  18.       ParsingNodes := Doc.GetElementsByTagName('TransactionResponse'); //seleziona solo responses
  19.  
  20.       for I := 0 to doc.DocumentElement.GetChildCount - 1 do
  21.         begin
  22. //          ... anything I want to do with the XML ...
  23.         end;
  24.     end;
  25. end;

Thanks in advance!

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: Malformed XML parser
« Reply #4 on: February 07, 2018, 11:38:06 pm »
Problem now is that the real xml I'm going to read is slightly different...

Indeed, and alas, in that case the readXMLfragment is also not able to help you out there.

If it was me, i would take a moment to think on what it is exactly that you wish to accomplish as there could be several approaches.

First, it seems to me that your file is a log file. Each entry in that log is situated on a single line. Part of that line contains xml.

What is it that you wish to accomplish with the data stored in the xml part (and what of the other information for that matter) ?

Is your goal to end up with a single xml document that has each entry listed, so that you are able to query the xml document for those data that you need ? Or is it that the only reason to use xml parsing is because your data is stored in xml format, and that you do not care about that at all as long as you are able to extract the parts that you are interested in ?

In case the former, you could perhaps use a stringlist for line parsing, use extractword (and related function) to break apart the line and store that into an xml document. In that case the xml tags would have to be removed/skipped/ignored/replaced.

In case the latter then using a regular expressions might perhaps be more helpful, as suggested by user Cyrax (you could use regex for the former as well, but nothing a simple search and replace or extractword couldn't fix for you).

Secondly, there is use of non compliant xml. You could perhaps solve that with using a on the fly xml parser (react to events) (perhaps better known as sax), but you have to take into account that in the end using a sax parser could perhaps not solve the reading/parsing for you either.

Another possible approach could be to use benibela's internet tools

I'm too inexperienced with xml (and mentioned approaches) to be able to give you a better advice atm. I was unable to solve it in a more easier manner, but perhaps other readers have an idea or two.

gicla

  • New Member
  • *
  • Posts: 10
Re: Malformed XML parser
« Reply #5 on: February 08, 2018, 12:50:34 am »
Thanks a lot molly for your very detailed comments. I appreciated.
At the end, if no better solution from other readers, I guess I will read the log file, search for and extract xml portions and store them in an array.
Then I will validate every xml message one at a time picking them from the array.
I'll update the post if succesful.

 

TinyPortal © 2005-2018