XML Module: Difference between revisions

From This Prolog Life
Jump to navigation Jump to search
(plxml v5.0)
No edit summary
Line 73: Line 73:
The iso-8859-1, iso-8859-2, iso-8859-15 and windows-1252 8-bit encodings are supported, but they must be identified correctly in the signature of the XML document.
The iso-8859-1, iso-8859-2, iso-8859-15 and windows-1252 8-bit encodings are supported, but they must be identified correctly in the signature of the XML document.


With the exception of comments and CDATA sections, xml.pl and therefore plxml, output a plain ASCII encoding with the following properties:
With the exception of comments and CDATA sections, xml.pl and plxml output a plain ASCII encoding with the following properties:
* In all character data, the characters & < and > are encoded as & < and > respectively.
* In all character data, the characters & < and > are encoded as & < and > respectively.
* In non-parsed character data, such as ''Attribute Values'', the characters " and ' are encoded as " and ' respectively.
* In non-parsed character data, such as ''Attribute Values'', the characters " and ' are encoded as " and ' respectively.

Revision as of 19:22, 26 March 2020

Terms and Conditions

This program is offered free of charge as unsupported source code. You may use it, copy it, distribute it, modify it or sell it without restriction.

I hope that it will be useful to you, but it is provided "as is" without any warranty express or implied, including but not limited to the warranty of non-infringement and the implied warranties of merchantability and fitness for a particular purpose.

Download

Current Version:
      XML Module 3.7 released 2014/07/09
      Windows application plxml 5.0 released 2020/03/27

Download the source code in tar.gz format (21KB)

Download the source code and Windows application as a ZIP file (436KB). UTF-8 XML output – supporting non-ASCII Unicode characters in comments and CDATA sections.

Extracting the files

Unzip the files to create a folder structure as follows:

+---bin
|   |   plxml.exe               : Application
|   |   libpl.dll               : Quintus Prolog support DLL
|   |   libqp.dll               :    "       "      "     "
|   |   qpconsole.dll           :    "       "      "     "
|   |   qpeng.dll               :    "       "      "     "
|
+---source
    |   xml.pl                  : Quintus Prolog module wrapper
    |   xml.iso.pl              : ISO Prolog module wrapper
    |   xml.lpa.pl              : LPA Prolog wrapper
    |   xml_driver.pl           : Driver
    |   xml_acquisition.pl      : Chars -> Document parsing
    |   xml_generation.pl       : Document -> Chars parsing
    |   xml_diagnosis.pl        : Document -> Chars parsing exception
    |   xml_pp.pl               : Document pretty-printing
    |   xml_utilities.pl        : Shared code

The Source Code (xml.pl)

xml is intended to be a modular module: it should be easy to build a program that can output XML, but not read it, or vice versa. Similarly, you may be happy to dispense with diagnosis once you are sure that your code will only try to make valid calls to xml_parse/2.

It is intended that the code should be very portable too. Clearly, some small changes will be needed between platforms, but these should be limited to the top-level wrapper file which contains the potentially non-portable code.

It is suggested that you name the wrapper file you need as xml.pl

Using plxml.exe

The application and the DLLs should reside in the same directory, unless you have a good reason to do something different.

The application is invoked with two file names as operands, i.e.

 plxml [-(c|p|a)*] INPUT OUTPUT

If INPUT contains a Prolog xml/2 clause, OUTPUT is written as the corresponding XML.

If INPUT contains a Prolog malformed/2 clause, OUTPUT is written as the corresponding XML with the unparsed/1 and out_of_context/1 terms written as CDATA.

If INPUT is an XML file, OUTPUT is written as a Prolog xml/2 or malformed/2 clause.

INPUT and/or OUTPUT may be "-" indicating stdin/stdout respectively.

The -a option
allows unescaped & (ampersand) characters to occur in PCDATA;
The -p option
preserves whitespace;
The -c option
causes prefixes to be removed from attribute names if the explicitly denoted namespace is the same as that of the containing tag (XML input only);

Plxml's Prolog output is compatible with LPA Prolog. Specifically, strings containing ~ (tilde) characters are output as lists of character codes.

Using plxml as a development tool

A common use of xml.pl is to populate template XML documents with answers from a Prolog application. A nice approach is to design a prototype document and then translate this into an xml/2 term with plxml.

For example, a prototype HTML page could be produced with a WYSIWYG XHTML editor. Alternatively, if your editor produces plain HTML, you can use plxml in combination with HTML Tidy e.g.

 tidy -asxhtml -ascii [HTMLFile] | plxml - [PLFile]

Similarly, during prototyping/early development it may be convenient to use plxml as the interface, rather than integrating xml.pl.

Using plxml to repair XML

plxml can sometimes repair broken XML:

 plxml -ac [Broken XML] - | plxml - [Fixed XML]

Character Encoding

By default, plxml can accept input encoded as UTF-8 or UTF-16, which encompasses plain 7-bit ASCII. The iso-8859-1, iso-8859-2, iso-8859-15 and windows-1252 8-bit encodings are supported, but they must be identified correctly in the signature of the XML document.

With the exception of comments and CDATA sections, xml.pl and plxml output a plain ASCII encoding with the following properties:

  • In all character data, the characters & < and > are encoded as &amp; &lt; and &gt; respectively.
  • In non-parsed character data, such as Attribute Values, the characters " and ' are encoded as &quot; and &apos; respectively.
  • Any character codes > 127 are output as decimal character entities e.g. 160 as &#160;.

Only character codes allowed by the XML specification are encoded by xml_parse/[2,3].

From version 5.0, plxml XML output is encoded as UTF-8.