XML Query Use Cases with xml.pl

From This Prolog Life
Revision as of 14:32, 8 November 2014 by John (talk | contribs)
Jump to navigation Jump to search

The following is a complete example to illustrate how the xml.pl module can be used. It exercises both the input and output parsing modes of xml_parse/[2,3], and illustrates the use of xml_subterm/2 to access the nodes of a "document value model". It's written for Quintus Prolog, but should port to other Prologs easily.

test( +QueryId )

The test/1 predicate is the entry-point of the program and executes a Prolog implementation of a Query from Use Case "XMP": Experiences and Exemplars, in the W3C's XML Query Use Cases, which "contains several example queries that illustrate requirements gathered from the database and document communities".

QueryId is one of q1...q12 selecting which of the 12 use cases is executed.

The XML output is written to the file [QueryId].xml in the current directory. xml_pp/1 is used to display the resulting "document value model" data-structures on the user output (stdout) stream. <syntaxhighlight lang="prolog">test( Query ) :-

   xml_query( Query, ResultElement ),
   % Parse output XML into the Output chars
   xml_parse( Output, xml([], [ResultElement]) ),
   absolute_file_name( Query, [extensions(xml)], OutputFile ),
   % Write OutputFile from the Output list of chars
   tell( OutputFile ),
   put_chars( Output ),
   told,
   % Pretty print OutputXML
   write( 'Output XML' ), nl,
   xml_pp( xml([], [ResultElement]) ).</syntaxhighlight>

xml_query( +QueryNo, ?OutputXML )

when OutputXML is an XML Document Value Model produced by running an example, identified by QueryNo, taken from the XML Query "XMP" use case.

Q1

List books published by Addison-Wesley after 1991, including their year and title. <syntaxhighlight lang="prolog">xml_query( q1, element(bib, [], Books) ) :-

   element_name( Title, title ),
   element_name( Publisher, publisher ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(book, [year=Year], [Title]),
       (
       xml_subterm( Bibliography, element(book, Attributes, Content) ),
       xml_subterm( Content, Publisher ),
       xml_subterm( Publisher, Text ),
       text_value( Text, "Addison-Wesley" ),
       member( year=Year, Attributes ),
       number_codes( YearNo, Year ),
       YearNo > 1991,
       xml_subterm( Content, Title )
       ),
       Books
   ).</syntaxhighlight>

Q2

Create a flat list of all the title-author pairs, with each pair enclosed in a "result" element. <syntaxhighlight lang="prolog">xml_query( q2, element(results, [], Results) ) :-

   element_name( Title, title ),
   element_name( Author, author ),
   element_name( Book, book ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(result, [], [Title,Author]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       xml_subterm( Book, Author )
       ),
       Results
   ).</syntaxhighlight>

Q3

For each book in the bibliography, list the title and authors, grouped inside a "result" element. <syntaxhighlight lang="prolog">xml_query( q3, element(results, [], Results) ) :-

   element_name( Title, title ),
   element_name( Author, author ),
   element_name( Book, book ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(result, [], [Title|Authors]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       findall( Author, xml_subterm(Book, Author), Authors )
       ),
       Results
   ).</syntaxhighlight>

Q4

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element. <syntaxhighlight lang="prolog">xml_query( q4, element(results, [], Results) ) :-

   element_name( Title, title ),
   element_name( Author, author ),
   element_name( Book, book ),
   input_document( 'bib.xml', Bibliography ),
   findall( Author, xml_subterm(Bibliography, Author), AuthorBag ),
   sort( AuthorBag, Authors ),
   findall(
       element(result, [], [Author|Titles]),
       (
       member( Author, Authors ),
       findall( Title, (
           xml_subterm( Bibliography, Book ),
           xml_subterm( Book, Author ),
           xml_subterm( Book, Title )
           ),
           Titles
           )
       ),
       Results
   ).</syntaxhighlight>

Q5

For each book found at both bn.com and amazon.com, list the title of the book and its price from each source. <syntaxhighlight lang="prolog">xml_query( q5, element('books-with-prices', [], BooksWithPrices) ) :-

   element_name( Title, title ),
   element_name( Book, book ),
   element_name( Review, entry ),
   input_document( 'bib.xml', Bibliography ),
   input_document( 'reviews.xml', Reviews ),
   findall(
       element('book-with-prices', [], [
           Title,
           element('price-bn',[], BNPrice ),
           element('price-amazon',[], AmazonPrice )
           ] ),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       xml_subterm( Reviews, Review ),
       xml_subterm( Review, Title ),
       xml_subterm( Book, element(price,_, BNPrice) ),
       xml_subterm( Review, element(price,_, AmazonPrice) )
       ),
       BooksWithPrices
   ).</syntaxhighlight>

Q6

For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. <syntaxhighlight lang="prolog">xml_query( q6, element(bib, [], Results) ) :-

   element_name( Title, title ),
   element_name( Author, author ),
   element_name( Book, book ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(book, [], [Title,FirstAuthor|Authors]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       findall( Author, xml_subterm(Book, Author), [FirstAuthor|Others] ),
       other_authors( Others, Authors )
       ),
       Results
   ).</syntaxhighlight>

Q7

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order. <syntaxhighlight lang="prolog">xml_query( q7, element(bib, [], Books) ) :-

   element_name( Title, title ),
   element_name( Publisher, publisher ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       Title-element(book, [year=Year], [Title]),
       (
       xml_subterm( Bibliography, element(book, Attributes, Book) ),
       xml_subterm( Book, Publisher ),
       xml_subterm( Publisher, Text ),
       text_value( Text, "Addison-Wesley" ),
       member( year=Year, Attributes ),
       number_codes( YearNo, Year ),
       YearNo > 1991,
       xml_subterm( Book, Title )
       ),
       TitleBooks
   ),
   keysort( TitleBooks, TitleBookSet ),
   range( TitleBookSet, Books ).</syntaxhighlight>

Q8

Find books in which the name of some element ends with the string "or" and the same element contains the string "Suciu" somewhere in its content. For each such book, return the title and the qualifying element. <syntaxhighlight lang="prolog">xml_query( q8, element(bib, [], Books) ) :-

   element_name( Title, title ),
   element_name( Book, book ),
   element_name( QualifyingElement, QualifyingName ),
   append( "Suciu", _Back, Suffix ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(book, [], [Title,QualifyingElement]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, QualifyingElement ),
       atom_codes( QualifyingName, QNChars ),
       append( _QNPrefix, "or", QNChars ),
       xml_subterm( QualifyingElement, TextItem ),
       text_value( TextItem, TextValue ),
       append( _Prefix, Suffix, TextValue ),
       xml_subterm( Book, Title )
       ),
       Books
   ).</syntaxhighlight>

Q9

In the document "books.xml", find all section or chapter titles that contain the word "XML", regardless of the level of nesting. <syntaxhighlight lang="prolog">xml_query( q9, element(results, [], Titles) ) :-

   element_name( Title, title ),
   append( "XML", _Back, Suffix ),
   input_document( 'books.xml', Books ),
   findall(
       Title,
       (
       xml_subterm( Books, Title ),
       xml_subterm( Title, TextItem ),
       text_value( TextItem, TextValue ),
       append( _Prefix, Suffix, TextValue )
       ),
       Titles
   ).</syntaxhighlight>

Q10

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute. <syntaxhighlight lang="prolog">xml_query( q10, element(results, [], MinPrices) ) :-

   element_name( Title, title ),
   element_name( Price, price ),
   input_document( 'prices.xml', Prices ),
   findall( Title, xml_subterm(Prices, Title), TitleBag ),
   sort( TitleBag, TitleSet ),
   element_name( Book, book ),
   findall(
       element(minprice, [title=TitleString], [MinPrice]),
       (
       member( Title, TitleSet ),
       xml_subterm( Title, TitleText ),
       text_value( TitleText, TitleString ),
       findall( PriceValue-Price, (
           xml_subterm( Prices, Book ),
           xml_subterm( Book, Title ),
           xml_subterm( Book, Price ),
           xml_subterm( Price, Text ),
           text_value( Text, PriceChars ),
           number_codes( PriceValue, PriceChars )
           ),
           PriceValues
           ),
       minimum( PriceValues, PriceValue-MinPrice )
       ),
       MinPrices
   ).</syntaxhighlight>

Q11

For each book with an author, return the book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation. <syntaxhighlight lang="prolog">xml_query( q11, element(bib, [], Results) ) :-

   element_name( Title, title ),
   element_name( Author, author ),
   element_name( Book, book ),
   element_name( Editor, editor ),
   element_name( Affiliation, affiliation ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element(book, [], [Title,FirstAuthor|Authors]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       findall( Author, xml_subterm(Book, Author), [FirstAuthor|Authors] )
       ),
       Books
   ),
   findall(
       element(reference, [], [Title,Affiliation]),
       (
       xml_subterm( Bibliography, Book ),
       xml_subterm( Book, Title ),
       xml_subterm( Book, Editor ),
       xml_subterm( Editor, Affiliation )
       ),
       References
   ),
   append( Books, References, Results ).</syntaxhighlight>

Q12

Find pairs of books that have different titles but the same set of authors (possibly in a different order). <syntaxhighlight lang="prolog">xml_query( q12, element(bib, [], Pairs) ) :-

   element_name( Author, author ),
   element_name( Book1, book ),
   element_name( Book2, book ),
   element_name( Title1, title ),
   element_name( Title2, title ),
   input_document( 'bib.xml', Bibliography ),
   findall(
       element('book-pair', [], [Title1,Title2]),
       (
       xml_subterm( Bibliography, Book1 ),
       findall( Author, xml_subterm(Book1, Author), AuthorBag1 ),
       sort( AuthorBag1, AuthorSet ),
       xml_subterm( Bibliography, Book2 ),
       Book2 @< Book1,
       findall( Author, xml_subterm(Book2, Author), AuthorBag2 ),
       sort( AuthorBag2, AuthorSet ),
       xml_subterm( Book1, Title1 ),
       xml_subterm( Book2, Title2 )
       ),
       Pairs
   ).</syntaxhighlight>

Auxiliary Predicates

<syntaxhighlight lang="prolog">other_authors( [], [] ). other_authors( [Author|Authors], [Author|EtAl] ) :-

   et_al( Authors, EtAl ).

et_al( [], [] ). et_al( [_|_], [element('et-al',[],[])] ).

text_value( [pcdata(Text)], Text ). text_value( [cdata(Text)], Text ).

element_name( element(Name, _Attributes, _Content), Name ).</syntaxhighlight>

range( +Pairs, ?Range )

when Pairs is a list of key-datum pairs and Range is the list of data. <syntaxhighlight lang="prolog">range( [], [] ). range( [_Key-Datum|Pairs], [Datum|Data] ) :-

   range( Pairs, Data ).</syntaxhighlight>

minimum( +List, ?Min )

is true if Min is the least member of List in the standard order. <syntaxhighlight lang="prolog">minimum( [H|T], Min ):-

   minimum1( T, H, Min ).

minimum1( [], Min, Min ). minimum1( [H|T], Min0, Min ) :-

   compare( Relation, H, Min0 ),
   minimum2( Relation, H, Min0, T, Min ).

minimum2( '=', Min0, Min0, T, Min ) :-

   minimum1( T, Min0, Min ).

minimum2( '<', Min0, _Min1, T, Min ) :-

   minimum1( T, Min0, Min ).

minimum2( '>', _Min0, Min1, T, Min ) :-

   minimum1( T, Min1, Min ).</syntaxhighlight>

input_document( +File, ?XML )

reads File and parses the input into the "Document Value Model" XML. <syntaxhighlight lang="prolog">input_document( File, XML ) :-

   % Read InputFile as a list of chars
   see( File ),
   get_chars( Input ),
   seen,
   % Parse the Input chars into the term XML
   xml_parse( Input, XML ).</syntaxhighlight>

Load the XML Module. <syntaxhighlight lang="prolog">:- use_module( xml ).</syntaxhighlight> Load a small library of Puzzle Utilities. <syntaxhighlight lang="prolog">

- ensure_loaded( misc ).

</syntaxhighlight>

Download a 5Kb tar.gz format file containing this program with input and output data.