Extensible Markup Language (XML)

CS4254 - Prof. Abrams

Note:  You should view this document in Internet Explorer 5.0.
The document contains examples of XML and XSL, and Internet Explorer is the first browser to support these technologies.

Note:  There appears to be a bug in IE5 (at least version 5.00.2614.3500). If you display the html files linked to by this document in IE5, you will get a scripting error.  The error suggests that the xml document referenced by the html document doesn't exist.  You can get around the problem by saving the html and any xml files that it references to your local file system, then opening the local copy of the file.

(The ActiveX component microsoft.xmldom's load method, when given a URL, returns immediately, before the URL is actually transferred to the browser.  You should be able to set the ActiveX object's aync variable to false so that the load blocks, but this does not appear to work correctly.  Click here for an html file that does work properly, but contains a workaround (displaying alert boxes to slow down the rendering until the URL arrives)  Use menu item "Source" under the "View" menu to see the html and JavaScript.

Contents

1. Some Benefits of XML
2. XML Is a Meta-Language
Exercise 1:  What does a Web Browser do with an XML Doc?
Exercise 2:  What If an XML Doc is Missing an End Tag?
Exercise 3:  How Do We Create an HTML Doc from the XML Doc?
Exercise 4:  What if You Modify XML Doc, But Insert a Typo?
Exercise 5:  Add Style to the HTML Doc
Exercise 6: Using JavaScript Loops

References


1. Some Benefits of XML

Data files become self-describing

XML permits standard tools - like parsers and structured text editors

The application author no longer needs to write a customer parser.

XML can be represented by a tree - making traversal easy

This idea is embodied in the Document Object Model (DOM).

DOM returns 26630 when asked "Give me the second PRICE in AUTOS"

XML can be used for any data interchange!

XML may become the language for all data interchange among applications on computers in the Internet - whether for web pages, data base queries, remote procedure calls, or whatever.

You can automatically populate a document template with XML

Here's an example of applying this html template to the above xml document:
(Note:  if you click on "this" in the last sentence and get an error message, try saving the last two links to local files in the same directory, then opening the local copy of the html file.)

Here are 10 more points describing XML


2. XML is a Meta-Language


Exercise 1:  What does a Web Browser do with an XML Doc?

  1. Start Internet Explorer 5 (IE5).
  2. Click on this link to view the xml doc above.
  3. Click on the + and - symbols in the IE5 display to see what happens.
  4. Note that IE5 nicely formats the xml tags for you.

Exercise 2:  What If an XML Doc is Missing an End Tag?


Exercise 3:  How Do We Create an HTML Doc from the XML Doc?

This is done by creating an HTML file with missing parts (to be filled in by XML), then do one of two things...
    1. Insert scripting into HTML file that populates missing parts.
    2. Create an XSL (Extensible Style Language) file, which transforms XML to HTML (discussed here).


    Here is part of an html file for our xml example:
     

      <HTML>

      <HEAD>
      <TITLE>Automobile Buyer's Guide</TITLE>
      </HEAD>

      <BODY>

      <TABLE width=100% BORDER=1>
      <TR>
      <TD>Automobile Buyer's Guide for </TD>
      <TD>Model: </TD>
      </TR>

      <TR>
      <TD>Body style</TD>
      <TD>-door</TD>
      </TR>

      <TR>
      <TD>Engine displacement</TD>
      <TD> liters</TD>
      </TR>

      </TABLE>

      </BODY>
      </HTML>


    Click here to view this file in IE5.0.

    Now we need a way to populate it.  So we do this:
     

      <HTML>
      <HEAD>
      <TITLE>Automobile Buyer's Guide</TITLE>
      <SCRIPT>
      function init() {
        xml = new ActiveXObject("Microsoft.XMLDOM");
        xml.load("Autos.xml");

        //Some stuff to fill in the missing HTML parts

      }
      </SCRIPT>

      </HEAD>

      <BODY onload="init()">
      ...
      </BODY>
      </HTML>

    But how do we insert "variables" into the BODY section so we can refer to them in the script?  Here's how:
     
      <TR>
      <TD>Automobile Buyer's Guide for <SPAN ID=Year></TD>
      <TD>Model: <SPAN ID=Name></TD>
      </TR>

      <TR>
      <TD>Body style</TD>
      <TD><SPAN ID=Doors></SPAN>-door</TD>
      </TR>

      <TR>
      <TD>Engine displacement</TD>
      <TD><SPAN ID=Displacement></SPAN> liters</TD>
      </TR>


    So the <SPAN> with an empty body is a placeholder, and the ID declares it a name that can be used in the script.

    We then make the script's init() function this...
     

      function init() {
        xml = new ActiveXObject("Microsoft.XMLDOM");
        xml.load("Autos.xml");

        // Get the AUTOS object
        autos = xml.documentElement;
        Year.innerHTML = autos.getAttribute("year");

        // Get the AUTO object
        auto  = xml.getElementsByTagName("AUTO").item(0);
        Name.innerHTML = auto.getAttribute("name");

        // Get the number of doors
        Doors.innerHTML = xml.getElementsByTagName("DOORS").item(0).text;

        // Get engine info
        Engine = xml.getElementsByTagName("ENGINE").item(0);
        Displacement.innerHTML = Engine.getAttribute("displacement") + " ";
      }

    Some notes:

    Result...

      The result is this html file.  (Use the View->Source menu item to see the full html file!)

      Hint:  If you want to create xml and html files like this on your server, be sure that you server's configuration files return a mime type of "text/xml" for files ending with the .xml extension.  Otherwise when IE5 interprets your html file, "xml.load("Autos.xml");" will either cause an access violation or return an illegal object.


Exercise 4:  What if You Modify XML Doc, But Insert a Typo?

Suppose that you have a bad memory for tag names, and try to create the xml file from memory, and you use <CARS> instead of <AUTOS>.

Will an XML parser like that in IE5 complain?  (Try it!)
To catch illegal tag names, attribute names, or illegal data type of bodies inside tags, you need a grammar.
 

How Grammars are Specified in XML

The grammar of our XML example can be specified something like this:


In XML the grammar is represented by a Document Type Definition (DTD):

<!ELEMENT AUTOS (AUTO)>
<!ATTLIST AUTOS
          year CDATA #REQUIRED>
<!ELEMENT AUTO (DOORS, ENGINE, PERFORMANCE, PRICE)>
<!ATTLIST AUTO
          name CDATA #REQUIRED>
<!ELEMENT DOORS EMPTY>
<!ELEMENT ENGINE EMPTY>
<!ATTLIST ENGINE
          displacement CDATA #REQUIRED
          horsepower CDATA #REQUIRED>
<!ELEMENT PERFORMANCE (ZEROTO60, QUARTERMILE)>
<!ELEMENT ZEROTO60 EMPTY>
<!ELEMENT QUARTERMILE EMPTY>
<!ATTLIST QUARTERMILE
          second CDATA #REQUIRED
          mpg CDATA #REQUIRED>
<!ELEMENT PRICE EMPTY>

[Above DTD is in this file...]

The final step is to add a line to the XML document naming the DTD as the second line:

<?xml version="1.0"?>
<!DOCTYPE BuyerGuide SYSTEM "Autos.dtd">

Notes on the example DTD:


Exercise 5:  Add Style to the HTML Doc

  1. Modify the html doc to link to a CSS style sheet that dresses up the doc.
  2. Add a few more rows to the table with more info from the XML doc (0-60 time, etc.)

Exercise 6: Using JavaScript Loops

Suppose we want out HTML document to list the names of all automobiles from the XML file. The suffix .length applied to one of the methods of Microsoft.XMLDOM is useful to create a JavaScript loop to process all tags with some tag name. Here is an example of how to list the names of all the automobiles in an XML file:

<HTML>
  <HEAD>
  <TITLE>Automobile Buyer's Guide</TITLE>
  </HEAD>
  <BODY>
    <p>Here is a list of all autos:
  <SCRIPT>
    xml = new ActiveXObject("Microsoft.XMLDOM");
    xml.load("Autos.xml");
   auto_array = xml.getElementsByTagName("AUTO");
   i = 0;
    while (i < auto_array.length) {
    document.write("<p>Auto #" + i + " is " + auto_array.item(i).getAttribute("name"));
    i++;
    }
  </SCRIPT>
  </BODY>
</HTML>

Click here to download this HTML file. Note in the HTML above that JavaScript directly writes the HTML markup. This is because the number of lines of output is now variable, and depends on the XML file. Therefore the JavaScript cannot easily use the .innerHTML notation from Exercise 5 to insert the body of a <SPAN> element.


References