Wiki Home

Practical Xml Part 2


Namespace: WIN_COM_API
Transcript of a lecture hosted by Erik Moore on 2000.09.10
[Irrelevant material has been trimmed.}

[20:53] *** Topic is 'Practical XML Part II with Erik Moore (9 PM EST, 6PM PST, 0100 GMT)'

[21:00] {ErikMoore} Hello, all. Am i late?

[21:00] {Evan Delay} Just on time. Let me do an intro.

[21:00] {ErikMoore} Shoot

[21:00] {Evan Delay} Erik is a Microsoft MVP, and is a Sysop on the Universal Thread in the Visual FoxPro forum. He is the author of several public domain tools, including eView , a functional replacement and enhancement for the dysfunctional VFP View Designer.

[21:01] {Evan Delay} Tonight is part 2 of his "Practical XML" lecture.

[21:01] {Evan Delay} Take it away.

[21:01] {ErikMoore} Thanks. We have a lot to cover. This lecture will outline how to use the microsoft XML parser, Ms Xml.

[21:01] {ErikMoore} Several tools out there let you use XML without really getting your hands dirty. Well, tonight we're going to get our hands dirty parsing XML. The Ms Xml parser can be used to both construct and parse XML, but in most cases, you wouldn't want to use Ms Xml to create and xml document from scratch because it would be much faster to just build the document in VFP.

[21:02] {ErikMoore} Parsing XML is the key to getting information out if it. To use XML in your applications, you have to know how to find that information in the XML document. Ms Xml makes it pretty easy.

[21:02] {ErikMoore} There are generally two types of XML parsers in the world, DOM, and SAX. DOM stands for Document Object Model. DOM parsers load the entire document at once, and build sort of an internal document tree that the developer can explore like an object hierarchy. IOW, the entire document is loaded into memory all at once.

[21:03] {ErikMoore} The parser builds a hierarchically arranged collection of nodes, each representing a beginning and ending tag-pair from the document. Each node is reprented as an object with properties and methods to let you discover its characteristics and query its children.

[21:04] {ErikMoore} SAX stands for Simple API for XML. SAX parsers parse an XML from top to bottom, and allow the developer to hook code into events that fire as each node is parsed. SAX is generally better suited for extremely large XML documents where it would be impractical or too slow to load the whole thing at once.

[21:06] * SimonGaudiuso Laughs - Out - Loud!!

[21:06] {ErikMoore} The new MS parser, Ms Xml 3 has SAX interfaces that you can use from C++ or VB, but since VFP does not support interface implementation, we cannot use the SAX capabilities of the new MS parser until VFP 7. Ms Xml is a DOM parser, and the stuff we'll do tonight uses the DOM.

[21:07] {ErikMoore} I hope that most of you either attended last Wednesday's lecture, or read the log. If you did, you know that there are a couple of broken versions of the Ms Xml parser out there, and there's a good chance that you've got one. The bugs in these versions of the parser make them almost unusable from VFP, so it's important that you upgrade (or downgrade) to a version that works. If you have the broken version, you can still follow along tonight, but some of the samples might causes errors for you.

[21:08] {ErikMoore} Ok, let's get started with the basics. I'm going to post lines of code, and you can cut and paste them to your command window and run them as we go along, or just watch, and take my word for what is happening. :-)

[21:09] {ErikMoore} Create the xml parser like this:

    oXML = CREATEOBJECT("Microsoft.XMLDOM") 

[21:09] {ErikMoore} After creating the parser there's a couple of things that you will do almost every time you use it. First, by default the loading and parsing methods of the parser are asynchronous - they return control to the calling routine before they are actually finished loading. So, in almost all cases, the first thing you want to do after creating the parser is to:

    oXML.Async = .F. 

[21:10] {ErikMoore} This causes the loading and parsing methods to finish before returning control to you.

[21:10] {ErikMoore} The alternative is to hook to events fired be the parser when it finishes but this is much more complicated, and really would only be useful for applications that use XML along side a UI, and processed fairly large documents. Now we're ready to load an XML document. To do this you call the parser's Load method, and pass it a file name or URL that points to an XML document.

    oXML.Load("c:\myfile.xml") 

[21:12] {ErikMoore} There is also a LoadXML method that takes a string that represents an XML document, so the following line would be equivalent to the above:

    oXML.LoadXML(FILETOSTR("c:\myfile.xml"))

[21:13] {ErikMoore} For my first parsing sample, I'm going to use the NASA newsreel link that I showed you Wednesday:

    oXML.Load("http://liftoff.msfc.nasa.gov/Content.xml")
(See Note #1 At End)

[21:14] {ErikMoore} By the time that this method returns, Ms Xml has loaded and processed the entire document, and is now ready to let you access it. The document object, and all node object have an xml property that hold the xml within. This property is useful for debugging, or parsing from the command window to see that your XML has been loaded, or to make sure you've got a reference to the right node. All well-formed XML documents have a couple of rules that they must follow (I hope you know these from your suggested reading. :-)) An important one comes into play here: every XML document must have exactly one root node - that is a top-level node that has no duplicates and which serves as a parent to all other nodes.

[21:15] {ErikMoore} Knowing this, the next thing we do when parsing any XML document is get a reference to that root node:

    oRoot = oXML.DocumentElement

[21:16] {ErikMoore} To get the tag name of the root, check it's Nodename property:

    ?oRoot.NodeName

[21:16] {ErikMoore} To look at all the XML under the root, look at the XML property. To look at the text between the tags, use the text property. Every node has a set of properties and methods that we use to manipulate it. The most important for parsing are:

  • NodeName: the name of the node
  • ChildNodes: a collection of all this node's children.
  • ParentNode: a reference to the node that contains this node
  • FirstChild: a reference to the first child node
  • LastChild: a reference to the last child node
  • SelectSingleNode: returns a reference to a node specified by criteria in the parameter
  • SelectNodes: returns a collection of nodes that fit the criteria in the parameter

    [21:17] {ErikMoore} There are many more PEMS, but most of them are useful for building and manipulating XML, and are beyond the scope of this lecture.

    [21:18] {ErikMoore} Go ahead and navigate another instance of your browser to Content.xml so you can see the XML we will be manipulating. Try #1 Here is a local Wiki copy of the file Try #2 Here is a local Wiki copy of the file Try #3 Here is a local Wiki copy of the file

    [21:18] {ErikMoore} Everybody there?

    [21:19] {NadyaNosonovsky} Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh

    [21:19] {ErikMoore} Aww, shoot. They posted an invalid document!!!

    [21:19] {ErikMoore} It looks like they did not escape the ampersand...

    [21:25] {ErikMoore} Ok, I saved an old copy of that same document to one of my servers... NASAContent.xml. Does that work? no that link gives no XML document.

    [21:25] {Evan Delay} Appears to.

    [21:27] {Evan Delay} I got some data. One cool thing. Browse oRoot in the debugger.

    [21:28] {ErikMoore} Ok, do this: Click Content.xml and view source, find the "&" character, change it to "and" and save it on your disk. Then open that XML file from your disk

    [21:30] {ErikMoore} Let me know when you are caught up. Just my luck, NASA posts an invalid version of their file on my lecture day!

    [21:33] {ErikMoore} Looking at this document you can see that it fairly simple in construction:

    [21:33] {ErikMoore} It consists of a root node named "newsreel."

    [21:33] {ErikMoore} The root contains only children nodes named "channel,"

    [21:33] {ErikMoore} Channel nodes have attributes named "id", "name" and "URL."

    [21:33] {ErikMoore} Channel nodes have children nodes named "item."

    [21:34] {ErikMoore} Item nodes can contain several text-only (no attributes) nodes named "title", "Date", "href" and "intro."

    [21:34] {ErikMoore} If you haven't already, in the VFP command window, start the parser and navigate to your copy of the document with the Load method:

        oXML.CREATEOBJECT("Microsoft.XMLDOM")
        oXML.Async = .F.
        oXML.Load("c:\NASA.xml")
        oRoot = oXML.DocumentElement
        ?oRoot.xml	&& To see the XML of the Root node 

    [21:36] {ErikMoore} See it?

    [21:36] {MarkusVoellmy} Got newsfeed :)

    [21:36] {ErikMoore} So let's say we want to loop through all of the Channel objects, and count how many children each one has. Since the ChildNodes property of oRoot is a collection, we can iterate through it with FOR EACH.

        FOR EACH oChannel IN oRoot.ChildNodes 
            ?oChannel.ChildNodes.Length
        ENDFOR

    [21:37] {ErikMoore} Oh yeah, all collections on Ms Xml have a length property that is a count of how many items are in the collection.

        FOR EACH oChannel IN oRoot.ChildNodes
            FOR EACH oItem IN oChannel.ChildNodes
                FOR EACH oChild IN oItem.ChildNodes
                    ?oChild.NodeName, oChild.Text
                ENDFOR
            ENDFOR
        ENDFOR 

    [21:41] {ErikMoore} The ChildNodes collection lets you iterate through every node, but most of the time you will only be interested in certain nodes in the document, and you need to narrow your collections down, or need a direct way to get a reference to that node. Doing this requires that you use a pattern matching syntax called XSL Pattern matching. The syntax to this is probably unlike anything you you've ever used before but it's not that complicated, and one easy way to learn it is by example.

    [21:43] {ErikMoore} There's a document in MSDN that lists a bunch of pattern samples and explains what they do: XSL Pattern Examples. Additionally, there are several introductory articles on the MS XML website that explain a lot more. You can start with this one: Authoring Match Patterns.

    [21:43] {ErikMoore} The methods that we use to get directly to a node or set of nodes that we're looking for are SelectNode() and SelectSingleNode().

    [21:44] {ErikMoore} Both of these methods take a pattern string as a paramter. The only difference is that SelectSingleNode returns a reference to the first node it finds that meets the criteria, and SelectNodes returns a collection of all nodes that meet the criteria. Say I want to get the collection of all "item" nodes in the document:

        oItems = oRoot.SelectNodes("//item") 

    [21:45] {ErikMoore} or, I want to get a reference to the "channel" node in our document that has an an attribute named "id" with the value of "Science":

        oChannel = oRoot.SelectSingleNode("channel[@id='Science']")

    [21:46] {ErikMoore} All of XSL pattern matching is too complicated to go through in the lecture, you can work from the samples in the MS document, or read the other articles at MS. Notice that first example using SelectNodes gets you the same thing as the earlier nested FOR EACHes but you didn't have to traverse all of the parents. Note that all XSL 'queries' are relative to the node that they are executed on.

    [21:49] {ErikMoore} The above two examples executed the query on the Root node, but if we already have a reference to a channel node, executing that node's SelectSingleNode method runs the query in the context of that node, and does not search its siblings or parent nodes. Once you have a reference to a node, you can get a list of its attributes:

        FOR EACH oAttribute IN oChannel.Attributes
            ?oAttribute.NodeName, oAttribute.Text
        ENDFOR

    [21:50] {ErikMoore} or, you can check the value of any of its attributes by name:

    ?oChannel.Attributes.GetNamedItem("id").text

    [21:51] {ErikMoore} It's cool to take the XSL pattern samples, translate them to the document you're working with, and watch how they work. You'll quickly see that conquering any XML document is pretty easy. What's more, learning XSL patterns also gets you a long way toward being able to author your own XSL stylesheets to transform XML into HTML or anything else. We'll cover that in part 3.

    [21:51] {ErikMoore} The XML menu sample I showed last Wednesday demonstrates some simple pattern matching, as well as the some of the other concepts I've explained here. The XML part of the menu class is really pretty simple - the hard part, IMO was authoring the DHTML that the class generates that actually gets the menu to work. :-)

    [21:53] {ErikMoore} Let's open up the XMLMenu class from the download last Wednesday. If you don't have it, you can get it from here: (maybe :-)) XmlMenu.prg.

    [21:53] {ErikMoore} If your browser renders the prg instead of offering a "Save As" dialog, just right click, select "view source", and save to a file. Anyway, looking at the XMLMenu class, you'll see that it has one main method that receives an XML string. It returns DHTML that defines a dropdown menu. But I just wanted to step through the code and show how it's parsing the menu definition file. After the BuildMenu method receives the XML, it creates the parser, sets Async to .F., and loads the XML with LoadXML.

    [22:00] {ErikMoore} You'll notice that it checks the return value of the LoadXML method. If this method returns .F., there was a problem with the XML. The parser returns detailed error information by way of the ParserError object. This object has several properties that describe the nature and cause of the error. Those are demonstrated in the ProcessXMLError method. The method then gets a reference to the collection of all "menuheader" items.

    WITH oRoot.SelectNodes("menuheader")

    [22:03] {ErikMoore} It iterates through each "menuheader" (which is equivalent to a pad in VFP). Since each menuheader node has a subnode named "padname", I grab the name of the pad with:

    lcPadName = oMenuHeader.SelectSingleNode("padname").Text

    [22:05] {ErikMoore} I do the same with "caption" and "url."

    [22:05] {ErikMoore} You don't need to worry about the string building stuff. The method then loops through the same collection again to perform some other stuff. Then for each pad, it iterates through each "item" under the "items" node.

    oItemList = oMenuHeader.SelectNodes("menuitems/menuitems")

    [22:07] {ErikMoore} It then grabs the value of each subnode: "caption," "url," and "description." And that's it. I defined a custom XML format, and completely parsed it to build what I needed to build. Are there questions on what happened?

    [22:09] {MarkusVoellmy} Nope.

    [22:09] {ErikMoore} So everybody now knows exactly how to parse any XML document with Ms Xml?

    [22:10] {Evan Delay} The code in menu.htm appears to be poorly supported in Netscape.

    [22:10] {ErikMoore} I doubt if I was _that_ clear... :-)

    [22:10] {MikeHelland} Yes, oXML = CreateObject('wwXML')

    [22:10] {MikeHelland} Right?

    [22:10] {ErikMoore} ED, yeah, I don't know anybody who has gotten DHTML menus to work in Net Scape.

    [22:11] {Evan Delay} Okay, so no sweat then.

    [22:11] {ErikMoore} MH, are you looking for trouble?

    [22:11] {MikeHelland} Nah, sorry.

    [22:11] {Evan Delay} A trout will cure that.

    [22:12] {ErikMoore} C'mon, I know that not all of you grasped every bit of that. Either I lost everybody, or that was an extremely effective session.

    [22:12] {Evan Delay} Erik I will admit to being overwhelmed. But the ideas seem simple. Just gotta sit down with the code for a while.

    [22:12] {CindyWinegarden} I need time to take it in and see where it fits into what I do.

    [22:12] {MarkusVoellmy} Hm the stuff is quite clear to me ... But I wouldn't say I can parse every document now :)

    [22:13] {Evan Delay} Erik, I skimmed in this morning only. My fault.

    [22:13] {ErikMoore} Once you see how easy it is, you'll see uses for it everywhere.

    [22:13] {ErikMoore} XML has already solved like a dozen distinctly different problems for me. The XMLMenu class was one. That's production code. I described a couple of others Wednesday.

    [22:15] {MarkusVoellmy} Erik: I think I need some further reading about the object model, but you're right, basicly it's real simple.

    [22:15] {ErikMoore} MSDN has excellent docs on Ms Xml. Also, the site www.vbxml.com has some great code samples, but you'll have to translate from VB.

    [22:15] {MikeHelland} What non-MS parsers are available?

    [22:16] {MarkusVoellmy} That we are used to :(

    [22:16] {ErikMoore} There's a Java parser from Data Channel, and IBM has one. Also, I think that Sun has published one. There is another COM one, but it's not free.

    [22:16] {MikeHelland} Cool, have you tried any of those? Who's the other COM one by?

    [22:17] {ErikMoore} I can't remember.

    [22:17] {ErikMoore} :-)

    [22:17] {ErikMoore} I haven't used any others. Haven't had a need to.

    [22:17] {MikeHelland} How do the SUN and IBM's work, do you know?

    [22:17] {ErikMoore} The MS parser is extremely fast, and supports a lot of standards.

    [22:17] {MikeHelland} They probably aren't COM then.

    [22:17] {ErikMoore} No, they aren't COM

    [22:18] {ErikMoore} Quiz- when would you want to use a SAX parser?

    [22:18] {MarkusVoellmy} When I have VFP 7?

    [22:19] {ErikMoore} lol.

    [22:19] {DavidStevenson} MH: yep

    [22:19] {MikeHelland} Alto - SAX or Tenor - SAX?

    [22:19] {ErikMoore} Or when you have a VERY large document.

    [22:23] {Evan Delay} Erik, do you have a part 3 date/time in mind?

    [22:23] {ErikMoore} How about next Sunday?

    [22:23] {Evan Delay} Fine with me.

    [22:34] {DenisChasse} Is it possible to download the HTML help compiler from MS site? if yes does anyone know where?

    [22:35] {ErikMoore} DC, you can but I don't know where exactly.

    [22:39] {Evan Delay} Just a reminder, Craig's lecture is this Wednesday.

    [22:39] {CindyWinegarden} Thanks Erik for a great session!

    [22:39] {ErikMoore} g'night

    [22:39] {Evan Delay} Thanks Erik

    Note #1: I tried following this example, (2000.9.16) and the NASA XML is STILL invalid: IE choked on this line: < title >La Niņa's Ghost< /title > .. Apparantly the ņ character isn't allowed in "official" XML... Are XML documents REALLY that easy to break? How could NASA get it wrong without more complaints? Are other parsers more resilient then MSXML? - wgcs


    Contributors: Erik Moore Evan Delay Cindy Winegarden
    Category XML Category Training Category Wednesday Night Lectures
  • ( Topic last updated: 2000.09.16 01:01:56 PM )