(June 20) jMM had an interesting point: XML is hierarchical. But our thinking is naturally relational. Are the structural principles different for hierarchical data compared to relational data? Answer: probably (I don't really know) but surely there might be some Hierarchical Data Design Principles we can use for guidance?-- Steven Black
I'm struggling a bit with the topic of using separate elements or creating attributes of other elements for some types of fields. One recommendation I've seen is that attributes are for "metadata" and elements are for data. But is this a good threshhold?
Looking at the example Xm L posted in Fox Forum XML Message, the following are used as attributes:
date format (US, European, ...)
time format (12, 24)bio type (URL, ...)
party type (sender, receiver, cc, bcc)location status (active, inactive)
But if you started with this as a data structure and showed it to a group of people, how much agreement would there be on which items are the metadata? I imagine most would agree on the first 3 above, but what about the last 2. Why not:
and similarly for status? It really seems to me that the decision point for choosing attributes over elements has to do with:
default values; and
enumerated (valid) choices
for an item.
I'm also wondering how this whole topic fits in with automated XML producing tools (maybe a separate topic XMLDataConversionTools) like the Rick Strahl wwXML class. Maybe there is no place for those tools in this discussion, but they are a big time saver, albeit treating every field as an element. -- Randy Pearson
wwXML is pretty flat, so it won't help much, although it's granular enough that you can create sub-objects. For example, you can create an object and then use createobjectxml to generate just the object's XML (no headers, no root tag etc) at a certain indentation level. You can then string these guys together for multiple hierarchical levels at least for output. Input can work the same, but you have to then use the parser to pre-specify a node to start parsing the object in from. Basically you tell which node is the top node then the children thereof is the child data. It won't be totally generic, but that probably won't be necessary here anyway as we'll be dealing with known data structures. --
Well, basically the rule is very simple. Attributes are meant to be used for meta-data and tags for real data. In other words, data about a party would be something like the name. Data about data ("meta-data") would be additional information about this data. The fact that the party-type is "sender" would be a good example for that. Therefore, it should be an attribute.
Now in addition to these considerations, there are performance issues (when it comes to parsing) or simple logistical issues that make good reasons to violate these rules. For example, in Web Builder, we use an element that has an infinite number of sub-items, with different (undefined) tag names (and that isn't good XML design to begin with) so we can very quickly iterate over them to get all the information we need. Then we needed to add some more information and our way of doing it wouldn't have worked anymore if we kept going this route since we couldn't have added another tag without losing a lot of our parsing performance. So we simply made this piece of information an attribute, even though it wasn't meta-data.
Attributes also have another advantage: You can define validation rules and other stuff in the DTD. In other words: You have a lot more control over attributes than over tags. For this reason, tools/technologies such as ADO are mainly using attributes and they are violating many XML guidelines doing so. But there is a good reason for that, I guess. -- MarkusEgger
I don't think the ADO thing is quite the right analogy. By using all attributes to describe the data using attributes is not any slower than using elements because you're accessing everything out of the same collection. What's slow is continued context switches and object accesses on different objects.
As Markus points out, attributes are fine for accessing individual data but they're horrible if you use them for searching. Kind of like a field that you can't apply an index to and have to scan for... -- Rick Strahl
Proposal: Maybe attributes are best for element Hot Spots. In other words, if an element is expected to vary in the future, use an attribute since attributes are naturally more extensible than is the definition of an element. Thoughts?-- Steven Black
Isn't this just the kind of basic database design issue we've faced for years? I think I'm just rephrasing Steve's comment - If an element is rather static, we create a separate table column in our database. If it's likely to change, it becomes one value in a multi-valued column.
No I was only meaning to compare single-valued situations. -- Randy Pearson
For instance, you may have an application that simply needs to know whether or not a person is of latin extraction (say a contributors list for a social service agency serving the hispanic community). This could be expressed with a simple logical/boolean column [latino]yes[/latino].
But this could also be an attribute at the higher entity level. [Contributor latino="no"]. This was really more the distinguishing nature of the examples above. -- Randy Pearson
However, you may have an application that needs more granularity, and the cultural/ethnic identification "latino" then becomes an attribute of an "ethnic" element. Steve Sawyer
In my initial testing with XML generation, and developing HTML with XML/XSL, I've found it helpful to think of an XML stream as a "view" rather than a data repository. As a general purpose view, many XSL stylesheets can use the same XML data, without the expense of opening database connections. Of course, this assumes that the data isn't too dynamic. -- Steve Lackey
Category XML, Category Fox Forum XML. See also Schemas For XML.
( Topic last updated: 2000.03.03 04:54:37 PM )