Human versus machine-readable XML
It's too easy to get caught up in that small portion of the vast body of XML data and documents that is designed primarily for human reading and consumption and to overlook the fact that the vast majority of XML documents and data seldom, if ever, appear anywhere near a computer display.
From my last tip, you already know that I'm working on revising that entirely mortal anti-classic of computer trade literature XML For Dummies into a 3rd edition. In trying to agree on some revisions to the discussion of XML Style in the book last week, I had an interesting revelation of sorts about the whole topic. Here goes: it's too easy to get caught up in that small portion of the vast body of XML data and documents that is designed primarily for human reading and consumption and to overlook the fact that the vast majority of XML documents and data seldom, if ever, appear anywhere near a computer display.
Everyone knows that XML is a godsend for developers who must build distributed applications, especially when producers and consumers of documents may use different platforms to send and receive such data. Because XML is inherently self-describing (and SOAP is a great example of this kind of technology) it creates a way to build applications that can readily exchange data and other information, so long as they can process the XML document description that describes a typical XML message payload.
Then, too, XML excels at plain old-fashioned data exchange between and among applications. The recipe for success in this case works like this:
- Build a document description that captures all important data elements, attributes, content models, value constraints, relationship metadata, and so forth. Let's call the resulting markup language a "canonical representation" of the data.
- For each application that needs to exchange data, design two transforms: (1) to the canonical form from the application form; and (2) to the application form from the canonical form.
- To enable any two or more applications that somehow relate to the canonical data exchange information, all that needs to happen is the sending application must transform data from its application form to the canonical XML form, and then the receiving application must transform data from the canonical XML form to its application form.
The beauty of this approach is that the phrase "Repeat as needed" works for an arbitrary number of applications. This by no means minimizes the hard work and thought required to define the XML that represents such data, and to implement two-way transforms for every application that wants to use the canonical XML form for its data. But it does provide a fantastic way for applications to exchange information.
In fact, I know of numerous multi-application systems where XML is more of a "virtual format" than a real format for data storage. In this environment, data is converted to XML only momentarily from a source application, then immediately converted to a different format suitable for a target application. I suspect a great many patched-together data exchanges occur this way in the real world.
Thus, for an ironic coda to my last XML tip, although it may be trickier to view native XML content on the Web through a browser than it should be nowadays, no one can say that XML isn't being heavily used in environments where XML data plays a key role, albeit away from the view of human eyes!
Have questions, comments, or feedback about this or other XML-related topics? Please e-mail me care of firstname.lastname@example.org; I'm always glad to hear from readers.
Ed Tittel is a principal at LANWrights, Inc., a wholly owned subsidiary of LeapIt.com. LANWrights offers training, writing, and consulting services on Internet, networking, and Web topics (including XML and XHTML), plus various IT certifications (Microsoft, Sun/Java, and Prosoft/CIW).