Manage Learn to apply best practices and optimize your operations.

Web Syndication and the RDF

Ed Tittel discusses the evolution of RDF from its conception and how it provides a well-documented way to represent metadata about Web resources.

The notion of Web content has continued expanding, ever since Tim Berners-Lee and the CERN crew began using the earliest versions of HTML in the early 1990s as a handy way to share research results in-house. The boundaries around what a document is, what kinds of information it can deliver, the kinds of behavior it manifests and the interactivity it supports or displays, have continued to expand over time as various forms of active content, dynamic behavior and metadata driven capabilities—most of the good stuff therein built using XML nowadays—have continued to appear online.

Web Syndication is a good example of how document behavior and boundaries are stretching as you read this. Using either the Really Simple Syndication (RSS) or Atom XML formats, it's possible to create a list of headlines and content abstracts in a readily readable form (that can also encapsulate binary data for software update delivery, among many other things, by the way) and then to make that data available for consumption by other programs. This notion of consumption is what explains why syndicated feeds is where the action is and explains why headlines, descriptions, abstracts and snippets form the foundation around which such feeds are based. Ultimately, syndication permits individual users or other Web sites to automatically read and/or publish links to new information items more or less as soon as they appear (or rather, as soon as they receive the feeds that go out at the same time they appear).

The most common XML syndication languages in use today are RSS and Atom, but both of these applications rest on work undertaken for the Resource Description Framework, or RDF, developed to describe resources available on the Web. IT provides a model for the data related to such resources and a formal syntax so that independent producers and consumers of such descriptions can readily exchange and use this type of information. That said, RDF is truly a form of metadata (data designed specifically and primarily to describe other data) in that it is built for software programs to use rather than humans to read. It explains why you don't often run into RDF descriptions in Web pages, nor find any readers for such information outside the development environments where such descriptions are crafted, implemented, tested and maintained.

Although RDF has been around from some time (since the mid-1990s) it only became a W3C Recommendation in February of 2004. In fact, RDF is the subject of numerous W3C recommendation documents, as a quick look at the W3C RDF home page will illustrate. RDF is also part of the W3C's Semantic Web activity, which is designed to provide "a common framework that allows data to be shared and reused across application, enterprise and community boundaries." This quote comes directly from the W3C Semantic Web home page. That same page goes on to describe RDF as follows "RDF is used to represent information and to exchange knowledge in the Web."

All this hoopla notwithstanding, RDF actually has a fairly simple mission: it provides a well-documented way to represent metadata about Web resources. Such metadata includes title, author and modification date for a Web page; copyright or licensing information for Web documents; availability schedules for shared resources; and categorization of resources by type, content or currency. The syntax model for RDF uses the formal languages notion that subjects have characteristics called predicates which take values called objects. Thus if a specific Web page (the subject) has a creator (the predicate) named Cathy Johnson (the object), the actual syntax boils down to using uniform resource identifiers for the subject and the object, along with a labeled link from subject to object that identifies the type of predicate relationship between the two. The fundamental construct is a graph that links these two types of entities together with predicate labels and can be extended to permit a single subject to take an arbitrary number of objects, each with its own predicate and label.

If you keep this simple, but powerful approach to identifying and labeling subjects and objects in mind when you dig deeper into the RSS and Atom Web syndication languages, much of what these tools seek to accomplish becomes clear. That's what we'll dig into as we pursue this topic further in our next series of XML Tips.

About the author

Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT Certification and information security topics. Among his many XML projects are XML For Dummies, 4th edition, (Wylie, 2005) and the Shaum's Easy Outline of XML (McGraw-Hill, 2004). E-mail Ed at with comments, questions or suggested topics or tools for review.

Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.