Manage Learn to apply best practices and optimize your operations.

SOA principles drive content practices

SOA has long focused on structured data, but the door is also open to incorporate unstructured document-based data.

The growth and visibility of SOA has brought the basic principles of service orientation to countless domains outside of application development. Truthfully, it's application development that borrowed the concepts of service orientation from other domains. For example, compliance to standards, design for reuse, loose coupling and authoritative sources of record are all ideas that have been alive and well in manufacturing operations for ages. But it's SOA, in the sense we know it, that has brought these concepts to the app dev mainstream. And now, everywhere you look, SOA principles are put to practice.

A great example of this is the world of content authoring and publishing.

SOA has changed the way we think about and develop applications. Rather than creating monolithic applications that are built for a specific purpose, SOA focuses on the creation of standards-based, reusable components that can be combined and recombined to create many new applications. The components are fine grained and built with reuse in mind.

These concepts are now widely adopted within authoring and publishing circles—folks who write product documentation, technical manuals and maintenance procedures, for example. Documents used to be written as monolithic artifacts with no thought to content reuse. A document's beginning, middle, end, pages, chapters, and other elements rarely worked as independent, reusable chunks of content. As a result, writers duplicated effort and the information organizations published was often inconsistent and out-of-date.

Writing for reuse
Increasingly, organizations are writing for reuse. They're creating fine-grained, topic-oriented components that are written—not with the end document in mind—but with any eye toward how they'll be reused over time. The working assumption is that these content components will be combined and composed many times over, into many different types of documents, deliverables and consuming applications.

Moreover, this form of "structured authoring" makes content canonical and authoritative—in a sense, similar to master data management, but for content. Content becomes centrally managed and controlled as a set of components, which are trusted and authoritative as the sources of the truth.

These components are then included by reference within documents, which is a practice often called "transclusion." Transclusion does away with reuse via copy-and-paste, an insidious practice that results in a spiraling loss of control, maintenance issues and the uncomfortable need to accept the lesser of two evils: (1) the reality of compounding maintenance burdens; or, (2) learning to accept inconsistent and out-of-date information.

Imagine making manual edits to a disconnected array of documents and deliverables, in different formats and languages, all of which reuse the same language. Instead, content components are referenced as a pointer to the centrally managed content chunk. At change time, revisions to individual components are made centrally and propagated to all consuming documents and deliverables—very SOA-like, indeed.

Uniting data and documents
As SOA principles continue to impact content practices, organizations are taking a closer look at the role of unstructured information assets in the SOA-based applications themselves. While most organizations may have adopted this structured authoring approach to streamline content publishing, some are recognizing an unintended benefit with relevance well outside of tech pubs. Suddenly, content looks a whole lot like data and it can be easily consumed by information-hungry applications that once struggled to make use of documents.

Most would agree that documents and document-centric processes have been conspicuously absent from the SOA agenda. In part, this is because structured data often represents the most critical assets of a business—the data driving the high-volume, high-value transactional processes that run a business. But it's also because structured data is well formed and well defined. Documents and other unstructured data are just harder to access and control in a granular way. XML and component-based content is changing that, providing rich definition and structure for content that used to be reserved only for the data sitting between columns and rows in a database.

Uniting data and documents gives organizations a more accurate picture of how they actually operate. Business is done at the intersection of data and documents, where the facts of structured data — financial information, inventory, etc. — meet the context of documents — manuals, policies, reports, analysis, etc. Many organizations now see SOA as the bridge to the longstanding data/document divide.

Moving beyond tech docs
Taking full advantage of the data/document union depends on expanding the reach of structured authoring within the organization. Traditionally, structured authoring and publishing solutions has been the domain of technical documentation teams. But going forward, organizations that extend the use of structured authoring to include the rest of the enterprise—engineering, marketing, customer support, etc.—will see yesterday's monolithic and trapped content transformed into highly accessible knowledge that flows unabated across applications and end users.

About the author
Jake Sorofman is senior vice president of marketing and business development North America and EMEA for JustSystems, the largest ISV in Japan and a vendor of XML and information management technologies. Learn more about JustSystems, and contact Jake at


Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.