SOA Data Integration Tutorial

This tutorial will focus on information that shows you how to think about data integration inside a service-oriented architecture. It will cover data services, data governance, XML, REST, data mashups, business intelligence (BI), XQuery, Service Data Objects (SDO) and ADO.NET.

In the early days of Web services, most of the attention centered on application integration and workflow. Standards like SOAP, WSDL, UDDI and BPEL were created and the enterprise service bus (ESB) category emerged as a new technology capable of performing these types of integrations.

Yet data got lost in the shuffle. Now architects are finding themselves dealing with performance issues caused by an inability to access data and to move it around in the same agile way they are handling their application logic. This tutorial will focus on information that shows you how to think about data integration inside a service-oriented architecture. It will cover data services, data governance, XML, REST, data mashups, business intelligence (BI), XQuery, Service Data Objects (SDO) and ADO.NET.

   Data Services
   Enterprise data management
   Data Standards
   More Learning Guides

 Data Services

Fundamental to SOA-style data integration is the concept of data services. This involves representing data, often with XML, and then creating a way for the application/Web service to access the represented data in an on-demand fashion.

Data integration basics
If you want to avoid data integration headaches, you have to be consistent, meaning that every developer has to do things in a similar way, rather than defining their own set of data representations. To do this requires a process that is:

A) Very easy for developers to use


B) Very quickly gives developers consistent representations

A simple example: if developer A and developer B both need to share information about user accounts, and one uses a variable length text field and one uses a fixed-length numeric field, you have created needless inconsistency, through the lack of a clearly defined process.

Taking this one step further, if your process has future flexibility baked in, making it possible for some future program to process today's N-digit account codes as well as tomorrow's N+M-digit account codes, you can carry that flexibility along into the specific implementation.

In fact, there is an entire  data integration lifecycle of which users need to be aware:

  • Access
  • Discovery
  • Cleansing
  • Integration
  • Delivery
  • Development
  • Auditing

Given its explicit, formal, and highly readable structure and syntax, XML has become the technology of choice for represented data when it must pass from some producer of information to some corresponding consumer of information. XML has become the foundation for data interchange of all kinds nowadays from accounting and authentication to zoology and taxonomies of all kinds.

SOA changes the requirements
For too many years, data integration initiatives, undertaken without the foundation of a data services layer, have resulted in a further proliferation of the siloed systems they were meant to integrate. For instance, a retailer might have deployed an extraction, transformation, and loading (ETL) tool to synchronize point-of-sale data from retail outlets into an SAP financials application. A second instance of the tool might serve to move SAP financials information into a DB2 data warehouse for analysis. And a third instance might work on the front end of the value chain to feed product procurement data to an operational data store. SOA is intended to remove barriers of siloed development, but you have to proceed with general objectives in mind in order to gain such benefits.

So say Ivan Chong and Ashutosh Kalkarni of IBM: In a modular SOA, a data integration platform serves as another component-based service. Its functionality can be packaged and reused across multiple projects to reduce development and deployment costs. It can help your organization leverage data assets currently locked in mainframe, packaged, and homegrown systems through open standards.

If you look for pre-built connectivity and visual mapping environments, you will find they can provide IT architects and developers with a mechanism to tap into information from a variety of sources, including packaged and homegrown applications such as SAP, mainframe and midrange systems such as IMS and VSAM, relational databases such as Oracle and Sybase, and unstructured and semi-structured data.

A common data model
The best among experts hold that it is essential to pursue a common data model in order to succeed with services. According to Bradley F. Shimmin, principal analyst at Current Analysis LLC, successful SOA data integration definitely requires a clearly conceived level of abstraction for creating reusable services for the disparate points of data an application requires. He points out that there has been a gap between the data services layer and the transport layer. For more, see's analysis of uniform methods of representing and handling data, and the data services platform (DSP) concept.

Shimmin and others advise you to look for useful existent data models for guidance. Examples of the common data model come from the telecommunications industry because it has a standard, the Shared Information/Data (SID) specification, writes Philip Howard, director of research - technology, Bloor Research. SID is an industry specific version of the common data model.

For further clues to understanding this trend, read Howard's analysis of  "The Importance of a Common Data Model" on It looks at "a data model that spans an enterprise's applications and data sources. In other words it defines all the data relationships and meanings that exist within an organization."

Vendors have been building data services platforms for a few years now. They are even plugging those data services capabilities into their ESBs, a good thing according to Forrester Research Inc.'s Larry Fulton.

Yet, helpful as all this sounds, Burton Group Inc. Research Director Anne Thomas Manes reminds architects to understand that DSPs are not a silver bullet and in fact require SOA developers to work in order to make data services work.

Handy as the DSP may be, it does not automate the creation of data services; nor should a DSP be thought of as a replacement for a data warehouse or traditional data integration system. Realize that it is a complement to them.

Return to Table of Contents

 Enterprise data management

A new way of handling data creates new concerns and opportunities at the enterprise level. First, something needs to be done to insure that the data services platform doesn't give rise to more problems than it solves. If it is done properly, then a SOA initiative can become a springboard for a new wave of targeted, real-time BI initiatives. It can also allow developers to create data mashups (composite applications which give knowledge workers new views of formerly disjointed data sets).

Data governance
In addition to using the right tools and applying the right business rules, it's also important for organizations to recognize that XML-based descriptions work as well as they are designed. This means that talented, capable data architects and data governance professionals must be involved in creating the right kind of representations, as well as the processes whereby data is captured, validated, maintained and distributed. You must define a formal data service approach to handling data within the SOA framework, and that often means outlining a data governance policy.

Data governance seeks to define the policies needed to make sure enterprise data is in a consistent and usable form when it is consumed by SOA applications.

You do well to look at SOA as a key enabler of data governance in large multi-systems enterprises with multiple data bases. These are best considered in the context of Service-Level Agreements established within the organization.

Data services mashups
As Web 2.0 data services mashups move into the enterprise, major vendors including IBM, Progress Software Corp. and BEA, which is now part of Oracle Corp., are offering mashup products for IT professionals, rather than consumers or business users. They join smaller vendors such as WSO2 Inc. and XAware Inc. in this space.

If you look at popular mashup servers like those from Yahoo! and Google, you discover that they fall into the consumer category and offer a palette of pre-built components for consumer applications. Realize that, in the enterprise, the issues are much more complex involving access to a wide range of sources from data warehouses to departmental spreadsheets. Architects must sort through and apply policies before those data sources can be added to a palette that a business user can access through a portal.

What is a data service mashup? It has been described as a means of pulling data from multiple sources into a logical unit. When you have multiple Web browsers open simultaneously, you get a feel for the mashup mixture. With Web services enabled-SOA, you can pull data from virtually anywhere into a logical unit. It has been described as "a bunch of backend data systems, which are typically very complex" that can be combined in terms of an XML Schema." But you must take the time to design such schema.

Return to Table of Contents

 Data Standards

XML is a well-known data standard at this juncture, but data services make use of other standards and technologies such as REST, XQuery, SDO, ADO.NET, JavaScript Object Notation (JSON).

REST already has a Representational State Transfer (REST) Tutorial for those interested in learning more about this subject. Yet architects and developers should be aware that REST holds out great promise in the arena of data services.

Because the REST style, unlike object oriented programming styles, is all about naming things with Uniform Resource Identifiers (URIs) so they can be retrieved, it is uniquely suited for creating data services applications.

Approved with its companion standard, XSL transformations (XSLT) 2.0, XQuery offers the potential to simplify data handling in Web services applications and speed development, according to members of the W3C who worked on it.

What SQL has been for the relational database world of the client-server era, XQuery can be for the world of Web services and service-oriented architecture, said Liam Quin, XML Activity lead at W3C. He pointed out that XQuery is from the lineage of SQL.

The basis for XQuery document processing is called an FLWOR expression, which stands for:

  • For, to drive the handling of item or node sequences within an input document or stream.
  • Let, to declare and initialize variables you'll use during processing.
  • Where (optional), to permit you to specify conditions for selecting items or nodes on which to operate.
  • Order (optional), to sort all selected nodes or items into some type of order.
  • Return, to return a specified value (that can be the result of computation, comparison, or other transformations or selections) for each node or item selected from the input

This type of structure permits incredibly detailed, explicit and specific processing controls. Any complete FLWOR expression must include at least one For or Let element; details will vary according to the type of document interaction and outputs required. Some of the XQuery syntax derives from its specification; the rest comes from the engine that performs XQuery processing.

Service Data Objects (SDO)
At a technical level, SDO defines its building blocks as data graphs, containers which possess data tree structures, each one having its own data types, metadata, parent-child relationships, cardinality relationships, default values or any other property commonly related to data structures. The important thing to realize about these data graphs is that they can be constructed from any type of data store -- relational, XML or any other proprietary format -- the main benefit being data can be inspected and modified through a uniform approach irrespective of its origin.

Complementing the use data graphs in SDO are data mediator services, which are charged with constructing data graphs from the many data repository variations and access technologies supporting SDO.

You can find examples of examples of how to use SDO in this tip from columnist Daniel Rubio.

Perhaps one of the most important sets of classes in the .NET Framework is ADO.NET. Developers ready to work with that architecture will find that its updates are well in tune with service-oriented data thinking. With ADO.NET as it is not configured, developers can work with disconnected data more easily than in earlier versions. It also allows the data to maintain relationships and referential integrity.

So it would seem only natural that a .NET Web service that is working with a database would want to take advantage of the disconnected data features of ADO.NET, and specifically the DataSet class. Many .NET Web service demos have done just that, creating a Web service that returns a DataSet to be consumed by a UI of some sort, also written in .NET, which can easily bind the data to some controls.

Read this tip for ADO.NET code examples.

You can employ JSON as a simple data format, but one which is more naturally fit for browser data consumption. The reason for this is that JSON is a subset to JavaScript, the de facto programming language used in all browsers. By structuring a data payload as a JSON response, you are effectively bypassing the need to parse an XML document in a browser -- typically done via JavaScript of course -- to get to the actual data.

In this sense, JSON uses a stripped down syntax compliant with the native JavaScript interpreter provided on all browsers. Your application's access and navigation to JSON data is done through the same standard JavaScript notation used to access string, array or hashtable values in a typical JavaScript application.

Read this tip for code examples of how to use JSON.

Return to Table of Contents

 More Learning Guides

Dig Deeper on Topics Archive