Data services mashups would be easy if all enterprise information came in well-formatted XML, but of course that is not reality.
How to deal with that reality is being discussed today by Kirstan Vandersluis, founder and chief scientist for XAware Inc., at the MySQL Conference & Expo in Santa Clara. In a talk titled "Crazy Data Formats and Multiple Data Sources? Taming Your Messy Mashups," he will be demonstrating how to create data services from multiple data sources. He will be mashing up MySQL, which is a format he terms "not too crazy." On the crazier side, he will use data from a COBOL copybook, which he describes as "somewhat crazy."
While the COBOL copybook is essentially a text file dumped from a mainframe, it is representative of the non-XML files developers, data architects and database administrators must deal with to create data services that conform to an XML schema.
"We see a lot of different types of text files," Vandersluis said in an interview previewing today's talk. "People deal with messages where the data is basically just in a Microsoft Word document. We support Web services, and also HTTP for REST services or POX services, FTP, e-mail. We can also hook into Java classes, EJBs."
Typically, he said, users of the tools have applications that need to exchange XML data in conformance with XML schema. For example, in the insurance industry companies are required to conform to the ACORD standard.
"The technology problem is exposing their data conforming to that XML schema," Vandersluis said.
Most insurance companies, for example, have data stored in a variety of different formats, not all of which are XML, so that mashing up all that information into data services that conform to a specific XML schema is a challenge, he said.
Describing the problems with the data, Vandersluis said, "To summarize, it's ugly. It's in a lot of different places. It's not very clean. It's duplicated where it shouldn't be duplicated."
The trick is to take data from multiple sources and then expose it according to a required XML Schema, he said.
"That's really what we mean by data mashup," he explained. "You are taking data sets from multiple locations, combining it into a logical form, which for us means conforming to an XML schema, and then exposing that as a service."
At the MySQL conference he will be demonstrating how this can be done using the XAware data integration environment, which he designed for building composite data services. The design environment is Eclipse-based. There is an engine that is a plug-in to a J2EE app server or servlet engine with a Java API for linking to applications.
To bring data from different sources together for a data mashup XAware provides adapters including ones for copybooks, text files, MQ, JMS, Excel, FTP and LDAP. The adapters convert the data into standard XML automatically applying the tags.
Getting disparate data into an XML format as complex as the insurance industry's ACORD structure is done through a process called "shredding" where information is put into multiple database tables," Vandersluis said.
"If you are talking about an auto policy," he explained, "there's going to be information about the policy itself. What's the term? What's the amount you're paying every six months? Who are the drivers? What vehicles are covered? So for shredding an XML structure like that, very likely each of those major subcomponents of the policy are going into different tables. So the shredding of that structure would be taking the driver information and storing that in the driver table. Taking the policy information and storing that in the policy table. Taking the vehicle information and storing that in the vehicle table."
He said IBM, Oracle and Microsoft have advanced XML capabilities in their information products, but in MySQL those capabilities are "pretty light." So XAware is working to provide advance XML capabilities to MySQL so the open source database recently acquired by Sun Microsystems Inc., will be equivalent to what is provided by the major commercial vendors for creating data mashups.
"What we're able to do is bring in an XML schema and use our design environment to map to either the different tables in MySQL or potentially across different data sources," Vandersluis explained. "In a lot of cases, MySQL might provide some portion of the data sets, but very often there are other data sets within the organization that have to participate in that same service."
This past November when XAware went open source, David Linthicum, managing partner of ZapThink LLC praised the XAware technology for its ability to take data from disparate formats and present it as if it were coming from a single database.
Linthicum said SOA developers looking for a way to integrate legacy data coming from a variety of legacy sources will find XAware tools helpful. He described XAware as "a heterogeneous data abstraction environment that adds a tremendous amount of value to SOA because it puts schema volatility into a single configurable domain."
The tool gives the developer a graphical view of the XML format, Vandersluis said.
"Then as a designer you can focus in on one section of the XML at a time and then map that to a backend data source," he explained.