Manage Learn to apply best practices and optimize your operations.

Expert advice on transmitting data: A pros and cons comparison of data transfer formats

Recent discussions on Web service programming forums suggest that a review of the various formats for transporting data around networks is useful for developers new to the field. Read this article to learn the pros and cons and of data transmission formats and stay tuned to learn more about the various transport mechanisms in an upcoming article.

Recent discussions on Web service programming forums suggested to me that a review of the various formats for transporting data around networks would be useful for developers new to the field. In this article, I am only going to talk about formats used for language independent machine to machine transmission, the various transport mechanisms will be a topic for another article. First, let's take a look at text-based data transfer formats. is a flexible text format for data representation which has solved many problems for developers, while creating some new ones. The standard for XML document syntax and the many related standards is maintained by the , while domain specific formats are maintained by a variety of organizations. PROS: CONS: The development and standardization of JavaScript has made the Web browser a powerful tool for dynamic presentation of data by manipulating the appearance and content of HTML elements. In recent years, it has become possible to assemble the JavaScript component of a webpage from multiple sources which can be updated repeatedly with data objects, thus JavaScript Object Notation or JSON. JavaScript recognizes the usual set of variable types, strings, numbers, arrays and simple objects. The data structures that JSON excels at representing are collections of name/value pairs and ordered lists of data values. Since JavaScript is transmitted as plain text, JSON can be read by other languages so the uses extend far beyond the Web browser. Thus, JSON is strong competition for data transmission in many areas. Recognizing this, RESTful Web service frameworks, such as Jersey and Restlet put a lot of effort into supporting JSON. PROS: CONS: It is rather easy to represent some sorts of data as lines of plain text in which one line corresponds to a single data item. Spreadsheet rows can be expressed this way using "comma separated values" or CSV. Another common approach is a list of "properties" where each line contains a name/value pair. PROS: CONS: So much for formats based on text, next let's look at some binary formats. The Common Object Request Broker Architecture or CORBA was the first serious effort to provide for communication of complex data objects between completely different systems. Much of is concerned with aspects of communication that we are not talking about here. The CORBA standard, now at version 3.1 (2008), is maintained by . PROS: CONS: Back in the days when dinosaurs roamed the earth and 300 baud modems were as good as you could get for a remote system, programmers put a lot of ingenuity into packing maximum information into the minimum number of bits. If we only needed integers between 0 and 63, we only used 5 bits which could share a byte with 3 true or false bits. I suspect that only programmers of deep space probes do much packed binary by hand these days. PROS: CONS: Google has made the idea of packed binary more practical for real applications with the introduction of . This toolkit evolved as a replacement for hand coded packed binary for exchanging requests and responses with Google index servers. The tools were released to open source distribution just 2 years ago. The is to provide a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more. This API produces an encoding of typical data values almost as compact as a hand optimized packed binary. The serves as an introduction to the concept and is suitable for modern programmers. Programmers must use a "proto" syntax to specify the data types to be transmitted and the toolkit takes care of generating the support code for packing and unpacking. PROS: CONS:

Text-based data transfer formats:


XML W3C working groups

  • Readable and editable by developers
  • Error checking by means of Schema and DTDs
  • Can represent complex hierarchies of data
  • Unicode gives flexibility for international operation
  • Plenty of tools in all computer languages for both creation and parsing
  • Bulky text with low payload/formatting ratio (but can be compressed)
  • Both creation and client side parsing are CPU intensive
  • Some common word processing characters are illegal (MS Word "smart" punctuation, for example)
  • Images and other binary data require extra encoding

  • Readable and editable by developers
  • Plenty of JavaScript developers
  • Highly developed browser toolkits such as Dojo and jQuery
  • Bulky text with low payload/formatting ratio, but not as bad as XML
  • Client CPU time required to parse
  • Not as flexible as XML for some data structures and binary data
Plain Text
Note on text based formats:
All of the text based formats share the virtue of being readable and editable by developers. This means that you can create and test both ends of a data exchange with fake data. As discussed in my article on testing Web services, this makes a tremendous difference during development.

  • Readable and editable by developers
  • Fairly compact representation for simple types
  • Possible confusion introduced by punctuation in values
  • Limited to very simple structures
Binary formats


CORBA the Object Management Group

  • Language and operating system independence
  • Compact data representation
  • Built in mapping in Java covers almost all features
  • Open-source versions are available
  • The complete standard is quite complex
  • Interfacing to non-object-oriented languages not easy
  • Incomplete implementation on many systems
Packed Binary

  • Very compact, approaching theoretical maximum.
  • Computation intensive
  • Fragile, dropped or damaged data bits are hard to detect and correct
  • Modern programmers not familiar with the idea
Google Protocol Buffers

Protocol Buffers intent of this API


  • Very compact representation, approaching theoretical maximum
  • Tools for many languages
  • Not sensitive to version changes
  • Open source license
  • Not readable or editable by developers
  • Yet another data definition syntax (proto) to learn

Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.