External Data Representation And Marshalling

Sangeevan Siventhirarajah
7 min readJun 21, 2020

--

External data representation is an agreed standard for representation of data structures and primitive values. The stored information in running program is represented as data structure. The information messages consists of sequences of bytes. Irrespective of the form of communication used, the data structure must be converted to a sequence of bytes before transmission and rebuilt on arrival. The individual primitive data items transmitted in message can be data values of many different types, and not all computers store primitive values in the same order. The representation differs between architecture.

To exchange binary data values between any two computers, the values can be converted to an agreed external format before transmission and converted to the local form on receipt; if the two computers are known t be the same type, the conversion to external format can be omitted; or the values are transmitted in the sender’s format, together with an indication of the format used, and the recipient converts the value if necessary.

However, that bytes themselves are never altered during transmission. To support RMI or RPC, any data type that can be passed as an argument or returned as a result must be able to be flattened and the individual primitive data values represented in an agreed format. Agreed standard for the representation of data structures and primitive values is known as external data representation.

Marshalling is the process of getting collection of data items and assembling them into a suitable form for transmission in a message. Unmarshalling is the process of disassembling them on arrival to get same collection of data items in the receiving point. Marshalling consists of the translation of structured data items and primitive values into an external data representation, unmarshalling consists of the generation of primitive values from their external data representation and the rebuilding of the data structures.

There are some approaches like CORBA’s common data representation, Java’s object serialization and XML (Extensible Markup Language) to external data representation and marshalling.

CORBA’s Common Data Representation

CORBA’s common data representation is the external data representation, which can represent all of the data types that can be used as arguments and return values in remote invocations in CORBA. These consist of primitive values and with a range of composite types. Each argument or result in a remote invocation is represented by sequence of bytes in the invocation or result message. It can be used by lots of programming languages.

For primitive types, CDR defines representation for both big-endian and little-endian ordering. Values are transmitted in sender’s ordering, which is specified in each message. Recipient translates if it requires different ordering.

For constructed types, the primitive values that comprise each constructed type are added to a sequence of bytes in a particular order.

CORBA CDR for constructed types
CORBA CDR for constructed types

Type of data item is not given with data representation in the message in CORBA CDR. This is because it’s assumed that the sender and receiver have common knowledge of the order types of the data items in a message. In particular, for RMI or RPC, each method invocation pass arguments of particular types, and the result is a value of a particular type.

Marshalling in CORBA can be generated automatically form the specification of the types of data items to be transmitted in message. Types of the data structures and the types of the basic data items are described in CORBA IDL, which provides a notation for describing the types of the arguments and results of RMI methods. The CORBA interface compiler generates appropriate marshalling and unmarshalling operations for the arguments and results of remote methods from the definitions of the types of their parameters and results.

Java’s Object Serialization

Java’s object serialization is concerned with the flattening and external data representation of any single object or tree of objects that may need to be transmitted in a message or stored in a disk. It’s only use by Java.

In java RMI, both objects and primitive data values may be passed as arguments and results of method invocations. An object is an instance of a class. Stating that a class implements the serializable interface, which is provided in the java.io package; that class has the effect of allowing its instances to be serialized.

In Java, serialization refers to the activity of flattening an object or a connected set of objects into a serial form that is suitable for storing on disk or transmitting in a message. Deserialization consists of restoring the state of an object or set of objects from their serialized form. It’s assumed that the process that does the deserialization has no prior knowledge of the types of the objects in the serialized form. There for some information about the class of each object is included in the serialized form. This information enables the recipient to load the appropriate class when an object is deserialized.

Serialization and deserialization in java
Serialization and deserialization in java

Java objects can contain references to other objects. When an object is serialized, all the objects that it references are serialized togather with it to ensure that when the object is reconstructed, all of its references can be full filled at the destination. References are serialized as handles. In this case, the handle is a reference to an object within the serialized form.

Serialization and deserialization of the arguments and results of remote invocations are generally carried out automatically by the middle ware, without any participation by the application programmer. If necessary, programmers with special requirements may write their own version of the methods that read and write objects. Another way in which a programmer may modify the effects of serialization is by declaring variables that should not be serialized as transient.

Java supports reflection, the ability to inquire about the properties of a class. Reflection makes it possible to do serialization and deserialization in a completely generic manner. This means that there is no need to generate special marshalling functions for each types of objects, as described for CORBA.

XML (Extensible Markup Language)

XML is a markup language for general use on the web. The term markup language refers to a textual encoding that represents both text and details as to its structure or its appearance. XML was designed for writing structured documents for the web. now it’s also used by clients and servers in the web services for represent data sent in messages exchanged.

XML data items are tagged with markup strings. The tags are used to describe the logical structure of the data and to associate attribute value pairs with logical structures.

XML tag
XML tag

XML is used to enable clients to communicate with web services and for defining the interfaces and other properties of web services. XML is also used in many other ways, including in archiving and retrieval systems although an XML archive may be larger than a binary one, it has the advantage of being readable on any computer.

XML is extensible in the sense that user can define their own tags. If a XML document is intended to be used by more than one application, then the name of the tags must be agreed between them.

Some external data representations such as CORBA CDR don’t need to be self describing, because it’s assumed that the client and server exchanging a message have prior knowledge of the order and the types of the information it contains. XML was intended to be used by multiple applications for different purposes. The provision of tags, together with the use of namespaces to define the meaning of the tags, has made this possible. The use of tags enables applications to select just those parts of a document it needs to process, it will not be affected by the addition of information relevant to other applications.

XML documents, being textual, can be read by humans. In practice , most XML documents are generated and read by XML processing software, but the ability to read XML can be useful when things go wrong. The use of text makes XML independent of any particular platform. Use of textual rather than binary representation, togather with the use of tags, makes the message large, so they require longer processing and transmission time, as well as more space to store. The efficiency of messages using the CORBA CDR is better than SOAP XML format.

Marshalling and unmarshalling in first two cases are intended to be done by a middle ware layer without any interfere on the part of the application programmer. Even XML is textual and therefor more accessible to hand encoding, marshalling and unmarshalling software is available for all commonly used platforms and programming environments. Because marshalling requires the consideration of all finest details of the representation of primitive components of composite objects, the process is likely to be error prone if done by hand. Another problem is compactness, which can be addressed in the design of automatically generated marshalling procedures.

The primitive data types are marshalled into binary form in the first two approaches. But in XML primitive data types are represented textually. Textual representation of data value will generally be longer than the same binary representation.

Another problem with regard to the design of marshalling method is whether the marshalled data should include information concerning the type of its content. CORBA’s representation includes nothing about their types, it contains only the values of the transmitted objects. Java’s object serialization and XML include type information. They are using different ways to do this. Java puts all the required type information into the serialized form. XML documents may refer to externally defined sets of names called namespaces.

The external data representation although used for the arguments and results of RMIs and RPCs, it has a more general use for representing data structures, objects or structured documents in a form suitable for transmission in messages or storing in files.

--

--

No responses yet