What Is Serialization?

In computer programming,‭ ‬serialization is the process of taking a data structure stored in local memory and turning it into a stream of‭ ‬bytes that can be transmitted over a network or stored on a disk to be reassembled and used by another program.‭ ‬Serialization also can be used to save the state of an object so it can be reloaded later by the same program.‭ ‬A more complex use of this‭ ‬function is to invoke a remote procedure call‭ (‬RPC‭)‬,‭ ‬effectively running a procedure on another computer through a network.‭ ‬This mechanism also allows for the distribution of data objects over a large‭ ‬networked system.

Nearly every modern computer language has either native support for serialization or a library available to add this functionality.‭ ‬When an object is serialized,‭ ‬all of the fields of the object are flattened.‭ ‬This process also is known as deflating‭ ‬or‭ ‬marshalling.‭ ‬The data is turned into a one-dimensional row of bytes that can be written to any output stream.‭ The type of output stream does not matter and‭ ‬could be a file or a network socket.

Once the data has been serialized and sent to its final location,‭ ‬the process of deserialization begins.‭ ‬The program that reads the byte stream restores all the information and places it in a new‭ ‬instance of the original object, creating an exact copy.‭ ‬It is important to understand that only the data that the object was holding is‭ ‬marshaled; ‬the object and its methods and other implementation data is not.‭ ‬This means the program that deserializes the data must be able to create an instance of the class that was‭ ‬originally‭ ‬serialized.

Data structure serialization can be used for a variety of purposes.‭ ‬Object information can be stored on physical media so the exact state of every object can be restored to the point it was at when program execution halted.‭ ‬It can be used to send messages to another computer that will cause a remote procedure to run.‭ ‬Serialization can even be used to efficiently compare state changes in real time applications.

Before using object serialization,‭ ‬it is important to understand some of the limitations it imposes.‭ ‬The most important is that, through the process of converting an object into a byte stream,‭ ‬fields that are declared as private will be exposed.‭ ‬During the transmission of the stream,‭ ‬this data can be captured and decoded, presenting a‭ ‬security hole.‭ ‬Most languages allow for the externalization of the data serialization formats so‭ ‬proprietary‭ ‬encoding is possible to help mitigate this risk.

Another factor to bear in mind is that serialization will,‭ ‬in general,‭ ‬work only with objects that are exactly the‭ ‬same as the serialized object.‭ ‬If‭ ‬new fields or methods are added‭ ‬to an object,‭ ‬then‭ ‬the signature of the object will change.‭ ‬This will mean the stored object will cause an exception and the data will become unrecoverable until an instance of the original unmodified object attempts to restore it.