Home
Search
 
What's New
Index
Books
Links
Q & A
Newsletter
Banners
 
Feedback
Tip Jar
 
XML RSS Feed
Tutorial: Object Serialization
This article explains how you can save and restore complicated object hierarchies. Using these techniques, you can manipulate data structures of extreme complexity with ease.

This article originally appeared in the September 1998 issue of Microsoft Office & Visual Basic for Applications Developer (MOD) magazine. If you like the article, visit the magazine's Web site to see if you would like to subscribe.

 

Object Serialization

By Rod Stephens

Sections
Serialization VBA Code Conclusion
Downloads

Many programs save and rebuild objects later. For example, an inventory system might represent objects in inventory. Some objects might be made up of other objects. The program can save the information it needs to build the inventory objects in a file or database. When it starts, the program reads the data and recreates the objects.

One approach to saving and loading objects is to write their data values into a file in some predefined order. Suppose an item in inventory can consist of several parts, each of which is also an item. The following code shows how the InventoryItem class might define variables representing the item's name and its parts.

    Public ItemName As String     Public ItemParts As New Collection

The class can provide FileWrite and FileInput subroutines to write and read an object's values from a file. FileWrite first writes data specific to this object. In this case that includes only the item's name. In a more realistic application, it would include such values as serial number, item ID, price, and so forth. FileWrite then saves the number of parts in the item's ItemParts collection. For each part in the collection, the routine calls the part's FileWrite subroutine to make the part write itself into the file.

FileInput reverses the process. It first reads the item's name from the file. It then reads the number of parts that make up the item. The routine then creates new items to represent the parts, and calls the FileInput subroutine of each to make the parts read themselves from the file.

Figure 1 shows FileWrite and FileInput subroutines for this simple class. The PartsForm1 UserForm, available for download, uses these routines to load and display a small list of inventory items. Figure 2 shows PartsForm1 in action. This program is meant to be run from Microsoft Word. The form loads its data file from the directory that holds the active document. You may need to change the code slightly to load the file from another location if you run the program from some other Office application.

In this example, the total inventory is composed of several items including a Standard Computer, Multi-Media Workstation, and Billing History Server. The Multi-Media Workstation item is made of up four parts: CPU, Monitor, Hard Disk, and Multi-Media Package. The Multi-Media Package is made up of four parts of its own: CD-ROM, Speakers, Sound Card, and Graphics Accelerator.

' Write the object's information into fnum, ' a file already open for output. Public Sub FileWrite(fnum As Integer) Dim part As InventoryItem      ' Write the object-specific data.     Write #fnum, ItemName      ' Write the number of parts.     Write #fnum, ItemParts.Count      ' Make the parts write themselves.     For Each part In ItemParts         part.FileWrite fnum     Next part End Sub  ' Read the object's information from fnum, ' a file already open for input. Public Sub FileInput(fnum As Integer) Dim i As Integer Dim num_parts As Integer Dim part As InventoryItem      ' Read the object-specific data.     Input #fnum, ItemName      ' Read the number of parts.     Input #fnum, num_parts      ' Create the parts and make them read themselves.     For i = 1 To num_parts         Set part = New InventoryItem         ItemParts.Add part         part.FileInput fnum     Next i End Sub
Figure 1. These routines save and restore InventoryItem objects from a file.


Figure 2. The ItemForm1 UserForm displaying a small inventory hierarchy.

This approach is simple, but it has several disadvantages. First, the data is stored in a rather unintuitive format. Figure 3 shows data representing the inventory information shown in Figure 2. With a little work you can puzzle out its meaning, what each line means is far from obvious. Changing or repairing this data if it were damaged would be difficult.

"All Inventory Items" 3 "Standard Computer" 4 "CPU" 0 "Monitor" 0 "Hard Disk" 0 "CD-ROM" 0 "Multi-Media Workstation" 4 "CPU" 0 "Monitor" 0 "Hard Disk" 0 "Multi-Media Package" 4 "CD-ROM" 0 "Speakers" 0 "Sound Card" 0 "Graphics Accelerator" 0 "Billing History Server" 5 "CPU" 0 "Monitor" 0 "Hard Disk" 0 "CD-ROM Tower" 5 "CD-ROM" 0 "CD-ROM" 0 "CD-ROM" 0 "CD-ROM" 0 "CD-ROM" 0 "Printer" 0
Figure 3. This data describes the inventory shown in Figure 2.

This method is also relatively inflexible. If you decide to change the program so it stores object information in the system registry instead of a file, you must rewrite these routines. If you later decide to send the data over the Internet and reconstruct the objects in a program at the far end, you must rewrite the routines again.

This approach is also not easily extensible. Suppose you want to add a new SerialNumber value to the InventoryItem class. In that case you need to rewrite the FileWrite and FileInput subroutines to account for the new value. Even worse, any old data files you have that store inventory data in the old format are now useless because the new FileInput subroutine cannot read the old format. To use the old data, you must write a data conversion program and convert all of the old data files to the new format. You have the same problems if you later decide to remove a value from the InventoryItem class.

The rest of this article describes an alternative scheme for saving and restoring objects that uses a more intuitive data format, allows greater flexibility, and is easily extended.

Serialization

Instead of writing itself directly into a file, the class creates a string representation of itself. Instead of loading its values from a file, the class reads its values from a string. Because this string represents the object in a serial format, it is called the object's serialization.

Once an object is serialized, a program can store the serialization in a file, database, or the system registry. It can transmit the serialization across a network or store it on a Web page. A remote program can read the serialization and use it to reconstruct the object.

A program can also apply other string operations to the serialization. For example, one program might serialize an object and then encrypt the serialization. It could then send the result to a remote program that would decrypt the serialization and use it to recreate the original object.

To make the serialization extensible, the object stores itself using token name/value pairs. For example, to store the ItemName token with value Standard Computer, a serialization would contain the string "ItemName(Standard Computer)." This indicates that the token with name "ItemName" has value "Standard Computer."

Before it begins reading a serialization, the object initializes its variables to default values. Any values that are not set in the serialization keep their default values. This allows you to easily add new values to the class. If the program loads a serialization stored in the older format, the newly added variables keep their default values.

Using default values also allows the program to reduce the size of the serialization in some cases. If an object's variable is set to its default value, the program does not need to save the value in the serialization. When an object later reads the serialization, it will automatically assign the default value to the omitted variable. If many objects have default values, they will all be omitted making the serialization shorter.

Using default values allows the class to handle new variables simply. The class can also easily ignore old variables that have been removed. When the class encounters an unknown token name, it simply ignores that data.

VBA Code

Figure 4 shows the GetToken subroutine. GetToken extracts the first token's name and value from the front of a serialization. It returns the name and value through parameters, and removes the values from the serialization.

GetToken begins by using the TrimInvisible subroutine to remove any invisible characters from the beginning of the string. This allows the serialization to contain characters such as carriage returns and spaces so it can appear more reasonable to a human. For example, the file can use indentation to show which items are contained in others.

Next, GetToken uses InStr to find the opening parenthesis that marks the start of the first token value. It then looks through the string to find the corresponding closing parenthesis. Each time it encounters an open parenthesis, the routine adds one to its count of open parentheses. When it finds a close parenthesis, it subtracts one from the count. When the count reaches zero, the routine has found the matching close parenthesis.

Finally, GetToken sets the return values for the token name, token value, and serialization with the token removed.

' Find and remove the next token from this string. ' ' Tokens are stored in the format: '    name1(value1)name2(value2)... ' Invisible characters (tabs, vbCrLf, spaces, etc.) '	 are allowed before names. Private Sub GetToken(txt As String, token_name As String, _     token_value As String) Dim open_pos As Integer Dim close_pos As Integer Dim txtlen As Integer Dim num_open As Integer Dim i As Integer Dim ch As String      ' Remove initial invisible characters.     TrimInvisible txt      ' If the string is empty, do nothing.     If txt = "" Then Exit Sub      ' Find the opening parenthesis.     open_pos = InStr(txt, "(")     txtlen = Len(txt)     If open_pos = 0 Then open_pos = txtlen      ' Find the corresponding closing parenthesis.     num_open = 1     For i = open_pos + 1 To txtlen         ch = Mid$(txt, i, 1)         If ch = "(" Then             num_open = num_open + 1         ElseIf ch = ")" Then             num_open = num_open - 1             If num_open = 0 Then Exit For         End If     Next i     If open_pos = 0 Or i > txtlen Then         ' There is something wrong.         Err.Raise vbObjectError + 1, _             "InventoryItem.GetToken", _             "Error parsing serialization """ & txt & """"     End If     close_pos = i      ' Get token name and value.     token_name = Left$(txt, open_pos - 1)     token_value = Mid$(txt, open_pos + 1, _         close_pos - open_pos - 1)     TrimInvisible token_name     TrimInvisible token_value      ' Remove the token name and value     ' from the serialization string.     txt = Right$(txt, txtlen - close_pos) End Sub  ' Remove leading invisible characters from ' the string (tab, space, CR, etc.) Private Sub TrimInvisible(txt As String) Dim txtlen As Integer Dim i As Integer Dim ch As String      txtlen = Len(txt)     For i = 1 To txtlen         ' See if this character is visible.         ch = Mid$(txt, i, 1)         If ch > " " And ch <= "~" Then Exit For     Next i     If i > 1 Then _         txt = Right$(txt, txtlen - i + 1) End Sub
Figure 4. Subroutine GetToken removes the first token name and value from a serialization.

Figure 5 shows property procedures that get and assign an InventoryItem object's serialization. The property get procedure takes as an extra parameter a number of spaces to insert before the serialization. The class uses this parameter to indent each object beneath the item that contains it.

For each of the object's variables that is not set to its default value, the routine adds the variable's name and value to the serialization string. Then, for each of the objects that make up this object, the routine add a token named Part. The token's value is the part object's serialization, suitably indented.

The property let procedure initializes an InventoryItem object using the values in a serialization. This routine does not need to use an indent parameter like the property get procedure does, but Visual Basic requires it so the parameters of the two procedures match exactly.

The procedure begins by setting the object's variables to default values. It then uses GetToken to examine the tokens in the serialization. It uses a Select statement to decide what kind of token it is reading. If the token is the name of a simple variable, the procedure sets the variable to have the token's value. For example, when the token name is ItemName, the procedure sets the object's ItemName variable to the token's value.

When the item reads a token named Part, it creates a new InventoryItem object and adds it to the ItemParts collection. It then sets the new part's serialization to the Part token's value so the new object can initialize itself.

If the property let procedure encounters a token it does not recognize, such as a variable that has been removed from an older version of the class, it simply ignores the value.

' Return the object's serialization. Public Property Get Serialization(indent As Integer) As String Dim txt As String Dim part As InventoryItem      ' Write the object-specific data.     If ItemName <> dflt_ItemName Then _         txt = txt & Space$(indent) & _         "ItemName(" & ItemName & ")" & _         vbCrLf      ' Make the parts write themselves.     For Each part In ItemParts         txt = txt & Space$(indent) & _         "Part(" & vbCrLf & _         part.Serialization(indent + 4) & _         Space$(indent) & ")" & vbCrLf     Next part      Serialization = txt End Property  ' Initialize the object using the serialization. Public Property Let Serialization(indent As Integer, _     new_value As String) Dim token_name As String Dim token_value As String Dim part As InventoryItem      ' Start with an empty collection.     Set ItemParts = New Collection      ' Initialize all values to defaults.     ItemName = dflt_ItemName      ' Examine each token in turn.     Do         ' Get the token name and value.         GetToken new_value, token_name, token_value         If token_name = "" Then Exit Do          ' Save the value appropriately.         Select Case token_name             Case "ItemName"                 ItemName = token_value             Case "Part"                 ' Create a new part and                 ' make it unserialize itself.                 Set part = New InventoryItem                 ItemParts.Add part                 part.Serialization(0) = token_value         End Select     Loop End Property
Figure 5. The Serialization property get procedure returns an object's serialization. The property let procedure initializes an object's variables using a serialization.

Figure 6 shows the new serialization for the data shown in Figure 3. This version is easier to understand and modify. It includes carriage returns and spaces to indicate the objects' containment hierarchy. The data format is easy enough to read that you could probably edit it by hand if necessary.

The ItemForm2 UserForm, available for download, demonstrates this technique for serializing and unserializing objects.
ItemName(All Inventory Items) Part(     ItemName(Standard Computer)     Part(         ItemName(CPU)     )     Part(         ItemName(Monitor)     )     Part(         ItemName(Hard Disk)     )     Part(         ItemName(CD-ROM)     ) ) Part(     ItemName(Multi-Media Workstation)     Part(         ItemName(CPU)     )     Part(         ItemName(Monitor)     )     Part(         ItemName(Hard Disk)     )     Part(         ItemName(Multi-Media Package)         Part(             ItemName(CD-ROM)         )         Part(             ItemName(Speakers)         )         Part(             ItemName(Sound Card)         )         Part(             ItemName(Graphics Accelerator)         )     ) ) Part(     ItemName(Billing History Server)     Part(         ItemName(CPU)     )     Part(         ItemName(Monitor)     )     Part(         ItemName(Hard Disk)     )     Part(         ItemName(CD-ROM Tower)         Part(             ItemName(CD-ROM)         )         Part(             ItemName(CD-ROM)         )         Part(             ItemName(CD-ROM)         )         Part(             ItemName(CD-ROM)         )         Part(             ItemName(CD-ROM)         )     )     Part(         ItemName(Printer)     ) )
Figure 6. This data describes the same objects as the data shown in Figure 3 but it is easier to read.

Conclusion

This serialization technique provides several benefits. By using simple strings, it allows a program to store the serialization in a file, the system registry, or some other location. It allows the program to send the serialization across the Internet, and it provides an easy opportunity for encryption and decryption.

This method easily adapts to changes in the objects' variables. When variables are added or removed from a class, the Serialization property procedures must be modified, but the rest of the program remains unchanged. The main program does not need to know the change has occurred. More importantly, existing data files will work with the new structure immediately without any data conversion.

Finally, the serialization is clear enough that you might reasonably be able to modify it directly if necessary. While a program usually modifies the data, it's nice to have the ability to see what's going on and to make changes in an emergency.

Downloads

Download a 6K zip file containing VBA UserForms and classes demonstrating the techniques described in this article. These are designed for use in VBA, but you can adapt them for Visual Basic, too.

This article originally appeared in the September 1998 issue of Microsoft Office & Visual Basic for Applications Developer (MOD) magazine. If you like the article, visit the magazine's Web site to see if you would like to subscribe.

The book Ready-to-Run Visual Basic Algorithms explains object serialization and many other traditional and object-oriented algorithms in depth.


 
Subscribe to the VB Helper newsletter
Copyright © 1997-2001 Rocky Mountain Computer Consulting, Inc.   All rights reserved.
www.vb-helper.com/tut5.htm Updated