As I see it there are a few bits of information that need to be shared for successful serialization of a data structure.
The Shared Schema
This is the most fundamental part, it is the contract that defines how to interpret what is being transmitted or stored. I'm not even being XML specific here, I mean schema in the most abstract sense-- some shared grammar that must be understood by both the code doing the serialization and that doing the deserialization. Maybe a schema is a literal XML schema, maybe it's a database schema, or maybe it's some obscure binary format.
At the very least, you probably want a schema to be version aware. If your data structure changes from a SGL to a DBL, between version 1.0 and 1.1, that's important to know. The schema might wish to carry along version history as well, but I wouldn't say this is a requirement.
In native code, you can think of each .lvclass as the schema for that class, though it doesn't really serve that purpose in the general sense since it doesn't express that information very well to anything other than LabVIEW.
The Class Identifier and Version
The object identifier is the fundamental unit that identifies what is being serialized. It could be anything so long as any given identifier resolves to a single type. Similarly most objects are going to want to be version aware to allow a given type to evolve over time. The native mechanism I believe uses the qualified class name and a four word (4 x U16) version number.
The Object Data
This is the actual value of what is being transmitted. As has been pointed out, simply saying "default" usually doesn't cut it. Strictly speaking though, if your schema maintains a whole version history and defines the default for each version, it is sufficient, but I find it far easier to always serialize all data simply because maintaining a version history in the schema is hard. LabVIEW can handle the "default" problem because the lvclass file holds the version history, all be it in an obfuscated binary format.
Designing a Serialization API
Let's start with deserialization. What are the fundamental steps in deserialization?
At it's heart, we're going to have some stream of data, and our job is to figure out what that data represents, create an instance of some representative object in our application, then properly transfer all or parts of the serialized data to that instance.
So let's define a base class, serializable.lvclass. In our schema, all objects which are serialized inherit from this class. What do we need to know to be able to create a proper instance of this type when we deserialize it?
Whatever our implementation is, at some level we're going to have to examine this data stream and figure out what to actually instantiate. If you're familiar with design patterns, this is should be a big flashing sign for a factory. The factory's job is to take an identifier in, create a concrete instance of whatever type we have to represent that identifier's schema, and return it as a serializable object.
Once the factory has returned us an instance of a serializable object, we can then call that object's dynamic deserialize.vi method, passing to it the remaining data in the stream. This method's job will be to consume as much of the stream as required by our schema. Internally, you can imagine the next piece of information deserialize will require is a version number, followed by any version specific data. It can then pass the data stream to its parent implementation, which will in turn consume what's required from the stream until the end of the line is reached and the base serializable.lvclass:deserialize.vi implementation.
Meanwhile a similar serializable.lvclass:serialize.vi dynamic method exists. It will write its version number, all data for the class, then pass the stream up the inheritance chain so each level in the hierarchy can get a crack at it. This method shouldn't serialize the class identifier, because that is handled at a scope outside of the serialize/deserialize methods. Recall deserialize.vi is not called until after our logic has consumed the identifier from the stream, so similarly we can expect the class identifier has already been committed to the stream before serialize.vi has been called.
There are some subtleties here I won't get into that really are best exposed in an example, which I definitely can't produce tonight. Quickly though:
A dynamic method serializable.lvclass:identifier.vi exists which must be overridden. The responsibility of this method is to return the unique class identifier for a concrete implementation.
A static serializeObject.vi method exists which takes a stream, and a serializable object as input. This method will first commit the value returned from identifier.vi to the stream, then pass the stream onto the object's serialize.vi method.
Similarly a static deserializeObject.vi method exists which consumes a class identifier from the stream, calls a factory method to get an instance of the appropriate serializable object, then passes the remaining stream onto the object's deserialize.vi method.
Since the serialize/deserialize methods are only intended to be called from within the scope of a class hierarchy, or by a static wrapper, I propose the static wrappers be member of serializable.lvclass and the dynamic methods be protected.
This all implies that class identifiers for ancestor classes are never serialized. That is the inheritance of a class is defined in the schema, and becomes hard-coded into a class implementation.
Also implied is versioning is delegated the class implementations as well. Versions can be absent entirely, or partially implemented. For example most of my applications can read most any version for which a schema was defined, but can only write the most recent schema for which the application is aware.
This is a lot of text, but fundamentally quite simple. There's only a single class and a handful of methods. The reason I haven't implemented a clear example so far is there are two "feats" that I think need to be overcome to change this from what's really a design pattern to something with a real utility as a re-use class.
The factory. There are many ways to go about doing this, I haven't settled on how I wish to make it extensible. I have ideas, but I'm tired at this point and this post is already way too long.
The stream. I really want to avoid writing yet another serialization scheme which is hard-coded for a file stream, a TCP/IP stream, a string, or a byte array. I really want to develop an abstraction layer that is fast, such that what the stream is becomes completely transparent to each class being serialized.