Serializing Objects

mje · March 2, 2011

I've finally decided to drink the cool-aid and trust LabVIEW to serialize my Objects directly That is I just wire up my Object to the binary read/write methods and hope for the best. So far I'm very impressed, it handles mutations of the objects very well. Up until now the mutations have been fairly mundane. They are changes that exist on their own such that if they weren't there to begin with, the default values that get put in place when an old version is loaded into the new code works fine.

However now I have some data I've added which essentially serves as an index to other data. When I load an old serialized object version, the index of course doesn't exist, so the old object data mutates to now have the default value (an empty array), meaning my object is missing its index. No big deal, I just have code in place such that whenever I need to use the index, I first check if it's valid, and build it if necessary.

This seems a bit clumsy though, I wonder what other strategies people have used for serialization? Up until now, I've always "rolled my own", but the documents I'm dealing with are now big enough that I can't ignore the orders of magnitude of difference in speed by using the native serialization options (the application I'm writing expects to have a median document size in the 100-500 MB range).

jgcode · March 2, 2011

...I just have code in place such that whenever I need to use the index, I first check if it's valid, and build it if necessary.

Cool thread.

To handle situations such as above, I do the checking at the time the object is read from disk.

Don't know if there are pros\cons between the two methods, or if it doesn't really matter.

Cheers

-JG

Mark Smith · March 2, 2011

When you serialize objects, the version number of the class gets written into a field (at least when using XML serialization - I presume the same is true of binary) so you could check what version of the class loaded and take appropriate action rather than look at the actual object data. I use the native LabVIEW XML serialization and it works quite well - of course, I'm not serializing objects that are 100-500 MB!

Mark

Daklu · March 2, 2011

I've finally decided to drink the cool-aid and trust LabVIEW to serialize my Objects directly

I'd be very, very careful with that particular flavor of kool-aid. There are lots of things that can happen during normal development that can make your saved objects unreadable by the new source code. What makes it all the more dangerous is there's nothing in the LV dev environment warning you that a particular edit will change the way a persisted object is loaded, (or if it can be loaded at all.) I imagine those large documents will be pretty hard to recreate if they unexpectedly become incompatible with the software.

Is there any way you can improve the speed of your custom serialization method? I've never done any benchmarking, but orders of magnitude difference between directly writing an object to disk and creating a method that unbundles the class data and writes it to disk? That raises the "something's not right" flag in my head...

mje · March 2, 2011

Yes, the version numbers persist in the binary format as in the XML.

As for the orders of magnitude, I was comparing a generic variant driven XML library I wrote that uses the native LabVIEW/Xerces parser. Since it's a DOM parser (opposed to SAX), loading big documents becomes...prohibitive as far as memory consumption goes, which affects speed. So the combination of DOM and variants leads to astronomical memory footprints when large data structures get involved. Works beautifully though for small pieces of data because you can literally throw anything at it.

As far as what breaks the native serialization, the only thing I'm aware of is changes in the qualified name of the objects. So renaming a the class itself, or the containing library, moving a class to a new library, etc will change the qname. Is there something else?

At this stage the product is still in development, and I can write a serialization protocol into it before I ship if it proves to be too much of a risk factor. There's been a reason I've left the cool-aid alone until now, and it was the risk Dak brought up. I'd love to know exactly what can break it.

Michael Aivaliotis · March 2, 2011

Don't use any type definitions inside your class data.

I'm not sure I'm following what your final implementation looks like. You're letting LabVIEW load the object with the binary read function and then fixing your index if it's bad? Is the index a scalar value that requires some logic to determine its value? I would probably use the class version number to determine what code to execute on the data to fix the index. At the end of the day, you still need to do some "fixing".

When I've had to do this. I use a multi-stage sequential conversion routine. based on what version I'm reading. So if you have 3 versions of your file then you would do 1->2->3. Needing 2 conversion routines.

I've stayed away from using the class versioning system for saving object data because I tend to use typedef clusters in my objects. Ya, I know, bad - bad Michael. I'm slowly trying to ween myself from this habit.

mje · March 3, 2011

To handle situations such as above, I do the checking at the time the object is read from disk.

Don't know if there are pros\cons between the two methods, or if it doesn't really matter.

The only reason I didn't do it this time is I'm still trying to wrap my head around the versioning embedded in the native packaging of data. Once you load your object, the previous version info is lost because your object has already mutated. If you want to know the version that's on disk, you need to interpret the byte stream directly (I think?). This can get cumbersome when you realize that each class has a version for itself, every level of ancestor, and of course every contained object in the private data. Seems intimidating.

Of course I could just cheat and have a version saved in the class data as well.

When I've had to do this. I use a multi-stage sequential conversion routine. based on what version I'm reading. So if you have 3 versions of your file then you would do 1->2->3. Needing 2 conversion routines.

That is historically what I've always done.

I'm not sure I'm following what your final implementation looks like. You're letting LabVIEW load the object with the binary read function and then fixing your index if it's bad? Is the index a scalar value that requires some logic to determine its value? I would probably use the class version number to determine what code to execute on the data to fix the index. At the end of the day, you still need to do some "fixing".

Yes, you're very close. Basically my old data had a large array-like piece of data. Searching and sorting that data is slow, so I went ahead and built pre-sorted index arrays that ride along with the original data to enable binary searches etc. So now when I load the document, if the index arrays don't exist, they need to be built. In this case it's not so much a matter of fixing as creating if it doesn't exist.

As for typedefs...Damn.

Michael Aivaliotis · March 3, 2011

Yes, you're very close. Basically my old data had a large array-like piece of data. Searching and sorting that data is slow, so I went ahead and built pre-sorted index arrays that ride along with the original data to enable binary searches etc. So now when I load the document, if the index arrays don't exist, they need to be built. In this case it's not so much a matter of fixing as creating if it doesn't exist.

I dunno man. It sounds like you need a database instead of a file. Just load the data you need based on a query. But that's a whole other thread perhaps.

jgcode · March 3, 2011

Of course I could just cheat and have a version saved in the class data as well.

Yes, that is what I do. I currently un/flatten the class to/from disk in much the same way Michael describes (I actually developed the technique from copying stuff in an article Jim posted on thinkinging?) to solve the issue you present.

Also, on the darkside Dave/Jack/AQ instigated a great discussion on such issues with this - well worth a read. In summary, blindly trusting NI's implementation is not advised for mission critical stuff but it is the easiest way - but won't fix your issue methinks.

mje · March 3, 2011

Cool, thanks for the tip. To clarify, there are no standing issues, what I have works. I was just wondering what people thought of the pre-canned serializaton mechanics. Consensus seems to be to avoid it, so I'll plan to work something else in before final release.

As for databases, we considered them. Well, only SQLite. Ultimately ruled it out due to the frequency with which we need to access the data and the relatively small data set. That and we couldn't justify the risk of introducing an unknown platform on a short development schedule. When I implemented my first binary searches back in the discovery phase of the project and saw the results, it sealed the deal: we decided to keep it all native.

Daklu · March 3, 2011

As for the orders of magnitude, I was comparing a generic variant driven XML library I wrote that uses the native LabVIEW...

THAT part I understood... everything beyond it is greek. :lol:

Is there something else?

Certain edits to the class data will make the data unretrievable. Example:

1. Put an numeric control in class, assign it the number 5, and save it to disk.

2. Load it from disk and verify you get the number 5.

3. Open the class .ctl, delete the numeric control, and apply the change.

4. Now add a numeric control back to the class .ctl, making sure it has the same name and apply the changes.

5. Load the object saved previously and read the number.

Intuitively you think it should be 5, but it has reverted to the default value of 0. The saved data cannot be recovered unless you can revert to the class .ctl file to a version prior to the delete.** Why? Mutation history--it's a double-edged sword. The class remembers that you deleted the first control and then added the second one. As far as it's concerned the new numeric control is for a completely different data member. Since the saved object doesn't contain data for this "new" member it is assigned the default value of 0.

Like the issue with propogating multiple typedef changes to unloaded vis, this isn't really a bug either. NI had to define some sort of default behavior and this is as good as any. (I do really wish there were a way to override the default behavior though.) Data persistence is one area where I think typedeffed clusters are easier to use than classes. They are much more transparent to the developer. If you know what to look out for you can be really careful with direct object persistence. Whether or not you can prevent other developers from messing it up is another story.

Stagg54 · September 12, 2013

Yes, that is what I do. I currently un/flatten the class to/from disk in much the same way Michael describes (I actually developed the technique from copying stuff in an article Jim posted on thinkinging?) to solve the issue you present.

Also, on the darkside Dave/Jack/AQ instigated a great discussion on such issues with this - well worth a read. In summary, blindly trusting NI's implementation is not advised for mission critical stuff but it is the easiest way - but won't fix your issue methinks.

Do you happen to have a link to that article?

Sign In

Serializing Objects

Recommended Posts

mje

jgcode

Mark Smith

Daklu

mje

Michael Aivaliotis

mje

Michael Aivaliotis

jgcode

mje

Daklu

Stagg54

Join the conversation

Browse

Activity

Important Information