Jump to content

Behaviour when loading class data from a flat (binary) file


Recommended Posts

Hi,

I have a case where I use the power of LV-Classes and their automatic version mutation capabilities for storing data to a flat binary format by just wiring the class wire to the "Read to binary file" primitive. Writing/reading class-data of different class version works like it should. However when for some reason the user would select a non-class file (e.g. some text file), I would simply expect the "Read from binary file" to return some error I could act upon.

Instead, LV throws a "Not enough memory to complete this operation" dialog at me, then when I click Ok on the dialog, after a really long (10's of seconds) timeout I get error 4, End of file encountered out of the "Read from binary file" primitive.

I'd qualify this as buggy behaviour. What are your thoughts?

Link to comment

QUOTE (Jeffrey Habets @ Mar 17 2009, 10:14 AM)

Hi,

I have a case where I use the power of LV-Classes and their automatic version mutation capabilities for storing data to a flat binary format by just wiring the class wire to the "Read to binary file" primitive. Writing/reading class-data of different class version works like it should. However when for some reason the user would select a non-class file (e.g. some text file), I would simply expect the "Read from binary file" to return some error I could act upon.

Instead, LV throws a "Not enough memory to complete this operation" dialog at me, then when I click Ok on the dialog, after a really long (10's of seconds) timeout I get error 4, End of file encountered out of the "Read from binary file" primitive.

I'd qualify this as buggy behaviour. What are your thoughts?

I experienced that myself and put it in the "Don't DO THAT" collumn. That is the same effect as what heepens when you (or LV 7.0 when quickly closing and opening files, no you don't have to believe me ;) ) mix up you file references for files of two different file types. The File primative trusts us that the file is of the type specified.

Ben

Link to comment

QUOTE (neBulus @ Mar 17 2009, 04:19 PM)

I experienced that myself and put it in the "Don't DO THAT" collumn. That is the same effect as what heepens when you (or LV 7.0 when quickly closing and opening files, no you don't have to believe me ;) ) mix up you file references for files of two different file types. The File primative trusts us that the file is of the type specified.

Ben

True, and for 'normal' (non class) data I would add a header containing file type and versioning information and read that first before attempting to further process the file. I'm using classes here to make life easier and let them do the versioning. I'd expect LV to be smarter when loading the data, because it has all information it needs in the class datatype.

As a matter of fact, I just tried the same with a cluster wired to the Read binary primitive and there I get a more expected result when opening a file with other (than the cluster format) data: LV returned error 116, Unflatten or byte stream read operation failed due to corrupt, unexpected, or truncated data. It should throw the same error with classes.

Link to comment

QUOTE (Jeffrey Habets @ Mar 17 2009, 11:30 AM)

True, and for 'normal' (non class) data I would add a header containing file type and versioning information and read that first before attempting to further process the file. I'm using classes here to make life easier and let them do the versioning. I'd expect LV to be smarter when loading the data, because it has all information it needs in the class datatype.

As a matter of fact, I just tried the same with a cluster wired to the Read binary primitive and there I get a more expected result when opening a file with other (than the cluster format) data: LV returned error 116, Unflatten or byte stream read operation failed due to corrupt, unexpected, or truncated data. It should throw the same error with classes.

THere is a little difference between those two situations. THe cluster can only be of one format. Class data can constis of its dat plus any of the data for the children. Which flavor of the Class (vs children's version) is stuffed in the class data (somewhere!). But how would the file primative be able to pull out the format without reading at the very least some type of header. Since the header is the wrong type it get formatted badly as you reported.

This does highlight the work that Stephen and his crew put into to LVOOP.

Have you concidered flattening the data and writting it to a text field following your normal header technique?

I am not saying this is NOT a bug.

I'm saying;

1) I expect it to have trouble so I am not suprised.

2) If a bug fix for this case slows down LVOOP, I would have preffered it not be fixed.

Just my two cents,

Ben

Link to comment

I assume that if you try it by using the method NI shows in its examples (using the datalog functions) it will work, but it might not. Personally, I flatten to a string and then save that string. You can try doing the same and seeing if the unflatten node throws the out of memory error.

In general, though, I would agree that this is probably a bug and that LV should recognize you simply did not point it at a class file.

Link to comment

QUOTE (Yair @ Mar 17 2009, 06:58 PM)

I assume that if you try it by using the method NI shows in its examples (using the datalog functions) it will work, but it might not. Personally, I flatten to a string and then save that string. You can try doing the same and seeing if the unflatten node throws the out of memory error.

In general, though, I would agree that this is probably a bug and that LV should recognize you simply did not point it at a class file.

The unflatten node doesn't give the out of memory error, instead it returns an error (1527, Attempted to read flattened data of a LabVIEW class that is not currently loaded into LabVIEW.) which seems the correct behaviour I also would expect from the read binary primitive. (Since the(un)flatten thing is probably more or less what the binary read/write primitives do under the hood anyway.)

I can actually read the class data written using the binary write by reading it as text and unflattening the string. So there's my workaround for now.

Thank you guys for your thoughts, I'll file a bug report on this issue.

Link to comment

I am not sure I would classify this as a bug. Without knowing the specifics of the headers that NI is using when storing the binary data it is quite possible that enough of the information for the incorrect data gets interpreted as a valid header. In such a case it could be trying to decode the remaining portion of the file using this invalid data as if it were valid. I know when I looked at a flattened variant data it could be fairly easy to feed it garbage and have it misinterpret it as valid data.

If the data headers are simplistic it would be easy to misinterpret garbage as valid data. In order to avoid this the headers would need to contain information such as CRCs or checksums to validate the data in the first place. If NI is doing this then I would classify it as a bug. If it uses only simplistic data headers than at best you could request this as a feature enhancement to include data validation. Otherwise this falls into your lap to validate the data before working with it.

I can certainly see your point that you would like consist behavior but this falls into a gray area as to who must validate the data in the first place.

Link to comment

QUOTE (Jeffrey Habets @ Mar 17 2009, 09:14 AM)

I'd qualify this as buggy behaviour. What are your thoughts?

When LV is told to unflatten a string, we do our best to interpret it as the data type you claim it to be.

If you flatten an eight-byte double as a string, then tell us to unflatten that string as a 4-byte integer, we're going to read the first four bytes. On the other hand, if you flatten a double and try to unflatten it as a string, we're going to treat the first four bytes of that data as the length of the string. Since this is likely a VERY large number, we will then try to allocate an array of that size, and we often run out of memory trying to do that. So depending upon exactly what you are flattening and unflattening, you may get the more helpful "data corrupt" errors, or you may get the "out of memory" errors. Pot luck depending on how close the data matches something that is parsable.

It's not a bug -- it is LV doing exactly what you told it to do. This behavior applies regardless of the data type you're unflattening, including LV classes. And it is not unique to LabVIEW. Try renaming a random file as ".png" and then ask a paint program to open it. You'll get any number of strange behaviors.

The trick is to save your data files with a unique file extension and then restrict your users to only picking files with that extension.

Link to comment

QUOTE (Aristos Queue @ Mar 17 2009, 08:38 PM)

Thanks for your input.. After reading this and the LVClass Data Storage Format wiki article I see that this problem can occur. Untill I read the wiki article I was under the impression that classes saved more information about itself when flattened.

QUOTE (Aristos Queue @ Mar 17 2009, 08:38 PM)

The trick is to save your data files with a unique file extension and then restrict your users to only picking files with that extension.

Ofcourse, I use that always as an extra check. But I'm not a fan of just plain extension-checking because it's pretty easy to change extensions and there are likely to be other file formats with the same extensions out there. I actually ran in to this finding accidently while migrating a program's storage format from readable text to binary, while needing to keep the extension the same for both.

Link to comment

QUOTE (Aristos Queue @ Mar 17 2009, 11:38 AM)

When LV is told to unflatten a string, we do our best to interpret it as the data type you claim it to be.

String yes, file no. (in any rational world)

QUOTE (Aristos Queue)

Try renaming a random file as ".png" and then ask a paint program to open it. You'll get any number of strange behaviors.

Both my local graphics editors put up an error: "This is not a valid PNG file." (more or less). There's no reason except for bad file format design that a proprietary format can't have a header that identifies the file type and some kind of data validation scheme.

QUOTE (Aristos Queue)

The trick is to save your data files with a unique file extension and then restrict your users to only picking files with that extension.

So anyone can sabotage the system by feeding it an invalid file with the desired extension, and since a dialog will come up before it's possible to validate the file, there is no way to safely use the binary data file functions in an industrial application. Is my understanding correct?

Link to comment

QUOTE (jdunham @ Mar 17 2009, 03:20 PM)

and since a dialog will come up before it's possible to validate the file, there is no way to safely use the binary data file functions in an industrial application. Is my understanding correct?
No. The way to safely use the binary data file functions is to write down information as a header that you can recognize the correctness of the file as one of your own files.

What behavior would you want from LabVIEW? Should we secretly record some random bits that LV checks that says, "Yep, we wrote this file." That would make it mighty hard to output some specific file format -- for example, a .png file. If LV output those secret bytes in the heading of every file, you'd never be able to write a .png. Or any other format.

QUOTE

There's no reason except for bad file format design that a proprietary format can't have a header that identifies the file type and some kind of data validation scheme.

Except the LV binary prims are NOT designed to output a proprietary file format. They output the binary strings as requested by you, the user.

QUOTE

Both my local graphics editors put up an error: "This is not a valid PNG file." (more or less).

And I guarantee that I can put together a file that wouldn't. As I said, it matters how close to being a valid file it was. PNG is probably not the best example. But there are plenty of graphics formats that are packed up, that assume they need to be unpacked, and you can put data in that will make the system think it needs to unzip as a gigantic system.

Link to comment

QUOTE (Aristos Queue @ Mar 17 2009, 08:31 PM)

No. The way to safely use the binary data file functions is to write down information as a header that you can recognize the correctness of the file as one of your own files.

What behavior would you want from LabVIEW? Should we secretly record some random bits that LV checks that says, "Yep, we wrote this file." That would make it mighty hard to output some specific file format -- for example, a .png file. If LV output those secret bytes in the heading of every file, you'd never be able to write a .png. Or any other format.

Except the LV binary prims are NOT designed to output a proprietary file format. They output the binary strings as requested by you, the user.

I think we're talking about two different things. There are true binary files, which just dump string data to a file, and there are labview binary files, which used to be called datalog files, which write arbitrary labview types in the flatten-to-string format, over which I have very little control. I thought this thread was about the latter, which have never been too useful for me, since it's too easy to render the file unreadable after the data type changes. I figured that's why the versioning was added, and it was piquing my interest.

Your comments are definitely valid for true binary files, and that's what we have to use, with our own validation and metadata, since the LabVIEW formats, which could have saved us a lot of effort, were not really robust enough (and I didn't even know they could put an out of memory dialog box, that makes it that much worse.

It's a real shame, because front panel datalogging could be extremely useful, UNTIL you make any changes to your front panel. Then you can never read the data again unless you can manage to reverse engineer it. It would be a real selling point if NI could fix this, except that it's not something prospective customers know is broken.

Link to comment

QUOTE (Aristos Queue @ Mar 17 2009, 09:38 PM)

So depending upon exactly what you are flattening and unflattening, you may get the more helpful "data corrupt" errors, or you may get the "out of memory" errors. Pot luck depending on how close the data matches something that is parsable.

Jeffrey said he didn't get the out of memory error when using the unflatten primitive. Assuming he tried it on the same file (did you, Jeffrey?) shouldn't the binary file primitives do the same?

Link to comment

QUOTE (Yair @ Mar 18 2009, 07:28 PM)

Jeffrey said he didn't get the out of memory error when using the unflatten primitive. Assuming he tried it on the same file (did you, Jeffrey?) shouldn't the binary file primitives do the same?

Yes, I did try it on the same file.. And I also was under the assumption that the binary primitives did basicaly something like reading the file and then unflattening. But reading AQ's response I realize that it isn't that simple in all cases.

When I read the file as a string and offer that string to the unflatten primitive, LV already has a lot more information (namely type and size) then when it should read the information straight from the file. So in this case LV knows how much memory to alocate.

Looking at the nature of a flattened class (and give the fact that my class data is not of variable size (e.g. no arrays or strings in it)), and seeing that the first 4 bytes determine the number of inheritance hierarchies my guess is that this is where LabVIEW chokes on. This number should be 1, since my class has no ancestors, but in my particular test case where I try to read a text file as binary, this number was very high so LabVIEW probably tried allocating memory for a couple of million class data clusters and this obviously results in an out of memory error.

Link to comment

QUOTE (jdunham @ Mar 17 2009, 09:47 PM)

Your comments are definitely valid for true binary files, and that's what we have to use, with our own validation and metadata, since the LabVIEW formats, which could have saved us a lot of effort, were not really robust enough (and I didn't even know they could put an out of memory dialog box, that makes it that much worse.

OK, this has been bugging me. I concede that the unflatten from file functions probably shouldn't have a bunch of validation info stuffed into the file.

But when you unflatten from a string OR from a file, and the memory manager fails, couldn't we get that through the error out wire rather than through a dialog box? AQ?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.