Jump to content

OpenG Zip unable to detect file corruption


Recommended Posts

The decompression in OpenG Zip does not seem to verify the checksums of the containing files.

Having changed a few bytes in the middle of various zip files I will get an error from other tools (tested on Windows and Linux RT), but OpenG Zip will churn on it as if nothing is wrong. In one case the change even caused the decompression to generate a file that was hundred times bigger than the true content.

Is there a quick fix to this, or would the entire library have to be updated (new dll etc) to get such functionality?

Link to comment
 

I don't see a check in there at all.

It's not much of a change in the bowels of the LabVIEW code, but does require a new CLFN to call the CRC which fortunately does exist in the DLL.

Thanks for the feedback.

On the project I'm working on I decided to just add a header to the zip files, I am managing the transfer of the files and can just strip off that header anyway.

At first I wrote code to parse out the headers of the zip archive and grab the necessary checksums, but dropped that approach when I saw that the file checksums are calculated on the uncompressed data , meaning I would need to decompress them first just to figure out if the data was wrong. A better alternative would be the central directory checksum I guess...but I went for an even easier solution in this case as the added header gave me some extra benefits.

It would be nice to have file verification in the OpenG Zip library someday though.

Link to comment
 

Thanks for the feedback.

On the project I'm working on I decided to just add a header to the zip files, I am managing the transfer of the files and can just strip off that header anyway.

At first I wrote code to parse out the headers of the zip archive and grab the necessary checksums, but dropped that approach when I saw that the file checksums are calculated on the uncompressed data , meaning I would need to decompress them first just to figure out if the data was wrong. A better alternative would be the central directory checksum I guess...but I went for an even easier solution in this case as the added header gave me some extra benefits.

It would be nice to have file verification in the OpenG Zip library someday though.

Well. Rold  is particularly thorough, so In the interim. If it's windows only, you could try 

Zlib Library for LabVIEW 1.1.0

It has a compatible VI with the native LabVIEW but  I didn't notice the conpane for the OpenG ones.

I'd be interseted if it passed your tests too.

Edited by ShaunR
Link to comment

The project is on an sbRIO running Linux RT, that is partially why I preferred using the OpenG library. 

(The device delivering the zip-files to the sbRIO gives it an *extremely* short time to reply on whether the data is OK or not, so eliminating slow file operations is a must...
With the correct checksum in the added header, I now run the crc32 calculation continuously on the incoming data,  which enables me to verify the transfer instantly.🙂. A file size in the header also allows me to preallocate the file space up front - or deny the transfer at startup if there is not enough space for it anyway👍)

Link to comment
 

With the correct checksum in the added header, I now run the crc32 calculation continuously on the incoming data,  which enables me to verify the transfer instantly.🙂. A file size in the header also allows me to preallocate the file space up front - or deny the transfer at startup if there is not enough space for it anyway👍)

That only tells you if the bytestream has been modified in transit rather than if there is a corruption in the archive itself. 

Link to comment

The checksum (well, CRC to be correct) will be generated by the same software that generates the archive in this case - and is then run through tests locally to ensure it is OK.
So I feel confident in trusting the content from that point onwards if the CRC is OK, and the structure of the content is recognisable.

It is the transfer in this case that is highly exposed to corruption... (involves several weak protocols and complex layers which I cannot change, - or at least not all of them at this stage)😲 

Edited by Mads
Link to comment
 

The checksum (well, CRC to be correct) will be generated by the same software that generates the archive in this case - and is then run through tests locally to ensure it is OK.
So I feel confident in trusting the content from that point onwards if the CRC is OK, and the structure of the content is recognisable.

It is the transfer in this case that is highly exposed to corruption... (involves several weak protocols and complex layers which I cannot change, - or at least not all of them at this stage)😲 

Surely it would have been easier to just add the CRC check in the openG.

Link to comment

I'm not sure what is happening exactly but the crc32 is calculated wen extracting the data with unzReadCurrentFile() and then checked when closing the file entry with unzCloseCurrentFile(). If the crc32 doesn't match, this function should return UNZ_CRCERROR (-105). The only time this check is not done is if you do raw extraction, but the OpenG ZIP library only uses that when deleting a file entry from an archive as it needs to create for that a new archive and instead of inflating each of the non deleted file first and then deflating it again, which would require the password if a file entry is password encoded, it simply retrieves the raw data stream and copies it over into the new archive, without unzipping,decrypting and then encrypting/zipping it again. 

Edited by Rolf Kalbermatter
Link to comment
 

I'm not sure what is happening exactly but the crc32 is calculated wen extracting the data with unzReadCurrentFile() and then checked when closing the file entry with unzCloseCurrentFile(). 

Is that being done inside the DLL functions, rather than in the LabVIEW code? I don't see any checks in the LabVIEW code. Nor do I see the CRC being passed to a DLL function.

Edited by ShaunR
Link to comment

Yes that is inside the unzip.c code from the minizip program, so there should be no need to do that again in the caller.

Will check how this code executes, as there are conditionals for the execution of this with raw format being one exception as it can’t be checked at that point and password protected entries have a different code path as the crc is also used as encryption seed.

Link to comment
 

Yes that is inside the unzip.c code from the minizip program, so there should be no need to do that again in the caller.

Will check how this code executes, as there are conditionals for the execution of this with raw format being one exception as it can’t be checked at that point and password protected entries have a different code path as the crc is also used as encryption seed.

Indeed. In the vanilla unzip.c; (minizip 1.2) the CRC is only used for the password so that files can be extacted. I had to calculate the CRC myself during extraction for integrity.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.