OpenG Zip unable to detect file corruption

Mads · March 12, 2019

The decompression in OpenG Zip does not seem to verify the checksums of the containing files.

Having changed a few bytes in the middle of various zip files I will get an error from other tools (tested on Windows and Linux RT), but OpenG Zip will churn on it as if nothing is wrong. In one case the change even caused the decompression to generate a file that was hundred times bigger than the true content.

Is there a quick fix to this, or would the entire library have to be updated (new dll etc) to get such functionality?

ShaunR · March 12, 2019

I don't see a check in there at all.

It's not much of a change in the bowels of the LabVIEW code, but does require a new CLFN to call the CRC which fortunately does exist in the DLL.

Mads · March 14, 2019

I don't see a check in there at all.

It's not much of a change in the bowels of the LabVIEW code, but does require a new CLFN to call the CRC which fortunately does exist in the DLL.

Thanks for the feedback.

On the project I'm working on I decided to just add a header to the zip files, I am managing the transfer of the files and can just strip off that header anyway.

At first I wrote code to parse out the headers of the zip archive and grab the necessary checksums, but dropped that approach when I saw that the file checksums are calculated on the uncompressed data , meaning I would need to decompress them first just to figure out if the data was wrong. A better alternative would be the central directory checksum I guess...but I went for an even easier solution in this case as the added header gave me some extra benefits.

It would be nice to have file verification in the OpenG Zip library someday though.

ShaunR · March 14, 2019

Thanks for the feedback.

On the project I'm working on I decided to just add a header to the zip files, I am managing the transfer of the files and can just strip off that header anyway.

At first I wrote code to parse out the headers of the zip archive and grab the necessary checksums, but dropped that approach when I saw that the file checksums are calculated on the uncompressed data , meaning I would need to decompress them first just to figure out if the data was wrong. A better alternative would be the central directory checksum I guess...but I went for an even easier solution in this case as the added header gave me some extra benefits.

It would be nice to have file verification in the OpenG Zip library someday though.

Well. Rold is particularly thorough, so In the interim. If it's windows only, you could try

Zlib Library for LabVIEW 1.1.0

It has a compatible VI with the native LabVIEW but I didn't notice the conpane for the OpenG ones.

I'd be interseted if it passed your tests too.

Edited March 14, 2019 by ShaunR

Mads · March 14, 2019

The project is on an sbRIO running Linux RT, that is partially why I preferred using the OpenG library.

(The device delivering the zip-files to the sbRIO gives it an *extremely* short time to reply on whether the data is OK or not, so eliminating slow file operations is a must...
With the correct checksum in the added header, I now run the crc32 calculation continuously on the incoming data, which enables me to verify the transfer instantly.🙂. A file size in the header also allows me to preallocate the file space up front - or deny the transfer at startup if there is not enough space for it anyway👍)

ShaunR · March 14, 2019

With the correct checksum in the added header, I now run the crc32 calculation continuously on the incoming data, which enables me to verify the transfer instantly.🙂. A file size in the header also allows me to preallocate the file space up front - or deny the transfer at startup if there is not enough space for it anyway👍)

That only tells you if the bytestream has been modified in transit rather than if there is a corruption in the archive itself.

Mads · March 15, 2019

The checksum (well, CRC to be correct) will be generated by the same software that generates the archive in this case - and is then run through tests locally to ensure it is OK.
So I feel confident in trusting the content from that point onwards if the CRC is OK, and the structure of the content is recognisable.

It is the transfer in this case that is highly exposed to corruption... (involves several weak protocols and complex layers which I cannot change, - or at least not all of them at this stage)😲

Edited March 15, 2019 by Mads

ShaunR · March 18, 2019

The checksum (well, CRC to be correct) will be generated by the same software that generates the archive in this case - and is then run through tests locally to ensure it is OK.
So I feel confident in trusting the content from that point onwards if the CRC is OK, and the structure of the content is recognisable.

It is the transfer in this case that is highly exposed to corruption... (involves several weak protocols and complex layers which I cannot change, - or at least not all of them at this stage)😲

Surely it would have been easier to just add the CRC check in the openG.

Rolf Kalbermatter · March 22, 2019

I'm not sure what is happening exactly but the crc32 is calculated wen extracting the data with unzReadCurrentFile() and then checked when closing the file entry with unzCloseCurrentFile(). If the crc32 doesn't match, this function should return UNZ_CRCERROR (-105). The only time this check is not done is if you do raw extraction, but the OpenG ZIP library only uses that when deleting a file entry from an archive as it needs to create for that a new archive and instead of inflating each of the non deleted file first and then deflating it again, which would require the password if a file entry is password encoded, it simply retrieves the raw data stream and copies it over into the new archive, without unzipping,decrypting and then encrypting/zipping it again.

Edited March 22, 2019 by Rolf Kalbermatter

ShaunR · March 23, 2019

I'm not sure what is happening exactly but the crc32 is calculated wen extracting the data with unzReadCurrentFile() and then checked when closing the file entry with unzCloseCurrentFile().

Is that being done inside the DLL functions, rather than in the LabVIEW code? I don't see any checks in the LabVIEW code. Nor do I see the CRC being passed to a DLL function.

Edited March 23, 2019 by ShaunR

Rolf Kalbermatter · March 24, 2019

Yes that is inside the unzip.c code from the minizip program, so there should be no need to do that again in the caller.

Will check how this code executes, as there are conditionals for the execution of this with raw format being one exception as it can’t be checked at that point and password protected entries have a different code path as the crc is also used as encryption seed.

ShaunR · March 24, 2019

Yes that is inside the unzip.c code from the minizip program, so there should be no need to do that again in the caller.

Will check how this code executes, as there are conditionals for the execution of this with raw format being one exception as it can’t be checked at that point and password protected entries have a different code path as the crc is also used as encryption seed.

Indeed. In the vanilla unzip.c; (minizip 1.2) the CRC is only used for the password so that files can be extacted. I had to calculate the CRC myself during extraction for integrity.

Sign In

OpenG Zip unable to detect file corruption

Recommended Posts

Mads

ShaunR

Mads

ShaunR

Mads

ShaunR

Mads

ShaunR

Rolf Kalbermatter

ShaunR

Rolf Kalbermatter

ShaunR

Join the conversation

Browse

Activity

Important Information