Can we prevent "ZLIB Compress Dir" from replacing accented characters?

Antoine Chalons · February 1, 2013

Hi all,

I'm using OpenG ZLIB Compress Directory.vi to turn a folder (and its subfolder) into a ZIP file.

When some subfolders have characters like é in their name, into the zip file those chars become something else (Ù), is there anything we can do to prevent this?

I guess the reason for this "issue" - or limitation - is that zlib library uses ANSI strings and does not support multibyte or unicode strings, fine. My question is is there a way around that?

LabVIEW zip VIs probably use zlib too because they give the same result

hooovahh · February 1, 2013

Do you want the crappy solution that works?

Attached is a VI that will compress a folder into a zip file an it preserves characters like the one you posted in your post.

It uses 7-Zip (woot!). It's just a command line call to add an archive. 7-Zip also provides a DLL but this was the quickest way to give you a solution. I embedded the 7zip EXE in the VI (a little shady I know) just so it is self contained. At one point I thought about wrapping all of the 7-zip commands into LabVIEW VIs but how do I need the added functionality that the LabVIEW native solution or the OpenG solution doesn't have.

7-Zip Zip Folder.vi

Antoine Chalons · February 1, 2013

Do you want the crappy solution that works?

I do, It's for internal use!

Thanks alot for that, I had no idea we could embed an EXE like that, neat!

:beer_mug:

hooovahh · February 1, 2013

Thanks alot for that, I had no idea we could embed an EXE like that, neat!

Yeah, I've also tried it with VIs as well (not as useful but I have a use case). From a users perspective I can see why this is extremely shady. I can have a VI that has an EXE that is a keylogger, or a virus, or whatever (I didn't just trust me you can MD5 the EXE). But it is more encapsulated so I don't need to make sure the EXE is in the same directory as the VI, and is included when making a LabVIEW EXE.

@neil I guess you're right about the DLL example. In either case I don't do it much.

Edited February 1, 2013 by hooovahh

Rolf Kalbermatter · February 4, 2013

Hi all,

I'm using OpenG ZLIB Compress Directory.vi to turn a folder (and its subfolder) into a ZIP file.

When some subfolders have characters like é in their name, into the zip file those chars become something else (Ù), is there anything we can do to prevent this?

I guess the reason for this "issue" - or limitation - is that zlib library uses ANSI strings and does not support multibyte or unicode strings, fine. My question is is there a way around that?

LabVIEW zip VIs probably use zlib too because they give the same result

Indeed, the zlib library or more generally the zip addition to it do not use mbcs functions. And that has a good reason, as the maintainers of that library want to make the library compilable on as many systems as possible including embedded targets. Those often don't have mbcs support at all.

However I'll be having a look at it, since the main part of the naming generation is actually done on the LabVIEW diagram, and that may be in fact more relevant here than anything inside zlib. There might be a relatively small fix to the LabVIEW diagram itself or the thin zlib wrapper that could allow for MBCS names in the zip archive.

Antoine Chalons · February 4, 2013

Indeed, the zlib library or more generally the zip addition to it do not use mbcs functions. And that has a good reason, as the maintainers of that library want to make the library compilable on as many systems as possible including embedded targets. Those often don't have mbcs support at all.

I understand the cross-platform concerns.

However I'll be having a look at it, since the main part of the naming generation is actually done on the LabVIEW diagram, and that may be in fact more relevant here than anything inside zlib. There might be a relatively small fix to the LabVIEW diagram itself or the thin zlib wrapper that could allow for MBCS names in the zip archive.

Sounds like a good news.

No urge though now that I have a work-around.

ShaunR · February 4, 2013

Indeed, the zlib library or more generally the zip addition to it do not use mbcs functions. And that has a good reason, as the maintainers of that library want to make the library compilable on as many systems as possible including embedded targets. Those often don't have mbcs support at all.

However I'll be having a look at it, since the main part of the naming generation is actually done on the LabVIEW diagram, and that may be in fact more relevant here than anything inside zlib. There might be a relatively small fix to the LabVIEW diagram itself or the thin zlib wrapper that could allow for MBCS names in the zip archive.

And 64 bit support?

Rolf Kalbermatter · February 4, 2013

I understand the cross-platform concerns.
Sounds like a good news.

No urge though now that I have a work-around.

The issue is rather complicated. I can fairly easily add support for filenames in whatever codepage your Windows system uses as default OEM codepage currently (which is how ZIP file names are supposed to be stored while LabVIEW uses the ACP itself), but there is no simple way to allow support for arbitrarily named files not currently displayable in that codepage. Those files can be correctly seen on modern Windows systems with NTFS filesystem since the filenames get stored as UTF-16 there, but LabVIEW's file functions still are 8 bit codepage based. If you try to open a file in LabVIEW containing characters not currently displayable in the current system codepage, LabVIEW fails fatally since it can not reference such a file at all.

So in order to allow LVZIP to compress a directory containing such files into a ZIP file and vice versa, the entire directory enumeration and such would need to be done outside of LabVIEW in the C code in order to allow using the UTF filename feature in ZIP files. But adding an entire ZIP/UNZIP utility to the C code of LVZIP seems a bit like overkill to me.

So the question is if it is enough to support foreign characters for the system the file was created with, and an optional setting to force Unicode filenames in the archive. But if you try to archive or unarchive files with characters in the name that can't be displayed by the current Windows codepage, then LabVIEW itself would catastrophally fail when I pass those names to the LabVIEW file IO functions.

Also note that the same applies probably for Mac too, and Linux I don't even have an idea yet how to solve this. For the cRIO and Pharlap systems it most likely is not even an option.

It's to bad that the LabVIEW developers didn't change the internal File IO API to use Unicode IO functions and extend the Path variable to support Unicode internally. Being a private datatype anyway there would be very little issues with backwards compatibility since whoever has relied on internal details of the Path datastructure would have been going out on his limb already.

Rolf Kalbermatter · February 11, 2013

So I've been fighting a bit over the weekend with this and came across a multitude of issues.

The first one is that most ZIP utilities at least on Windows, seem to use the OEM codepage to store ASCI information in the ZIP archive, where as LabVIEW as a true GUI application uses of course the default (ASCI codepage). Both are set depending on the language setting in the International Settings control panel but are usually totally different codepages, with similar character glyphs but typically at entirely different code positions. In addition ZIP files have a feature to store the file name as UTF8 string in the archive directory. So far so good.

Implementing the correct translation from the LabVIEW ASCI codepage to the OEM codepage and back is fairly trivial on Windows, a bit more complicated on MacOSX and only with limited accuracy since the Mac uses traditionally somewhat different character translation tables than Windows. On Linux it is a complete impossibility without linking to external libraries like iconv, which might or might not be available on particular Linux distributions!

So I'm a bit in a limbo here how to go about this, because adding an entire codepage translater into LVZIP for non Windows targets seems like a rather bad overkill.

While investigating this I also found another issue entirely independent of LVZIP. Suppose you have a file on your disk with a filename that contains characters not present in the current ASCI codepage of your Windows system! There seems absolutely no way to access this file from within LabVIEW since the LabVIEW path uses internally MultibyteCharacters based on the current ASCI codepage, and if a filename contains characters not present in the current ASCI codepage the LabVIEW path will not be able to represent this filename at all.

In case you wonder why such filenames could even exist: unless you use an old FAT file format on your Windows system the filenames are really stored in UTF-16 in the filesystem and Windows Explorer is fully Unicode compliant, so those files happily can exist on the disk and get displayed by Explorer, but not accessed by LabVIEW.

And in case you wonder if this is an issue in non Windows systems: On Linux definitely not nowadays since all modern Linux systems use UTF-8 as encoding and it seems LabVIEW also uses whatever is the default Multibyte encoding on the OS, which would be UTF-8 in those cases. For MacOSX I'm not entirely sure since there are about umtien different possible APIs to access the filesystem, depending if you go Carbon, Cocoa, Posix or any mix of it, each of them has its own particular limits and specialties.

I really wish they would have made the Path format use UTF-16 internally on Windows long ago and avoid such problems altogether, possibly translating the path to a multibyte encoding when needing to flatten a path in order to keep the Flattened format consistent. But at least all existing filepaths on the disk would be valid then within an application. As it is now, the flattened path isn't really standardized in any way anyhow, as it is flattened to whatever local multibyte setting the OS is configured for, on Windows that's one of the local codepages while on Linux and possibly Mac that's UTF-8 nowadays. So passing a Path through VI server between different LabVIEW installations will run into problems already between different platforms and even between Windows versions using different country locales. Making it all consistently UTF-8 in a flattened format would not really make this worse but rather improve the situation, with one single drawback: Flattened paths on Windows systems stored in older versions of LabVIEW would not automatically be compatible with LabVIEW versions using UTF-8 for flattened paths.

Basically I would like to know two things here:

1) First what is the feeling about support for translation of the filename strings on non-Windows systems? Is that important and how much effort is it worth? Consider that support for such translation on embedded targets like VxWorks would be absolutely only possible with the addition of a codepage translater to LVZIP.

2) Has anyone run into trying to access filenames containing characters that the current Windows multibyte table did not support and if so what solution did you choose?

Sign In

Can we prevent "ZLIB Compress Dir" from replacing accented characters?

Recommended Posts

Antoine Chalons

hooovahh

Antoine Chalons

hooovahh

Rolf Kalbermatter

Antoine Chalons

ShaunR

Rolf Kalbermatter

Rolf Kalbermatter

Join the conversation

Browse

Activity

Important Information