Text descriptions vs Code page

Mefistotelis · December 2, 2019

On LabView 2014, I see that my VI files happily store strings using current OS (testing on Windows) code page.

I assume there's no conversion mechanism, and if I use the VI on another OS, or just different language Windows, I will get unreadable mish-mash instead of the proper text?

Is that fixed in LV 2019, or do NI still avoid UTF?

I realize this is not an issue, or at least minimal issue, when you describe things in english and avoid using ascii codes > 127. (though note that even original LV libraries contain OS-specific char codes, mostly a special variants of apostrophe and minus/pause: ` and -.

I though the world has finished moving to unicode more than 15 years ago...

EDIT: Verified on Windows with different language. I do get invalid characters. There is no conversion, LabView just tries to print the characters using current Windows code page, resulting in garbage. How this is still an issue today?

I need to provide the ability to set code page as input parameter to my VI reader.

EDIT2: Now i know everything I need - see "Character Code Issues" chapter:

http://zone.ni.com/reference/en-XX/help/372614J-01/lvconcepts/porting_among_platforms/

It's funny how lack of proper string conversion is called "compatibility feature". Reading the chapter gives you a feeling that LabView supports the conversion whenever possible, but it doesn't seem to be the reality.

Edited December 2, 2019 by Mefistotelis

JKSH · December 3, 2019

19 hours ago, Mefistotelis said:

Is that fixed in LV 2019, or do NI still avoid UTF

LabVIEW NXG uses UTF-8 for all text strings.

I think classic LabVIEW (version 20xx) will unlikely ever get full Unicode support.

Rolf Kalbermatter · December 4, 2019

No Classic LabVIEW doesn't and it never will. It assumes a string to be in whatever encoding the current user session has. That's for most LabVIEW installations out there codepage 1252 (over 90% of LabVIEW installations run on Windows and most of them on Western Windows installations).

When LabVIEW classic was developed (around end of the 80ies of the last century codepages was the best thing out there that could be used for different installations and Unicode didn't even exist. The first Unicode proposal is from 1988 and proposed a 16 bit Unicode alphabet. Microsoft was in fact an early adaptor and implemented it for its Windows NT system as 16 bit encoding based on this standard. Only in 1996 was Unicode 2.0 released which extended the Unicode character space to 21 bits. LabVIEW does support so called multibyte character encodings as used for many Asian codepages and on systems like Linux where nowadays UTF-8 (in principle also simply a multibyte encoding) is the standard user encoding it supports that too as this is transparent in the underlaying C runtime. Windows doesn't let you set your ANSI codepage to UTF-8 however, otherwise LabVIEW would use that too (although I would expect that there could be some artefacts somewhere from assumptions LabVIEW does when calling certain Windows APIs that might not match how Microsoft would have implemented the UTF-8 emulation for its ANSI codepage.

By the time the Unicode standard was mature and the various implementations on the different platforms were more or less working LabVIEW's 8-bit character encoding based on the standard encoding was so deeply engrained that full support for Unicode had turned into a major project of its own. There were several internal projects to work towards that which eventually turned into a normally hidden Unicode feature that can be turned on through an INI token. The big problem with that was that the necessary changes touched just about every code in LabVIEW somehow and hence this Unicode feature is not always producing consistent results for every code path. Also there are many unsolved issues where the internal LabVIEW strings need to connect to external interfaces. Most instruments for instance won't understand UTF-8 in any way although that problem is one of the smaller ones as the used character set is usually strictly limited to ASCII 7-bit and there the UTF-8 standard is basically byte for byte compatible.

So you can dig up the INI key and turn Unicode in LabVIEW on. It will give extra properties for all control elements to set them to use Unicode text interpretation for almost all text (sub)elements instead but the support doesn't for instance extend to paths and many other internal facilities unless the underlaying encoding is already set to UTF-8. Also strings in VIs while stored as UTF-8 are not flagged as such as non Unicode enabled LabVIEW versions couldn't read them, creating the same problem you have with VIs stored on a non Western codepage system and then trying to read them on a system with a different encoding.

If Unicode support is an important feature for you, you will want to start to use LabVIEW NXG. And exactly because of the existence of LabVIEW NXG there will be no effort put in LabVIEW Classic to improve its Unicode support. To make it really work you would have to rewrite large parts of the LabVIEW code base substantially and that is exactly what one of the tasks for LabVIEW NXG was about.

Mefistotelis · December 4, 2019

Great overview, thank you.

What LabVIEW could have done is just storing the information about code page used for the VI. That would then allow conversion when the sting is displayed on screen, if necessary. But since there's no info on codepage within the VI, I implemented it as a parameter (the tool I made exports VI to XML and within the XML the encoding is UTF-8) :

https://github.com/mefistotelis/pylabview/blob/master/README.md#text-code-pages

5 hours ago, Rolf Kalbermatter said:

over 90% of LabVIEW installations run on Windows and most of them on Western Windows installations

I think if you don't live in western country, you might have different view on that. Even if the 90% statistics is true (it sounds iffy to me, but I have no data to subvert it), people from other countries probably mostly see VIs created near them.

I've seen a lot of VIs with asian language, probably Chinese. I can't even read the alphabet, so can't tell for sure which code page is there. Sometimes when there's longer description, I can just check with Google translate; but there are often 1-2 words only, ie. in names of interfaces. The translator then either gives something plausible for many code pages, or non-technical term for all of them. When I get connector name and see "stick noodles" for Chinese translation (w/ their code page), and "fluffy neckcloth" for Japanese (w/ their code page), I still can't tell the origin of that file. Nor what the connector does. (I never got these specific words, but what I got wasn't very far).

Anyway, there's a lot of people in China. And a lot of factories, which do semi-automated Quality Assurance. Such tasks might be often handled with help of LabVIEW.

Rolf Kalbermatter · December 5, 2019

22 hours ago, Mefistotelis said:

Great overview, thank you.

What LabVIEW could have done is just storing the information about code page used for the VI. That would then allow conversion when the sting is displayed on screen, if necessary. But since there's no info on codepage within the VI, I implemented it as a parameter (the tool I made exports VI to XML and within the XML the encoding is UTF-8) :

That's not as easy even if you leave away other platforms than Windows. In old days Windows did not have support preinstalled for all possible codepages and I'm not sure it does even nowadays. Without the necessary translation tables it doesn't help if you know what codepage text is stored in so translation into something else is not guaranteed to work. Also the codepage support as implemented in Windows does not allow you to display text in a different codepage than what is currently active and even if you could switch the current codepage on the fly all text previously printed on screen in another codepage would suddenly look pretty crazy. While Microsoft had support for Unicode initially only for the Windows NT platform, (which wasn't initially supported by LabVIEW at all) they only added a Unicode shim to the Windows 9x versions (which were 32 bit like Windows NT but with a somewhat Windows 3.1 compatible 16/32 bit kernel around 2000 by a special Library called Unicows (Probably for Unicode for Windows Subsystem) that you could install. Before that Unicode was not even available on Windows 95. 98 and ME, which was the majority of platforms LabVIEW was used on after 3.1 was kind of dieing. LabVIEW on Windows NT was hardly used despite that LAbVIEW was technically the same binary than for the Windows 9x versions. But the hardware drivers needed were completely different and most manufacturers other than NI were very slow to start supporting their hardware for Windows NT. Windows 2000 was the first NT version that saw a little LabVIEW use and Windows XP was the version where most users definitely abandoned Windows 9x and ME for measurement and industrial applications.

That only would have worked if LabVIEW for Windows would use internally everywhere the UTF-16 API, which is the only Windows API that allows to display any text on screen independent of codepage support, and this was exactly one of the difficult parts to get changed in LabVIEW. LabVIEW is not a simple notepad editor where you can switch the compile define UNICODE to be defined and suddenly everything is using the Unicode APIs. There are deeply ingrained assumptions that entered the code base in the initial porting effort that was using 32-bit DOS extended Watcom C to target the 16-bit Windows 3.1 system that only had codepage support and no Unicode API whatsover and neither had the parallel Unix port for the Sun OS, which was technically Unix SRV4 but with many special Sun modifications, adaptions and special borks built in. It still allowed eventually to release a Linux version of LabVIEW without having to write an entirely new platform layer but even Linux didn't have working Unicode code support initially. It took many years before that was sort of standard available in Linux distributions and many more years before it was stable enough that Linux distributions started to use UTF-8 as standard encoding rather than the C runtime locals so nicely appreaviated with EN-en and similar which had no direct mapping to codepages at all.

But Unix while not having any substantial Unicode support for a long time eventually went a completely different path to support Unicode than what Microsoft had done. And the Mac port only learned to have useful Unicode support after Apple eventually switched to their BSD based MacOS X. And neither of them really knew anything about codepages at all so a VI written on Windows and stored with the actual codepage inside would have been just as unintelligent for those non-Windows LabVIEW versions as it is now. Also in true Unix (Linux) way they couldn't of course agree on one implementation for a conversion API between different encodings but there were multiple competing ones such as ICU and several others. Eventually the libc also implemented some limited conversion facility although it does not allow you to convert between arbitrary encodings but only between widechar (usually 32-bit Unicode) and the currently active C locale. Sure you can change the current C locale in your code but that is process global so it also affects how libc will treat text in other parts of your program which can be a pretty bad thing in multithreading environments.

Basically your proposed codepage storing wouldn't work at all for non-Windows platforms and even under Windows only has and certainly had in the past very limited merit. You reasoning is just as limited as the original choice of NI was when they had to come up with a way to implement LabVIEW with what was available then. Nowadays the choice is obvious and UTF-8 is THE standard to transfer text across platforms and over the whole world but UTF-8 only got a viable and used feature (and because it was used also a tested, tried and many times patched one to work as the standard had intended it) in the last 10 to 15 years. At that time NI was starting to work on a rewrite of LabVIEW which eventually turned into LabVIEW NXG.

Edited December 5, 2019 by Rolf Kalbermatter

Mefistotelis · December 5, 2019

I don't think I can be completely convinced to your point: I agree refactoring LV doesn't make sense now, but I think something should've been done years ago to allow the support.

Even if no conversion was done at a time, as soon as multi-lingual versions of Windows and Mac OS started popping out, it was obvious the conversion will be an issue. I'm not talking there should be a conversion right away, just that even then, it was obvious that storing just the information which code page is in use, would be a prudent choice.

Now for really implementing the conversion: It wouldn't be needed for the OS to support anything - `libconv` can be compiled even in Watcom C (I'm not stating libconv should have been used, only stating that doing codepage conversion even in DOS was not an issue). Around 1994, I wrote a simple code page conversion routine myself, in Turbo Pascal. Not for unicode, it converted directly between a few code pages, with a few simple translation tables. It also had a function to convert to pure ASCII - replace every national character with the closest english symbol (or group of symbols). That would be good enough support for pre-unicode OSes - it wasn't really necessary to support all unicode characters, only to allow portability between platforms which LabVIEW supported. Finally, I don't think LabVIEW uses native controls (buttons etc.) from the OS - it treats the native window as a canvas, and draws its own controls. So support on of multi-lingual text in controls is not bound to the OS in any way.

For implementation details within LabVIEW: That would be more tricky, and I understand possible issues with that. LabVIEW operates on binary data from various sources, and if the VI tells it to print a string, it doesn't keep track whether that string came from the VI and has known code page, or come from a serial port with a device talking in different encoding. There are still ways to solve such issues, just not completely transparent for the user. Plus, most string used in user interface are not really changing at runtime.

I didn't actually knew that LabVIEW is considered "classic" version and is being replaced by NXG. That is strong argument against any refactoring of the old code.

The conversion I introduced to my extractor works well, so this shouldn't be much of an issue for me.

Rolf Kalbermatter · December 5, 2019

1 hour ago, Mefistotelis said:

For implementation details within LabVIEW: That would be more tricky, and I understand possible issues with that. LabVIEW operates on binary data from various sources, and if the VI tells it to print a string, it doesn't keep track whether that string came from the VI and has known code page, or come from a serial port with a device talking in different encoding. There are still ways to solve such issues, just not completely transparent for the user. Plus, most string used in user interface are not really changing at runtime.

Huuu? If that was like this you wouldn't need a string control, a text table, and just about any other control except booleans and numerics has some form of text somehwere by default, or propertys to change the caption of controls, axis labels, etc. etc. In that case adding Unicode support would indeed have been a lot easier. But the reality is very different!

Also your conversion tool might have been pretty cool, but converting between a few codepages wouldn't have made a difference. If you have a text that comes from one codepage you are bound to have characters that you simply can't convert into other codepages so what to do about that? LabVIEW for instances links dynamically to files based on file names. How to deal with that? The OS does one type of conversion depending on the involved underlaying filesystems and a few other parameters and there is only no loss if both filesystems support fully Unicode and any transfer method between the two is fully Unicode transparent. That certainly wasn't always true even a few years back. Then the Unicode (if it was even fully preserved) is translated back to whatever codepage the user has set his system to for applications like LabVIEW which use the ASCI API. Windows helpfully translates every character it can't represent in that codepage into a question mark except that that is officially not allowed to be used in path names. LabVIEW stores filenames in its VIs and then if LabVIEW would use a self cooked conversion it would be bound to have some different conversions than what Windows or your Linux system might come up with. Even the Windows Unicode translation tables contained and still contain diversions from the official Unicode standard. They are not fully transparent when compared for instance to implementations like ICU or libconv. And they probably never will completely be because Microsoft is bound to legacy compatibility just as much and changing things now would burn some big customers. And that is just the problem of filenames. There are many many more such areas where there are no really clean solutions for. In many cases no solution is better than a halfbacked one that might make you feel safe only to let you fall badly on your nose.

The only fairly safe solution is to go completely Unicode. Any other solution falls either immediately flat on its nose (e.g. codepage translation) or has been superseeded by Unicode and is not maintained anymore. That's the reality. And just for fun even Unicode can be tricky when it comes to collation for instance. Comparing strings just on codepoints is for instance a sure way to fail as you have so called non-forwarding codepoints that combined with other codepoints can form a character. Except that Unicode for many of these characters also contains single codepoints. Looking at the binary representation the strings surely look different, but logically they are not! I'm not even sure Windows uses any collation when trying to locate filenames. If it doesn't it might be unable to find a file based on a path name eventhough the name stored on disk and visible for instance in Explorer looks textually exactly the same than the name you passed to the file API. But without proper collation they simply are not and you would get a File Not Found error! WTF the file is right there in Explorer!

As to solving encoding when interfacing to external interfaces (instrument control, network, file IO, etc etc) there are indeed solution to that, by specifying an encoding at these interfaces. But I haven't seen one that really convinced me to be easy to use. Java (and even .Net which was initially just a not-a-Sun version of Java from Microsoft) for instance uses a string to indicate the encoding to use but that string has been traditionally not very well defined and there are various variants that mean basically the same but look very different and the actual real support that comes standard with Java is pretty limited since it has to work on many different platforms that might have very little to no native support for this. .Net has since become a lot more support but that hasn't made it simpler to use.

And yes the fact that LabVIEW used to be multiplatform didn't make this whole business any easier to deal with. While you could sell to customers that ActiveX and .Net simply was technically impossible on other platforms than Windows, that would't have fared well with things like Unicode support and many other things. Yet the underlaying interfaces are very different on the different platforms and in some cases even conceptually different.

Edited December 5, 2019 by Rolf Kalbermatter

ShaunR · December 6, 2019

3 hours ago, Mefistotelis said:

it doesn't keep track whether that string came from the VI and has known code page, or come from a serial port with a device talking in different encoding

We can cope with that. In fact. We can cope with UTF 8 everywhere except the front panels.

UTF8 LV80.vi

Sign In

Text descriptions vs Code page

Recommended Posts

Mefistotelis

JKSH

Rolf Kalbermatter

Mefistotelis

Rolf Kalbermatter

Mefistotelis

Rolf Kalbermatter

ShaunR

Join the conversation

Browse

Activity

Important Information