Jump to content

To Upper Case is polymorphic


Recommended Posts

:blink: Wow, that's a new one for me... Could not belief at first, but it does work. Also just tested with a cluster containing strings and it works too (only if the cluster contains string and numeric types, but hey)!

I've never used it for anything other than string or array of string up till now.

Thanks for sharing Neil!

Link to comment

Just been playing around with this, works with To Lower Case too.

 

Not quite sure where this would be used? According to LabVIEW help:

 

To Upper Case Details

If string is a numeric value or an array of numeric values, each number is evaluated as an ASCII value. The To Upper Case function translates all values in the range of 97 to 122 into values over the range of 65 to 90. It also translates any other value in the extended ASCII character set that has an uppercase counterpart, such as lowercase alphabetic characters with accents.

 

Any idea's where you could use this?

 

Greg

Link to comment

Today I learned you can pass an I32 into the To Upper Case prim. Did this totally by accident.

 

Has anybody ever used this feature before? 

 

I have used it many times. This as well as the fact that many of the string-comparison nodes accepts numbers as well

post-5958-0-00011800-1398422886.png

 

This can be quite useful when processing strings, e.g. stream data, and you want to stay in the U8-array domain for speed.

 

/J

  • Like 2
Link to comment

Interesting, that's a new one for me.

 

Yesterday I also discovered that the string-number conversion primitives work on arrays as well.

 

attachicon.gifConvert.png

 

Sort of blew my mind as well. Never stop learning...

 

Should have been following the micro-nuggets thread on the NI-forums, somebody even pointed this out:

 

http://forums.ni.com/t5/LabVIEW/Micro-Nuggets-Post-em-if-you-got-em/m-p/1880159#M635520

 

This lets me replay a conversation I have had on numerous occasions:

Me:  Strings in LV are really byte arrays, you should be able to auto-index and pass them into any function which accepts an array of bytes.

NI (usually AQ):  Oh no, strings may appear to be byte arrays, but they aren't really.

Me:  That is funny, all of the byte-protocols such as TCP/IP use strings as the data type.  OK.  prove it, give me a single example where:  String->[u8]->Array Size does not equal String Length?  Any ini keys, magic strings, unicode this or that.

NI:  crickets chirping

 

This thread shows some string functions are also very useful for numerics.  I think there should be complete interchangeability.  The surprises should come when the polymorphism does not exist, not when it does.  Besides, there is so much more to searching arrays than finding an exact match, when you unleash the pattern matching normally reserved for strings to numeric arrays amazing things are possible.

 

This does bring up a sore spot for me, trying to use the Number to HexString function with an array has a brutal coercion:

post-26690-0-14230500-1398447597.png

 

Actually much faster to wrap a For Loop in this case.

Link to comment

It takes clusters too? Wow, think of the environmental impact of all those unnecessary loops and cluster primitives I've used. I've been living in decadence.

 

This thread shows some string functions are also very useful for numerics.  I think there should be complete interchangeability.  The surprises should come when the polymorphism does not exist, not when it does.  Besides, there is so much more to searching arrays than finding an exact match, when you unleash the pattern matching normally reserved for strings to numeric arrays amazing things are possible.

 

Agreed. Strings and one dimensional arrays should be far more interchangeable. I have more than once wanted to use the strings pattern matching capabilities on integer arrays for example.

Link to comment
This lets me replay a conversation I have had on numerous occasions:

Me:  Strings in LV are really byte arrays, you should be able to auto-index and pass them into any function which accepts an array of bytes.

NI (usually AQ):  Oh no, strings may appear to be byte arrays, but they aren't really.

Me:  That is funny, all of the byte-protocols such as TCP/IP use strings as the data type.  OK.  prove it, give me a single example where:  String->[u8]->Array Size does not equal String Length?  Any ini keys, magic strings, unicode this or that.

NI:  crickets chirping

 

I'm not sure if I agree or disagree with the auto indexing. I see strings in LabVIEW as a class ultimately representing an array of bytes, but not actually being an array of bytes themselves. To this point, adding the autoindexing ability to strings seems a bit off because the string itself is just one thing. That said, I understand your side of the argument as well; if it represents a byte array, shouldn't you be able to do work on it like it's a byte array? How is this handled in other languages when you iterate on a string class? Even in JAVA I think you have to convert to a character array, no? Or at least iterate over charAt(i). This is just a guess. It is an interesting questions none the less. 

 

Just curious, how would you implement autoindexing of a string? Would it return a new "char" data type, a u8, either, etc?

Edited by GregFreeman
Link to comment
Just curious, how would you implement autoindexing of a string? Would it return a new "char" data type, a u8, either, etc?

 

Well. While we're talking about pipe dreams, I'd have auto-indexing of a string return a (sub)string, thus potentially allowing multi-byte characters and inplaceness.

Link to comment
Besides, there is so much more to searching arrays than finding an exact match, when you unleash the pattern matching normally reserved for strings to numeric arrays amazing things are possible.

 

Wow, pattern matching for numeric arrays. I'm loving it!

If only we had a proper regex function that let us do the cool stuff we're used to (or not) from perl! xkcd

Link to comment
This lets me replay a conversation I have had on numerous occasions:

Me:  Strings in LV are really byte arrays, you should be able to auto-index and pass them into any function which accepts an array of bytes.

NI (usually AQ):  Oh no, strings may appear to be byte arrays, but they aren't really.

Me:  That is funny, all of the byte-protocols such as TCP/IP use strings as the data type.  OK.  prove it, give me a single example where:  String->[u8]->Array Size does not equal String Length?  Any ini keys, magic strings, unicode this or that.

NI:  crickets chirping

 

Currently you are quite right. However once NI adds Unicode support (if and when they do it) you will run into problems if you just assume that string == byte array. So better get used to the idea that they might not be the same.  :D

 

And there is in fact an INI key that adds prelimenary Unicode support to LabVIEW, however it still causes more trouble than it solves because of many reasons among which the following:

 

The problem for NI here is that the traditional "string==byte array" principle has caused a lot of legacy code that is basically impossible to not break when adding Unicode support. There was once a discussion by AQ where he posed the radical change of string handling in LabVIEW to allow proper support of Unicode. All byte stream nodes such as VISA Read and Write and TCP Read and Write etc would change to accept byte arrays as input. And there would probably be a new string type that could represent multi byte and wide char strings while the current string type would slowly get depreciated.

 

Difficulties here are that the various LabVIEW platforms support different types of wide chars (UTF16 on Windows, UTF32 on Unix and only UTF8 at most on most realtime systems). So handling those differences in a platform independent manner is a big nightmare.

Suddenly string length can either mean byte length, which is different on different platforms or character length which is quite time consuming to calculate for longer strings. Most likely when flattening/converting strings to a byte stream format they would have to be translated to UTF8 which is the largest common denominator for all LabVIEW platforms (and the standard format for the web nowadays).

 

All in all a very large and complicated undertaking but one NI is certainly working on in the background for some years already. Why they haven't started to change the bytestream nodes to at least accept also byte arrays or maybe better even change them to take byte arrays only, I'm not sure.

Link to comment
I'm not sure if I agree or disagree with the auto indexing. I see strings in LabVIEW as a class ultimately representing an array of bytes, but not actually being an array of bytes themselves. To this point, adding the autoindexing ability to strings seems a bit off because the string itself is just one thing. That said, I understand your side of the argument as well; if it represents a byte array, shouldn't you be able to do work on it like it's a byte array? How is this handled in other languages when you iterate on a string class? Even in JAVA I think you have to convert to a character array, no? Or at least iterate over charAt(i). This is just a guess. It is an interesting questions none the less.

Just curious, how would you implement autoindexing of a string? Would it return a new "char" data type, a u8, either, etc?

This is all nice and good as long as you can assume that you deal with ANSI strings only (with optionally an extended character set, which is however codepage dependent and therefore absolutely not transparently portable from one computer to the next). And it is not even fully right in LabVIEW now, since LabVIEW really uses multibyte (MBCS) encoding. So autinindexing over a string has a serious problem. Some would expect it to return bytes, I would expect it to return characters, which is absolutely not the same in MBCS and UTF encoding. The only way to represent a MBCS or UTF character as a single numeric on any platform would be to use ultimately UTF32 encoding, which requires 32 bit characters, but not all platforms on which LabVIEW runs support that out of the box and adding iconv or icu support to a realtime platform has some far reaching consequences in terms of extra dependencies and performance.

 

Java internally uses exclusively Unicode and yes you have to iterate over a string by converting it to a character array or indexing the character position explicitly. And there is a strict seperation between byte stream formats and string formats. Going from one to the other always requires an explicit conversion with an optional encoding specification (most conversions also allow default encoding conversion which is usually UTF8).

Link to comment
I'm usually the loudest complainer about people treating strings as a byte arrays.  In most programming languages, strings are not bytes arrays.  In some languages, for legacy reasons you can treat them as byte arrays.  In LabVIEW they are byte arrays.  That results in interesting things with Multibyte Character Sets.  I don't have any of the Asian localized LabVIEWs installed, but I think strings in those languages will have a string length reported as twice the number of characters that are actually there.  When you have the unicode INI token set, this is definitely the case.  Arguably that's wrong, but there's no way to go back and change it now.  

 

I thought MBCS coding has exactly the attribute of not specifically using a fixed size code size for the different character elements. So I somehow doubt your claim that Asian localized LabVIEW would show double the number of bytes in String Length than it has characters. It would be most likely around double if the text represents the local asian characters but likely not exactly. And if it is structured anyway similar to UTF8 then it might actually show exactly the same amount of bytes as it contains characters if it contains western English text only.

 

The platforms which are supported is entirely related to the language and compiler for that language.  Any time -- absolutely any time -- strings move from one memory space to another (shared memory, pipes, network, files on disk) the encoding of the text needs to be taken into account.  If you know the encoding the other application is expecting, you can usually convert the text to that encoding.  There are common methods for handling that text that doesn't convert (the source has a character that the destination doesn't have), but they're not perfect.  The good thing is that it'd be extremely rare since just about everything NI produces right now "talks" ASCII or Windows-1252, so it's easy to make existing things keep working.

 

What you're talking about is how the characters are represented internally in common languages.  In C/C++/C# on Windows they're UTF-16.  In C/C# on Unix systems they're UTF-32.  As long as you're in the same memory space you don't need to worry.  There is no nightmare.  Think of your strings as sequences of characters and it's a lot easier.  As soon as you cross a memory boundary you need to be concerned.  If you know what encoding the destination is expecting you encode your text into that.  If/when LabVIEW gets Unicode support, that will be essential.  Existing APIs will be modified to do as much of that for you as possible.  Where it's not clear, you'll have the option and it'll default to Windows-1252 to mimic old behavior.  Any modern protocol has its text encoding defined or the protocol allows using different encodings.

 

I guess I used somehow bad words to describe what I wanted to do. You are right that the crossing of memory borders is the place where string encoding needs to be taken care of. And  that there are of course always problems with one encoding not translating exactly one to one to another one. As such the UTF encodings are the most universal ones as they support basically all the currently known characters including some that haven't been used for 1000 and more years.

 

While you are right that most NI products currently use ASCII or at most Win1252 as that is the default ACP for most western Windows installations, there is certainly already a problem with existing LabVIEW applications who run on Windows installations that have been configured for different locales. There for instance, the string a DLL receives from the LabVIEW diagram can and will contain different extended ASCII characters and the DLL has to figure out what that encoding local is before it can do anything sensible with the string when it wants to be locale aware (example in case, the OpenG lvzip library which needs to translate all strings from CP_ACP to CP_OEM encoding to deal correctly (well correctly is a bit to much said here, but at least the way other ZIP utilities do) with the file names and comments when adding them to an archive and vice versa when reading them from it. Also any string written to disk or anywhere else in such a locale will look different than on a Win1252 locale when it contains extended characters.

 

This is what I mean with the task being a nightmare for NI. Any change you guys will do, no matter how smart and backwards compatible you attempt to be, has a big potential to break things here. And it must be a huge undertaking or LabVIEW had that support already since 7.1!!!  :D

 

And one plea from my side, if and when NI adds Unicode string support please expose a C manager interface for that too!!!!  :cool:

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.