Jump to content

Performance boost for Type Cast


Recommended Posts

Performance public service announcement.

1733801934_typecastperformance.png.1e12b99db6767ed0f91e910f9145d690.png

Benchmark is included below for those who want to validate this discovery. "Type Cast" works by flattening the type to string and then unflattening. In this case, it doesn't recognize the special case of the byte array already being a string, but the Byte Array To String node does recognize that equivalence, so doing this eliminates a significant part of the type cast effort.

This has been reported to NI as a possibility for future optimization.

array-typecast-benchmark.vi

  • Like 1
Link to comment

What work does Type Cast need to do with arrays that it doesn't do with strings, which are just arrays of bytes?  Seems to me that the fastest cast would be something like array of U64 to Doubles, as those are the same size and don't require any length checks (unlike, say, a 9-byte string to array of 8-byte Double).  Is the problem not a missing optimization for U8, but instead nonperformance code for numeric arrays?

Link to comment

Thanks for posting this.  This is going to make me go back and change some code.

Is this performance boost agnostic to the "type" input on the typecast primitive?  Your explanation makes this seem to be the case, but I just want to confirm.  Basically, can it be stated as a general rule: "if you are wiring a u8 array into a type cast primitive, then it's always best to go ahead and drop in a byte array to string primitive"

Thanks!

Link to comment
21 hours ago, bjustice said:

Even better, @Aristos Queue could this be turned into a generic VIM?: 

Try it out in the benchmark VI that I posted and see what you get.

On 9/25/2022 at 7:06 PM, bjustice said:

"if you are wiring a u8 array into a type cast primitive, then it's always best to go ahead and drop in a byte array to string primitive"

I believe the answer is yes. As with all performance questions, I hesitate to actually say yes... you [or someone] should benchmark a few cases before committing to that theorem. 

Link to comment

Hooovahh,

I find myself running into this situation somewhat frequently.  Usually, it's because I am pulling off an arbitrary blob of data from either a network connection, or from a data file, and then have to type cast this into the appropriate numeric data array type.  In these situations, performance is paramount, data is often very large or coming in at high rate. 

Unless I'm missing something, my understanding is that this mechanism here represents the most performant way to convert an array of bytes or a string into a strictly typed numeric array.  Thus, the optimization discovered by AQ here is very useful.

Thoughts? Thanks!

Link to comment

oh, I do wish that the typecast allowed for me to specify endianess though.

I recognize that the unflatten from string allows for me to specify endianess, but this is substantially slower than typecast.

So, I often have to reverse the byte array before typecasting to get the endianess correct.

Link to comment
17 hours ago, bjustice said:

Unless I'm missing something, my understanding is that this mechanism here represents the most performant way to convert an array of bytes or a string into a strictly typed numeric array.  Thus, the optimization discovered by AQ here is very useful.

Oh don't get me wrong, what AQ posted here is great and anything to improve performance is appreciated.  I'm just saying that a Type Cast function itself, is on the slow side, when compared to a more low level option.  Lets say you are converting an array of bytes into an array of booleans.  You can use the Type Cast, but depending on the use case a "Not Equal Zero" would be a better.  Same with if you are converting to an enum.  Having a case structure for the numeric, and then having enum constants for each value would be more efficient.

In the example AQ posted you could do the raw math on the array of bytes, and get a double value.  I suspect this would be better on performance, if someone spent the time to write a conversion from 4 bytes to a double and back.

Link to comment

Really?  You're suggesting that there's potentially a faster way to convert a u8 byte array to an array of DBLs?  I hadn't considered that.  I really thought that TypeCast would have been the fastest option.  But, I also didn't realize that type cast used the string unflatten code under the hood until AQ communicated this.

Link to comment
3 hours ago, bjustice said:

Really?  You're suggesting that there's potentially a faster way to convert a u8 byte array to an array of DBLs?

I am suggesting that, yes, but I can't prove it.  My reasoning is just that every time I've been able to code around the Type Cast, it has improved performance.  I admit that converting to a double is more complicated than my other examples so I could be wrong.  I'm not exactly sure how a double becomes an array of bytes.  It might be simple but for some reason it isn't clear to me what the conversion is.  Casting a double "1" becomes 0x3F F0 00 00 00 00 00 00.

Edit: The deeper down the rabbit hole I go, the more I think Type Cast should just be used in these cases.

Link to comment

The relative difference between the two appears to decrease with the size of the array, suggesting this is an initialization step (sanity checks) issue.

On my machine:
 

N    |  no String Conversion    |    with String Conversion  |
--------------------------------------------------------------
10   |              750 ns      |               400 ns       |
100  |              700 ns      |              1000 ns       |
1000 |             5600 ns      |              5100 ns       |

So approximately 300 ns more for the no String Conversion case, which becomes a negligible difference for large array.

I would argue that large arrays are more common in this type of conversion tasks, but this is interesting to know regardless.

I would also argue that there are vastly more irritating oldies in the LabVIEW code base that would deserve attention, but we know how that flies in Austin.

Link to comment
12 hours ago, jacobson said:

I've seen a few RT applications that use move blocks over type casting as well. Obviously not very safe but it is fast.

It depends on your definition of safe! 😃

If the VI does enforce proper data types (through its connector pane for instance) and accounts for the size of the target buffer or adjusts it properly (for instance by using the minimum size in the Call Library Node to use a different parameter as size indicator, or explicitly resize the target buffer to the required size) this can be VERY safe. Of course it is not safe in the sense that any noob can go into that VI and sabotage it, but hey to make things foolproof requires an immense effort, and that is the overhead of the Typecast function. 😁

But to make things engineer proof is absolutely impossible! 😀

Also a memcpy() call is only functionally equivalent to a Typecast on Big Endian machines. For LabVIEW that applied "only" to Mac68K. MacPPC, SunSparc, HPUnix PARisc, Silicon Graphics Irix, IBM AIX, DEC Alpha and VxWorks (of whose not all were ever officially released). The only LabVIEW platforms that really use Little Endian are the ones based on i386/AMD64 and ARM CPUs, which are the only platforms that currently still are shipping.

On 9/27/2022 at 8:39 PM, hooovahh said:

In general I try to avoid the Type Cast function.  In the example AQ gave I probably wouldn't try to code around it, since that seems like it could be a pain.  But something like a number to an Enum, or to a cluster or array, I may try to code it for better performance.

For me it really depends. I use it often in functions that deal with binary communication (if they use Big Endian binary format, otherwise the Flatten/Unflatten is always preferable). Here the additional overhead of the Typecast functions is usually insignificant in comparison to the time the overall software has to wait for responses from the other side. Even with typical TCP communications and 1Gb or higher fiber connections, your Read function sits generally there for several milliseconds to receive the next data package. Shaving off a few nanoseconds or even microseconds from the overall execution time is really completely insignificant in this case. If you talk about serial communication or similar, things get even more insignificant.

For shared library interfacing and data processing like image handling and similar, the situation is often different and here I tend to always use memory copies whenever possible, unless I need to do specific endian handling. Then I use Flatten/Unflatten as that is very convenient to employ a specific endianness.

Edited by Rolf Kalbermatter
Link to comment
16 hours ago, bjustice said:

Really?  You're suggesting that there's potentially a faster way to convert a u8 byte array to an array of DBLs?  I hadn't considered that.  I really thought that TypeCast would have been the fastest option.  But, I also didn't realize that type cast used the string unflatten code under the hood until AQ communicated this.

Typecast does a few things to make sure the input buffer is properly sized for the desired output type. For instance in your byte array to double array situation, if your input is not a multiple of 8 bytes, it can't just reuse the input buffer in place (It might never do that but I'm not sure. I would expect that it does if that array isn't used anywhere else by a function that wants to modify/stomp it). But if it does it has to resize the buffer and also adjust the array size in any case. If it doesn't it would be anyhow a dog slow operation 😃.

Extra complication with Typecast is that it always does Big Endian normalization. This means that it will go on every still shipping LabVIEW platform and byte swap every element in the array appropriately. This may be desired but if it isn't, fixing it by adding a Swap Bytes and Swap Words function in the resulting array has actually several problems:

1) It costs extra performance for swapping the bytes in Typecast and then again for swapping it back. A simple memcpy() would be much more performant for sure even if it requires a memory allocation for the target buffer.

2) If LabVIEW ever gets a Big Endian platform again (we can dream, can we) your code will potentially do the wrong thing depending on who created the original byte array in the first place. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.