Jump to content

Typecast+swap bytes faster then UnFlattenString -Why?


mzu

Recommended Posts

Posted (edited)

Dear audience,

There is a certain very long string (1-100 MB). It contains 16bit integers in little-endian format. I need to convert them to the proper i16 array. I am using Windows, x86. 2 solutions come to my mind:

  1. Use typecast to i16 array and then swap the bytes, since first typecast assumes big-endian order. post-2886-125101986185_thumb.png
  2. Use unflatten from string, specifiying the right order.post-2886-125101986711_thumb.png

In actual BDs there is a constant instead of the control for array2 (this is what LV2009 changed when I created snipplet)

One might think that avoiding an extra pair of byte swaps in the second case would make it faster.

Not really, see the attached graph. It has size of the string along the X-axis (in MB) and time required to get the conversion done along the Y axis (in ms). 

post-2886-125101971595_thumb.png

The dependence is quite linear, so something extra is done on each byte of the data in case of unflatten from string. What is it? Why number 2 is slower? It does not depend on LabVIEW version (8.6 vs 2009) and does not depend on 32/64 bit.

(crosspost from http://www.labviewpo...t=1367&start=0)

Edited by mzu
  • Like 1
Posted

 Small addition. It is actually i16 with Unflaten from string, and the data are actually for i16:

post-2886-125105866944_thumb.png

Posted

Not sure how typecast is implement in LabVIEW. However in other languages (such is C) typecast is actually an operator, not a function. Using typecast just tells the compiler that the data in the memory location for variable x should be interpreted ('re-cast') as being of datatype y. That is, there is no function call at runtime associated with performing a typecast.

However, I suspect the flatten to string primitive does do some low-level data interpretation / copying and may of some computational overhead. Swap bytes sounds like something the CPU instruction set would implement directly so I can imagine that this would be significant by comparison.

Mostly speculation but I hope that helps.

~Dan

  • Like 1
Posted

Thank you for your reply,

This is exactly what I need: a C-type typecast.

In LabVIEW this kind of typecast swaps bytes (probably, because for early versions of LabVIEW flattened data was always big-endian). So, I though that using direct specific operation, with specific instruction "not to swap bytes" would save time. Nope. 

Also, note that the time difference between those 2 approaches is O(number of bytes), so "Unflatten From String" actually does some operation on every byte(word) of the string it operates on.

Posted

Not sure how typecast is implement in LabVIEW. However in other languages (such is C) typecast is actually an operator, not a function. Using typecast just tells the compiler that the data in the memory location for variable x should be interpreted ('re-cast') as being of datatype y. That is, there is no function call at runtime associated with performing a typecast.

However, I suspect the flatten to string primitive does do some low-level data interpretation / copying and may of some computational overhead. Swap bytes sounds like something the CPU instruction set would implement directly so I can imagine that this would be significant by comparison.

Mostly speculation but I hope that helps.

~Dan

LabVIEW's typecast is more complex than that. It is in essence a typecast like what you see in C but with the extra twist of byte swapping any multi-byte integer to be in Big Endian format on the byte stream side.

I think the problem here is that Unflatten does other things like checking the input string length to be valid and whatever. The implementation of Unflatten is certainly a lot more complex since it has to work with any data type including highly complicated variable sized types of clusters containing variable sized data, containing ......

Typecast on the other hand only works on flat data which excludes any form of clusters containing variable sized data. Possibly Flatten/Unflatten could be improved since little endian conversion on a little endian machine should certainly not take longer than the Typecast and additional byte swap, but the priority for such a performance boost might be rather low, since it would certainly make the implementation of Flatten/Unflatten even more complex and hence more prone to bugs in the implementation.

But thanks for showing me that the good old Typecast/Swapping still seems to be the better way than using Flatten/Unflatten with the desired endian setting :lol:.

Thank you for your reply,

This is exactly what I need: a C-type typecast.

In LabVIEW this kind of typecast swaps bytes (probably, because for early versions of LabVIEW flattened data was always big-endian). So, I though that using direct specific operation, with specific instruction "not to swap bytes" would save time. Nope.

The reason for this is that LabVIEW originates from the Mac with its 68000 CPU which was always a big endian CPU. While the later PPCs in the PPC Macs had the option to either use big or little endian as preferred format, Apple choose to use the same big endian format that came from the 68k.

When NI ported LabVIEW to Windows (and other architectures like Sparc and PA Risc later) they had to tackle a problem. In order to send binary data to a GPIB device or over the network, one had always used the Typecast or Flatten operator to convert it into the binary string and it would have been very nice if the data sent over the network or written into a binary file by a LabVIEW program on the Mac, could be easily read by a LabVIEW program on Windows. This required the same byte order for flattened data, so the flattened format was specified to be big always endian, independent of the platform LabVIEW is running on.

A C typecast will be difficult to do in LabVIEW. Trying to do that with a small external code could be an option but it is quite tricky. It's not enough to simply swap the handles but you also need to adjust the array length in the handle accordingly so a different function for each different integer sizes would be required.

Rolf Kalbermatter

  • Like 2
Posted (edited)

I think the problem here is that Unflatten does other things like checking the input string length to be valid and whatever. The implementation of Unflatten is certainly a lot more complex since it has to work with any data type including highly complicated variable sized types of clusters containing variable sized data, containing ......

Thank you Rolf,

the issue is even more complex. Checking for the string length and other checks is a O(1) operation. What unflatten does is O(N), where N is amount of data. But I guess we will never know unless LV source code leaks out someday ...

Edited by mzu
Posted

rolfk,

made a DLL according to your recipe. Please, find attached VI and DLL. It is 2 orders of magnitude faster on my computer. The code inside the DLL is quite simple:

void TypeCast(LStrHandle *arg1, TD1Hdl *arg2){     void *tmp = *arg1;      *arg1 = (LStrHandle)*arg2;  //Swap     *arg2 = (TD1Hdl)tmp;     (**arg2)->dimSize >>=1;  // Adjust array size}

DLLTypecast.zip

post-2886-125133371813_thumb.png

Posted

rolfk,

made a DLL according to your recipe. Please, find attached VI and DLL. It is 2 orders of magnitude faster on my computer. The code inside the DLL is quite simple:

I don't see any order of magnitude faster or anything. The "real" typecast method is simply a flat line independent of the size of the array. :rolleyes: And that is not surprising since the work to be done is always the same.

Also smart move to use the shift operator for the size adjustment. That way you avoid any possible rounding problems if the incoming array is of uneven length. Otherwise using the divider operator you could get x.5 which could get rounded up to the next number (not sure about the exact semantics of C in this case if both the divider and and dividend are integer, though there is a good chance that the 2 in

(**arg2)->dimSize /= 2;

might get expanded to a floating point number first anyhow.

But why even bother about that if you can use the shift operator instead which will always do the safe thing.

I guess someone at NI will soon go and add a special case code to the Flatten and Unflatten function that will do the smart thing when input and output are simply both flat arrays and the desired endianess matches the endianess of the current processor. :cool:

Rolf Kalbermatter

Posted

 

The "real" typecast method is simply a flat line independent of the size of the array. :rolleyes: And that is not surprising since the work to be done is always the same.

I agree, but there is a slight linear dependence. Why? May be it the passing of the parameters inside the VI, containing a dll, may be it is an overhead of the testing method.

Posted

Unflatten has to go through a lot more checks because it doesn't know what format the input is in. It can't possibly just reinterpret_cast the pointer and let you work with the data because if the string was malformed then you would crash (we don't like that). With the typecast primitive we can avoid some of those checks because we know the source type and from that we know both that the layout of the input data and that it is well formed. The reason it's still linear is that there is still a conversion taking place. We still can't do a real reinterpret_cast in most cases because that would be unsafe.

  • Like 1
Posted
Unflatten has to go through a lot more checks

Adam, would not all those checks be independent of the array size? Like check that the length == size of an area allocated for a handle etc ...

Both typecast primitive and unflatten primitive know the type at "compile-time" (whatever it means for LabVIEW).

We still can't do a real reinterpret_cast in most cases because that would be unsafe.

I got your point.

there is still a conversion taking place

What kind of a conversion: only byteswap? or copying, or something else?

I guess someone at NI will soon go and add a special case code to the Flatten and Unflatten function

:)

Posted

Even if we ignore all conversions and all byte swapping we still have to copy the data because the code that UnFlatten uses cannot take ownership of the data it's reading. On the diagram it might make sense to say "I'm done with this string, so this output array can own it", but the same code handles unflattening from a file stream (that's how we read default data and constants from a VI file, for instance). Copying the data is an O(n) operation so we can't do any better than that.

It may be possible to optimize specific cases, but that comes at the expense of complicating code that currently doesn't care what it's reading from. Since unflattening from a string isn't normally considered a high-performance operation that's not a tradeoff we've been willing to make.

Posted

Even if we ignore all conversions and all byte swapping we still have to copy the data because the code that UnFlatten uses cannot take ownership of the data it's reading. On the diagram it might make sense to say "I'm done with this string, so this output array can own it", but the same code handles unflattening from a file stream (that's how we read default data and constants from a VI file, for instance). Copying the data is an O(n) operation so we can't do any better than that.

It may be possible to optimize specific cases, but that comes at the expense of complicating code that currently doesn't care what it's reading from. Since unflattening from a string isn't normally considered a high-performance operation that's not a tradeoff we've been willing to make.

So the real solution would then be to add an endianess selector to the Typecast?:rolleyes:

Running and hiding!

Rolf Kalbermatter

  • Like 2

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.