Jump to content

are Boolean expression stopped like they would be in C ?


Recommended Posts

For most programming languages, when a logical expression is evaluate, like " d = a AND ( b OR c )" the operation is terminated when "a" is false and the expression "b OR c"  will not be processed. 

I think the LabVIEW compiler does something similar. Does anyone know?
Problem in LabVIEW is the processing order is not defined.

Link to comment

Do you really have an application that needs that much optimazation?  You likely have other issues if that is the case.  Anyways, you can do that exact logic easily enough without the AND.

post-11268-0-34840100-1449579459.png

 

Note: The FALSE case just has the FALSE value passed through.

Edited by crossrulz
Link to comment

from what I can tell, people like to use that to make hard to read code such that a particular function (with side effects) will or won't get called depending on earlier boolean expressions:

https://en.wikipedia.org/wiki/Short-circuit_evaluation

 

Either way, its more realistic to say that the behavior is defined like this rather than that the compiler does some special optimization. labview's compiler simply doesn't do the same:

"The short-circuit expression x Sand y (using Sand to denote the short-circuit variety) is equivalent to the conditional expression if x then y else false; the expression x Sor y is equivalent to if x then true else y."

Edited by smithd
Link to comment

Thank you for your replies on the "short cut evaluation".
I already tried the construction of crossrulz, Unfortunately, it turned out that the case statement itself slows down the speed.

Also I tested the speed applying true or false to a.
No difference was detected, of course because the ^-node needs input from the v-node as is pointed out by Craig.
 

Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation.  

Link to comment

Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation.

 

And here you walk into the mist! :D

 

LabVIEW is written in C(++) for most of it, but it doesn't and has never created C code for the normal targets you and I are using it with. Before LabVIEW 2010 or so, it translated the code first into a directed graph and then from there directly into machine code. Since then it has a two layer approach. First it translates the diagram into DFIR (Dataflow Intermediate Representation) which is a more formalized version of a directed graph representation with additional features. Most of the algorithme optimization including things like dead code elimination, constant unfolding and many more things are done on this level. From there the data is passed to the open source llvm compiler engine which then generates the actual machine code representation. At no point is any intermediate C code involved, as C code is notorously inadequate for representing complex relationships in an easy to handle way.

 

There is a C generator in LabVIEW that can translate a LabVIEW VI into some sort of C++ code. It was used for some of the early embedded toolkits such as for the AD Blackfin Toolkit, Windows Mobile Toolkit, and ARM Toolkit. But the generated code is pretty unreadable, and the solution proofed very hard to support in the long run. You can still buy the C Generator Addon from NI which gives you a license to use that generator but its price is pretty exorbitant and active support from NI is minimal. Except under the hood for the Touch Panel Module in combination with an embedded Visual C Express installation it is not used in any currently available product from NI AFAIK.

  • Like 2
Link to comment

​

Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation.  .

 

​Yes - fundamentally impossible from your perspective

 

LabVIEW (or more specifically, G) isn't an interpreted language like Javascript or Python and it can evaluate the components of your expression in parallel, unlike procedural languages. So "b OR c" might be evaluated 10 hrs before "a" since an expression, or a component of an expression, is evaluated when the data is available, not when a program pointer steps to the next line.- that's  dataflow for you  :P

​

  • Like 2
Link to comment

LabVIEW can do some funny things, with constant folding, deleting dead code, and caching results from previous calculations.  That coupled with the fact that UI elements are updated asynchronously, probes, automatic error handling, and debugging, can cause code that measure execution speed to be some thing that is trickier than initially thought.  You may want to post the code you made which calculates this speed test so we can determine if it is calculating a speed properly.

  • Like 1
Link to comment

I make crossrulz first solution approx 9 times faster in the shortcut and 4-5 times faster in the non shortcut scenario.

 

You have added a build array and a Bool array to Number into the equation.

 

I think it is the first time I have benchmarked something this low level before!

 

 

Craig

post-15311-0-74886000-1449854463.png

Link to comment

I get very similar results to the ones CraigC has shown. The numeric conversion really is no good option here, neither for performance, nor for memory.

 

In order to find an answer to the initial question I have performed my own tests based on your benchmarks before. My benchmark includes several cases with different ways of solving the same issue. Each case will use the same values and the VI runs in a loop to get some long-term timings.

 

The AND condition can either be constant TRUE, FALSE or random values. The last case will perform the same operation as all others before, but this time without a for-loop, so basically this is totally LabVIEW optimized. For readability a chart shows the timings in comparison. Oh yeah and since we use for-loops there are also cases where the array is initialized beforehand and values are replaced each iteration to prevent memory allocations (who knows, might safe time :rolleyes:).

 

Here is the snippet (running in LV2015, attached VI is saved for LV2011 though):

 

post-17453-0-56701800-1449861066.png

 

Please tell if you find anything wrong with how it is implemented. Benchmarking is no easy task unfortunately.

Now here are some results based on that VI. First with the AND being connected to RANDOM values:

 

post-17453-0-17921700-1449861075.png

 

Next the AND is connected to TRUE values:

 

post-17453-0-69539600-1449861081.png

 

And finally AND connected to FALSE:

 

post-17453-0-20263800-1449861086.png

 

So what do we get from that?

 

Basically initializing a Boolean array and replacing values in a for-loop did not work very well (all the '(no allocation)' times). My guess is that the compiler optimization is just better than me, since the number of iterations is pre-determined LabVIEW can optimize the code way better.

 

All timings with shortcut are worse than the ones without. So using a shortcut does not improve performance, in fact its just the opposite. Results might change if the Boolean operations are way more complex, however in this case LabVIEW is doing its job way better than we do with our 'shortcut'.

 

The last case without for-loop ('Fastest solution') gives a clear answer. LabVIEW will optimize the code to work on such an amount of data. My guess is that the operations also use multiple threads inside, so maybe we get different results with parallel for-loops enabled in the other cases. And most likely much more stuff.

 

The most interesting point is the comparison of timings between the 'Fastest solution'. As you can see there is no real difference between any of the three cases. This could either mean my benchmark is wrong, or LabVIEW has no shortcut optimizations like C.

 

What do you think?

Shortcut Benchmark LV2011.vi

Edited by LogMAN
Link to comment

Looks pretty good to me.  The only improvements I could suggest (which I tried and have no real effect) is to disable debugging and automatic error handling, replace compound arithmetic with the native AND and OR.  And I would assume LabVIEW could look up stream and find that the boolean array output isn't used and might optimize that away.  But adding an indicator (outside the for loop with the other indicators) didn't change the result of the test.  

 

So in this situation LabVIEW does its job better than we can.  Or rather the compiler can use things it knows and create code that performs better than we could create.  And honestly this is one of the staples of LabVIEW programming.  "Don't worry about it the compiler will take care of it."  This statement is both something I've relied on, and something I've hated at times.  The compiler isn't perfect, but I obviously trust it enough to rely on it daily.

  • Like 2
Link to comment

I am still surprised  the option {build array; convert to Byte} does so bad because I think of those nodes as primitive labVIEW operations. Convert to Byte is not a frequently used node, maybe It is not a very basic operation after all.
I am also a bit disappointed because I like the idea of editing a truth table. Of course, performance wins from beauty!   

Link to comment

Try to estimate the amount of CPU cycles involved.

 

Your code has two issues:

 

1) The bits must be put together to form an array (they originate from three different locations in memory).

2) You have an array of 3 bits, so that makes 1 Byte (least amount of memory that can be allocated). The numeric representation is a U32 (4 Bytes), so it won't fit without additional work.

 

The System actually has to allocate and initialize additional memory. After that there is a case-structure with n-cases. The CPU has to compare each case one-by-one until it finds a match.

 

Now the primitive solution: First the CPU gets a copy of one bit from each the first and second array and performs the OR operation (1 cycle). The result is kept in the CPU cache and used for the following AND operation (another 1 cycle). No additional memory is required.

 

The primitive solution could actually work on 32 or even 64 Bits simultaneously (depending on the bitness of your machine and the CPU instructions used), where the numeric solution must be done one-by-one.

 

Hope this makes things a bit more clear.

Link to comment

I think booleans in LabVIEW are stored as bytes, not bits. All bits clear is a false and any other pattern is true.

 

You are right!

I've just tried it. Initializing an array of Boolean vs. array of U8 allocates the exact same amount of memory. :o

I did not know that! I honestly thought LabVIEW would auto-magically merge bits into bytes... Reality strikes again :(

So I revoke my last statement and state the opposite:

 

The primitive solution could actually work on 32 or even 64 Bits simultaneously (depending on the bitness of your machine and the CPU instructions used), where the numeric solution must be done one-by-one.

 

The primitive solution won't work on multiple Bits simultaneously.

 

We could do this ourselves by manually joining Boolean to Bytes and using the primitives on Bytes instead of Boolean. Not sure if there is anything gained from it (in terms of computational performance) and I think it's not subject to this topic, so I leave it at that.

Link to comment

I think booleans in LabVIEW are stored as bytes, not bits. All bits clear is a false and any other pattern is true. This can lead to some interesting tricks you can play when typecasting in between boolean and characters.

 

That's right. Before LabVIEW 5, Booleans were 16 bit integers and Boolean arrays were packed into 16 bit integers too. That had however several performance penalties in several places and was anything but standard to anything else except some MacOS precedences, so was dropped in favor of a more standard implementation with bytes for each boolean.

 

 

You are right!

I've just tried it. Initializing an array of Boolean vs. array of U8 allocates the exact same amount of memory. :o

I did not know that! I honestly thought LabVIEW would auto-magically merge bits into bytes... Reality strikes again :(

 

Reality is that there is no such thing as a perfect implementation for every possible use case. Your packed array solution you assumed LabVIEW to use, had severe performance limitations in LabVIEW 3 and was therefore dropped in favor of a more common way that did consume more memory when boolean arrays were involved but made conversion between boolean arrays and other formats simpler and generally more performant.

 

But there is a simple rule: If you are concerned about this kind of performance optimization then don't use boolean arrays at all! They involve memory manager operations all over the place and those are magnitudes slower than a few hundred CPU cycles with which you can do just about any boolean operation you might ever dream up. For such performance optimization you need to look at properly sized integers and do boolean arithmetic on them. And if your "boolean array" gets over 64 bits you should definitely look at your algorithme. Most likely you have chosen the easy path of working with boolean arrays in order to not have to think about proper algorithme implementation but if you go that path, worrying about sub microseconds optimizations is definitely a lost battle already.

 

One of the worst performance killers with the packed boolean array implementation in LabVIEW 3 was autoindexing. Suddenly a lot of register shifts and boolean masking had to be done on every autoindex terminal for a boolean array. That made such loops magnitudes slower than a simple adress increment when using byte arrays for boolean arrays.

  • Like 2
Link to comment

Also keep in mind providing random values to a case structure means a lot of conditional jumps that the processor gets wrong. You're trading the execution time of an AND or an OR for the execution time of finding the next operation to call in memory and potentially getting the wrong one and having to back up. I'm actually surprised the case structure isn't significantly worse.

Link to comment
  • 2 weeks later...
  • 2 weeks later...

The way this is coded these two processes happen in parallel.  If you are on a single core you may get inconsistent results.  Also with UI's being updated asynchronously, if one finished, before the other (which we expect) then your timing won't be consistent because your computer will be off updating the UI and may take away cycle counts from the other process.  Basically the timing information from that VI can't be trusted, but the fact that the numbers are so small means it probably doesn't matter anyway.  

 

Also can't that bottom for loop be removed all together?  AND and OR work on arrays as well as scalars.

Link to comment

By the way, You can use a Lookup table / Karnaugh diagram if you start with Byte's instead of Boolean's. Still slower than the logic (0.027  versus 0.022) .

This makes sense if you break them both down:

 

Top option:

-Load int, load int (likely pre-loaded since I'm assuming thats the point of letting labview do autoindexing, so it should be fast)

-Add int, int

-Convert someIntType->int (likely free)

-Add int, int

-Convert result to I32 (maybe free)

-Bounds check on index

-Random access from memory at index

...still waiting

...still waiting...

...still waiting...

...result

-Store result in output array

 

Bottom Option:

-Load int, int (fast)

-Or int, int

-And int, int

-Move result to output array

 

The way this is coded these two processes happen in parallel.  If you are on a single core you may get inconsistent results.  Also with UI's being updated asynchronously, if one finished, before the other (which we expect) then your timing won't be consistent because your computer will be off updating the UI and may take away cycle counts from the other process.  Basically the timing information from that VI can't be trusted, but the fact that the numbers are so small means it probably doesn't matter anyway.  

 

Also can't that bottom for loop be removed all together?  AND and OR work on arrays as well as scalars.

Some of it happens in parallel but there is a wire wrapping around from the end of the top loop to the start of the bottom.

Edited by smithd
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.