ArjanWiskerke Posted December 8, 2015 Report Share Posted December 8, 2015 For most programming languages, when a logical expression is evaluate, like " d = a AND ( b OR c )" the operation is terminated when "a" is false and the expression "b OR c" will not be processed. I think the LabVIEW compiler does something similar. Does anyone know?Problem in LabVIEW is the processing order is not defined. Quote Link to comment
CraigC Posted December 8, 2015 Report Share Posted December 8, 2015 Hi, The processing order is very much defined. See Here http://www.ni.com/getting-started/labview-basics/dataflow Craig Quote Link to comment
crossrulz Posted December 8, 2015 Report Share Posted December 8, 2015 (edited) Do you really have an application that needs that much optimazation? You likely have other issues if that is the case. Anyways, you can do that exact logic easily enough without the AND. Note: The FALSE case just has the FALSE value passed through. Edited December 8, 2015 by crossrulz Quote Link to comment
smithd Posted December 8, 2015 Report Share Posted December 8, 2015 (edited) from what I can tell, people like to use that to make hard to read code such that a particular function (with side effects) will or won't get called depending on earlier boolean expressions: https://en.wikipedia.org/wiki/Short-circuit_evaluation Either way, its more realistic to say that the behavior is defined like this rather than that the compiler does some special optimization. labview's compiler simply doesn't do the same: "The short-circuit expression x Sand y (using Sand to denote the short-circuit variety) is equivalent to the conditional expression if x then y else false; the expression x Sor y is equivalent to if x then true else y." Edited December 8, 2015 by smithd Quote Link to comment
ArjanWiskerke Posted December 9, 2015 Author Report Share Posted December 9, 2015 Thank you for your replies on the "short cut evaluation".I already tried the construction of crossrulz, Unfortunately, it turned out that the case statement itself slows down the speed. Also I tested the speed applying true or false to a.No difference was detected, of course because the ^-node needs input from the v-node as is pointed out by Craig. Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation. Quote Link to comment
Rolf Kalbermatter Posted December 9, 2015 Report Share Posted December 9, 2015 Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation. And here you walk into the mist! LabVIEW is written in C(++) for most of it, but it doesn't and has never created C code for the normal targets you and I are using it with. Before LabVIEW 2010 or so, it translated the code first into a directed graph and then from there directly into machine code. Since then it has a two layer approach. First it translates the diagram into DFIR (Dataflow Intermediate Representation) which is a more formalized version of a directed graph representation with additional features. Most of the algorithme optimization including things like dead code elimination, constant unfolding and many more things are done on this level. From there the data is passed to the open source llvm compiler engine which then generates the actual machine code representation. At no point is any intermediate C code involved, as C code is notorously inadequate for representing complex relationships in an easy to handle way. There is a C generator in LabVIEW that can translate a LabVIEW VI into some sort of C++ code. It was used for some of the early embedded toolkits such as for the AD Blackfin Toolkit, Windows Mobile Toolkit, and ARM Toolkit. But the generated code is pretty unreadable, and the solution proofed very hard to support in the long run. You can still buy the C Generator Addon from NI which gives you a license to use that generator but its price is pretty exorbitant and active support from NI is minimal. Except under the hood for the Touch Panel Module in combination with an embedded Visual C Express installation it is not used in any currently available product from NI AFAIK. 2 Quote Link to comment
ShaunR Posted December 9, 2015 Report Share Posted December 9, 2015 ​ Does it mean that short cut evaluation is fundamentally impossible? LabVIEW in the end generates C code. The c compiler, I would think, takes care of the short-circuit evaluation. . ​Yes - fundamentally impossible from your perspective LabVIEW (or more specifically, G) isn't an interpreted language like Javascript or Python and it can evaluate the components of your expression in parallel, unlike procedural languages. So "b OR c" might be evaluated 10 hrs before "a" since an expression, or a component of an expression, is evaluated when the data is available, not when a program pointer steps to the next line.- that's dataflow for you ​ 2 Quote Link to comment
ArjanWiskerke Posted December 10, 2015 Author Report Share Posted December 10, 2015 I tested the "case structure" solution again. In contrast to my previous results, there was a speed improvement.I have put the case structure back into my code, and since the logic is more complicated, the case statement really improves the speed. Quote Link to comment
hooovahh Posted December 10, 2015 Report Share Posted December 10, 2015 LabVIEW can do some funny things, with constant folding, deleting dead code, and caching results from previous calculations. That coupled with the fact that UI elements are updated asynchronously, probes, automatic error handling, and debugging, can cause code that measure execution speed to be some thing that is trickier than initially thought. You may want to post the code you made which calculates this speed test so we can determine if it is calculating a speed properly. 1 Quote Link to comment
ThomasGutzler Posted December 10, 2015 Report Share Posted December 10, 2015 You should probably build your code into an executable to avoid the dev env overhead before doing the benchmarking Quote Link to comment
ArjanWiskerke Posted December 11, 2015 Author Report Share Posted December 11, 2015 OK, Here is the code I used for testing. There is disabled code to generate random Boolean array's. I ran that code and copied the data into the array constants. before testing the alternative solutions. I had to empty the arrays again to reduce the file size. testShortCutEvaluation.vi Quote Link to comment
CraigC Posted December 11, 2015 Report Share Posted December 11, 2015 I make crossrulz first solution approx 9 times faster in the shortcut and 4-5 times faster in the non shortcut scenario. You have added a build array and a Bool array to Number into the equation. I think it is the first time I have benchmarked something this low level before! Craig Quote Link to comment
LogMAN Posted December 11, 2015 Report Share Posted December 11, 2015 (edited) I get very similar results to the ones CraigC has shown. The numeric conversion really is no good option here, neither for performance, nor for memory. In order to find an answer to the initial question I have performed my own tests based on your benchmarks before. My benchmark includes several cases with different ways of solving the same issue. Each case will use the same values and the VI runs in a loop to get some long-term timings. The AND condition can either be constant TRUE, FALSE or random values. The last case will perform the same operation as all others before, but this time without a for-loop, so basically this is totally LabVIEW optimized. For readability a chart shows the timings in comparison. Oh yeah and since we use for-loops there are also cases where the array is initialized beforehand and values are replaced each iteration to prevent memory allocations (who knows, might safe time ). Here is the snippet (running in LV2015, attached VI is saved for LV2011 though): Please tell if you find anything wrong with how it is implemented. Benchmarking is no easy task unfortunately. Now here are some results based on that VI. First with the AND being connected to RANDOM values: Next the AND is connected to TRUE values: And finally AND connected to FALSE: So what do we get from that? Basically initializing a Boolean array and replacing values in a for-loop did not work very well (all the '(no allocation)' times). My guess is that the compiler optimization is just better than me, since the number of iterations is pre-determined LabVIEW can optimize the code way better. All timings with shortcut are worse than the ones without. So using a shortcut does not improve performance, in fact its just the opposite. Results might change if the Boolean operations are way more complex, however in this case LabVIEW is doing its job way better than we do with our 'shortcut'. The last case without for-loop ('Fastest solution') gives a clear answer. LabVIEW will optimize the code to work on such an amount of data. My guess is that the operations also use multiple threads inside, so maybe we get different results with parallel for-loops enabled in the other cases. And most likely much more stuff. The most interesting point is the comparison of timings between the 'Fastest solution'. As you can see there is no real difference between any of the three cases. This could either mean my benchmark is wrong, or LabVIEW has no shortcut optimizations like C. What do you think? Shortcut Benchmark LV2011.vi Edited December 11, 2015 by LogMAN Quote Link to comment
hooovahh Posted December 11, 2015 Report Share Posted December 11, 2015 Looks pretty good to me. The only improvements I could suggest (which I tried and have no real effect) is to disable debugging and automatic error handling, replace compound arithmetic with the native AND and OR. And I would assume LabVIEW could look up stream and find that the boolean array output isn't used and might optimize that away. But adding an indicator (outside the for loop with the other indicators) didn't change the result of the test. So in this situation LabVIEW does its job better than we can. Or rather the compiler can use things it knows and create code that performs better than we could create. And honestly this is one of the staples of LabVIEW programming. "Don't worry about it the compiler will take care of it." This statement is both something I've relied on, and something I've hated at times. The compiler isn't perfect, but I obviously trust it enough to rely on it daily. 2 Quote Link to comment
ArjanWiskerke Posted December 12, 2015 Author Report Share Posted December 12, 2015 I am still surprised the option {build array; convert to Byte} does so bad because I think of those nodes as primitive labVIEW operations. Convert to Byte is not a frequently used node, maybe It is not a very basic operation after all.I am also a bit disappointed because I like the idea of editing a truth table. Of course, performance wins from beauty! Quote Link to comment
LogMAN Posted December 12, 2015 Report Share Posted December 12, 2015 Try to estimate the amount of CPU cycles involved. Your code has two issues: 1) The bits must be put together to form an array (they originate from three different locations in memory). 2) You have an array of 3 bits, so that makes 1 Byte (least amount of memory that can be allocated). The numeric representation is a U32 (4 Bytes), so it won't fit without additional work. The System actually has to allocate and initialize additional memory. After that there is a case-structure with n-cases. The CPU has to compare each case one-by-one until it finds a match. Now the primitive solution: First the CPU gets a copy of one bit from each the first and second array and performs the OR operation (1 cycle). The result is kept in the CPU cache and used for the following AND operation (another 1 cycle). No additional memory is required. The primitive solution could actually work on 32 or even 64 Bits simultaneously (depending on the bitness of your machine and the CPU instructions used), where the numeric solution must be done one-by-one. Hope this makes things a bit more clear. Quote Link to comment
Neil Pate Posted December 13, 2015 Report Share Posted December 13, 2015 (edited) I think booleans in LabVIEW are stored as bytes, not bits. All bits clear is a false and any other pattern is true. This can lead to some interesting tricks you can play when typecasting in between boolean and characters. Edited December 13, 2015 by Neil Pate 1 Quote Link to comment
LogMAN Posted December 13, 2015 Report Share Posted December 13, 2015 I think booleans in LabVIEW are stored as bytes, not bits. All bits clear is a false and any other pattern is true. You are right! I've just tried it. Initializing an array of Boolean vs. array of U8 allocates the exact same amount of memory. I did not know that! I honestly thought LabVIEW would auto-magically merge bits into bytes... Reality strikes again So I revoke my last statement and state the opposite: The primitive solution could actually work on 32 or even 64 Bits simultaneously (depending on the bitness of your machine and the CPU instructions used), where the numeric solution must be done one-by-one. The primitive solution won't work on multiple Bits simultaneously. We could do this ourselves by manually joining Boolean to Bytes and using the primitives on Bytes instead of Boolean. Not sure if there is anything gained from it (in terms of computational performance) and I think it's not subject to this topic, so I leave it at that. Quote Link to comment
Rolf Kalbermatter Posted December 13, 2015 Report Share Posted December 13, 2015 I think booleans in LabVIEW are stored as bytes, not bits. All bits clear is a false and any other pattern is true. This can lead to some interesting tricks you can play when typecasting in between boolean and characters. That's right. Before LabVIEW 5, Booleans were 16 bit integers and Boolean arrays were packed into 16 bit integers too. That had however several performance penalties in several places and was anything but standard to anything else except some MacOS precedences, so was dropped in favor of a more standard implementation with bytes for each boolean. You are right! I've just tried it. Initializing an array of Boolean vs. array of U8 allocates the exact same amount of memory. I did not know that! I honestly thought LabVIEW would auto-magically merge bits into bytes... Reality strikes again Reality is that there is no such thing as a perfect implementation for every possible use case. Your packed array solution you assumed LabVIEW to use, had severe performance limitations in LabVIEW 3 and was therefore dropped in favor of a more common way that did consume more memory when boolean arrays were involved but made conversion between boolean arrays and other formats simpler and generally more performant. But there is a simple rule: If you are concerned about this kind of performance optimization then don't use boolean arrays at all! They involve memory manager operations all over the place and those are magnitudes slower than a few hundred CPU cycles with which you can do just about any boolean operation you might ever dream up. For such performance optimization you need to look at properly sized integers and do boolean arithmetic on them. And if your "boolean array" gets over 64 bits you should definitely look at your algorithme. Most likely you have chosen the easy path of working with boolean arrays in order to not have to think about proper algorithme implementation but if you go that path, worrying about sub microseconds optimizations is definitely a lost battle already. One of the worst performance killers with the packed boolean array implementation in LabVIEW 3 was autoindexing. Suddenly a lot of register shifts and boolean masking had to be done on every autoindex terminal for a boolean array. That made such loops magnitudes slower than a simple adress increment when using byte arrays for boolean arrays. 2 Quote Link to comment
ShaunR Posted December 13, 2015 Report Share Posted December 13, 2015 So what do we get from that? Memory [re]allocations in diagrams are expensive and Boolean operator performance is trivial in comparison?. If you have to worry about a few nanoseconds of fundamental operations, then a general purpose desktop is not the platform to use? 1 Quote Link to comment
smithd Posted December 13, 2015 Report Share Posted December 13, 2015 Also keep in mind providing random values to a case structure means a lot of conditional jumps that the processor gets wrong. You're trading the execution time of an AND or an OR for the execution time of finding the next operation to call in memory and potentially getting the wrong one and having to back up. I'm actually surprised the case structure isn't significantly worse. Quote Link to comment
ArjanWiskerke Posted December 23, 2015 Author Report Share Posted December 23, 2015 (edited) By the way, You can use a Lookup table / Karnaugh diagram if you start with Byte's instead of Boolean's. Still slower than the logic (0.027 versus 0.022) . Edited December 23, 2015 by ArjanWiskerke Quote Link to comment
hooovahh Posted January 4, 2016 Report Share Posted January 4, 2016 The way this is coded these two processes happen in parallel. If you are on a single core you may get inconsistent results. Also with UI's being updated asynchronously, if one finished, before the other (which we expect) then your timing won't be consistent because your computer will be off updating the UI and may take away cycle counts from the other process. Basically the timing information from that VI can't be trusted, but the fact that the numbers are so small means it probably doesn't matter anyway. Also can't that bottom for loop be removed all together? AND and OR work on arrays as well as scalars. Quote Link to comment
smithd Posted January 5, 2016 Report Share Posted January 5, 2016 (edited) By the way, You can use a Lookup table / Karnaugh diagram if you start with Byte's instead of Boolean's. Still slower than the logic (0.027 versus 0.022) . This makes sense if you break them both down: Top option: -Load int, load int (likely pre-loaded since I'm assuming thats the point of letting labview do autoindexing, so it should be fast) -Add int, int -Convert someIntType->int (likely free) -Add int, int -Convert result to I32 (maybe free) -Bounds check on index -Random access from memory at index ...still waiting ...still waiting... ...still waiting... ...result -Store result in output array Bottom Option: -Load int, int (fast) -Or int, int -And int, int -Move result to output array The way this is coded these two processes happen in parallel. If you are on a single core you may get inconsistent results. Also with UI's being updated asynchronously, if one finished, before the other (which we expect) then your timing won't be consistent because your computer will be off updating the UI and may take away cycle counts from the other process. Basically the timing information from that VI can't be trusted, but the fact that the numbers are so small means it probably doesn't matter anyway. Also can't that bottom for loop be removed all together? AND and OR work on arrays as well as scalars. Some of it happens in parallel but there is a wire wrapping around from the end of the top loop to the start of the bottom. Edited January 5, 2016 by smithd Quote Link to comment
hooovahh Posted January 5, 2016 Report Share Posted January 5, 2016 Oh so sorry I missed that, tiny light green wire is hard to see on white background. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.