Oakromulo Posted December 17, 2012 Report Posted December 17, 2012 After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit. Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI. Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster. Inlined Std SubVI Formula Node SubVI evalSugenoFnode.vi evalSugenoInline.vi perfComp.vi PerfCompProject.zip 2 Quote
GregFreeman Posted December 17, 2012 Report Posted December 17, 2012 (edited) I agree, LabVIEW lacks in readability when it comes to doing math like this, but there's not much you can do about it. Now with wire labels it helps a little bit because you can label your intermediate "variables" so-to-speak. You've made it about as clean as you can. I'm guessing the formula node has some sort of overhead on a "per call" basis so calling it in the for loop is causing the long execution times as compared to the primitives. I think, in general, primitives are always the best option in terms of performance due to optimization (but someone else can probably give better detail on the why than I can so I'll just leave it at that). Edited December 17, 2012 by for(imstuck) Quote
Oakromulo Posted December 17, 2012 Author Report Posted December 17, 2012 Yeah... there must be some constant overhead when calling the formula nodes. There's also the option to try array manipulation inside the node, therefore removing the for loop outside. I'll give it a try it later. Quote
Oakromulo Posted December 17, 2012 Author Report Posted December 17, 2012 (edited) Another surprise over here... I've tried the formula node with c-like array manipulation, with constant dimensions. This time the primitives have been 71% faster than formula node. The O(n) overhead theory seems unlikely... Slight modification, same results: Edited December 17, 2012 by Oakromulo Quote
GregSands Posted December 17, 2012 Report Posted December 17, 2012 What I usually do in these sort of cases is to add an image of the equations to the BD of the VI. Another option might be to use a Math Node. Quote
Oakromulo Posted December 17, 2012 Author Report Posted December 17, 2012 (edited) LaTex --> G... that's awesome. Definitely should become a core LV feature! It'd be nice to meet Darin in the next NI Week... By the way, the equations represent a simplified First Order Sugeno Fuzzy Inference System. Always a good idea to add them to the VIs! Edited December 17, 2012 by Oakromulo Quote
Elset Posted December 17, 2012 Report Posted December 17, 2012 FYI. Not sure if the discussion is still applicable 3 years later... Quote
Oakromulo Posted December 17, 2012 Author Report Posted December 17, 2012 Tim, Probably not... though it'd be interesting to know a little bit more about what happens behind the node. Quote
ShaunR Posted December 17, 2012 Report Posted December 17, 2012 Formula nodes are for c and matlab programmers that can't get their head around LabVIEW (or don't want to learn it). It's well known that it is a lot slower than native LV and it's a bit like the "Sequence Frame" in that it is generally avoided. I would guess there are optimisations that LabVIEW is unable to do if using the formula node which are reliant on the G language semantics (in-placeness?). Quote
vugie Posted December 18, 2012 Report Posted December 18, 2012 Some time ago I performed some benchmarks of Formula Node and the conclusion was that the difference from native code is neglectable (until you don't use arrays in the node). But it was on single core machine. I think that difference you observe comes from execution parallelism. 1 Quote
bsvingen Posted December 18, 2012 Report Posted December 18, 2012 Formula nodes are slow, and in particular when using array manipulations. Things gets even worse for RT. This is a pity, because when doing math you want to just take one look and reckognize the code. I typically refactor as much as possible, and include a text with the equivalent text code. Quote
Ton Plomp Posted December 19, 2012 Report Posted December 19, 2012 After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit. Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI. Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster. Inlined Std SubVI Formula Node SubVI I only get a speed improvement of 4% LabVIEW 2012 Win7 32-bit Intel i5-2410M @2.3 GHz Ton Quote
bsvingen Posted December 19, 2012 Report Posted December 19, 2012 I only get a speed improvement of 4% LabVIEW 2012 Win7 32-bit Intel i5-2410M @2.3 GHz Ton Then you are doing something wrong that does not filter out overhead etc. I also get consistently 20-30 % improvements with diagram vs formula node. Try doing 2D array math in a formula node. Last I checked it was 50-100% slow down (but that was a looong time ago). Writing a DLL in C gives the fastest running code, but that kind of defeats the purpose of making the code accessible, readable and maintainable. Wire diagram HAS improved in the latest iterations of LV, and IMO that is overall the best solution (given reasonably complex math). I have written a matrix solver using exclusively wires. It's pritty fast for matrix smaller than approximately 200 x 200. For larger matrixes the native solver (using DLL) is faster, but I guess one of the main reason it is faster is it probably uses a more complex algorithm that scales better, parallel maybe, I don't know. Quote
Ton Plomp Posted December 19, 2012 Report Posted December 19, 2012 I have no idea what changed, but currently I get 40% improvement. Ton Quote
Oakromulo Posted December 21, 2012 Author Report Posted December 21, 2012 (edited) Ton, Same thing here... I ran the first comparison again in my laptop at work and it was just 5% faster too! Desktop (home): AMD Phenom II 965BE C3 @ 3.7 GHz (quad core) 8GB DDR3-2000 CL5 Laptop (work) Intel Core i5 M540 @ 2.53 GHz (dual core, Hyper Threading enabled) 6GB DDR3-1333 CL8 Both with Win7 x64 and LV2011. bsvingen, I think I'm going to try an equivalent DLL to be called from LV. I have little to no experience with DLLs on LV apart from the system ones. vugie, If I push the code inside a timed loop with manual affinity, is it safe to say it runs only in a single core? Edited December 21, 2012 by Oakromulo Quote
Oakromulo Posted December 21, 2012 Author Report Posted December 21, 2012 (edited) Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster! PerfCompProject2.zip Edited December 21, 2012 by Oakromulo Quote
JackDunaway Posted December 22, 2012 Report Posted December 22, 2012 After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit. Welcome! And +1 for the meticulous style. Quote
ShaunR Posted December 22, 2012 Report Posted December 22, 2012 (edited) Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster! Move the indicators out of the for loops. Edited December 22, 2012 by ShaunR 1 Quote
GregFreeman Posted December 22, 2012 Report Posted December 22, 2012 I have no idea what changed, but currently I get 40% improvement. Ton Empty arrays on the first test Quote
Oakromulo Posted December 23, 2012 Author Report Posted December 23, 2012 (edited) Move the indicators out of the for loops. With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option. Edited December 23, 2012 by Oakromulo Quote
Oakromulo Posted December 23, 2012 Author Report Posted December 23, 2012 (edited) I just realized now that the percentiles have been calculated in a very wrong way. I invite you all to check the new comparison below with a queue structure. Now with parallelized for loops and queue, the primitives were a full 4 times faster than the formula node SubVI! PerfCompProject3.zip Edited December 23, 2012 by Oakromulo Quote
ShaunR Posted December 23, 2012 Report Posted December 23, 2012 (edited) With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option. Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms. Edited December 23, 2012 by ShaunR Quote
Oakromulo Posted December 23, 2012 Author Report Posted December 23, 2012 (edited) Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms. Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one. Edited December 23, 2012 by Oakromulo Quote
ShaunR Posted December 23, 2012 Report Posted December 23, 2012 Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one. A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory., Quote
Oakromulo Posted December 23, 2012 Author Report Posted December 23, 2012 A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory., I've forgotten that only RT FIFOs are pre-allocated and so enable constant-time writes. This time I replaced the queue with a DBL functional global variable. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.