Jump to content
Oakromulo

Formula nodes: code readability comes at a price

Recommended Posts

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI.

Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster.

Inlined Std SubVI

rtp6j9.png

Formula Node SubVI

2ex7vq0.png

evalSugenoFnode.vi

evalSugenoInline.vi

perfComp.vi

PerfCompProject.zip

  • Like 2

Share this post


Link to post
Share on other sites

I agree, LabVIEW lacks in readability when it comes to doing math like this, but there's not much you can do about it. Now with wire labels it helps a little bit because you can label your intermediate "variables" so-to-speak. You've made it about as clean as you can. I'm guessing the formula node has some sort of overhead on a "per call" basis so calling it in the for loop is causing the long execution times as compared to the primitives. I think, in general, primitives are always the best option in terms of performance due to optimization (but someone else can probably give better detail on the why than I can so I'll just leave it at that).

Edited by for(imstuck)

Share this post


Link to post
Share on other sites

Yeah... there must be some constant overhead when calling the formula nodes. There's also the option to try array manipulation inside the node, therefore removing the for loop outside. I'll give it a try it later.

Share this post


Link to post
Share on other sites

Another surprise over here... I've tried the formula node with c-like array manipulation, with constant dimensions. This time the primitives have been 71% faster than formula node. The O(n) overhead theory seems unlikely...

eqti8j.png

Slight modification, same results:

35bdmyh.png

2upuwlw.png

Edited by Oakromulo

Share this post


Link to post
Share on other sites

What I usually do in these sort of cases is to add an image of the equations to the BD of the VI.

Another option might be to use a Math Node.

Share this post


Link to post
Share on other sites

LaTex --> G... that's awesome. Definitely should become a core LV feature! It'd be nice to meet Darin in the next NI Week...

By the way, the equations represent a simplified First Order Sugeno Fuzzy Inference System. Always a good idea to add them to the VIs!

141uueq.png

Edited by Oakromulo

Share this post


Link to post
Share on other sites

Formula nodes are for c and matlab programmers that can't get their head around LabVIEW (or don't want to learn it).

It's well known that it is a lot slower than native LV and it's a bit like the "Sequence Frame" in that it is generally avoided. I would guess there are optimisations that LabVIEW is unable to do if using the formula node which are reliant on the G language semantics (in-placeness?).

Share this post


Link to post
Share on other sites

Some time ago I performed some benchmarks of Formula Node and the conclusion was that the difference from native code is neglectable (until you don't use arrays in the node). But it was on single core machine. I think that difference you observe comes from execution parallelism.

  • Like 1

Share this post


Link to post
Share on other sites

Formula nodes are slow, and in particular when using array manipulations. Things gets even worse for RT. This is a pity, because when doing math you want to just take one look and reckognize the code. I typically refactor as much as possible, and include a text with the equivalent text code.

Share this post


Link to post
Share on other sites

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI.

Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster.

Inlined Std SubVI

rtp6j9.png

Formula Node SubVI

2ex7vq0.png

I only get a speed improvement of 4%

  • LabVIEW 2012
  • Win7 32-bit
  • Intel i5-2410M @2.3 GHz

Ton

Share this post


Link to post
Share on other sites

I only get a speed improvement of 4%

  • LabVIEW 2012
  • Win7 32-bit
  • Intel i5-2410M @2.3 GHz

Ton

Then you are doing something wrong that does not filter out overhead etc. I also get consistently 20-30 % improvements with diagram vs formula node. Try doing 2D array math in a formula node. Last I checked it was 50-100% slow down (but that was a looong time ago). Writing a DLL in C gives the fastest running code, but that kind of defeats the purpose of making the code accessible, readable and maintainable. Wire diagram HAS improved in the latest iterations of LV, and IMO that is overall the best solution (given reasonably complex math).

I have written a matrix solver using exclusively wires. It's pritty fast for matrix smaller than approximately 200 x 200. For larger matrixes the native solver (using DLL) is faster, but I guess one of the main reason it is faster is it probably uses a more complex algorithm that scales better, parallel maybe, I don't know.

Share this post


Link to post
Share on other sites

Ton,

Same thing here... I ran the first comparison again in my laptop at work and it was just 5% faster too!

Desktop (home):

AMD Phenom II 965BE C3 @ 3.7 GHz (quad core)

8GB DDR3-2000 CL5

Laptop (work)

Intel Core i5 M540 @ 2.53 GHz (dual core, Hyper Threading enabled)

6GB DDR3-1333 CL8

Both with Win7 x64 and LV2011.

bsvingen,

I think I'm going to try an equivalent DLL to be called from LV. I have little to no experience with DLLs on LV apart from the system ones.

vugie,

If I push the code inside a timed loop with manual affinity, is it safe to say it runs only in a single core?

Edited by Oakromulo

Share this post


Link to post
Share on other sites

Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster!

k16vdv.png

4hv3w1.png

PerfCompProject2.zip

Edited by Oakromulo

Share this post


Link to post
Share on other sites

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Welcome! And +1 for the meticulous style.

Share this post


Link to post
Share on other sites

Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster!

Move the indicators out of the for loops. ;)

Edited by ShaunR
  • Like 1

Share this post


Link to post
Share on other sites

Move the indicators out of the for loops. ;)

With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option.

Edited by Oakromulo

Share this post


Link to post
Share on other sites

I just realized now that the percentiles have been calculated in a very wrong way. I invite you all to check the new comparison below with a queue structure.

9fvfht.png

Now with parallelized for loops and queue, the primitives were a full 4 times faster than the formula node SubVI!

PerfCompProject3.zip

Edited by Oakromulo

Share this post


Link to post
Share on other sites
With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option.

Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms.

Edited by ShaunR

Share this post


Link to post
Share on other sites

Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms.

Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one.

2mpblv4.png

Edited by Oakromulo

Share this post


Link to post
Share on other sites

Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one.

A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory.,

Share this post


Link to post
Share on other sites

A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory.,

I've forgotten that only RT FIFOs are pre-allocated and so enable constant-time writes. This time I replaced the queue with a DBL functional global variable.

xqb1bd.png

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.