Formula nodes: code readability comes at a price

Oakromulo · December 17, 2012

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI.

Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster.

Inlined Std SubVI

Formula Node SubVI

GregFreeman · December 17, 2012

I agree, LabVIEW lacks in readability when it comes to doing math like this, but there's not much you can do about it. Now with wire labels it helps a little bit because you can label your intermediate "variables" so-to-speak. You've made it about as clean as you can. I'm guessing the formula node has some sort of overhead on a "per call" basis so calling it in the for loop is causing the long execution times as compared to the primitives. I think, in general, primitives are always the best option in terms of performance due to optimization (but someone else can probably give better detail on the why than I can so I'll just leave it at that).

Edited December 17, 2012 by for(imstuck)

Oakromulo · December 17, 2012

Yeah... there must be some constant overhead when calling the formula nodes. There's also the option to try array manipulation inside the node, therefore removing the for loop outside. I'll give it a try it later.

Oakromulo · December 17, 2012

Another surprise over here... I've tried the formula node with c-like array manipulation, with constant dimensions. This time the primitives have been 71% faster than formula node. The O(n) overhead theory seems unlikely...

Slight modification, same results:

Edited December 17, 2012 by Oakromulo

GregSands · December 17, 2012

What I usually do in these sort of cases is to add an image of the equations to the BD of the VI.

Another option might be to use a Math Node.

Oakromulo · December 17, 2012

LaTex --> G... that's awesome. Definitely should become a core LV feature! It'd be nice to meet Darin in the next NI Week...

By the way, the equations represent a simplified First Order Sugeno Fuzzy Inference System. Always a good idea to add them to the VIs!

Edited December 17, 2012 by Oakromulo

Elset · December 17, 2012

FYI. Not sure if the discussion is still applicable 3 years later...

Oakromulo · December 17, 2012

Tim,

Probably not... though it'd be interesting to know a little bit more about what happens behind the node.

ShaunR · December 17, 2012

Formula nodes are for c and matlab programmers that can't get their head around LabVIEW (or don't want to learn it).

It's well known that it is a lot slower than native LV and it's a bit like the "Sequence Frame" in that it is generally avoided. I would guess there are optimisations that LabVIEW is unable to do if using the formula node which are reliant on the G language semantics (in-placeness?).

vugie · December 18, 2012

Some time ago I performed some benchmarks of Formula Node and the conclusion was that the difference from native code is neglectable (until you don't use arrays in the node). But it was on single core machine. I think that difference you observe comes from execution parallelism.

bsvingen · December 18, 2012

Formula nodes are slow, and in particular when using array manipulations. Things gets even worse for RT. This is a pity, because when doing math you want to just take one look and reckognize the code. I typically refactor as much as possible, and include a text with the equivalent text code.

Ton Plomp · December 19, 2012

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Right now, I'm working on a project that involves lots of data mining operations through a neurofuzzy controller to predict future values from some inputs. For this reason, the code needs to be as optimized as possible. With that idea in mind I've tried to implement the same controller using both a Formula Node structure and Standard 1D Array Operators inside an inlined SubVI.

Well... the results have been impressive for me. I've thought the SubVI with the Formula Node would perform a little bit better than the other one with standard array operators. In fact, it was quite the opposite. The inlined SubVI was consistently around 26% faster.

Inlined Std SubVI

Formula Node SubVI

I only get a speed improvement of 4%

LabVIEW 2012
Win7 32-bit
Intel i5-2410M @2.3 GHz

Ton

bsvingen · December 19, 2012

I only get a speed improvement of 4%

LabVIEW 2012

Win7 32-bit

Intel i5-2410M @2.3 GHz

Ton

Then you are doing something wrong that does not filter out overhead etc. I also get consistently 20-30 % improvements with diagram vs formula node. Try doing 2D array math in a formula node. Last I checked it was 50-100% slow down (but that was a looong time ago). Writing a DLL in C gives the fastest running code, but that kind of defeats the purpose of making the code accessible, readable and maintainable. Wire diagram HAS improved in the latest iterations of LV, and IMO that is overall the best solution (given reasonably complex math).

I have written a matrix solver using exclusively wires. It's pritty fast for matrix smaller than approximately 200 x 200. For larger matrixes the native solver (using DLL) is faster, but I guess one of the main reason it is faster is it probably uses a more complex algorithm that scales better, parallel maybe, I don't know.

Ton Plomp · December 19, 2012

I have no idea what changed, but currently I get 40% improvement.

Ton

Oakromulo · December 21, 2012

Ton,

Same thing here... I ran the first comparison again in my laptop at work and it was just 5% faster too!

Desktop (home):

AMD Phenom II 965BE C3 @ 3.7 GHz (quad core)

8GB DDR3-2000 CL5

Laptop (work)

Intel Core i5 M540 @ 2.53 GHz (dual core, Hyper Threading enabled)

6GB DDR3-1333 CL8

Both with Win7 x64 and LV2011.

bsvingen,

I think I'm going to try an equivalent DLL to be called from LV. I have little to no experience with DLLs on LV apart from the system ones.

vugie,

If I push the code inside a timed loop with manual affinity, is it safe to say it runs only in a single core?

Edited December 21, 2012 by Oakromulo

Oakromulo · December 21, 2012

Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster!

PerfCompProject2.zip

Edited December 21, 2012 by Oakromulo

JackDunaway · December 22, 2012

After two years "leeching" content every now and then from the Lava community I think it's time to contribute a little bit.

Welcome! And +1 for the meticulous style.

ShaunR · December 22, 2012

Another test: the comparison with parallelized for loops with 4 instances (number of physical + HT cores) with the i5 laptop has resulted in an amazing 89% faster!

Move the indicators out of the for loops.

Edited December 22, 2012 by ShaunR

GregFreeman · December 22, 2012

I have no idea what changed, but currently I get 40% improvement.

Ton

Empty arrays on the first test :lol:

Oakromulo · December 23, 2012

Move the indicators out of the for loops.

With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option.

Edited December 23, 2012 by Oakromulo

Oakromulo · December 23, 2012

I just realized now that the percentiles have been calculated in a very wrong way. I invite you all to check the new comparison below with a queue structure.

Now with parallelized for loops and queue, the primitives were a full 4 times faster than the formula node SubVI!

PerfCompProject3.zip

Edited December 23, 2012 by Oakromulo

ShaunR · December 23, 2012

With output auto-indexing disabled, wouldn't the indicators outside the loop kick in compiler optimizations? Anyway, a queue in this case seems a better option.

Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms.

Edited December 23, 2012 by ShaunR

Oakromulo · December 23, 2012

Yes. That's what you want, right? Fast? Also. LV has to task switch to the UI thread. UI components kill performance and humans cant see any useful information at those sorts of speeds anyway ( 10s of ms ) . If you really want to show some numbers whizzing around, use a notifier or local variable and update the UI in a separate loop every, say, 150ms.

Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one.

Edited December 23, 2012 by Oakromulo

ShaunR · December 23, 2012

Sure! I've added the indicators just for avoiding the "unused code/dangling pin" compiler optimization. You're right, it wasn't very clever, the queue idea is much better. The slow random number generator inside the for loop is there for the same reason to avoid unfair comparisons between the formula node SubVI and the standard one.

A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory.,

Oakromulo · December 23, 2012

A local variable will be the fastest except for putting the indicator outside (and won't kick in that particular optimisation as long as you read it somewhere I think). The queues, however will have to reallocate memory as the data grows, so they are better if you want all the data, but a local or notifier would be preferable as they don't grow memory.,

I've forgotten that only RT FIFOs are pre-allocated and so enable constant-time writes. This time I replaced the queue with a DBL functional global variable.

Sign In

Formula nodes: code readability comes at a price

Recommended Posts

Oakromulo

GregFreeman

Oakromulo

Oakromulo

GregSands

Oakromulo

Elset

Oakromulo

ShaunR

vugie

bsvingen

Ton Plomp

bsvingen

Ton Plomp

Oakromulo

Oakromulo

JackDunaway

ShaunR

GregFreeman

Oakromulo

Oakromulo

ShaunR

Oakromulo

ShaunR

Oakromulo

Join the conversation

Similar Content

[CR] LV muParser 1 2 3 4 7

LV muParser

DevOps Engineer -- for audio test & measurement in Boston (LISTEN)

MoveBlock duration

How can i implementing a formula for spectrum in labview?

Browse

Activity

Important Information