Buffer hunting

bsvingen · March 18, 2009

It occured to me that the primitives + - and so on always create a buffer. With large datasets this is not good, and it is especially evident when using LVOOP. Basically this means that the primitives are made as functions returning a (new) value. For instance, + is written as:

double plus(a,b)

{ return a+b }

More often this is actually what I want (I have omitted references for c, because this has nothing to do with it):

void plus(a,b,c)

{ c = a+b }

What I would like to have is a "+" primitive with 3 inputs and one output, where the result is set in the third input. This will create no buffers. LV can be tricked into doing this, but it is counterintuitive and not very elegant compared with using a primitive, still it executes faster. I have included a very simple example. On my whish list.

Download File:post-4885-1237247031.zip

jdunham · March 18, 2009

QUOTE (bsvingen @ Mar 16 2009, 03:44 PM)

It occured to me that the primitives + - and so on always create a buffer.

Well only if you need to keep the inputs and the result, like you are doing. LabVIEW generally reuses buffers when it is safe to do so, according to the "VI Memory Usage" chapter of the LabVIEW help manuals (and according to the vehement declarations of every LabVIEW R&D engineer I've ever met). I don't think the use of lvclasses has any effect (according to the vehement declarations of Aristos Queue).

QUOTE (bsvingen @ Mar 16 2009, 03:44 PM)

More often this is actually what I want (I have omitted references for c, because this has nothing to do with it):

void plus(a,b,c)

{ c = a+b }

Well in your example, when a and b are added, c is nowhere around, so how can it know to direct the output to C?

[For those of you playing along at home, a, b, and c are all arrays.]

If you want to control memory, you need to use the Memory Control palette. (If I remember correctly, you've been using LV 8.2 until recently, so this is a new feature).

Even so, I don't think you can avoid some kind of copy if you want to preserve A and B and save the results in C. Sure you preallocated C somewhere earlier, but this diagram doesn't know that C is big enough to hold the output of A+B.

Using the inplace structure helps out, but I couldn't get the swap node to make it any faster. In the diagram below, the top left diagram executed about 30% faster than your formula node, and the other three were just a hair slower than the formula node.

mje · March 18, 2009

My original thought was to slap in an inplace structure and see what happens. Sure enough, it's the winner.

The curiousness is the allocation that appears, but I remember something AQ said once in a thread a few months back (can't seem to track it down). It amounts to buffer allocations are not always used. I'd hazard a guess that in this case, the buffer allocation must be able to be there, since there's no way to know the size of the arrays ahead of time. However, under tested conditions, they're all matched in size, which allows the in-place structure to reuse the 3 buffers and forgo the allocation completely.

The question that then pops into my mind, is then why is the native behavior of the add prim such that when not operating in place and working with two arrays, the allocation goes on an input? Notice when operating in place it moves to an output. That little nugget seems to suggest to me what might be preventing the optimization of the original code. LabVIEW's pretty smart about such optimizations usually, the fact that it doesn't work in this case I find a bit surprising.

As an aside, I was fiddling around with this for the last twenty minutes or so and got much the same results that jdunham summarized in his image. I added another case though:

Which would be similar to bsvingen's alternative case, but using native LV code. It does invoke an allocation of a non-arrayed DBL. I was surprised at how slow it was: worse out of any of the cases (though I never considered the swap bytes prim).I suppose the reason is that although the allocation is likely on the stack (always the same size, just a single DBL value), it still needs to iteratively copy the value from that memory location into the buffer, opposed to writing directly to it. I'd guess I had traded an array allocation for an array copy in that case.

jdunham · March 18, 2009

QUOTE (MJE @ Mar 16 2009, 08:45 PM)

The curiousness is the allocation that appears, but I remember something AQ said once in a thread a few months back (can't seem to track it down).

I remember, but I couldn't find it either. I blame the NSA.

QUOTE (MJE)

The question that then pops into my mind, is then why is the native behavior of the add prim such that when not operating in place and working with two arrays, the allocation goes on an

input

? Notice when operating in place it moves to an output. That little nugget seems to suggest to me what might be preventing the optimization of the original code. LabVIEW's pretty smart about such optimizations usually, the fact that it doesn't work in this case I find a bit surprising.

Well I certainly don't understand the ins and outs of the dots' locations. But it's unsurprising to see the copy on the input of A+B without an inplace node. Since A and B have to be preserved in the output cluster, it's not possible to perform the add without a new array to hold the results, created before the add is executed.

Now when the inplace frame is added, I don't konw why the buffer dot moves to the outputs.

QUOTE (MJE)

As an aside, I was fiddling around with this for the last twenty minutes or so and got much the same results that jdunham summarized in his image. I added another case though:

I tried a couple of other things, but nothing else was fast.

I guess I should upload my changes to the benchmark in case anyone else wants to play along.

bsvingen · March 19, 2009

I think my example was a bit too simple (try the slightly more complex c=sqrt(a*a + b*b) and make that execute faster than a formula node version ) . But the point remains, this would be much easier with a three input "+", or a more generalized version of that principle where no buffers are created in the first place. Nevertheless, a = a +b (in contrast with c = a + b) will also create a buffer because LV has no assignment, there are only functions returning for instance a+b where the result pops out of nowhere.

But those in-place structures and things looks cool. I have to look more into them.

jdunham · March 19, 2009

QUOTE (bsvingen @ Mar 17 2009, 03:46 PM)

Nevertheless, a = a +b (in contrast with c = a + b) will also create a buffer because LV has no assignment

Nope. A+B will only create a buffer if you need to keep A and B around in addition to the result. While I agree LV can get squirrely when you work with huge datasets (I realize that's exactly your concern), but not normally needing to worry about assignment or storage at all is a huge benefit.

If you do a lot of work with huge datasets, you might want to buy IMAQ Vision, and treat your data as images. There are all kinds of inplace math functions and I bet they will be tons faster than native LV for million-point datasets.

bsvingen · March 19, 2009

QUOTE (jdunham @ Mar 18 2009, 07:14 AM)

Nope. A+B will only create a buffer if you need to keep A and B around in addition to the result. While I agree LV can get squirrely when you work with huge datasets (I realize that's exactly your concern), but not normally needing to worry about assignment or storage at all is a huge benefit.
If you do a lot of work with huge datasets, you might want to buy IMAQ Vision, and treat your data as images. There are all kinds of inplace math functions and I bet they will be tons faster than native LV for million-point datasets.

Well, I still find chasing for buffers is more of an black art. Maybe some day I get it :rolleyes:

Sign In

Buffer hunting

Recommended Posts

bsvingen

Link to comment

jdunham

Link to comment

mje

Link to comment

jdunham

Link to comment

bsvingen

Link to comment

jdunham

Link to comment

bsvingen

Link to comment

Join the conversation

Browse

Activity

Important Information