Can the speed of this code be improved?

mooner · October 31, 2023

I have a very large array and need to get data with row indexes 0, 3, 6, 9 ...... to form a new array. Attached is my current code (LabVIEW2021) and I find the runtime a bit slow. Is there a faster way to do this?

demo.vi

ensegre · October 31, 2023

This is about three times faster for me:

demo2.vi

Mads · October 31, 2023

A bit quicker it seems is to use conditional indexing, like this:

On my machine I got the following results (turned off debugging to avoid it interfering with test and removed the display of input as I would occationally get memory full errors with it):

Original: 492 ms

ensegre original: 95 ms

ensegre with multiply by 3 instead of 3 adds: 85 ms

Conditional indexing: 72 ms

There might be even better ways..The efficiency of conditional indexing is often hard to beat though.

demo_rev1.vi

Edited October 31, 2023 by Mads

ShaunR · October 31, 2023

My go (improvement of ensegre's solution ).

SR_demo.vi

Original: 492 ms

ensegre original: 95 ms

ensegre with multiply by 3 instead of 3 adds: 85 ms

Conditional indexing: 72 ms

Replace array (this one): ~20ms.

Edited October 31, 2023 by ShaunR

Mads · October 31, 2023

You beat us well @ShaunR 🙂

Normally I would use replace in situations like this too, to get the advantage of the preallocated memory. I improved some of the OpenG array functions that exact way (which is now included in Hooovah's version), but lately I've gotten so used to the performance and simplicity of conditional indexing that I did not grab for it here 🤦‍♂️.

Edited October 31, 2023 by Mads

ShaunR · October 31, 2023

What's interesting about ensegre's solution is the unintuitive use of the compound arithmatic in this way. There must be a compiler optimization that it takes advantage of.

ensegre · October 31, 2023

Indeed, it is buffer preallocation what makes the difference. In my first solution I presume that the compiler is implicitly allocating the output buffer, knowing N and the sizes of the input. Shaun's solution beats all because is in place and the allocation of the input is not accounted for, but destroys the original input.

demo3.vi

ensegre · October 31, 2023

6 minutes ago, ShaunR said:

What's interesting about ensegre's solution is the unintuitive use of the compound arithmatic in this way. There must be a compiler optimization that it takes advantage of.

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

ShaunR · October 31, 2023

6 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

With the replace array it makes no difference but in your original it made about 10ms difference (which is why I thought it was a compile optimization)

ensegre · October 31, 2023

I don't want to be picky, but with that solution I get the same ~117ms with compound +++, twice + x3 and 3x, whereas ~127ms with compound arithmetic 3x or x3. Platform and optimizations 🤷‍♂️

ShaunR · October 31, 2023

Being picky

52 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++

vs

20 minutes ago, ensegre said:

~117ms with compound +++, twice + x3 and 3x, whereas ~127ms with compound arithmetic 3x or x3

Can't be both (and that's 10ms)

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Edited October 31, 2023 by ShaunR

Mads · October 31, 2023

When I first saw the compount add I though that must be use because it is quicker than multiplying by 3, but in the tests on my machine it is the slower. With the replace logic though it does not matter anymore.

A digression: Branch prediction is an interesting phenomenon when dealing with optimizations. In this particular case it did not come into play, but in other cases there might be a benefit in making consecutive operations identical 🙂 :

https://stackoverflow.com/questions/289405/effects-of-branch-prediction-on-performance

ensegre · October 31, 2023

13 minutes ago, ShaunR said:

Can't be both (and that's 10ms)

Maybe I wasn't clear enough: replacing Compound arithmetic +++ with Multiply x3 in my BD I did get the same timing (in contrast with Mads), whereas using CompoundArithmetic x3 I got 10ms more. And now to further elaborate I put several variants of the x3 in a Diagram Disable, and surprise, times become ~150ms for all variants but ~144ms for Multiply x3. But back on demo2,vi, I also now get ~150ms instead of ~120. Say compiler optimizations, cache, or I don't know what.

12 minutes ago, ShaunR said:

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Formally you're right, but in this case I observed no difference - I guess the gettick gets executed as soon as possible when entering the frame, and on my system that's early enough, even if not guaranteed to be the first operation

demo2+.vi

mooner · November 1, 2023

20 hours ago, ensegre said:

I am glad that I have been able to find satisfactory answers to every question on the forum. Thank you all for your helpful responses, I have gained a lot from being here.

Sign In

Can the speed of this code be improved?

Recommended Posts

mooner

ensegre

Mads

ShaunR

Mads

ShaunR

ensegre

ensegre

ShaunR

ensegre

ShaunR

Mads

ensegre

mooner

Join the conversation

Browse

Activity

Important Information