Jump to content

Can the speed of this code be improved?


mooner

Recommended Posts

Posted

I have a very large array and need to get data with row indexes 0, 3, 6, 9 ...... to form a new array. Attached is my current code (LabVIEW2021) and I find the runtime a bit slow. Is there a faster way to do this?

spendTime.PNG

demo.vi

Posted (edited)

A bit quicker it seems is to use conditional indexing, like this:

image.png.59fdb6049072874cd21d0b99112feec5.png

On my machine I got the following results (turned off debugging to avoid it interfering with test and removed the display of input as I would occationally get memory full errors with it):

Original: 492 ms

ensegre original: 95 ms

ensegre with multiply by 3 instead of 3 adds: 85 ms

Conditional indexing: 72 ms

There might be even better ways..The efficiency of conditional indexing is often hard to beat though.

 

demo_rev1.vi

Edited by Mads
Posted (edited)

My go (improvement of ensegre's solution ).

image.png.351fa4a7eeba87f112c9ba64611844a2.png

 

image.png.2ba56109ac5030d4a97cbe82fb9561a7.png

SR_demo.vi

Original: 492 ms

ensegre original: 95 ms

ensegre with multiply by 3 instead of 3 adds: 85 ms

Conditional indexing: 72 ms

Replace array (this one): ~20ms.

Edited by ShaunR
  • Thanks 2
Posted (edited)

You beat us well @ShaunR 🙂

Normally I would use replace in situations like this too, to get the advantage of the preallocated memory. I improved some of the OpenG array functions that exact way (which is now included in Hooovah's version), but lately I've gotten so used to the performance and simplicity of conditional indexing that I did not grab for it here 🤦‍♂️

Edited by Mads
Posted

What's interesting about ensegre's solution is the unintuitive use of the compound arithmatic in this way. There must be a compiler optimization that it takes advantage of.

Posted

Indeed, it is buffer preallocation what makes the difference. In my first solution I presume that the compiler is implicitly allocating the output buffer, knowing N and the sizes of the input. Shaun's solution beats all because is in place and the allocation of the input is not accounted for, but destroys the original input.
 

d.png

image.png

demo3.vi

Posted
6 minutes ago, ShaunR said:

What's interesting about ensegre's solution is the unintuitive use of the compound arithmatic in this way. There must be a compiler optimization that it takes advantage of.

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

Posted
6 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

With the replace array it makes no difference but in your original it made about 10ms difference (which is why I thought it was a compile optimization)

Posted

I don't want to be picky, but with that solution I get the same ~117ms with compound +++, twice + x3 and 3x, whereas ~127ms with compound arithmetic 3x or x3. Platform and optimizations 🤷‍♂️

Posted (edited)

Being picky :D

52 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++

vs

20 minutes ago, ensegre said:

~117ms with compound +++, twice + x3 and 3x, whereas ~127ms with compound arithmetic 3x or x3

Can't be both ;) (and that's 10ms)

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Edited by ShaunR
Posted

When I first saw the compount add I though that must be use because it is quicker than multiplying by 3, but in the tests on my machine it is the slower. With the replace logic though it does not matter anymore.

A digression: Branch prediction is an interesting phenomenon when dealing with optimizations. In this particular case it did not come into play, but in other cases there might be a benefit in making consecutive operations identical 🙂 :

https://stackoverflow.com/questions/289405/effects-of-branch-prediction-on-performance

Posted
13 minutes ago, ShaunR said:

Can't be both ;) (and that's 10ms)

Maybe I wasn't clear enough: replacing Compound arithmetic +++ with Multiply x3 in my BD I did get the same timing (in contrast with Mads), whereas using CompoundArithmetic x3 I got 10ms more. And now to further elaborate I put several variants of the x3 in a Diagram Disable, and surprise, times become ~150ms for all variants but ~144ms for Multiply x3. But back on demo2,vi, I also now get ~150ms instead of ~120. Say compiler optimizations, cache, or I don't know what.

12 minutes ago, ShaunR said:

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Formally you're right, but in this case I observed no difference - I guess the gettick gets executed as soon as possible when entering the frame, and on my system that's early enough, even if not guaranteed to be the first operation

demo2+.vi

Posted
20 hours ago, ensegre said:

 

I am glad that I have been able to find satisfactory answers to every question on the forum. Thank you all for your helpful responses, I have gained a lot from being here.

 

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.