Jump to content

Can the speed of this code be improved?


mooner

Recommended Posts

A bit quicker it seems is to use conditional indexing, like this:

image.png.59fdb6049072874cd21d0b99112feec5.png

On my machine I got the following results (turned off debugging to avoid it interfering with test and removed the display of input as I would occationally get memory full errors with it):

Original: 492 ms

ensegre original: 95 ms

ensegre with multiply by 3 instead of 3 adds: 85 ms

Conditional indexing: 72 ms

There might be even better ways..The efficiency of conditional indexing is often hard to beat though.

 

demo_rev1.vi

Edited by Mads
Link to comment

You beat us well @ShaunR 🙂

Normally I would use replace in situations like this too, to get the advantage of the preallocated memory. I improved some of the OpenG array functions that exact way (which is now included in Hooovah's version), but lately I've gotten so used to the performance and simplicity of conditional indexing that I did not grab for it here 🤦‍♂️

Edited by Mads
Link to comment

Indeed, it is buffer preallocation what makes the difference. In my first solution I presume that the compiler is implicitly allocating the output buffer, knowing N and the sizes of the input. Shaun's solution beats all because is in place and the allocation of the input is not accounted for, but destroys the original input.
 

d.png

image.png

demo3.vi

Link to comment
6 minutes ago, ShaunR said:

What's interesting about ensegre's solution is the unintuitive use of the compound arithmatic in this way. There must be a compiler optimization that it takes advantage of.

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

Link to comment
6 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++. Maybe there is something platform dependent, if at all.

With the replace array it makes no difference but in your original it made about 10ms difference (which is why I thought it was a compile optimization)

Link to comment

Being picky :D

52 minutes ago, ensegre said:

in my case I don't see appreciable differences between x3 and compound +++

vs

20 minutes ago, ensegre said:

~117ms with compound +++, twice + x3 and 3x, whereas ~127ms with compound arithmetic 3x or x3

Can't be both ;) (and that's 10ms)

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Edited by ShaunR
Link to comment

When I first saw the compount add I though that must be use because it is quicker than multiplying by 3, but in the tests on my machine it is the slower. With the replace logic though it does not matter anymore.

A digression: Branch prediction is an interesting phenomenon when dealing with optimizations. In this particular case it did not come into play, but in other cases there might be a benefit in making consecutive operations identical 🙂 :

https://stackoverflow.com/questions/289405/effects-of-branch-prediction-on-performance

Link to comment
13 minutes ago, ShaunR said:

Can't be both ;) (and that's 10ms)

Maybe I wasn't clear enough: replacing Compound arithmetic +++ with Multiply x3 in my BD I did get the same timing (in contrast with Mads), whereas using CompoundArithmetic x3 I got 10ms more. And now to further elaborate I put several variants of the x3 in a Diagram Disable, and surprise, times become ~150ms for all variants but ~144ms for Multiply x3. But back on demo2,vi, I also now get ~150ms instead of ~120. Say compiler optimizations, cache, or I don't know what.

12 minutes ago, ShaunR said:

However. You have a timing issue in the way you benchmark in your last post. The middle gettickcount needs to be in it's own frame before the for loop.

Formally you're right, but in this case I observed no difference - I guess the gettick gets executed as soon as possible when entering the frame, and on my system that's early enough, even if not guaranteed to be the first operation

demo2+.vi

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.