Jump to content

Improved performance with additional tunnels?


Recommended Posts

Posted (edited)

Hello everyone,

 

I have been sent here by NI Support, as they were not able to explain the behaviour to me.

I have created a little demonstrating VI, that shows some weird behavior:

 

When I go into a for loop with some data, the runtime depends on the amount of data i transfer/collect with outside bound autoindexing tunnels.

The weird thing is, that it executes faster if I collect data, for example I get the following values:

# Tunnels = 0 => mean runtime 37 ms

# Tunnels = 1 => mean runtime 37,5 ms

# Tunnels = 2 => mean runtime 35,2 ms

# Tunnels = 3 => mean runtime 47 ms

The attached VI is LV 2015, but the images should give enough explanation.

If someone could explain this to, this would be great. 

 

Thanks in advance

 

Rüdiger

LAVA Test VI Performance T2.jpg

LAVA Test VI Performance T1.jpg

LAVA Test VI Performance T0.jpg

 

LAVA Test VI Performance.vi

LAVA Test VI Performance T3.jpg

Edited by Rüdiger
Posted (edited)

Obscure compiler optimizations I would guess. Possibly platform dependent, maybe related to the fact that you create buffers for the empty arrays at the unconnected exit tunnels. Hint: show buffer allocations, I note that for example a buffer is shown as created twice on the output logic array for case 0, and once for case 1.

For the limited value that this kind of benchmarking has on a non-RTOS, I get slightly different timings, with a minor decrease between 0 and 1 rather that between 1 and 2:

0 23.96-0.35 24.04-0.20 24-0.29
1 23.6-0.58
2 25.56-0.58 25.52-0.51
3 33.32-0.69

I also remark that timings change slightly if I remove case 3:
0 23.88-0.38
1 23.69-0.49
2 24.41-0.51

and further decrease if I delete the clock wire inside each case connecting to the unused input tunnel of the outermost loop.

 

 

 

 

Edited by ensegre
Posted

Note that you should move the # Tunnels input outside the For loop, since otherwise you could change that value while the loop is running, and LabVIEW has to account for that possibility. There's also debugging issues here - the additional tunnels are places where LabVIEW needs to allocate memory in case you put a probe on them during execution, but maybe that's actually helping here by changing the way LabVIEW reuses memory. Of course, if you disable debugging, the VI executes instantaneously because it optimizes out the unnecessary For loops. I am seeing the 2 tunnel case fastest at 29ms, the 1 and 3 tunnel versions nearly identical and barely slower (30.5ms), and the 3-tunnel version slowest at 39.5ms.

I don't think you're learning much from this sort of benchmark when you have debugging enabled and code that could otherwise be optimized out.

  • Like 1
Posted
3 hours ago, Rüdiger said:

I have been sent here by NI Support, as they were not able to explain the behaviour to me.

:wacko: I'd expect at least a "clumping optimisations" hand-wave from paid support rather than a "dunno, ask someone else". The gesture controlled quad copter making it too dangerous to walk upstairs to ask R&D?  :D 

  • Like 1
Posted
4 hours ago, Rüdiger said:

I have been sent here by NI Support

:lol:

Wild guess: LabVIEW uses the "Delete From Array" function in case 0 to dispose of used elements as soon as possible. For cases 1 and 2 it uses "Index Array", thus does not free the memory (the original array is just passed to the output tunnel and freed afterwards in one go). Case 3 is less efficient as the final 2D-Array takes time to build (requires re-allocation in memory in worse case).

Maybe ask NI support? Oh wait... :frusty:

  • Like 1
Posted

I would be interested in what happened with that support interaction if you are okay with PMing me the service request number. Aside from that I would agree with Ned, but I wasn't able to figure out a way to test this where everything wasn't completely optimized out or where the differences in execution timing couldn't just be explained by the changes that I had to make for LabVIEW to not optimize it out.

Posted

Thanks a lot for all those quick answers. I did not expect that. :)

I'm aware that benchmarking this kind is very quick and dirty on non-RTOS. As soon as I have access to a colleagues PC (where the original problem VI is) I can upload that VI, as it shows a greater impact in performance than my specially created VI.

@ensegre Thanks for the hint with the buffer allocations. Now I see at least, that even if I connect the input elements to an autoindexing output tunnel, there is no additional buffer allocation.

@ned : You're absolutely right with the #tunnels control, Thanks for the hint with disabling debugging. Usually it is hardly influencing subVI runtime in my VIs

@ShaunR In my case, the support team has to cross the atlantic ocean, as I asked the german support team. But they offered me to escalate it to the US colleagues, saying an answer can take a looooong time. That's why I asked here. :)

@jacobson I guess I will leave it like it is, as the support guys are doing there best and offered redirecting to people who maybe know better (US colleagues or LAVAG users). And one of the 2 proposals works fine so far. ;)

 

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.