Jump to content

Massive performance difference between IDE and executable


Recommended Posts

Hi Community,

I am working on an app, which analyzes an image for multiple features. While testing the executable I found that there is a massive performance difference between running the code in the IDE and the executable.

I made an example, which shows this problem clearly. Basically I extract small areas out of a much larger image. The example extracts the same 50*50 px area. After that, some local thresholding is being applied onto those extracts.

10.000 extracts, 8 parallel loops, 1  core for Vision:

  • IDE: 0,6 s for the extracts, 0,4 s for the thresholding
  • EXE: 4,5 s for the extracts, 11 s for the thresholding

Why does it take so much longer in the EXE? My actual algorithms are much more complex, which amplify the problem massively. Playing with the parameters influences the numbers slightly, but the big difference in time between IDE and EXE remains. I tried the code on multiple machines, same problem. Example saved as 2012.

System Info: Win 10 64, LabVIEW 2020 64, Vision 2020 64 (I have tried the code in 32 bit and observed the same problem)

I hope you can help me out. Thanks in advance!

performance_testing_lv_2012.zip

Edited by Milox
Link to post

Same behaviour here on LabVIEW 2020 64-bit, but... Do you really need to create and store in memory 10 000 images at once?

2021-03-06_18-39-53.jpg.f7f992347cf9d8e6898748e112744b39.jpg

I'm even not surprised, that both LabVIEW and RTE go crazy trying to do that. When I take IMAQ Create out of the loops, the situation improves significantly.

2021-03-06_18-42-53.jpg.787c4d2cf9a5fc3a7a657b79e4cf6fd0.jpg

  • IDE: 0,1 s for the extracts, 0,4 s for the thresholding
  • EXE: 0,2 s for the extracts, 0,4 s for the thresholding

Of course, there's no any reason to divide the whole processing into two separate loops in this case (because you would get the same image slice on all 10 000 iterations. Instead there's a need to do the entire processing in one loop and finalize the image after the loop with IMAQ Dispose. With this approach you'd reuse the same memory location on each interation instead of making a new one. If you need to run the processing in several threads, just create N IMAQ images before the loop, do your things and dispose them, when all the work is done.

Link to post
Posted (edited)

Thank you guys for the replies. @dadreamer you are correct, placing the create IMAQ outside of the loop improves it, but as I said this is just a simplified example. My real app hast to extract 30.000 burls from a wafer clamp surface, analyze and save them (yes, save the extracts, don't ask me why, it is a requirement). These extracts are spread all over the image.

The reason for the 2 loops is that you "can't" really parallelize the 1st loop, since it is depending on the same raw image input. But parallelizing the 2nd loop has performance gains in IDE. And I was just wondering why the EXE is so much slower. You mentioned that LabVIEW goes crazy trying to create IMAQ's in the for-loop. I can't see that in IDE. Could you expand on that?

I will try out your suggestions.

  1. Creating N IMAQ's beforehand
  2. Doing it all in series. Raw img -> extract -> threshold (same memory location)

Thanks,

Milox

--

Edit:

I just created all the IMAQ's beforehand and it is indeed the IMAQ Create function that takes so long during Run-Time. The actual extracting and thresholding is just a couple of ms slower than IDE.

Creating the IMAQ's takes a combined 14 seconds during Run-Time. In IDE it's just 0,6 s. So the question changed to "Why is IMAQ Create so much slower during Run-Time?".

A workaround is to create all IMAQ's once, keep and reuse them. Just 1 massive slowdown during initialization.

Edited by Milox
Link to post
1 hour ago, Milox said:

You mentioned that LabVIEW goes crazy trying to create IMAQ's in the for-loop. I can't see that in IDE. Could you expand on that?

Well, seems like I had to say that LabVIEW also should behave as RTE does, but it does not for some obscure reason. So, my bad in phrasing there. It's not so easy to answer, not knowing how Vision internals work. I suppose, it has something to do with the way, how Vision's memory manager allocates memory. Perhaps it's more optimized to work in LabVIEW and less (or not) optimized for EXE's. I noticed that in IDE IMAQ Create takes nearly the same amount of time to run (0,03 to 0,06 ms), while in RTE that amount starts at 0,03 and raises on each iteration. Here are the shots to illustrate.

IDE:

Perf_IDE.jpg.481802a27a9420e15345c1175551258c.jpg

Create_Time_IDE.jpg.fb16f2b29cb7f6dd117fc841449f02a5.jpg

RTE:

Perf_RTE.jpg.3f895a5c9dd8670509c77b133ea52612.jpg

Create_Time_RTE.jpg.c82241d7dc615d1c4b0cc23af24f5752.jpg

Maybe someone from NI could elaborate on these differences?.. By the way, I found two more similar issues [1, 2] and the reason behind each one was not clarified.

  • Like 1
  • Thanks 1
Link to post

Wow, thank you very much for your insights. That definitely looks like the memory allocation during RTE is at least "different" from the IDE, which is strange for a compiled language I think.

I have an open ticket with the NI support, but no answer yet. I will get back when I have some more info. For now the workaround is to not dynamically call IMAQ Create in RTE apps.

Thanks a lot!

Link to post
On 3/6/2021 at 12:25 PM, Milox said:

That definitely looks like the memory allocation during RTE is at least "different" from the IDE, which is strange for a compiled language I think.

The problem is sometimes the compiler gets a bit too aggressive and does something that it thinks won't functionally change the code, but does.  Like what if the compiler mistakenly thinks the close reference function can't be called?  Well it will think that node can safely be removed and nothing will change.  But if the close was actually being called in the IDE, and now it isn't in the RTE that could be a problem.

The Always Copy function has been known as a band-aid because it forces the compiler in some cases to leave things alone instead of trying to optimize code.  This would then have the code no longer leaking memory.  It seems to be a real bug, and NI should fix it.  But in the mean time you might want to sprinkle in some Always Copies and see if anything changes.  IMAQ images are references and so I don't know if it will actually help or not.  I don't have vision to test with.

Link to post
  • 1 month later...
On 3/6/2021 at 6:25 PM, Milox said:

Wow, thank you very much for your insights. That definitely looks like the memory allocation during RTE is at least "different" from the IDE, which is strange for a compiled language I think.

I have an open ticket with the NI support, but no answer yet. I will get back when I have some more info. For now the workaround is to not dynamically call IMAQ Create in RTE apps.

Thanks a lot!

The Create Image function does a search in a list of images based on the name you pass in and automatically will reuse that image if it finds it. If not it will create it and add it to that list. The linear increase in execution time is a clear indication of this. As the number of images in this list grows, the search will take longer and it is obviously a simple linear search.  Why this increase doesn't seem to happen in the IDE is a bit a mystery. Something somehow seems to cache some information here.

Edited by Rolf Kalbermatter
Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.