Jump to content

Parallel process significantly slower in runtime then in Development


Wim

Recommended Posts

Hi,

 

I’m having issues with a LabVIEW application that I made.

 

Quick summary:

The application is used for reliability tests.

Users can configure profiles/experiments. These are translated to a voltage/current output.

Experiments can be started to run parallel on a rack of power supplies.

 

More Details:

The application is OO-based. Each active experiment is an ‘experiment executor’ object which is a class with an active object. The process vi is a very simple statemachine which handles init of the hardware (VISA (Powersupplies connected via Ethernet, VISA TCP/IP)), iteration handling of the test, summary logfiles, UI and a few other things.

Typical usage of the application is:

16 samples tested in parallel. Most of them run the same experiment (== same output) but on different powersupplies.

These samples are tested simultaneously, and started at the same time.

System details:

LabVIEW2011 SP1 (32bit) // VISA 5.4.1

 

Observations:

In development:

  • I can simultaneously start 16 samples
  • summary file of the experiment settings are written to disk  (16times in parallel == 16 samples and all vi’s are set to reentrant)  duration is about 10s for each file. (depends on the length of the experiment, different profile steps that are used etc) it is  a simple ini format file, about 136KB size
  • Experiments start execution. (== output and measurement on the power supplies)
  • Parallel reentrant VISA communication with power supplies works perfect, internal looprate in the ‘avtive’state of the process is about 500ms (with 16 parallel samples)

 

In runtime:

When I start the samples, the parallel processes are started BUT:

  • Writing if the ini summary file gets slower and slower each time a new clone/sample is launched. I see this, because for debug I open the FP of that vi.  Writing of the file gets slower and slower… fastest file == 30s, slowest (== last sample started == 250s)
  • Parallel reentrant VISA communication, looprate is 20-30 SECONDS (iso 500msec in development)

Can someone help me with this ?

16 parallel process isn ‘t that much.

I always thought that runtime would be faster than development env.

Link to comment

Wow that does sound strange.  I'd suggest enabling debugging on the built EXE, and then you can probe around.  Using something like the simple sexy timing probe you should be able to see what steps in your process are slower than in the IDE.  The unfortunate thing is it sounds like all processes are just generally slower.  I could also suggest trying to see how it performs with only one worker loop (compared to source) but it sounds like the problem gets worst with more loops so this might not tell you much.  Maybe you could do the divide and conquer method and start disabling, parts of the code and see when it starts to respond like the source.  Very time consuming, but effective for isolating the problem.

  • Like 1
Link to comment

Hooovahh,

 

thanks for your input.

The timig probe looks very usefull.

 

However, the issue is already fixed.  Can't tell how... I don't understand it.

 

I was preparing the folders with the code for a service request to NI.

I changed the build path for the exe to a folder next to the code and build it again. That exe works fine.

 

I copied it to my test system and VISA is running fine too.

 

Strange solution ....  Was the build spec corrupt?

 

I really can't tell but I'm glad my exe is running as expected now.

 

I have a whole evening and night ahead to worry about what solved this issue  :)

Link to comment

However, the issue is already fixed.  Can't tell how... I don't understand it.

I copied it to my test system and VISA is running fine too.

 

Strange solution ....  Was the build spec corrupt?

 

I really can't tell but I'm glad my exe is running as expected now.

Ooh, this sounds familiar.

 

I have been running into issues with very similar symptoms recently.  My code is on a RT system and I have noticed extremely weird behaviour when deploying.  Sometimes it will simply NOT deploy the current code from the Object cache, even if I force a recompile of the top-level VI.  What I have obeserved is that if I edit some sub-VIs in a different context (a project with only one target so that objects are not locked for example), these changes do NOT get automatically detected in the original context (even after closing and opening again)  The fact that my sub-VIs are inlined may or may not be relevant.

 

I have had situations where my testbench would run a function absolutely according to spec but when deployed (Save, close Testbench project, open main project, compile, deploy) it would behave as an older version, even after forcing a recompile of the entire hierarchy.  I managed to "fix" it by placing a global write in a specific sub-sub-sub.vi in the proper context.  Then it compiled properly and deployed correctly.  I have the feeling that some "other context" edits are getting obscured somehow int he system.  But don't ask me to reproduce it, I've been trying for years.  Each time I try to pare down the code to get an example running, the problem goes away.

 

I'm running LV 2012 SP1 with separated source code.

Link to comment

shoneill,

 

I'm having the issue again.

I made a few successful builds yesterday and have a good exe running on the test system.

 

However, I wanted to know what solved it.

So, I did a commit to svn, to be safe.

I reverted to an older project file (code stays the same) and did a build. Again everything slow.

 

Updated back to the latest project file. I was assuming this would be a succes but..... again a slow exe.

 

I tried making a new project file from scratch.... slow exe

 

To be continued.

Link to comment

shoneill,

 

I'm having the issue again.

<snip>

To be continued.

Starting to sound like the object cache. Do you have separated compiled from source? IIRC that was really flaky in 2011. I had no end of troubles with it especially with the 64 bit version.

 

If you haven't already done so. Turn off the separate compiled code and then delete the object cache. Then try a full compile before building the exe.

Edited by ShaunR
Link to comment

That's why I thought it sounded familiar.  I'm pretty sure my problems are also linked to the separated compile code and the cache.

 

I personally tried deleting the cache, but after making one or two edits the problem just came back.

 

If I make it to NI Week this year and I'm bald, faulty deploys on our RT system will be the reason.  I really do feel like tearing my hair out at times, it's extremely frustrating.  Debugging on FPGA is a walk in the park in comparison. :throwpc:


One topic plodding around in the back of my head:

 

Wim, What are your locale settings?  I presume, like here in Switzerland, you have a comma as a decimal separator?  Maybe there's a bug somewhere where some string somewhere isn't paying attention to the system decimal point?  If so I'm switching locale tomorrow.....

Edited by shoneill
Link to comment

I never use the option to separate compiled code.

 

Now it's getting stranger and stranger by the minute.

 

The original dev system can create exe's again with good performance (I can only check the file IO, don't have the powersupplies there)

 

If i copy this exe (including aliases file and ini) to the actual test system, file IO is terrible.  6s for a file on my dev system vs 30 to 450s on the test system.  

I can't understand this one. OK, it is the same harddisk that files are written to... why the difference between dev env and runtime.

 

Local settings: dev system: decimal point = point and on the test system decimal point = comma.   But I have in the LabVIEW ini and application ini the 'UseLocalDecimalPoint=False' option.

Edited by Wim
Link to comment

So they are not equivalent. When you disable the power supplies on the test system does the performance improve?

 

No.

 

I tested with the same configuration (simulation) which just generates dummy data (random number).

Link to comment

Hi guys,

 

maybe a solution.

 

So I build a new exe and tested it on different systems with the same configuration (simulation, exact the same experiments etc etc)

 

The 2 systems at my customers location (the actual testsystem and the dev system) were both slow.  The 3th system (a personal VM) was fast.  With exact the same exe, installed runtimes etc etc

 

So I'm thinking it is due to the PC's.  They are on a big network where IT has very strict policies.

 

One of my colleagues had a good idea.... If it works from the LabVIEW.exe (the dev env) then rename your exe to LabVIEW.exe.  We think that LabVIEW is a known program which has more permissions on the system then my own build exe (SwitchablePS.exe)

 

And for now it works.

The dev env is fast in simulation, the actual test system is fast, even with the hardware connected.

 

The test system is running a test now. It will end next week.  Fingers crossed.

Link to comment

Oh man, anti-virus, firewall, or general IT policies broke my application many times.  Often in ways that seemed unexplainable at the time.  I had a vision system that when it was being deployed would some time have a half my image become green.  Like the image was being grabbed and half way through it stopped.  We looked into the capture card, and the PC, thinking it couldn't keep up, network settings a bunch of things.  It was strange because before going onsite it worked just fine, which should have been a red flag.  At the time I thought, well maybe the PC was just barely fast enough, and since IT added a few programs it now can't keep up.

 

Nope it was corporate firewalls scanning all network traffic, causing slow downs on our ethernet controlled cameras.  But hey, free trip to Alabama.

 

Hope you figure it out, be sure and report back if you do.

Link to comment

Nope it was corporate firewalls scanning all network traffic, causing slow downs on our ethernet controlled cameras.  But hey, free trip to Alabama.

 

So what was the solution? Tell them that their IT department could only FTP into the system and disconnect it from their network? I've done that before :D

 

This is why the attitude exhibited by the IT guy at CERN who gave that presentation was so refreshing. In IT, a solutions provider  is as rare as rocking horse droppings. Empire building problem providers with aspirations of .tyranny are 10-a-penny.

Link to comment

The tests are running for 6 hours now.

 

I talked to an IT guy.

They confirm that it is McAfee that is causing the problem.  But adding an exception is not possible.... They have to do a request to McAfee to add an exception worldwide  :oops:

 

They just don't want to put any effort in it.

 

I'll just call my executable labview.exe  :thumbup1:

Link to comment

They confirm that it is McAfee that is causing the problem.  But adding an exception is not possible.... They have to do a request to McAfee to add an exception worldwide  :oops:

 

 

 

Seriously?  Wow.

We were seeing some weird network slowdowns recently.  Maybe this is the reason.  We have Kaspersky, but maybe it's a similar crap slowing out network.

Link to comment

So what was the solution? 

Oh it was great because in this situation they were my customer, paying hourly for me to be there, I troubleshot the issue, and came up with a proposal to fix it.  Turn off the bloody IT crippling software.  I had no horse in the race.  The customer could take my advice and have a test system that worked, or they could do nothing, and not have a working test system.

 

When a manufacturing line goes down due to IT being too restrictive, things get done quickly.  We were told that IT would not allow the software to be uninstalled.  You actually needed a super secret password to uninstall it.  But they did allow the PC to have an administrator account, where I could turn off services, and turn off startup applications.  So the customer had me removed it from msconfig startup.

 

The details are fuzzy, but I think their IT department ended up paying for my time to be there.  Not that I think that is fair, they were just trying to do their job.

Link to comment

The tests are running for 6 hours now.

 

I talked to an IT guy.

They confirm that it is McAfee that is causing the problem.  But adding an exception is not possible.... They have to do a request to McAfee to add an exception worldwide  :oops:

 

They just don't want to put any effort in it.

 

I'll just call my executable labview.exe  :thumbup1:

Now I'm sad I didn't see this earlier. As soon as I read your first post my immediate thought was...McAfee?

 

Fun story: on some of the domain computers here at NI (can't figure out which ones or what is different), the labview help will take about 1-2 minutes to load any time you click on detailed help because McAfee is...McAfeeing. You go into task manager and kill mcshield (which is railing the CPU) and it works in an instant. Its truly amazing.

 

And...yes, you can add exceptions. McAfee doesn't seem interested in making it easy, but it is possible. Renaming to labview.exe is a pretty sweet workaround though.

Link to comment

What’s the point of anti-virus software that a virus can get around by renaming itself “labview.exe�

At times I believe it is less about protecting a PC from viruses, and more about a check box in some corporate form that says "All domain PCs will have anti-virus software installed".

  • Like 1
Link to comment

tests are running for about 30hours now... everything is working at the estimated speed.

 

 

 

At times I believe it is less about protecting a PC from viruses, and more about a check box in some corporate form that says "All domain PCs will have anti-virus software installed".

In this case it is definitly just a checkmark on a form.... This customer is a big, very known company... can't mention the name though.

 

And the comment "We have to ask a worldwide exception for this."  is just the same as "My checkmark is set,  yours isn't...."

 

Well, the people that are using the application, can live with it.

 

Their comment on this was..."So if we want to use software that our company does not allow, we'll just name it LabVIEW or Word or ....  Rather not LabVIEW, we want to keep that reputation clean"  :thumbup1:

Link to comment

tests are running for about 30hours now... everything is working at the estimated speed.

 

 

 

In this case it is definitly just a checkmark on a form.... This customer is a big, very known company... can't mention the name though.

 

And the comment "We have to ask a worldwide exception for this."  is just the same as "My checkmark is set,  yours isn't...."

 

Well, the people that are using the application, can live with it.

 

Their comment on this was..."So if we want to use software that our company does not allow, we'll just name it LabVIEW or Word or ....  Rather not LabVIEW, we want to keep that reputation clean"  :thumbup1:

 

Expect a call in  few weeks when they close that loophole and the software grinds to a halt once again.

Link to comment

Expect a call in  few weeks when they close that loophole and the software grinds to a halt once again.

Ohh another fun story time (need to write these down before I forget).  So I'm on site at a place where we are deploying a tester that is designed to replace testers that were already there.  They had 5 testers, and this first visit was to replace one of them with our new design which was faster better more scaleable etc.  There were some things we couldn't test until we were on site and one of those things was the network authentication and logging.

 

So we got onsite and went to replace the first system, and we asked for login credentials for their network.  We were told we couldn't get any for a couple days.  So I went over to one of the existing testers and looked at the credentials it was using since the source was on the old stand.  It had something like User: Tester and Password:Tester1234 on all 5 testers.  So I used those credentials and the new tester and 4 of the older ones were running just fine.

 

I mentioned to the customer that this seems like a weak password and user name and that all systems were using the same credentials.  At that moment out of the corner of my eye the IT guy ran off quickly.  A few minutes later he came back and said he fixed it.  By that he meant that the user name and password no longer worked.  They didn't seem to care that the new system didn't work yet, until I went over to one of the older 4 systems that all of the sudden could no longer test parts and was holding up the line.

 

I just thought it was comical that they fixed the loop hole which had apparently been running fine for years, without thinking that it would break everything using it.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.