Mike Le Posted August 26, 2014 Report Share Posted August 26, 2014 I'm running into a very strange Actor Framework problem. It's extremely hard to reproduce, so unfortunately I can't share the code (it's a large and proprietary project). I also don't think I can reduce it to a smaller example. I'm going to describe the behavior I'm seeing and hope that others here have (1) seen something similar or (2) have more debugging suggestions. I have a Controller Actor that launches four Nested Actors: the UI, a Hardware Interface, a Real-Time Plotter, and an Analysis Plotter. When the UI Actor panel closes, it sends a Stop message to the Caller (the Controller). The Controller sends a Stop message to all Nested Actors. Locked Behavior Scenarios Running and then closing the UI window results in locked libraries in the following scenarios. 1) Program is launched and streaming of data (from HW Interface to Controller to Real-Time Plotter) is started. The UI Window is closed. 2) Program is launched and UI Window is immediately closed (<2 seconds open). Description of "Locking" Behavior When the libraries are locked, the project window also exhibits strange behavior. I am able to "collapse" virtual folder and library contents. However, I cannot re-expand them. I am able to click and select files (highlighting them as selected). A right-click results in a complete freeze of the IDE, requiring a restart. Trying to simply "x" out of the Project Window causes all windows to close, but LabVIEW.exe keeps running until I do a task-kill. Additionally, if I try to Launch Debug Actor after the project window has locked, the execution gets held up in the Launch Actor primitive. Specifically, in the attempt to open the VI reference. It errors the first time and runs through the "VI Interface Type 4" property node into the "Open Reference" primitive. It never gets past the "Open Reference" primitive (not even to return an error). It just stays locked in execution, seemingly indefinitely. Last, SOMETIMES locking is not consistent. I can sometimes go through a complete run without any locking behavior the FIRST time. Running the code a SECOND time always yields the locked behavior in the above scenarios. Clean Shutdown Scenarios Running and then closing the UI window DOES NOT result in the locked behavior in the following scenarios. 1) Program is launched with only some of the 4 Nested Actors enabled, then the UI window is closed immediately. Example combinations: GUI + Analysis / GUI + Streaming + Analysis / GUI + Analysis + HW Interface / GUI only. 2) Launch Debug Actor is run. Debug Actor window comes up. THEN Program is launched with all 4 Nested Actors. This always closes cleanly in all tested scenarios: closing the GUI window immediately, streaming and then closing the GUI, etc. 3) I added a popup window to my Parent Actor overrides of Pre-Launch Init and Stop Core. They simply pop up a dialog that says "____ Actor starting up" or "______ Actor shutting down" respectively. With this modification, I can watch all Actors start up and shut down as expected. I can close the UI window immediately or start streaming, etc. Libraries never lock. 4) I disabled the dialogs and added a 100ms wait to "Pre-Launch Init." With this addition, everything closes cleanly under streaming and "close immediately" scenarios. Dropping the delay to 10ms causes the locking behavior to return. Next Steps I'm going to try disabling different parts of each Actor and see if there's any particularly problematic section. I may try adding a small delay between Launch Actor calls in the Controller, just to see what happens. Anyone seen anything like this? Any suggestions? When I restart, I get a weird error window that says "internal warning 0x occurred in setstate.cpp." This is the first time I've seen an error "0x" that doesn't have a followup identifying string. I attached one of the error logs if anyone wants to look at it. Quote Link to comment
Mike Le Posted August 28, 2014 Author Report Share Posted August 28, 2014 An unsatisfying resolution to this problem: I performed a Mass Recompile of a few folders, and the behavior went away. Makes me think I need to sacrifice a goat to the LabVIEW gods every time I run into something like this. Quote Link to comment
shoneill Posted August 29, 2014 Report Share Posted August 29, 2014 Slightly unrelated behaviour: When working on RT, I'd often get bad deploys (but without error) so that the code actually running ont he RT would be different fromt he code the IDE thinks is running on the RT. It seems that the "is running" determination is sometimes a bit less reliable than it should be, especially when dealing with multiple contexts. I think there are some race conditions in the code associated with this aspect of the IDE. Quote Link to comment
Mike Le Posted September 20, 2014 Author Report Share Posted September 20, 2014 (edited) Okay, I UNMARKED my old post as the solution, because the problem came back this week. A mass compile didn't resolve it like before. I went back and forth with NI about this. We started poking around Launch Actor.vi, based on another CAR they had that sounded similar. The change we ultimately made was to remove the 2-iteration for loop that tries to reopen the VI if there's an error on the first try. We also explicitly close the VI reference to the Actor. There's a comment in the original Launch Actor that reads: We deliberately leak the opened VI reference in order to achieve better performance. We open it once, and then leave the reference open to so that we only depend upon LabVIEW's root loop synchronization on the first call. LabVIEW auto closes the reference when the VI goes idle. I guess that means that LabVIEW was failing to auto-close that reference? I've definitely noticed that the libraries are now SLOW to unlock after code executes. There's several seconds where the project is responsive but the libraries remain locked, then I get a busy cursor for a second, and then the libraries unlock. But they DO unlock, and I can keep working on the code without having to task-kill LabVIEW, so I guess that's a success. Edited September 20, 2014 by Mike Le Quote Link to comment
odoylerules Posted September 20, 2014 Report Share Posted September 20, 2014 Okay, I UNMARKED my old post as the solution, because the problem came back this week. A mass compile didn't resolve it like before. I went back and forth with NI about this. We started poking around Launch Actor.vi, based on another CAR they had that sounded similar. The change we ultimately made was to remove the 2-iteration for loop that tries to reopen the VI if there's an error on the first try. We also explicitly close the VI reference to the Actor. There's a comment in the original Launch Actor that reads: I guess that means that LabVIEW was failing to auto-close that reference? I've definitely noticed that the libraries are now SLOW to unlock after code executes. There's several seconds where the project is responsive but the libraries remain locked, then I get a busy cursor for a second, and then the libraries unlock. But they DO unlock, and I can keep working on the code without having to task-kill LabVIEW, so I guess that's a success. Thanks for the update. While i'm not using that the actor framework, i did copy the Launch Actor method you reference for opening up async processes in one of my projects. I'm not exactly sure why i did that but it appears that this might be a bad way to do things now. I'll have to go back and update that launch method. I was always slight confused why the actor framework used this 2 iteration for loop method but assumed it was a better way to do things. Quote Link to comment
drjdpowell Posted September 20, 2014 Report Share Posted September 20, 2014 A link to the conversation on why the AF uses the pool-of-clones ref in the way it does. Don’t change things until you know what “root loop synchronization†means. I use the same method in non-AF code and haven’t had any issues, so there must be something else involved in the library-locking problem. Quote Link to comment
Mike Le Posted September 20, 2014 Author Report Share Posted September 20, 2014 A link to the conversation on why the AF uses the pool-of-clones ref in the way it does. Don’t change things until you know what “root loop synchronization†means. I use the same method in non-AF code and haven’t had any issues, so there must be something else involved in the library-locking problem. Thanks for the info. I was a bit concerned as we made these changes because the NI rep I was speaking to was definitely NOT an Actor Framework expert... he didn't know what it was before we had the conversation. He pulled the suggestions from a similar CAR and it patched my problem, but I got the feeling I didn't get a "tear off the warranty" explanation I SHOULD have gotten. Quote Link to comment
Mike Le Posted September 23, 2014 Author Report Share Posted September 23, 2014 AQ breaks down the issue here and offers a few solutions. The bug affects those using 2013, 2013SP1, and 2014. It's supposed to be patched by 2014SP1. There's more detail in the linked thread, but here's what it boils down to in terms of "work-arounds" in the meantime: You have a top-level VI that launches your top actor. That launcher VI quits, leaving the top-actor running, and then the top actor goes off and does its thing, including spawning additional actors. I believe that the entire problem goes away if you can somehow leave your top-level VI running. As long as the launcher VI stays running, the VI refnum allocated inside Launch Actor stays valid and we do not have to open a second reference to Actor.vi. Avoiding that second reference seems to be critical. Alternatively, if there is enough of a time gap between the launcher VI quitting and the first call to Launch Nested Actor.vi, that seems to help. I cannot guarantee that, but it seems to be the case looking at the C++ code. I have not actually tried in G to empirically test this theory. If neither of those works, then you can go back to the 2012 version of Launch Actor.vi and see how the block diagram worked back then. The "close the reference on every call to Launch Actor" is less efficient and subject to root loop pause, as noted earlier in this discussion, but it completely dodges this bug (because it basically forces there to be no overlap of the refnums). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.