Tim_S Posted December 11, 2015 Report Posted December 11, 2015 I'm hoping someone has some deep insight before I start waving the dead chicken and invoking the arcane. I have two applications that talk to each other. One executes the test (sequencer) and the other is a user interface. These reside on the same computer. The two applications were originally talking through TCP where the UI would open a connection, register for messages, and receive messages from the sequencer at about 150 msec intervals. The UI would process the messages. and update the front panel accordingly. This ran into an issue where UI would not update smoothly. I was able to determine the UI was experiencing periods where there wouldn't be a message for up to 6 seconds. On the sequencer side, there were timeouts in sending the message at that point. The code already turns off Nagel. I poked through ShaunR's Transport.lvlib; I tried increasing the transmit and receive buffers on the sequencer side to no avail. I put a shared variable in the sequencer side and made the buffer 5 deep. This contains the same string that was being sent over TCP. I updated the UI side to read the shared variable instead. The update issues were solved. Or so we thought... This setup is going on five stations... two running one type of test (let's call these A) and three running another (let's call these B). The stations have, from the computer's perspective, identical hardware ordered at the same time and all updated to the same software level (operating system and all). This all should work the same. But it doesn't. There is an additional part of the message that only occurs once when the test starts. Reports are that the two station A will run 8 runs in a row with no issue, not get the message at the start of test for 4 tests, get it for 2, miss it again, and so on. We did see once a station B did this, but the three have run hundreds of tests since with no issue. A coworker has been checking this out on-site and tells me there are no errors or warnings with the UI side of the shared variable. The UI side has a 50 msec wait before a Read Variable with Timeout (timeout of 1000 msec), so it should run circles around the transmit side. To make things more confusing for me, there is an extra part of the message that only occurs once when the test ends. This always is received by the UI application. The sequencer and UI applications are something we've used before without issue. There is a customer-provided .NET control that talks to a Vector box (flexray to a control module) through UDP messages. I've Wiresharked that and don't see anything that will cause a problem with TCP or shared variables. Quote
ShaunR Posted December 11, 2015 Report Posted December 11, 2015 On 12/11/2015 at 3:22 PM, Tim_S said: There is a customer-provided .NET control Eliminate that and see if it goes away (simulate the data if you have to). There is a very good reason why ,NET (and it's grandpapy, ActiveX) is banned from all my LabVIEW projects and this stinks of ,NET threading and LabVIEW root loop. Quote
Zyga Posted December 13, 2015 Report Posted December 13, 2015 On 12/11/2015 at 4:23 PM, ShaunR said: ..There is a very good reason why ,NET (and it's grandpapy, ActiveX) is banned from all my LabVIEW projects.. Could you provide us more details? Any particular reasons? I've used .net in my project twice or three times already, and there were no problems occurring. Quote
Rolf Kalbermatter Posted December 14, 2015 Report Posted December 14, 2015 On 12/13/2015 at 8:50 PM, Zyga said: Could you provide us more details? Any particular reasons? I've used .net in my project twice or three times already, and there were no problems occurring. .Net is in some ways better than ActiveX in the areas Shaun mentions. ActiveX is an extension of OLE and COM which have their roots in old Windows 3.x times. Back then preemptive multitasking was something reserved for high end unix workstations and PCs had to live with cooperative multitasking. So many things in Windows 3.1 and OLE and COM assumed single threading environment or at best what Microsoft called apartment threading. This last one means that an application can have multiple threads but any particular object coming from an apartment threading component always has to be invoked from the same thread. LabVIEW having started on Mac and then ported to Windows 3.1 heavily inherited those single threading issues from both OSes. It was "solved" by having a so called root loop in LabVIEW that dispatched all OS interactions such as mouse, keyboard and OS events to whatever component in LabVIEW needed them. When LabVIEW got real multithreading support in LabVIEW 5 this root loop was maintained and located in the main thread that the OS starts up when launching LabVIEW. It is also the thread in which all GUI operations are executed. Most ActiveX components never supported anything more than apartment threading as that kept development of the component more simple. LabVIEW does honor that by executing the methods and property calls for those ActiveX components from the main thread (called usually UI Thread). That can have certain implications. Out of context or remote ActiveX components are thunked by Windows through the OLE RPC layer and the according message dispatch for this OLE thunking is executed in the Windows message dispatch routine that is called by LabVIEW in its root loop. Any even slight error in the Windows OLE thunking, ActiveX component or the LabVIEW root loop in how to handle the various events properly can lead to a complete lockup of the message dispatch and with that the root loop of LabVIEW and absolutely nothing works anymore. Theoretically other threads in LabVIEW can continue to run and actually do, but without keyboard, mouse and GUI interaction an application is considered pretty dead by most users. .Net is less suspicable to such problems but not entirely free as it still inherits various technologies from COM and OLE deep down in its belly. My personal issue with both is that they involve a very complex infrastructure in addition to the LabVIEW runtime that is: 1) has to be loaded on application startup delaying the startup even more 2) while very easy to use when it works, almost impossible to understand when things go wrong 3) being a Microsoft technology has a big chance of being obsoleted or discontinued when Microsoft tries to embrace the next hype that hits the road (DDE while still present is dead, OLE/COM superseded by ActiveX, and ActiveX is highly discouraged in favor of .Net now, Silverlight has been axed already) Betting on those technologies has had a very good chance of being siderailed so far, as NI had to find out several times including with Silverlight the last time. 2 Quote
ShaunR Posted December 14, 2015 Report Posted December 14, 2015 (edited) On 12/13/2015 at 8:50 PM, Zyga said: Could you provide us more details? Any particular reasons? I've used .net in my project twice or three times already, and there were no problems occurring. Rolf's given you the technical reasons and history. So ignoring cross-platform (big one for me, especially as I'm now moving away from Windows) , performance and falling to pieces when IT push out a security updates. Here are some real world, LabVIEW specific examples. Thread Starvation Obsolecence. Deadlocks.(see note at bottom) Like Rolf says. When they work-fine. When they don't; they are self contained bundles of nightmares that you can only remove (if you can get into the IDE ). I just prefer not to put them in in the first place. Edited December 14, 2015 by ShaunR Quote
Tim_S Posted December 14, 2015 Author Report Posted December 14, 2015 I appreciate the feedback and the technical explanations. I'm working on a way to prove the .NET control is the issue. Unfortunately, it's the only means we've been provided to control the UUT so this is challenging. Quote
Tim_S Posted December 16, 2015 Author Report Posted December 16, 2015 Rather than remove the .NET control from the test stand, I went back to my demo application. This is something simple that small-scale tests the various features. It is meant to run on a generic computer and is something a salesman could take on the road with no hardware involved. I added a counter in to the front panel that is incremented when the message at the start of test arrives and decrement it when the end of test message arrives. A timer was added to start a new test after the last is completed. My first test run went for 467 tests after which the counter was -43. There were various programs running (Outlook, Lync, Firefox...), the screensaver came on, and the computer locked during that time. The second test shut off all the extra programs that I could. This went for 87 test and had a count of -12. The screensaver came on and system locked, but I don't think that impacted the test (I recall having issues with such with Win95/98, but not since). I'm not sure I have a fresh system. Trying this after a cold boot is next on my list. Quote
Tim_S Posted December 16, 2015 Author Report Posted December 16, 2015 I put in some more informative logging. It seems the counter was getting off because the message at end of test sometimes shows up twice when my laptop is on battery. Not a quirk I was expecting. Quote
ShaunR Posted December 16, 2015 Report Posted December 16, 2015 On 12/16/2015 at 6:11 PM, Tim_S said: I put in some more informative logging. It seems the counter was getting off because the message at end of test sometimes shows up twice when my laptop is on battery. Not a quirk I was expecting. It's starting to sound very much like you have a race condition or two somewhere. Extra messages could be due to retries but could also be due to reading the same value twice (especially happens with notifiers). So you are losing messages sometimes and gaining messages sometimes. All seemingly randomly tied to arbitrary and unconnected system conditions. That is a code smell I am very familiar with. Quote
Tim_S Posted December 16, 2015 Author Report Posted December 16, 2015 On 12/16/2015 at 6:51 PM, ShaunR said: It's starting to sound very much like you have a race condition or two somewhere. Extra messages could be due to retries but could also be due to reading the same value twice (especially happens with notifiers). So you are losing messages sometimes and gaining messages sometimes. All seemingly randomly tied to arbitrary and unconnected system conditions. That is a code smell I am very familiar with. I had this long paragraph but then stopped to think and started to smell something too. Yea, this could be race condition. In staring at the code, I'm getting a smell like I've gotten into a classic blunder ("The most famous of which is 'never get involved in a land war in Asia'") of making this too complicated. Quote
Cat Posted December 17, 2015 Report Posted December 17, 2015 Regarding your original problem with TCP loopback, take a look at: https://lavag.org/topic/14609-issues-tcp-ing-with-a-c-program-on-the-same-computer/ I was having what sounds like a very similar problem to yours. Bottom line from NI was that there was some issue with the TCP stack in LabVIEW when running on Windows 7 and doing TCP loopback. That was in LV11. By LV13 it still wasn't fixed. It does seem to be working in LV15, however. Cat 1 Quote
Tim_S Posted December 17, 2015 Author Report Posted December 17, 2015 On 12/17/2015 at 6:07 PM, Cat said: Regarding your original problem with TCP loopback, take a look at: https://lavag.org/topic/14609-issues-tcp-ing-with-a-c-program-on-the-same-computer/ I was having what sounds like a very similar problem to yours. Bottom line from NI was that there was some issue with the TCP stack in LabVIEW when running on Windows 7 and doing TCP loopback. That was in LV11. By LV13 it still wasn't fixed. It does seem to be working in LV15, however. Cat I recall this one now... that could relate to shared variables as well. Looks like shared variables are based on TCP communication (yea for Wireshark). Not going to get WinXP on the machines. Win10 likely uses the same stack. Updating to LV2015 has logistical issues with trying to maintain same software at different sites. I'm thinking I need a different communication strategy to overcome this one. Quote
ShaunR Posted December 17, 2015 Report Posted December 17, 2015 On 12/17/2015 at 6:07 PM, Cat said: Regarding your original problem with TCP loopback, take a look at: https://lavag.org/topic/14609-issues-tcp-ing-with-a-c-program-on-the-same-computer/ I was having what sounds like a very similar problem to yours. Bottom line from NI was that there was some issue with the TCP stack in LabVIEW when running on Windows 7 and doing TCP loopback. That was in LV11. By LV13 it still wasn't fixed. It does seem to be working in LV15, however. Cat Is there a CAR or some document detailing the findings? Quote
Tim_S Posted December 17, 2015 Author Report Posted December 17, 2015 On 12/17/2015 at 8:23 PM, ShaunR said: Is there a CAR or some document detailing the findings? Back on the third page of the discussion is CAR #313508. 1 Quote
Cat Posted December 18, 2015 Report Posted December 18, 2015 I am currently running two separate applications on the same computer that are using TCP loopback to send an aggregate 132MB/sec between them. So it's definitely finally working in LV15. However, I looked thru the LV bug fix lists and couldn't find the CAR anywhere. As for other options, back when I originally had this problem, I eventually settled on writing files to a RAM disk as the communications path. It wasn't fancy, but it worked out well. Quote
Tim_S Posted December 18, 2015 Author Report Posted December 18, 2015 I'm looking at publishing the current value to a set of shared variables and having the consumer handle the state changes. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.