BobHamburger Posted February 21, 2009 Report Share Posted February 21, 2009 I have a hairy, complex issue that I'll try to distill down to its basics. The system that I'm working with has three (3) cFP-1808's being controlled from a PXI-8106, running RT, via Modbus through Ethernet. I've taken the NI Modbus Library and written some nice wrappers around it to realize the basic FP functions like read and write, along with some configuration and scaling functions (this is a nice feature exposed through the Modbus interface). At the very bottom of the comm heirarchy are the familiar TCP primitives. I've constructed a cFP Polling Engine VI, which handles opening the TCP session and doing the cyclical reading and writing. The polling engine is set to take a tag list, and read from/write to a complete cFP stack every 100 mSec. It's been benchmarked to do everything it needs to do in about 55 mSec, so there's some headroom in the system. It's designed to handle any errors (56, 64, 66 and the like) by simply closing the reference and re-initiating a new comm session. The three 1808 stacks are serviced by three re-entrant copies of the Polling Engine; all of their respective subVIs are similarly re-entrant, so that everything can run in parallel. That's the intention, but in practice there are problems... 1. When running under RT, about every hour or two, the system will show an Error 56 timeout (To=50 mSec) and perform a disconnect-reconnect cycle as designed. This works perfectly, with the reconnection cycle taking 10-12 mSec. The timeouts seem to be pretty evenly distributed between the three 1808 stacks. All well and good, BUT: 2. After exactly 62 reconnections (which at the failure rates noted take 2-3 days of continuous operation) no more reconnections will occur. Neither can I connect to the 8106 RT controller via FTP. It's as if we've completely run out of TCP sockets. And yes, I've checked the very obvious: the old connection refnum is indeed closed before a new one is opened. 3. Variation #1: if I take the parallel, asynchronous operation described above and run it from my PC host under Windows, none of the random disconnect-reconnect cycling occurs. The system is rock-solid. 4. Variation #2: if I restructure the TCP Polling Engines so that they run serially rather than in parallel, and run this under RT, none of the random disconnect-reconnect cycling occurs. The system is rock-solid. Here's the ugly conclusion that I've arrived at: the TCP API cannot reliably support multiple, parallel, asynchronous calls in RT. Moreover, the TIMED_WAIT operation for holding and releasing ports/sockets doesn not seem to be reliably operating. Has anyone else in the community observed this kind of behavior in TCP comms under PXI-RT? Are we asking too much from RT that it works as well as Windows? Other than completely serializing the calls to the TCP API, so that we're only doing one at a time, are there any other methods for making multi-device TCP communications under RT reliable? Quote Link to comment
Grampa_of_Oliva_n_Eden Posted February 21, 2009 Report Share Posted February 21, 2009 QUOTE (BobHamburger @ Feb 20 2009, 10:30 AM) I have a hairy, complex issue that I'll try to distill down to its basics.The system that I'm working with has three (3) cFP-1808's being controlled from a PXI-8106, running RT, via Modbus through Ethernet. I've taken the NI Modbus Library and written some nice wrappers around it to realize the basic FP functions like read and write, along with some configuration and scaling functions (this is a nice feature exposed through the Modbus interface). At the very bottom of the comm heirarchy are the familiar TCP primitives. I've constructed a cFP Polling Engine VI, which handles opening the TCP session and doing the cyclical reading and writing. The polling engine is set to take a tag list, and read from/write to a complete cFP stack every 100 mSec. It's been benchmarked to do everything it needs to do in about 55 mSec, so there's some headroom in the system. It's designed to handle any errors (56, 64, 66 and the like) by simply closing the reference and re-initiating a new comm session. The three 1808 stacks are serviced by three re-entrant copies of the Polling Engine; all of their respective subVIs are similarly re-entrant, so that everything can run in parallel. That's the intention, but in practice there are problems... 1. When running under RT, about every hour or two, the system will show an Error 56 timeout (To=50 mSec) and perform a disconnect-reconnect cycle as designed. This works perfectly, with the reconnection cycle taking 10-12 mSec. The timeouts seem to be pretty evenly distributed between the three 1808 stacks. All well and good, BUT: 2. After exactly 62 reconnections (which at the failure rates noted take 2-3 days of continuous operation) no more reconnections will occur. Neither can I connect to the 8106 RT controller via FTP. It's as if we've completely run out of TCP sockets. And yes, I've checked the very obvious: the old connection refnum is indeed closed before a new one is opened. 3. Variation #1: if I take the parallel, asynchronous operation described above and run it from my PC host under Windows, none of the random disconnect-reconnect cycling occurs. The system is rock-solid. 4. Variation #2: if I restructure the TCP Polling Engines so that they run serially rather than in parallel, and run this under RT, none of the random disconnect-reconnect cycling occurs. The system is rock-solid. Here's the ugly conclusion that I've arrived at: the TCP API cannot reliably support multiple, parallel, asynchronous calls in RT. Moreover, the TIMED_WAIT operation for holding and releasing ports/sockets doesn not seem to be reliably operating. Has anyone else in the community observed this kind of behavior in TCP comms under PXI-RT? Are we asking too much from RT that it works as well as Windows? Other than completely serializing the calls to the TCP API, so that we're only doing one at a time, are there any other methods for making multi-device TCP communications under RT reliable? Close but not exactly... If I did not share this and it ended up being the issue, I'd feel bad. The Modbus libraries that NI had their web-site where OK if that was all you were doing so they make a nice demo. They did not play well with other process. I went through and refactored them adding "zero ms wiats' to loops and some other stuff. Maybe instances are fighting with each other and can play nicer together if they just share the CPU better. Ben Quote Link to comment
Mark Smith Posted February 21, 2009 Report Share Posted February 21, 2009 QUOTE (neBulus @ Feb 20 2009, 08:47 AM) Close but not exactly...If I did not share this and it ended up being the issue, I'd feel bad. The Modbus libraries that NI had their web-site where OK if that was all you were doing so they make a nice demo. They did not play well with other process. I went through and refactored them adding "zero ms wiats' to loops and some other stuff. Maybe instances are fighting with each other and can play nicer together if they just share the CPU better. Ben Ben, You may have the answer - IIRC (from my fuzzy recollection of RTOS class), you can starve a thread/process/whatever so that it never executes if higher priority tasks are still requesting CPU cycles. So, of the some of the code is written so that it always operates at a lower priority adding the zero msec wait ("Wiring a value of 0 to the milliseconds to wait input forces the current thread to yield control of the CPU" - LV Help) should make sure it executes. So maybe when the close connection is called asynchronously it never really gets executed because it never gets a high enough priority but when called serially it's not competing for cycles and does get executed correctly and the port gets released for re-use. This is all complete speculation, so TIFWIIW. Mark Quote Link to comment
jgcode Posted February 21, 2009 Report Share Posted February 21, 2009 QUOTE (neBulus @ Feb 21 2009, 12:47 AM) The Modbus libraries that NI had their web-site where OK if that was all you were doing so they make a nice demo. They did not play well with other process. I went through and refactored them adding "zero ms wiats' to loops and some other stuff. FWIW, I prefer to use the IO Server when communicating with Modbus on RT rather than those VIs. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.