Jump to content
Stobber

TCP errors 56 and 66 on VxWork sbRIO

Recommended Posts

I have a small LV code library that wraps the STM and VISA/TCP APIs to make a "network actor". My library dynamically launches a reentrant VI using the ACBR node that establishes a TCP connection as client or server, then uses the connection to poll for inbound messages and send outbound ones.

 

When I try to establish a connection using my Windows 7 PC as the client and my sbRIO (VxWorks) as the server, it connects and pushes 100 messages from the client if the server was listening first. If the client spins up first, sometimes it works and sometimes I get error 56 from the "TCP Open Connection" primitive repeatedly. Other times I've seen error 66.

 

When I try to make Windows act as the server, the sbRIO client connects without error, but it returns error 56 from "TCP Read" on the first call thereafter (inside "STM.lvlib:Read Meta Data (TCP Clst).vi"). This happens in whichever order I run them.

 

This test, and all others I've written for the API, works just fine when both client and server are on the local machine.

 

-------------------------------

 

I'm tearing my hair out trying to get a reliable connection-establishment routine going. What don't I know? What am I missing? Is there a "standard" way to code the server and client so they'll establish a connection reliably between an RT and a non-RT peer?

 

Share this post


Link to post
Share on other sites

A listener has to be running for a client to connect via TCPIP otherwise you should get an error 63 (connection refused). That's just the way TCPIP works. It is up to the client to retry [open] connection attempts in the event of a connect failure but the general assumption is that there is an existing listener (service) on the target port of the server that the client is attempting to connect to.

 

N.B.

Regardless of the purpose of the data.to be sent or recieved. In TCPIP, Client = Open and Server = Listen

Edited by ShaunR

Share this post


Link to post
Share on other sites

Network streams?

I used them for two years, and they repeatedly failed on their promise. Lots of errors thrown by the API that the application had to be protected from (including error codes documented only for Shared Variables?!), lots of issues with reconnection from the same app instance after an unexpected disconnection, lots of issues with namespacing endpoints, element buffering, etc. I never kept detailed notes on all of it to share with others, but the decision to use raw TCP was actually a decision to get the hell off Network Streams.

 

A listener has to be running for a client to connect via TCPIP otherwise you should get an error 63 (connection refused). That's just the way TCPIP works. It is up to the client to retry [open] connection attempts in the event of a connect failure but the general assumption is that there is an existing listener (service) on the target port of the server that the client is attempting to connect to.

 

N.B.

Regardless of the purpose of the data.to be sent or recieved. In TCPIP, Client = Open and Server = Listen

 

Right, thanks. Glad to know that observation makes sense. Now to debug the part where a connection is closed on the VxWorks target between "Listen" and "Read" for no apparent reason...

Edited by Stobber

Share this post


Link to post
Share on other sites

Error 56 - a timeout - isn't really an error, it just means there wasn't any data. The connection is still valid, and data could arrive later. In most of my TCP code I ignore error 56, especially if I'm polling multiple connections and expect that there won't be data on most of them most of the time. I bundle my TCP references with a timestamp indicating the last time that data was received on that connection, and if it's ever been longer than an hour since I last received data on that connection, then I close that connection. I've used this same approach with the STM (adapting existing code that used it) and it worked fine there too.

  • Like 1

Share this post


Link to post
Share on other sites

I used them for two years, and they repeatedly failed on their promise. Lots of errors thrown by the API that the application had to be protected from (including error codes documented only for Shared Variables?!), lots of issues with reconnection from the same app instance after an unexpected disconnection, lots of issues with namespacing endpoints, element buffering, etc. I never kept detailed notes on all of it to share with others, but the decision to use raw TCP was actually a decision to get the hell off Network Streams.

 

Certainly the buffering has it's oddities, but I have not come across the other issues you mention. I still think you are brave for going back to pure TCP/IP though!

Share this post


Link to post
Share on other sites

Error 56 - a timeout - isn't really an error, it just means there wasn't any data. The connection is still valid, and data could arrive later. In most of my TCP code I ignore error 56, especially if I'm polling multiple connections and expect that there won't be data on most of them most of the time. I bundle my TCP references with a timestamp indicating the last time that data was received on that connection, and if it's ever been longer than an hour since I last received data on that connection, then I close that connection. I've used this same approach with the STM (adapting existing code that used it) and it worked fine there too.

Timestamps are very useful.. That's the reason I added one to the Transport.lvlib so that I could ascertain the clock skew between the sender and receiver.

Share this post


Link to post
Share on other sites

Thank you all for the help. It turns out that some of the TCP Read calls inside STM.lvlib's VIs were causing error 56 or returning junk bytes when I set the timeout too low. (e.g. "0 ms" in an attempt to create a fast poller that was throttled by a different timeout elsewhere in the loop). When I increased the timeouts on all TCP Read functions, my problems went away. Well, that's how it looks right now, anyway.

 

Incidentally, setting a timeout of 0 ms works fine if I'm asking client and server to talk over the same network interface on the same PC. That kind of makes sense.

 

Update: I'm now having serious problems with backlogged messages. I get messages out of the TCP Read buffer in bursts, and it seems the backlog grows constantly while running my app. This is breaking the heartbeat I'm supposed to send over the link, so each application thinks the other has gone dead after a second or two. Anybody know what might cause the TCP connection to lag so badly?

Edited by Stobber

Share this post


Link to post
Share on other sites

Update: I'm now having serious problems with backlogged messages. I get messages out of the TCP Read buffer in bursts, and it seems the backlog grows constantly while running my app. This is breaking the heartbeat I'm supposed to send over the link, so each application thinks the other has gone dead after a second or two. Anybody know what might cause the TCP connection to lag so badly?

 

I have not messed about with this kind of thing in years, but perhaps it's Nagle's algorithm at play? You can disable it using Win32 calls, see here.

Edited by Neil Pate
  • Like 1

Share this post


Link to post
Share on other sites

Update: I'm now having serious problems with backlogged messages. I get messages out of the TCP Read buffer in bursts, and it seems the backlog grows constantly while running my app. This is breaking the heartbeat I'm supposed to send over the link, so each application thinks the other has gone dead after a second or two. Anybody know what might cause the TCP connection to lag so badly?

 

As a guess  I would say you probably have a short timeout and the library uses STANDARD mode when reading.

Do you get this behaviour when the read timeout is large, say, 25 seconds?

 

Most TCPIP libraries use a header that has a length value. This makes them application specific, but really easy to write. They rely on reading x bytes and then using that as a length input to read the rest of the message. That's fine and works in most cases as long as the first n bytes of a read cycle are guaranteed to be the length.

 

What happens if you have a very short timeout though (like your zero ms)? Now you are not guaranteed to read all of the x length bytes. You might only get one or two of a 4 byte value and then timeout. If using STANDARD mode the bytes are returned even though timed out but now they are gone from the input stream (destructive reads). So the next stage doesn't read N bytes as normal because of the timeout but now you are out of sync so when you come around the next time you start half way through the length bytes. This is why Transport.lvlib uses BUFFERED rather than STANDARD since buffered isn't destructive after a timeout and you can circle back an re-read until you have all the bytes at the requested number of bytes.

Edited by ShaunR

Share this post


Link to post
Share on other sites

As a guess  I would say you probably have a short timeout and the library uses STANDARD mode when reading.

Do you get this behaviour when the read timeout is large, say, 25 seconds?

Actually, I'm using a patched version of STM where I fixed that issue by changing all TCP reads to BUFFERED mode. :) So that's not a problem.

 

I do successfully read in a simple test VI with a large timeout, but debugging in my application which requires a heartbeat in each direction every 250 ms makes the whole thing lag and choke. I'm going to try and write a more complex test VI that does the heartbeating without any other app logic involved.

I have not messed about with this kind of thing in years, but perhaps it's Nagle's algorithm at play? You can disable it using Win32 calls, see here.

Let me look into that, too. Thanks!

Share this post


Link to post
Share on other sites

Actually, I'm using a patched version of STM where I fixed that issue by changing all TCP reads to BUFFERED mode. :) So that's not a problem.

 

I do successfully read in a simple test VI with a large timeout, but debugging in my application which requires a heartbeat in each direction every 250 ms makes the whole thing lag and choke. I'm going to try and write a more complex test VI that does the heartbeating without any other app logic involved.

Let me look into that, too. Thanks!

250 ms is pushing it as that is what the default nagle is. That should not make it back up messages, though, just your heartbeat will timeout sometimes. If you have hacked the library to use BUFFERED instead of STANDARD then it is probable that you are just not consuming the bytes because a read no longer guarantees that they are removed. That will cause the windows buffer to fill and eventually become unresposive until you remove some bytes.

Edited by ShaunR

Share this post


Link to post
Share on other sites

250 ms is pushing it as that is what the default nagle is. That should not make it back up messages, though, just your heartbeat will timeout sometimes.

That's definitely happening, and it's the first-tier cause of my headache. I need to get heartbeating working consistently again.

 

 

If you have hacked the library to use BUFFERED instead of STANDARD then it is probable that you are just not consuming the bytes because a read no longer guarantees that they are removed. That will cause the windows buffer to fill and eventually become unresposive until you remove some bytes.

Huh...that's good to know. Is there a way to check the buffer without popping from it? I could add a check-after-read while debugging to make sure I'm getting everything I think I'm getting.

Share this post


Link to post
Share on other sites

That's definitely happening, and it's the first-tier cause of my headache. I need to get heartbeating working consistently again.

 

 

Huh...that's good to know. Is there a way to check the buffer without popping from it? I could add a check-after-read while debugging to make sure I'm getting everything I think I'm getting.

Well. Neil has given you the VI to turn the Nagle algo off. So that shouldn't be a problem.

 

There is no primitive to use with the standard TCPIP VIs to find the number of bytes waiting. There is a Bytes At Port property for VISA, but people tend to prefer the simplicity of the TCPIP VIs.

Share this post


Link to post
Share on other sites

Another solution, although this may sound overly complicated, is maintain a buffer of all the received bytes from a given connection yourself. Since I'm already bundling a timestamp with my TCP connection IDs, adding a string to that cluster isn't a big deal. Each time I read bytes from a connection, I append them to the existing buffer, then attempt to process them as a complete packet (or as multiple packets, in a loop). Any remaining bytes after that processing go back in the buffer.

 

There might be another approach to work around the Nagle algorithm. Try doing a read (even of 0 bytes) immediately following each write. I believe this will force the data to be sent, on the assumption that the read is waiting for a response from the just-sent packet. I'm not completely certain of this, but I think I tried it once and it worked.

 

For a heartbeat, if you're just checking if the remote system is running, it might be worth switching to UDP.

Share this post


Link to post
Share on other sites

It was Nagle.

 

Summary:

Connection issues between two machines because of extremely short timeouts on the TCP Open/Listen functions and the STM Metadata functions. Packet buffering issues because of the Nagle algorithm.

 

Thank you guys very very much for all the help!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Similar Content

    • By David Akinpelu
      I am implementing a TCP connection between MyRio(client) and Python(server). The goal is to send data from the server to the client, perform some calculations, and send the result to the server. I need to keep changing the data being sent to the client. I noticed the first data works fine and the result returned to the server, but the client could not read subsequent data. I debugged and was able to detect that the error is coming from the first read function is the "Reading values subvi"  but I don't know how to correct it. How do I go about it?
      I have attached the files below.
      Reading Unknown Bytes from TCP.vi Second_client.vi SimpleServer.py
    • By Diegodemf
      Hello, I'm trying to send data from my RPI b+ to labview and store the information in a spreadsheet, right now the data I'm using is just a sawtooth generated in the python code but eventually it will be sensor data adquired via spi, the data comes at =~ 6-7khz but the time it takes to send the data to labview is much greater, I fear the circular buffer I intended to use to store the data before sending it via tcp will get filled pretty fast. I'm not very experienced in either python nor Labview, I was wondering if i could get some advice in the optimization of the code.

      I attached the VI of the client and a VI that graphs the data on the spreadsheet, also the python code.
      Thanks in advance, sorry for the mess.
      Waveform Grapher.vi
      Client.vi
      serverSerial.py
    • By bigjoepops
      I am working with a Universal Robot arm.  I believe it is a UR-5.  I was under the assumption that it could be controlled via Modbus, but I found that was incorrect.  I need to use URScript commands.  I am having a hard time getting started with this.  I was wondering if anyone has written a VI that would control the movement of one of these robot arms.
      Thanks.
    • By Zyl
      Hi everybody,
       
      I'm actually running on a problem with a  TCP connection between 2 cRIOS.
      One cRIO is a server which writes 76 bytes long messages every 10ms (today, but can be anything between 1ms and 1s) using STM Write VI (so at the end it pushes 82 bytes long message in the TCP write function). I want that message to be sent only if the client as time to read it, so I set the timeout to 0.
      The other cRIO is a the client with tries to read on the TCP link at a speed of 1000Hz (1ms, Wait next Multiple used to ensure that the loop is not running faster). I use the STM Read VI to get the data sent from the other cRIO. The read function has a timeout of 100ms.
      What I was expected is that the client loop would actually runs at 10ms rate (server writing rate) due to the 100ms timeout. And if the server writes faster, it would follow the server rate. If the server rate is greater that 100ms, error 56 would be fired, and I would handle it.
      What happens is that the server writes the 82 bytes every 10ms. But the client loop is always getting data and runs at 1ms ; which means that the timeout is not respected ! I disabled Nagel algorithm on the server side to be sure that the message is sent when requested, but it didn't help. The client acts like there is always data in the read buffer. Even if it can be right for the first iterations, I would expect that running at 1ms rate, the client would empty the buffer rapidly, but it seems that it never ends... Moreover, the longer the server writes, the longer it takes for the client to see empty buffer (timeout reached again and error 56) when the server is stopped (but connection not closed).
      Does somebody already ran into this issue ?
      Any idea on how I can solve this ?
       
      Server code is attached to the post. 2 TCP connections are established between the server and client (so same IP address, but different ports), but only one is used (upper loop). The other opens and close immediately because EnableStream boolean is always false.

    • By liskped
      Hi.
      Im working on this prosjekt where I have a TCP communication between two computers (Server & Klient). I have managed to get communication between those two. (Sending and reciving one cluster.) But I want to send and recive more than one. I looked at det Topic "Passing Cluster and Datatype Through TCP" on how to use the OpenG, but I have a problem with "the Variant to Data". 
      What am I doing wrong? Or is there another way to send more data?
       
      The Klient side:


       
       
       
      The Server side: 


       
       
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.