Debugging why TCP disconnects

infinitenothing · December 13, 2016

I'm debugging an issue where the server and client communicate fine for a long period of time but suddenly the server and client stop talking. Eventually the server (LabVIEW Linux RT) starts seeing error 56 on send and then 15 minutes later, the server's read will see error 66. The client is running on windows (not programmed in LabVIEW) and we can run diagnostics like Wireshark there. It's a 1:1 connection. Is there a way to gather more info as to why a TCP connection fails? Example be able to tell if:

Client closed connection (normal disconnect)

Some one pulled the cable (can blame user)

Client thread crashes (can blame client developer)

ShaunR · December 15, 2016

The only time you usually see an error 56 on a send is when the TCPIP buffer is full. Error 66 is a normal server disconnect.

There are a couple of reasons that you may get error 56 on a send but the usual one is sending too quickly, say, 2MB/s on a 10mb TCPIP port. Other less frequent are when the connection goes deaf and mute but the connection is still established (usually happens with transparent proxies) and NIC problems (especially with multiple cards.)

hooovahh · December 16, 2016

On my Linux RT setup I've seen several issues with TCP connections just getting lost, glad to hear I'm not the only one. For now I've just added a retry which will close, and reopen the connection and issue the previous request and wait for the reply. I've also added code to periodically send a request and get a reply even if I don't need it, just so I can keep the connection alive, and close and reopen it if the connection when down.

In my setup I'm talking to 4 pieces of TCP equipment, through a router on one NIC, and on the other talking directly to the Windows PC. The error I've seen on the RT side is 63, connection refused by the server, which makes me want to blame the equipment a bit but its all new, and it is 1:1 like your setup. I haven't figured out the problem, but the retries are working and add a bit of jitter but is acceptable based on the timing needs of the equipment. I'm not saying there is something wrong with the RT Linux TCP stack, but I don't really have much to go on.

smithd · December 16, 2016

In linux, macs and 2016+ windows you can only have 1024 connections, but I believe this includes any in the time_wait state. If your server side is the one closing connections maybe you're running out of sockets to talk on?

Also, at least on windows you can get access to the underlying socket using http://digital.ni.com/public.nsf/allkb/7EFCA5D83B59DFDC86256D60007F5839
You could potentially add keepalives at the socket level rather than implementing yourself (its just an input to setsockopt and the windows equivalent).

Edited December 17, 2016 by smithd
lol max=macs

infinitenothing · December 19, 2016

I'd much prefer to get to the root cause of the issue than to try to buffer my own retry. I just don't have enough experience troubleshooting network issues to know how to isolate that sort of problem

Edited December 19, 2016 by infinitenothing

Rolf Kalbermatter · December 28, 2016

On 12/19/2016 at 7:53 PM, infinitenothing said:

I'd much prefer to get to the root cause of the issue than to try to buffer my own retry. I just don't have enough experience troubleshooting network issues to know how to isolate that sort of problem

While finding the root cause is of course always a good thing, networking is definitely not something that you can simply rely to work always uninterrupted. Any stable networking library will have to implement some kind of retry scheme at some point.

HTTP did this traditionally by usually reopening a connection for every new request. Wasteful but very stable!

Newer HTTP communication supports a keep alive feature, but with the additional provision to close the connection on any error anyways and on the client side reconnecting again on every possible error including when the server closed the connection forcefully despite being asked to please keep it alive.

Most networks and especially TCP/IP were never designed to guarantee uninterrupted connections. What TCP guarantees is a clear success or failure on any packet transmission and also proper order of successful packets in the same order as they were send, but nothing more. UDP on the other hand doesn't even guarantee any of these.

infinitenothing · December 28, 2016

Since it's a 1:1 connection, I'm hoping that disconnects are infrequent enough that in the rare instance that there's a disconnect I can just let the user click a "reconnect" button when it happens. Of course to get to that point, I need to rule out as many possible causes of disconnects (application, OS, etc) as possible which brought me here.

Buffering on the server is something I'm very nervous about. It's a fairly dense data stream and it would pile up quickly or could cause CPU starvation issues if implemented wrong. All this makes me appreciate NI's Network Streams. I just wish there was an open source library for the client.

bbean · December 28, 2016

One thing to verify as well is that your clients power management settings for the NIC or wireless aren't automatically shutting down the adapters after long periods to save power.

Sign In

Debugging why TCP disconnects

Recommended Posts

infinitenothing

ShaunR

hooovahh

smithd

infinitenothing

Rolf Kalbermatter

infinitenothing

bbean

Join the conversation

Browse

Activity

Important Information