Jump to content

Timeout in TCP read


Tim_S

Recommended Posts

Some of my code is giving me behavior I'm not understanding. I've been talking with NI tech support, but I'm trying to better understand what's going on for the foreseeable project down the road that is going to tax what TCP can transmit.

 

I have a PC directly connected to a cRIO-9075 with a cable (no switch involved). I've put together a little test application that, on the RT side, creates an array of 6,000 U32 integers, waits for a connection, and then starts transmitting the length of the array (in bytes) and the array itself over TCP every 100 msec. The length and data are a single TCP write.

 

On the PC side, I have a TCP open, a read of four bytes to get the length of the data, then a read of the data itself. The second TCP read does not occur if there is any errors with the first TCP read. Both reads have a 100 msec timeout.

 

The error I'm getting is a sporatic timeout (error 56) at the second TCP read on the PC side. This causes the next read of the data length to be from my data so I get invalid data from there out. The error occurs from second to hours after the start of transmission.

 

As a sanity check, I did some math on how long it should take to transmit the data. Ignoring the overhead of TCP communication, it should take ~2 msec for the write to transmit.

 

A workaround seems to be to have an infinite timeout (value of -1) for the second TCP read. I'm rather leery of having an infinite (or very long) timeout in the second read. Tech support was able to get this working with 250 msec on the second read.

 

Test VIs uploaded...

Test Stream Data.zip

Link to comment

Some of my code is giving me behavior I'm not understanding. I've been talking with NI tech support, but I'm trying to better understand what's going on for the foreseeable project down the road that is going to tax what TCP can transmit.

 

I have a PC directly connected to a cRIO-9075 with a cable (no switch involved). I've put together a little test application that, on the RT side, creates an array of 6,000 U32 integers, waits for a connection, and then starts transmitting the length of the array (in bytes) and the array itself over TCP every 100 msec. The length and data are a single TCP write.

 

On the PC side, I have a TCP open, a read of four bytes to get the length of the data, then a read of the data itself. The second TCP read does not occur if there is any errors with the first TCP read. Both reads have a 100 msec timeout.

 

The error I'm getting is a sporatic timeout (error 56) at the second TCP read on the PC side. This causes the next read of the data length to be from my data so I get invalid data from there out. The error occurs from second to hours after the start of transmission.

 

As a sanity check, I did some math on how long it should take to transmit the data. Ignoring the overhead of TCP communication, it should take ~2 msec for the write to transmit.

 

A workaround seems to be to have an infinite timeout (value of -1) for the second TCP read. I'm rather leery of having an infinite (or very long) timeout in the second read. Tech support was able to get this working with 250 msec on the second read.

 

Test VIs uploaded...

 

You use buffered mode, This mode only returns:

1) once timeout has expired, without retrieving any data

2) whenever all the requested bytes have arrived

 

Seems your communication somehow looses somewhere bytes. With buffered mode you return with a timeout and the data stays there, and then gets read and interpreted as anothere length code sending you even more into the offside.

Link to comment

I've just tried that with no improvement. I went back through the emails with tech support and see that they have tried that as well.

 

1) once timeout has expired, without retrieving any data

2) whenever all the requested bytes have arrived

 

Seems your communication somehow looses somewhere bytes. With buffered mode you return with a timeout and the data stays there, and then gets read and interpreted as anothere length code sending you even more into the offside.

Correct. Once I'm out of synch, the only recourse with this method is to stop everything and restart from a known point. What's is confusing about the loss of bytes is the length and data are a single write so should show up at "the same time". They will be multiple packets, but there should not be collisions or packet loss.

Link to comment

Correct. Once I'm out of synch, the only recourse with this method is to stop everything and restart from a known point. What's is confusing about the loss of bytes is the length and data are a single write so should show up at "the same time". They will be multiple packets, but there should not be collisions or packet loss.

 

Well the next step in debugging this, would be to enable another mode on the read side and see what data actually arrives. Then comparing that data with what you think you sent on the cRIO side.

Link to comment

Well the next step in debugging this, would be to enable another mode on the read side and see what data actually arrives. Then comparing that data with what you think you sent on the cRIO side.

Switched to Standard to read whatever is available at timeout. Of what does get received, the data is correct.

 

I've Wiresharked the connection. This is looking something lower-level as there are responses to the TCP packets of "Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)". I'm now getting errors (code 56 and 66) on the RT side where I wasn't before.

 

A coworker dropped off a PLC I'm supposed to talk Ethernet/IP to. I was able to get Ethernet/IP going for 5+ minutes without any errors using the same cable and PC.

 

For grins, I changed from 6000 element array to 50 (should fit within one packet). The errors in the test routine went away. I'm seeing Reassembly errors in Wireshark, but those appear to be for the RT front panel (in development mode). Creeping up the array size, the issue returns soon as go to two packets for the message.

Link to comment
  • 2 weeks later...

Update on this... This issue only appears on my development PC in development environment and run-time. I was able to set up executables to test non-development PC-to-PC and PC-to-cRIO for extended periods of time; these tests went for overnight without issue. I'm still working with technical support on this issue.

Edited by Tim_S
Link to comment

Update on this... This issue only appears on my development PC in development environment and run-time. I was able to set up executables to test non-development PC-to-PC and PC-to-cRIO for extended periods of time; these tests went for overnight without issue. I'm still working with technical support on this issue.

 

If it seems limited to your PC then the most likely suspect would seem the network card and according driver in that PC. Wouldn't be the first time that network card drivers do buggy things.

 

Maybe it has to do with jumbo frame handling. Try to see if you can disable that in the driver configuration. As far as I know cRIO doesn't support jumbo frames at all, so there shouldn't be any jumbo frames transmitted, but it could be that an enabled jumbo frame handling in the driver does try to be too smart and reassemble multiple TCP packets into such frames to pass to the Windows socket interface.

Link to comment

If it seems limited to your PC then the most likely suspect would seem the network card and according driver in that PC. Wouldn't be the first time that network card drivers do buggy things.

Amen.

 

Maybe it has to do with jumbo frame handling. Try to see if you can disable that in the driver configuration. As far as I know cRIO doesn't support jumbo frames at all, so there shouldn't be any jumbo frames transmitted, but it could be that an enabled jumbo frame handling in the driver does try to be too smart and reassemble multiple TCP packets into such frames to pass to the Windows socket interface.

I checked the driver on my development PC and have verified that Jumbo Packet is disabled. I haven't found much on Jumbo Frames with cRIO, but what I have found indicates it is only available with Windows and not RT.

 

I am starting to think the 'standard corporate IT load and policies' are causing some instability in long-term TCP communication. Unfortunately, I don't think I can prove that or change it.

Link to comment
  • 3 weeks later...

Update:

Reversing the data (Windows PC as server, cRIO as client) ran overnight without issues.

 

Both NI tech support and I were able to get the behavior to reproduce with the shipping examples "simple TCP client" and "simple TCP server". NI tested using a dual-processor cRIO which did not show the behavior.

 

The next test I've been asked to try is to monitor CPU usage on the cRIO. Hypothesis is the CPU usage momentarily goes very high causing the TCP communication to get delayed. We haven't thought on what could be causing CPU usage to spike.

Link to comment
  • 1 month later...

I do have a resolution, but not an explanation. Sending of commands/response is still through TCP, however the fast data transfer has been switched to a network stream. I was able to get 600,000 U32 per second to transfer overnight without issue (better performance than I need). Network streams are built on TCP, so this has NI tech support and myself scratching our heads as to why one works well and the other does not.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.