Jump to content

Timeout in TCP read


Tim_S

Recommended Posts

Some of my code is giving me behavior I'm not understanding. I've been talking with NI tech support, but I'm trying to better understand what's going on for the foreseeable project down the road that is going to tax what TCP can transmit.

 

I have a PC directly connected to a cRIO-9075 with a cable (no switch involved). I've put together a little test application that, on the RT side, creates an array of 6,000 U32 integers, waits for a connection, and then starts transmitting the length of the array (in bytes) and the array itself over TCP every 100 msec. The length and data are a single TCP write.

 

On the PC side, I have a TCP open, a read of four bytes to get the length of the data, then a read of the data itself. The second TCP read does not occur if there is any errors with the first TCP read. Both reads have a 100 msec timeout.

 

The error I'm getting is a sporatic timeout (error 56) at the second TCP read on the PC side. This causes the next read of the data length to be from my data so I get invalid data from there out. The error occurs from second to hours after the start of transmission.

 

As a sanity check, I did some math on how long it should take to transmit the data. Ignoring the overhead of TCP communication, it should take ~2 msec for the write to transmit.

 

A workaround seems to be to have an infinite timeout (value of -1) for the second TCP read. I'm rather leery of having an infinite (or very long) timeout in the second read. Tech support was able to get this working with 250 msec on the second read.

 

Test VIs uploaded...

Test Stream Data.zip

Link to post

Some of my code is giving me behavior I'm not understanding. I've been talking with NI tech support, but I'm trying to better understand what's going on for the foreseeable project down the road that is going to tax what TCP can transmit.

 

I have a PC directly connected to a cRIO-9075 with a cable (no switch involved). I've put together a little test application that, on the RT side, creates an array of 6,000 U32 integers, waits for a connection, and then starts transmitting the length of the array (in bytes) and the array itself over TCP every 100 msec. The length and data are a single TCP write.

 

On the PC side, I have a TCP open, a read of four bytes to get the length of the data, then a read of the data itself. The second TCP read does not occur if there is any errors with the first TCP read. Both reads have a 100 msec timeout.

 

The error I'm getting is a sporatic timeout (error 56) at the second TCP read on the PC side. This causes the next read of the data length to be from my data so I get invalid data from there out. The error occurs from second to hours after the start of transmission.

 

As a sanity check, I did some math on how long it should take to transmit the data. Ignoring the overhead of TCP communication, it should take ~2 msec for the write to transmit.

 

A workaround seems to be to have an infinite timeout (value of -1) for the second TCP read. I'm rather leery of having an infinite (or very long) timeout in the second read. Tech support was able to get this working with 250 msec on the second read.

 

Test VIs uploaded...

 

You use buffered mode, This mode only returns:

1) once timeout has expired, without retrieving any data

2) whenever all the requested bytes have arrived

 

Seems your communication somehow looses somewhere bytes. With buffered mode you return with a timeout and the data stays there, and then gets read and interpreted as anothere length code sending you even more into the offside.

Link to post

I've just tried that with no improvement. I went back through the emails with tech support and see that they have tried that as well.

 

1) once timeout has expired, without retrieving any data

2) whenever all the requested bytes have arrived

 

Seems your communication somehow looses somewhere bytes. With buffered mode you return with a timeout and the data stays there, and then gets read and interpreted as anothere length code sending you even more into the offside.

Correct. Once I'm out of synch, the only recourse with this method is to stop everything and restart from a known point. What's is confusing about the loss of bytes is the length and data are a single write so should show up at "the same time". They will be multiple packets, but there should not be collisions or packet loss.

Link to post

Correct. Once I'm out of synch, the only recourse with this method is to stop everything and restart from a known point. What's is confusing about the loss of bytes is the length and data are a single write so should show up at "the same time". They will be multiple packets, but there should not be collisions or packet loss.

 

Well the next step in debugging this, would be to enable another mode on the read side and see what data actually arrives. Then comparing that data with what you think you sent on the cRIO side.

Link to post

Well the next step in debugging this, would be to enable another mode on the read side and see what data actually arrives. Then comparing that data with what you think you sent on the cRIO side.

Switched to Standard to read whatever is available at timeout. Of what does get received, the data is correct.

 

I've Wiresharked the connection. This is looking something lower-level as there are responses to the TCP packets of "Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)". I'm now getting errors (code 56 and 66) on the RT side where I wasn't before.

 

A coworker dropped off a PLC I'm supposed to talk Ethernet/IP to. I was able to get Ethernet/IP going for 5+ minutes without any errors using the same cable and PC.

 

For grins, I changed from 6000 element array to 50 (should fit within one packet). The errors in the test routine went away. I'm seeing Reassembly errors in Wireshark, but those appear to be for the RT front panel (in development mode). Creeping up the array size, the issue returns soon as go to two packets for the message.

Link to post
  • 2 weeks later...

Update on this... This issue only appears on my development PC in development environment and run-time. I was able to set up executables to test non-development PC-to-PC and PC-to-cRIO for extended periods of time; these tests went for overnight without issue. I'm still working with technical support on this issue.

Edited by Tim_S
Link to post

Update on this... This issue only appears on my development PC in development environment and run-time. I was able to set up executables to test non-development PC-to-PC and PC-to-cRIO for extended periods of time; these tests went for overnight without issue. I'm still working with technical support on this issue.

 

If it seems limited to your PC then the most likely suspect would seem the network card and according driver in that PC. Wouldn't be the first time that network card drivers do buggy things.

 

Maybe it has to do with jumbo frame handling. Try to see if you can disable that in the driver configuration. As far as I know cRIO doesn't support jumbo frames at all, so there shouldn't be any jumbo frames transmitted, but it could be that an enabled jumbo frame handling in the driver does try to be too smart and reassemble multiple TCP packets into such frames to pass to the Windows socket interface.

Link to post

If it seems limited to your PC then the most likely suspect would seem the network card and according driver in that PC. Wouldn't be the first time that network card drivers do buggy things.

Amen.

 

Maybe it has to do with jumbo frame handling. Try to see if you can disable that in the driver configuration. As far as I know cRIO doesn't support jumbo frames at all, so there shouldn't be any jumbo frames transmitted, but it could be that an enabled jumbo frame handling in the driver does try to be too smart and reassemble multiple TCP packets into such frames to pass to the Windows socket interface.

I checked the driver on my development PC and have verified that Jumbo Packet is disabled. I haven't found much on Jumbo Frames with cRIO, but what I have found indicates it is only available with Windows and not RT.

 

I am starting to think the 'standard corporate IT load and policies' are causing some instability in long-term TCP communication. Unfortunately, I don't think I can prove that or change it.

Link to post
  • 3 weeks later...

Update:

Reversing the data (Windows PC as server, cRIO as client) ran overnight without issues.

 

Both NI tech support and I were able to get the behavior to reproduce with the shipping examples "simple TCP client" and "simple TCP server". NI tested using a dual-processor cRIO which did not show the behavior.

 

The next test I've been asked to try is to monitor CPU usage on the cRIO. Hypothesis is the CPU usage momentarily goes very high causing the TCP communication to get delayed. We haven't thought on what could be causing CPU usage to spike.

Link to post
  • 1 month later...

I do have a resolution, but not an explanation. Sending of commands/response is still through TCP, however the fast data transfer has been switched to a network stream. I was able to get 600,000 U32 per second to transfer overnight without issue (better performance than I need). Network streams are built on TCP, so this has NI tech support and myself scratching our heads as to why one works well and the other does not.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Content

    • By ThomasGutzler
      Hi,
      I'm connecting to a Rigol DZ1000 Oscilloscope via USB and using the :DISP:DATA? ON,0,PNG command to grab a screenshot. Reading out the data in blocks of 65535 bytes until there is no more (see attached vi).
      This normally works fine but yesterday I was getting a timeout error. I fired up IO Trace and got this:
      > 783. viRead (USB0::0x1AB1::0x04CE::DS1ZA201305475::INSTR (0x00000001), "#9000045852‰PNG.......", 65536 (0x10000), 45864 (0xB328)) > Process ID: 0x000039C8 Thread ID: 0x00001760 > Start Time: 13:13:54.1169 Call Duration 00:00:10.4323 > Status: 0xBFFF0015 (VI_ERROR_TMO) You can see that 45864 bytes were received, which is exactly what was specified by the binary data header (45852 data bytes + 11 header bytes + 1 termination char)
      I dumped the reply string into a binary file and set the scope to run so it show something else on the screen. Sure enough the error went away. I also dumped a good result into a file. Then I tried to figure out what the problem may have been but I didn't get anywhere. Any ideas? Sure looks like a bug in VISA read or perhaps an incorrectly escaped reply from the scope?
      It's very easy to "convert" the reply into the screenshot - just remove the leading 15 bytes (4 bytes from WriteBinayFile and 11 bytes from the scope header). And yes, both data files display just fine as PNG. I don't think PNG does internal checksum so byte errors would be hard to spot.
      Any ideas what could have caused that timeout?

    • By Argold
      I'm writing an application where users can plug in their own matlab code via a Mathscript "wrapper".  I need a way of timing out the matlab in case the user creates an infinite loop.
      Does anyone know if mathscript provides a native timeout (google didn't find anything when searching "mathscript timeout").
      If not, does anyone have any good (hopefully simple) suggestions?
      Thanks!
    • By piZviZ
      In my project i have 2 task
       
      (1).Read data of sensor (attached to arduino board) serially  and display on LabVIEW.
      (2).Control output(just ON-OFF) pins of arduino using LabVIEW.
       
       
      So,for that i am using TAB in labview.created 2 TAB called Oscillospe and Input.
       
      If i first start Oscilloscope it work well.
      But when i come to Oscilloscope tab after using Input tab.
       
      It gives Time out error (VISA read)
       
      If i am executing in Highlight mode it works well
       
      tab--event.vi
    • By Roryriffic
      Hi all,
       
      new to Lava, so (kindly) let me know if I screwed up my post
       
      I have a multi-user web application controlled by a LV web service.  I keep track of all the users through the web service > sessions VI's.  This is pretty cool and works really great. I can give each user different levels of authentication, force them to use SSL, track where they go and when throughout the website, etc.
       
      Now what I'd like to do is keep track of when their session expires.  Obviously I can provide a "Logout" link on the client-side which ultimately calls the "Destroy Session" VI, but I can't ensure that all clients will click the link to log out (e.g. how many times do you really sign out of gmail verse just close the browser?!)
       
      What's the best way to catch the session timeout event that is specified during "Create Session" VI?  It appears as if I need a "LV Web Service Request" object to even call the Destroy Session VI, which makes matters much more difficult it seems. Is this even possible?
       
      Thanks for the help
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.