Cat Posted September 4, 2019 Report Share Posted September 4, 2019 (edited) Hi all! Long time, no talk to.  🙂 I'm supposedly retired, but then decided to go over to the Dark Side and become a contractor. We'll see how long that lasts... Current issue: A C developer and I are sending data via a MS loopback adapter between a C app and a LV app on the same machine (Windows 10). Past iterations have worked great (after that little bug in LV11 was fixed). The new system, however, needs a much higher continuous data thruput -- somewhere in the neighborhood of 900 MB/s. The loopback adapter is topping out somewhere around 500MB/s. If we switch from internal TCP to connecting two 10G NICs in the same box, the data gets passed with no problem. So it's not the code, and we know how to configure hardware NICs for large data thruput. We've tried everything the web seems to offer, including a lot of things that have been supposedly deprecated. We've tried using localhost, 127.0.0.1, a 192.68 address, and various other IP addy options. If you google "how to speed up a loopback adapter" or any variant of that, we've probably tried it. Does anyone have experience with tweaking a loopback adapter and have any suggestions of things to try? If it matters, the computer is a screaming fast dual Xeon with 128 GB of memory. Cat  Edited September 4, 2019 by Cat Quote Link to comment
ShaunR Posted September 5, 2019 Report Share Posted September 5, 2019 (edited) 14 hours ago, Cat said: We've tried everything the web seems to offer, including a lot of things that have been supposedly deprecated. Well. If you've turned off all the M$ malarky (like QOS and Nagle) and tried the fast path then you are CPU bound. There isn't really a lot to tweak. Edited September 5, 2019 by ShaunR Quote Link to comment
Mads Posted September 5, 2019 Report Share Posted September 5, 2019 What is the role of the loopback adapter in this case? Do you need it to monitor the traffic through Wireshark for example? Or is the machine without a single physical network adapter so you have the loopback installed just to get access to networking? Or is it to handle a routing issue? Otherwise the link could be fully local, with all the shortcuts that allows the network driver to take.  Quote Link to comment
Gribo Posted September 5, 2019 Report Share Posted September 5, 2019 (edited) Iperf3 gets ~95% of the theoretical speed of a 1G Ethernet easily and reliably on Windows, without being CPU bound. It requires multiple threads to get there, A single thread will be limited by the scheduler. It doesn't matter much whether it is TCP or UDP. Edited September 5, 2019 by Gribo 1 Quote Link to comment
Cat Posted September 6, 2019 Author Report Share Posted September 6, 2019 22 hours ago, ShaunR said: Well. If you've turned off all the M$ malarky (like QOS and Nagle) and tried the fast path then you are CPU bound. There isn't really a lot to tweak. We've set the loopback adapter the same way we've set up hardware NICs (at least as much as is applicable). I'll confirm that QOS and Nagle specifically have been dealt with. The C dev tried fastpath (even tho it was supposedly deprecated a while back) but that didn't help any. The computer has a very fast CPU. Two of them, actually. I guess the implication is that the hardware NICs are doing something the CPUs can't. Quote Link to comment
Cat Posted September 6, 2019 Author Report Share Posted September 6, 2019 19 hours ago, Mads said: What is the role of the loopback adapter in this case? Do you need it to monitor the traffic through Wireshark for example? Or is the machine without a single physical network adapter so you have the loopback installed just to get access to networking? Or is it to handle a routing issue? Otherwise the link could be fully local, with all the shortcuts that allows the network driver to take.  The point of the loopback adapter is to use TCP to communicates to pass data between two different executables ( one C and one LV) on the same machine. The machine has 2- 1G and 4- 10G physical network adapters. Can you explain more what you mean by "link could be fully local"? Quote Link to comment
Cat Posted September 6, 2019 Author Report Share Posted September 6, 2019 16 hours ago, Gribo said: Iperf3 gets ~95% of the theoretical speed of a 1G Ethernet easily and reliably on Windows, without being CPU bound. It requires multiple threads to get there, A single thread will be limited by the scheduler. It doesn't matter much whether it is TCP or UDP. We could try running iperf, just to confirm it's the loopback adapter and not the code. But as I said, we've run the same code with 2 hardware NICs (10G) connected on the same computer and it works fine. 1 Quote Link to comment
Mads Posted September 6, 2019 Report Share Posted September 6, 2019 (edited) 5 hours ago, Cat said: The point of the loopback adapter is to use TCP to communicates to pass data between two different executables ( one C and one LV) on the same machine. The machine has 2- 1G and 4- 10G physical network adapters. Can you explain more what you mean by "link could be fully local"? Passing data between executables on the same machine that happens to have the TCP stack loaded because it has a network interface anyway does normally not require a loopback adapter (unless any of the requirements I listed are in effect). If this was a serial link, then sure - you would need a physical or virtual null modem installed. The local TCP traffic never passes through any adapter anyway.  As described in the first sentence here (where the need for loopback is in place because they want to capture the truly local traffic): https://wiki.wireshark.org/CaptureSetup/Loopback You can fire up the client-server examples in LabVIEW and run those with localhost, as long as the machine happens to have a single NIC installed. Any client-server will be able to do that. That's why I was wondering what's different here.  Edited September 6, 2019 by Mads Quote Link to comment
ShaunR Posted September 6, 2019 Report Share Posted September 6, 2019 5 hours ago, Cat said: We've set the loopback adapter the same way we've set up hardware NICs (at least as much as is applicable). I'll confirm that QOS and Nagle specifically have been dealt with. The C dev tried fastpath (even tho it was supposedly deprecated a while back) but that didn't help any. The computer has a very fast CPU. Two of them, actually. I guess the implication is that the hardware NICs are doing something the CPUs can't. NetDMA was removed [depricated] in Windows 8. NetDMA would definately make a difference but probably not as much as you need (up to about 800MBs) and it requires BIOS support.  Quote Link to comment
Phillip Brooks Posted September 18, 2019 Report Share Posted September 18, 2019 On 9/5/2019 at 11:41 PM, Cat said: We could try running iperf, just to confirm it's the loopback adapter and not the code. But as I said, we've run the same code with 2 hardware NICs (10G) connected on the same computer and it works fine. Many higher-end NICs offer a fetaure called a TCP Offload Engine. These NICs perform much of the processing that takes place on the CPU.  That may be the difference here.  https://en.wikipedia.org/wiki/TCP_offload_engine Quote Link to comment
Cat Posted September 18, 2019 Author Report Share Posted September 18, 2019 (edited) We "upgraded" to Windows 10 version 1903 (from 1803 or 1809, I don't remember) and iperf3 went from ~450 MB/s to ~2 GB/s. Yay! Unfortunately, 1) the IA gods have not deemed 1903 worthy so we're not supposed to even have it installed, 2) there is question of whether this a fix, or something that will disappear in the next version, and 3) ironically, 1903 is causing issues with various types of hardware NICs (not a problem for us -- so far). But, for the moment, we've got something that works. Thanks to all for responses, and thanks to Gribo for suggesting iperf3. It's made it a lot easier to test network throughput. Cat Edited September 18, 2019 by Cat Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.