Jump to content

Speed up loopback adapter


Recommended Posts

Posted (edited)

Hi all!  Long time, no talk to.  🙂

I'm supposedly retired, but then decided to go over to the Dark Side and become a contractor.  We'll see how long that lasts...

Current issue:

A C developer and I are sending data via a MS loopback adapter between a C app and a LV app on the same machine (Windows 10).  Past iterations have worked great (after that little bug in LV11 was fixed).  The new system, however, needs a much higher continuous data thruput -- somewhere in the neighborhood of 900 MB/s.  The loopback adapter is topping out somewhere around 500MB/s. 

If we switch from internal TCP to connecting two 10G NICs in the same box, the data gets passed with no problem.  So it's not the code, and we know how to configure hardware NICs for large data thruput.

We've tried everything the web seems to offer, including a lot of things that have been supposedly deprecated.  We've tried using localhost, 127.0.0.1, a 192.68 address, and various other IP addy options. If you google "how to speed up a loopback adapter" or any variant of that, we've probably tried it.

Does anyone have experience with tweaking a loopback adapter and have any suggestions of things to try?  If it matters, the computer is a screaming fast dual Xeon with 128 GB of memory.

Cat

 

Edited by Cat
Posted (edited)
14 hours ago, Cat said:

We've tried everything the web seems to offer, including a lot of things that have been supposedly deprecated.

Well. If you've turned off all the M$ malarky (like QOS and Nagle) and tried the fast path then you are CPU bound. There isn't really a lot to tweak.

Edited by ShaunR
Posted

What is the role of the loopback adapter in this case? Do you need it to monitor the traffic through Wireshark for example? Or is the machine without a single physical network adapter so you have the loopback installed just to get access to networking? Or is it to handle a routing issue?

Otherwise the link could be fully local, with all the shortcuts that allows the network driver to take.

 

Posted (edited)

Iperf3 gets ~95% of the theoretical speed of a 1G Ethernet easily and reliably on Windows, without being CPU bound. It requires multiple threads to get there, A single thread will be limited by the scheduler. It doesn't matter much whether it is TCP or UDP. 

Edited by Gribo
  • Like 1
Posted
22 hours ago, ShaunR said:

Well. If you've turned off all the M$ malarky (like QOS and Nagle) and tried the fast path then you are CPU bound. There isn't really a lot to tweak.

We've set the loopback adapter the same way we've set up hardware NICs (at least as much as is applicable).  I'll confirm that QOS and Nagle specifically have been dealt with.  The C dev tried fastpath (even tho it was supposedly deprecated a while back) but that didn't help any.

The computer has a very fast CPU.  Two of them, actually.  I guess the implication is that the hardware NICs are doing something the CPUs can't.

Posted
19 hours ago, Mads said:

What is the role of the loopback adapter in this case? Do you need it to monitor the traffic through Wireshark for example? Or is the machine without a single physical network adapter so you have the loopback installed just to get access to networking? Or is it to handle a routing issue?

Otherwise the link could be fully local, with all the shortcuts that allows the network driver to take.

 

The point of the loopback adapter is to use TCP to communicates to pass data between two different executables ( one C and one LV) on the same machine.  The machine has 2- 1G and 4- 10G physical network adapters.  Can you explain more what you mean by "link could be fully local"?

Posted
16 hours ago, Gribo said:

Iperf3 gets ~95% of the theoretical speed of a 1G Ethernet easily and reliably on Windows, without being CPU bound. It requires multiple threads to get there, A single thread will be limited by the scheduler. It doesn't matter much whether it is TCP or UDP. 

We could try running iperf, just to confirm it's the loopback adapter and not the code.  But as I said, we've run the same code with 2 hardware NICs (10G) connected on the same computer and it works fine.

  • Like 1
Posted (edited)
5 hours ago, Cat said:

The point of the loopback adapter is to use TCP to communicates to pass data between two different executables ( one C and one LV) on the same machine.  The machine has 2- 1G and 4- 10G physical network adapters.  Can you explain more what you mean by "link could be fully local"?

Passing data between executables on the same machine that happens to have the TCP stack loaded because it has a network interface anyway does normally not require a loopback adapter (unless any of the requirements I listed are in effect). If this was a serial link, then sure - you would need a physical or virtual null modem installed. 

The local TCP traffic never passes through any adapter anyway.

 

As described in the first sentence here (where the need for loopback is in place because they want to capture the truly local traffic):
https://wiki.wireshark.org/CaptureSetup/Loopback

You can fire up the client-server examples in LabVIEW and run those with localhost, as long as the machine happens to have a single NIC installed. Any client-server will be able to do that. That's why I was wondering what's different here.

 

Edited by Mads
Posted
5 hours ago, Cat said:

We've set the loopback adapter the same way we've set up hardware NICs (at least as much as is applicable).  I'll confirm that QOS and Nagle specifically have been dealt with.  The C dev tried fastpath (even tho it was supposedly deprecated a while back) but that didn't help any.

The computer has a very fast CPU.  Two of them, actually.  I guess the implication is that the hardware NICs are doing something the CPUs can't.

NetDMA was removed [depricated] in Windows 8. NetDMA would definately make a difference but probably not as much as you need (up to about 800MBs) and it requires BIOS support.

 

  • 2 weeks later...
Posted
On 9/5/2019 at 11:41 PM, Cat said:

We could try running iperf, just to confirm it's the loopback adapter and not the code.  But as I said, we've run the same code with 2 hardware NICs (10G) connected on the same computer and it works fine.

Many higher-end NICs offer a fetaure called a TCP Offload Engine. These NICs perform much of the processing that takes place on the CPU.

 

That may be the difference here.

 

https://en.wikipedia.org/wiki/TCP_offload_engine

Posted (edited)

We "upgraded" to  Windows 10 version 1903 (from 1803 or 1809, I don't remember) and iperf3 went from ~450 MB/s to ~2 GB/s.  Yay! 

Unfortunately, 1) the IA gods have not deemed 1903 worthy so we're not supposed to even have it installed, 2) there is question of whether this a fix, or something that will disappear in the next version, and 3) ironically, 1903 is causing issues with various types of hardware NICs (not a problem for us -- so far).

But, for the moment, we've got something that works. 

Thanks to all for responses, and thanks to Gribo for suggesting iperf3.  It's made it a lot easier to test network throughput.

Cat

Edited by Cat

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.