FPGA FIFO performance: width vs. depth

codcoder · January 18, 2023

Hi,

So I have question about the inner working of the host to target FIFO for a setup with a Windows PC and a PXIe-7820R (if the specific hardware is important). But it really isn't so much of question as me trying to understand something.

My setup: I transfer data from host to the target (the FPGA module 7820). First I simply configured my FIFO to use U8 as datatype and read one element at a time on the FPGA target. It worked but when I increased the amount of data I ran into a performance issue.

In order to increase the throughput I both increased the width of the FIFO, packing four bytes into one U32, and also reconfigured the target to read two elements at a time.

This worked, so there really isn't any issue here that needs to be resolved.

But afterwards I thought occurd: would I have achieved the same result if I kept the width U8 but read eight elements at a time on the FPGA? Since 4*2 and 1*8 both are 8, would I have achieved the same throughput? Or is it better to read fewer but longer integers (and then splitting them up into U8's)?

I've read NI's white paper but it doesn't cover this specific subject.

Thanks for any thoughts given on the topic! 😊

jacobson · January 18, 2023

On the FPGA side, reading 2 U32s or 8 U8s shouldn't make a difference from a throughput sense. Some old info I found internally basically said that if they don't have the same throughput it's a bug.

I also don't think the DMA throughput should be effected. If I remember correctly, the DMA engine will try to send multiple data items up at the same time to minimize the overhead of PCIe packet headers.

Rolf Kalbermatter · January 20, 2023

Basically the same as Jacobson said. The DMA FIFOs internally are 64-bit aligned. If you try to push data through it that doesn't fit into the 64 bits (8 * 8 bit, 4 * 16 bit, 2 * 32 bit or 1 * 64 bit) then the FPGA will actually force alignment by stuffing extra filler bytes into the DMA channel. In that case you would loose some of the throughput as there is extra data transferred that simply is discarded on the other side. That loss is however typically very low. The worst case would be if you try to push 5 byte data elements (clusters of 5 bytes for instance) through the channel. Then you would waste 3/8 of the DMA bandwidth.

The performance on the FPGA side should not change at all purely from different data sizes. What could somewhat change is the usage of FPGA resources as binary bit data is stuffed, shifted, packed/unpacked and otherwise manipulated to push into or pull from the DMA interface logic.

The performance on the realtime side could change however as more complex packing/unpacking will incur some extra CPU consumption.

Edited January 20, 2023 by Rolf Kalbermatter

Sign In

FPGA FIFO performance: width vs. depth

Recommended Posts

codcoder

jacobson

Rolf Kalbermatter

Join the conversation

Similar Content

How and where to change the number to compile at the same time in LabVIEW FPGA 19.0?

How to synchronize a FlexRIO device with a DAQ inside a PXIe chassis?

properties/methods for a serial port in FPGA: how to assign the port?

NI drags its feet... 1 2

Best circuit board for LabView

Browse

Activity

Important Information