Jump to content

FPGA FIFO performance: width vs. depth


Recommended Posts


So I have question about the inner working of the host to target FIFO for a setup with a Windows PC and a PXIe-7820R (if the specific hardware is important). But it really isn't so much of question as me trying to understand something. 

My setup: I transfer data from host to the target (the FPGA module 7820). First I simply configured my FIFO to use U8 as datatype and read one element at a time on the FPGA target. It worked but when I increased the amount of data I ran into a performance issue.

In order to increase the throughput I both increased the width of the FIFO, packing four bytes into one U32, and also reconfigured the target to read two elements at a time.

This worked, so there really isn't any issue here that needs to be resolved. 

But afterwards I thought occurd: would I have achieved the same result if I kept the width U8 but read eight elements at a time on the FPGA? Since 4*2 and 1*8 both are 8, would I have achieved the same throughput? Or is it better to read fewer but longer integers (and then splitting them up into U8's)?

I've read NI's white paper but it doesn't cover this specific subject.

Thanks for any thoughts given on the topic! 😊


Link to comment

On the FPGA side, reading 2 U32s or 8 U8s shouldn't make a difference from a throughput sense. Some old info I found internally basically said that if they don't have the same throughput it's a bug.

I also don't think the DMA throughput should be effected. If I remember correctly, the DMA engine will try to send multiple data items up at the same time to minimize the overhead of PCIe packet headers.

  • Like 1
Link to comment

Basically the same as Jacobson said. The DMA FIFOs internally are 64-bit aligned. If you try to push data through it that doesn't fit into the 64 bits (8 * 8 bit, 4 * 16 bit, 2 * 32 bit or 1 * 64 bit) then the FPGA will actually force alignment by stuffing extra filler bytes into the DMA channel. In that case you would loose some of the throughput as there is extra data transferred that simply is discarded on the other side. That loss is however typically very low. The worst case would be if you try to push 5 byte data elements (clusters of 5 bytes for instance) through the channel. Then you would waste 3/8 of the DMA bandwidth.

The performance on the FPGA side should not change at all purely from different data sizes. What could somewhat change is the usage of FPGA resources as binary bit data is stuffed, shifted, packed/unpacked and otherwise manipulated to push into or pull from the DMA interface logic.

The performance on the realtime side could change however as more complex packing/unpacking will incur some extra CPU consumption.

Edited by Rolf Kalbermatter
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...

Important Information

By using this site, you agree to our Terms of Use.