I have a question about high level system design with FPGA-RT-PC. It would be great if I can get some advice about ideal approaches to move data between the 3 components in an efficient manner. There are several steps; DMA FIFO from FPGA to RT, processing the data stream in the RT to derive chunks of useful information, parsing these chunks into complete sets on the RT and sending these sets up to the Host.
In my system, I have the FPGA monitoring a channel of a digitiser and deriving several data streams from events that occur (wave, filtered data, parameters etc). When an event occurs the data streams are sent to the RT through a DMA FIFO in U64 chunks. Importantly, events can be variable length. To overcome this, I reunite the data by inserting unique identifiers and special characters (sets of 0's) into the data streams which I later search for on the RT.
Because the FPGA is so fast, I might fill the DMA FIFO buffer rapidly, so I want to poll the FIFO frequently and deterministically. I use a timed loop on the RT to poll the FIFO and dump the data as U64's straight into a FIFO on the RT. The RT FIFO is much larger than the DMA FIFO, so I don't need to poll it as regularly before it fills.
The RT FIFO is polled and parsed by parallel loop on the RT that empties the RT FIFO and dumps into a variable sized array. The parsing of the array then happens by looking for special characters element wise. A list of special character indices is then passed to a loop which chops out the relevant chunk and, using the UID therein, writes them to a TDMS file.
Another parallel loop then looks at the TDMS group names and when an event has an item relating to each of the data streams (i.e. all the data for the event has been received), a cluster is made for the event and it is sent to the host over a network stream. This UID is then marked as completed.
The aim of the system is to be fast enough that I do not fill any data buffers. This means I need to carefully avoid bottle necks. But I worry that the parsing step, with a dynamically assigned memory operation on a potentially large memory object, an element wise search and delete operation (another dynamic memory operation) may become slow. But I can't think of a better way to arrange my system or handle the data. Does anyone have any ideas?
PS I would really like to send the data streams to the RT in a unified manner straight from the RT, by creating a custom data typed DMA FIFO. But this is not possible for DMA FIFOs, even though it is for target-scoped FIFOs!