FPGA DVR TDMS Log

AlexA · June 21, 2014

Ok,

I've spent a couple of hours on this so far and haven't had all that much success. I configured it in two ways. Firstly, I would message the DVR reference to my central handler which was keeping track of a timer value. If the value was greater than 40 ms (frame rate of 25 Hz), then the DVR was pulled from the message and the data copied out from the DVR and sent to the UI for display. The DVR is then deleted (If the timer wasn't greater than 40 ms, the DVR was deleted). Alternatively, the user can select to stream to disk, in which case, the same copy to UI is made, but instead of deleting the DVR, it is sent on to a File IO engine which uses the Asynch TDMS VIs to write to disk.

With this configuration, I noticed a significant drop in CPU use compared to just messaging the data around, something on the order of 15 to 20%, but the whole processing chain was not able to keep up with images at a rate of 100 FPS, the buffer would slowly fill up then images would become disjointed as the data started to time out. I did not observe this phenomenon when messaging the actual data around (with similar UI update restrictions).

The second configuration I tried was opening the asynch .tdms file in the same VI which is performing the FIFO read. This time the asynch write was performed in the same loop as the FIFO read, and the data for the user was copied out with similar timing restrictions as above, but the timer was done in the form of the timed loop which reads the FIFO.

The second configuration had even worse performance, no CPU utilisation improvements and the buffer filled up even faster. I definitely haven't explored all the facts at play here, and I'm most likely doing something stupid, but with my naive investigation so far, it would seem that the DVR release notification mechanism is kind of slow...

I tested the case where the code did nothing except obtain the DVR ref and then delete again straight away, no UI updates or anything, and the buffer still slowly filled up at 100 FPS.

Anyone got any insight?

Cheers,

Alex

AlexA · June 21, 2014

I did not observe this phenomenon when messaging the actual data around (with similar UI update restrictions).

Ok, I had to go back and check this because it didn't sound right. It turns out, even messaging the data around results in this buffer phenomenon. I guess I never saw it because I wasn't trying to push the code this hard. Hmmmm. In that case, it would seem that messaging the DVR to an FIO process does give some advantages.

Neil Pate · June 21, 2014

How big are the images you are writing to disk? I would have thought that the bottleneck in most systems would be the speed of physically writing to disk, not the passing around of the data in the software. Modern CPUs/RAM can shovel ridiculous quantities of data around if the software is architectured properly.

ShaunR · June 21, 2014

Ok, I had to go back and check this because it didn't sound right. It turns out, even messaging the data around results in this buffer phenomenon. I guess I never saw it because I wasn't trying to push the code this hard. Hmmmm. In that case, it would seem that messaging the DVR to an FIO process does give some advantages.

I was not aware of this function either (still using 2009 whenever I can ),

How big are your images?

This is how I would approach it. It is the way I have always, with high speed acquisition and have never found a better way even with all the new fangled stuff. The hardware gets faster, but the software gets slower

Once you have grabbed the data, immediately delete the DVR. The output of the Delete DVR primitive will give you the data and the sub process will be able to go on to acquire the next without waiting. The data from the Delete DVR you copy/wire into a Global Variable (ooooh, shock horror) which is your application buffer that your file and UI can just read when they need to. This is the old fashioned "Global Variable Data Pool" and is the most efficient method (in LabVIEW) of sharing data between multiple process and perfectly safe from race conditions AS LONG AS THERE IS ONLY ONE WRITER. You may need a small message (Acquired-I would suggest the error cluster as the contents) just to tell anyone that wants to know that new data has arrived (mainly for your file process. Your UI can just Poll the Global every N ms).

The process here is that you only have one, deterministic, data copy that affects the acquisition (time to use those Preferred Execution Systems ; ) ) and you have the THE most efficient method of sharing the data (bar none) but - and this is a BIG but - your TDMS writing has to be faster than your acquisition otherwise you will lose frames in the file.You will never run out of memory,or get performance degradation because of buffers filling up, though, and you can mitigate data loss a bit by again buffering the data in a queue (the TDMS write side, not the acquisition) if you know the consumer will eventually catch up or you want to save bigger chunks than are being acquired. However, if the real issue is that your producer is faster than your consumer; that is always a losing hand and if it's a choice between memory meltdown or losing frames, the latter wins every time unless you are prepared to throw hardware at it...

I've used the above technique to stream data using TDMS at over 400MB/sec on a PXI rack without losses (I didn't get to use the latest PXI chassis at the time that could theoretically do more than 700MB/sec ).. The main software bottle-neck was event message flooding (next was memory throughput, but you have no control over that) and the only way you can mitigate it is by increasing the amount you acquire in one go (reduce the message rate) which looks much, much easier with this function.

AlexA · June 22, 2014

How big are the images you are writing to disk? I would have thought that the bottleneck in most systems would be the speed of physically writing to disk, not the passing around of the data in the software. Modern CPUs/RAM can shovel ridiculous quantities of data around if the software is architectured properly.

Each image is 1MB, running at a rate of 100 FPS kind of chokes things.

The frame grabber is really a PXI FPGA connected to the PC over a 250MB/s MXI->PCIe x1 connection.

I'm now starting to get curious about how messaging works, does it wrap the data in some form of pointer? Or is there an explicit copy when you load a message with data?

@ShaunR

I've benchmarked the tdms writes before. I'm using an SSD, so they average about 4ms with periodic spikes to 10-12ms. As per thread title, the images are actually written as a U8 array straight into tdms.

AlexA · June 22, 2014

Hi Shaun,

I'm kind of confused by the architecture you're suggesting. Say the data I'm working with is an array of U8 values (pixels). Would you suggest making the global an array of the same size? Or a larger array and wrap it in a FGV?

ShaunR · June 23, 2014

Hi Shaun,

Would you suggest making the global an array of the same size?

Yes.

JamesMc86 · June 23, 2014

The process here is that you only have one, deterministic, data copy that affects the acquisition

This method may work well for you but just note a global variable is not deterministic, from LabVIEW help:

Use global variables to access and pass small amounts of data between VIs, such as from a time-critical VI to a lower priority VI. Global variables can share data smaller than 32-bits, such as scalar data, between VIs deterministically. However, global variables of larger data types are shared resources that you must use carefully in a time-critical VI. If you use a global variable of a data type larger than 32-bits to pass data out of a time-critical VI, you must ensure that a lower priority VI reads the data before the time-critical VI attempts to write to the global again.

AlexA · June 23, 2014

Yeah, I benchmarked the global approach. I had one writer and two readers. I tried it via deletion of DVR, and with the normal read FIFO method associated with FPGAs.

In both cases, the performance (judging by CPU utilisation) averaged about 10 % worse. It would appear from my very crude tests that DVR is faster.

ShaunR · June 23, 2014

In both cases, the performance (judging by CPU utilisation) averaged about 10 % worse. It would appear from my very crude tests that DVR is faster.

A surprising result, although I am suspicious of you equating CPU utilisation with throughput performnance

AlexA · June 24, 2014

Yeah, I'm not really advocating it. It's just the readiest thing I had on hand to do some crude profiling.

Michael Aivaliotis · June 27, 2014

The DVR is useful for eliminating copies of data. It represents a pointer to the data. So you are not passing the data around. You're passing a pointer to the data. It sounds like you are creating the DVR with every new image and then deleting it after read. If this is the case then there's probably some overhead associated with that process. It might be better to just update the data in the DVR with new data using the IPE structure. However, then you need to figure out how to notify the recipient that the data is new of course.

I don't understand your application completely so I'm not sure where the bottleneck is. From my experience, queues are pretty fast for messaging - have you tried that?. But perhaps you can't use queues in your setup. I'm also concerned by the 40ms logic on the recipient. How does the 25Hz relate to the 100FPS? Can you clarify that?

Sign In

FPGA DVR TDMS Log

Recommended Posts

AlexA

AlexA

Neil Pate

ShaunR

AlexA

AlexA

ShaunR

JamesMc86

AlexA

ShaunR

AlexA

Michael Aivaliotis

Join the conversation

Browse

Activity

Important Information