Jump to content

Fast Fourier Transform


Dirk J.

Recommended Posts

I need really fast Fourier transforms (*). Simple 'millisecond timer value' benchmarking shows an approximate calculation time of 200 us (microsecond) for an 2048 double array on my system, which I would like to be about 10x faster. Has anyone here experience with using external libraries in Labview, such as FFTW or Intel Math Kernel / Intel Performance Libraries? How do they compare to LV's FFT routines?

The benchmarking was done on a 1.7 GHz Pentium M, 1GB Dell Inspiron 510m laptop; LV8.2, winXPsp2.

(*) For those of you interested in the application: its Optical Coherence Tomography, an optical medical imaging technique analogous to ultrasound, you know, with the babies. But we image smaller structures, typically . My raw image data is typically an 2D array (2048 x 600) in which I need to take the 600 FFTs of the columns to get my final image; 2048 being the # of pixels of a line scan camera. The hardware is capable of doing ~20 images/second; the software isn't.

Link to comment
I need really fast Fourier transforms (*). Simple 'millisecond timer value' benchmarking shows an approximate calculation time of 200 us (microsecond) for an 2048 double array on my system.

My Dell D820 laptop with intel Core 2 Duo T7200 & 4GB memory does the same thing in average about 30 us if I run the real FFT in four parallel loops under LV 8.20. So if the FFT is the only task running, my laptop would analyze 55 of your images per second. As T7200 is not even the fastest processor available, you should achieve your goal by simply buying a new better computer. If you need a laptop T7600 is the best available processor for the task. If you can use workstation intel core QX6700 is the best for the job. With best I mean best performing.

Tomi

Link to comment

Thanx Tomi, and I thought the 'buy new hardware' road wouldn't get me there :-)

What do you mean by 4 par. loops, splitting the task (in the example case) up in 4 loops running 150 FFT's each, in one VI?

Laptop is not strictly nessecary, but since we would like to take our system into the clinic, it should be as compact as possible.

55 of your images per second.

wow. that approaches screen refresh rates.

Link to comment
I need really fast Fourier transforms (*). Simple 'millisecond timer value' benchmarking shows an approximate calculation time of 200 us (microsecond) for an 2048 double array on my system, which I would like to be about 10x faster. Has anyone here experience with using external libraries in Labview, such as FFTW or Intel Math Kernel / Intel Performance Libraries? How do they compare to LV's FFT routines?

The benchmarking was done on a 1.7 GHz Pentium M, 1GB Dell Inspiron 510m laptop; LV8.2, winXPsp2.

(*) For those of you interested in the application: its Optical Coherence Tomography, an optical medical imaging technique analogous to ultrasound, you know, with the babies. But we image smaller structures, typically . My raw image data is typically an 2D array (2048 x 600) in which I need to take the 600 FFTs of the columns to get my final image; 2048 being the # of pixels of a line scan camera. The hardware is capable of doing ~20 images/second; the software isn't.

Just a few comments:

1 Which version of LV are you using? I think Version 7.1 or so there was a major re-write of the math functions.

2 Make sure you don't have any coercion dots (data type conversion) into and out of your FFT routines. This will slow things down.

3 For best speed even with newer hardware, desktop version of the processor is faster than laptop. I have a Dell Inspiron 9400 laptop, and a Dell Dimension 9200 desktop; the desktop is much faster than the laptop.

4 A built application (exe) will give you slightly better speed than the code.

Neville.

Link to comment

1 Which version of LV are you using? I think Version 7.1 or so there was a major re-write of the math functions.

--- 8.20 (faster than before, I agree)

2 Make sure you don't have any coercion dots (data type conversion) into and out of your FFT routines. This will slow things down.

--- Has been taken care of.

3 For best speed even with newer hardware, desktop version of the processor is faster than laptop. I have a Dell Inspiron 9400 laptop, and a Dell Dimension 9200 desktop; the desktop is much faster than the laptop.

--- Laptop is not a strict requirement.

4 A built application (exe) will give you slightly better speed than the code.

--- Can you quantify that (approximate percentage or so)? AppBuilder has been ruled 'too expensive' up until now...

-thanks

Link to comment
4 A built application (exe) will give you slightly better speed than the code.

--- Can you quantify that (approximate percentage or so)? AppBuilder has been ruled 'too expensive' up until now...

-thanks

It depends on the size/complexity of your application, but when building the exe, the diagrams are removed, "debug" mode on the VI's is disabled, and only the front panels of top-level VI's are saved. This results in a smaller memory footprint and a slightly better speed... I would hazard a guess of about 5% improvement.

By the way, the app builder is part of the Pro versions of LV, which also throw in a lot of (mostly useless) toolkits. It might make sense to upgrade to a Pro version instead of spending an additional $1000 or so for the app builder.

Try disabling debug mode on your VI's, saving all them and running to see what sort of speed increase you obtain.

Make sure you are not using any property nodes in the fft loop part of the code. Property nodes cause a thread-switch to the UI thread, and slows things down.

Also avoid any graphical updates until all the calculations are done (the "defer panel updates" property is pretty useful here).

You could also play around with multi-threading, priorities, re-entrancy, and code parallelism to speed things up further, but that is a topic for another day!

PS. Posting code is a good way to generate a higher quality discussion.

Neville.

Link to comment

Try disabling debug mode on your VI's, saving all them and running to see what sort of speed increase you obtain.

-- did that.

Make sure you are not using any property nodes

-- none there.

You could also play around with multi-threading, priorities, re-entrancy, and code parallelism to speed things up further, but that is a topic for another day!

-- have been doing that, but not to full extent I guess.

PS. Posting code is a good way to generate a higher quality discussion.

-- I know, but so far testing has been pretty simple, see image of bd. At this point, I don't have any experience with using external FFT code, so nothing to share yet...

post-3523-1170810096.jpg?width=400

Link to comment

Hello,

I've read some articles on the net where the FFTs are calculated on a ATI Graphical Processor(s). The performance seems to be incredible. I'm running on NVIDIA so I didn't test it...

Here is a link that gives an idea what is possible : HW Image Processing

I guess you have to develop C-code to use this in LabVIEW and I have no idea if it is faster than the NI FFTs algorithms (which are very efficient).

Best Regards,

Donald

Link to comment
Has anyone here experience with using external libraries in Labview, such as FFTW or Intel Math Kernel / Intel Performance Libraries? How do they compare to LV's FFT routines?

Would you like to download evaluation version of Math Kernel and test the performance. I'd be interested in the results, especially if you have a Core 2 duo processor available somewhere. It should be failry easy to integrate the library with LabvIEW. We can help you with the integarion issues.

Tomi

Link to comment
-- I know, but so far testing has been pretty simple, see image of bd.

I believe you are not just timing the FFT, but also the build array that is implicit with indexing the output of the for loop.

If you want to just find out how fast the FFT is, turn off auto-indexing on the output. I expect you'll see a little bit of difference.

I'm assuming you do need the 2-D output in your application. In that case, try preallocating an array and use a shift register and replace subset instead of the autoindexed output.

Gary

Link to comment

I took a look at the built-in FFT method shipping with LabVIEW. It opens a VI reference to itself and never uses the reference for anything. Nor does it close the reference properly. Any idea why FFT opens the reference to itself? Is the reference used somehow directly from C code to increase performance?

Tomi

Link to comment
Any idea why FFT opens the reference to itself? Is the reference used somehow directly from C code to increase performance?

I assume it works in a "first call" kind of way, to initialize the 'tbl' input of the FFT library node. There are 2 mysterious inputs to the call: this 'tbl' and an 'org' parameter, both integers.

I believe you are not just timing the FFT, but also the build array that is implicit with indexing the output of the for loop [...] try preallocating an array and use a shift register and replace subset instead of the autoindexed output.

Preallocating an array increases the speed of the above example with about 20% !!

Link to comment
Would you like to download evaluation version of Math Kernel and test the performance. I'd be interested in the results, especially if you have a Core 2 duo processor available somewhere. It should be failry easy to integrate the library with LabvIEW. We can help you with the integarion issues.

Tomi

According to rolfk, LabVIEW does use Math Kernel functionality.

Link to comment
  • 1 month later...

QUOTE(Dirk J. @ Feb 6 2007, 05:52 PM)

I need really fast Fourier transforms (*). Simple 'millisecond timer value' benchmarking shows an approximate calculation time of 200 us (microsecond) for an 2048 double array on my system, which I would like to be about 10x faster. Has anyone here experience with using external libraries in Labview, such as FFTW or Intel Math Kernel / Intel Performance Libraries? How do they compare to LV's FFT routines?

The benchmarking was done on a 1.7 GHz Pentium M, 1GB Dell Inspiron 510m laptop; LV8.2, winXPsp2.

(*) For those of you interested in the application: its Optical Coherence Tomography, an optical medical imaging technique analogous to ultrasound, you know, with the babies. But we image smaller structures, typically . My raw image data is typically an 2D array (2048 x 600) in which I need to take the 600 FFTs of the columns to get my final image; 2048 being the # of pixels of a line scan camera. The hardware is capable of doing ~20 images/second; the software isn't.

Hi Dirk,

Did you ever manage to solve this problem? I'm looking to do something similar with FFTW.

Hematose

Link to comment

Yes, I got it to work.

I'll post example code sometime this week. It may take a couple of days though, I'm quite busy at the moment (implementing the FFTW in my app :-) and it really needs some documentation because it is not 100% straightforward from the wiring what's going on. In the mean time, if you're really interested, I'd suggest you read the whole FFTW documentation from the site, not just chapter 2, because that will clarify a lot of what's going on in the code.

Link to comment
  • 1 month later...

Better late then never...

I'm attaching a project showing my LV implementation of the Fastest Fourier Transform of the West.

The example implements a complex-to-complex transform as outlined in the fftw manual (which is also included).

The example VI is documented; the text is reproduced here.

In FFTW, you create a "FFTW plan" (a pointer to a structure containing all relevant information) by calling an initialization routine.

In the present implementation, the planner takes the FFT size, and a planning method as inputs, see the Block Diagram for details.

The plan is optimized for /your/ hardware configuration. Depending on the method you choose, this can take quite some time so you typically use this 'outside the loop'.

You can then execute the plan as many times as you like on different arrays by calling the execution routine inside the loop. Afterwards, clean up by calling the

destroy routine.

This example shows a 1-dimensional transform. 2D, 3D or actually any-D transforms are also possible by calling different initialization routines; see the FFTW

documentation for more details.

It is also possible to perform FFT's along for example only 1 dimension in a 2D array, which was my origional problem.

You accomplish this by calling yet another initialization routine (for this example, you would configure the CLF node to call fftw_plan_many_dft with the "howmany"

parameter set to 10. The input array would then be a size 20480 1D array, and you wouldn't need a FOR-loop). I'm still working on this one. The way I have it

working needs to reshape the arrays each time which proved to be slower than calling the 2048 array FFT 10 times.

The planner routine has a direction parameter; a value of -1 gives the forward FFT; a value of 1 the iFFT

Of course you are not limited to to complex-to-complex transforms; real-to-complex; real-to-real, ... etc is also possible, again by calling a different planner library

function. See the FFTW manual for the details.

Link to comment

QUOTE(Dirk J. @ May 1 2007, 06:17 PM)

The input array would then be a size 20480 1D array, and you wouldn't need a FOR-loop). I'm still working on this one. The way I have it

working needs to reshape the arrays each time which proved to be slower than calling the 2048 array FFT 10 times.

Thanks Dirk,

how much faster is it on your hardware?

And I don't get your problem (reshape), I thought a reshape from 2D to 1D could be done with a typecast from 2D to 1D!

Ton

Link to comment

QUOTE(tcplomp @ May 1 2007, 07:01 PM)

how much faster is it on your hardware?

And I don't get your problem (reshape), I thought a reshape from 2D to 1D could be done with a typecast from 2D to 1D!

on my laptop its about 1.4 times faster.

I didn't know about the typecast; I'll try that, thanks.

Link to comment

QUOTE(tcplomp @ May 1 2007, 07:01 PM)

And I don't get your problem (reshape), I thought a reshape from 2D to 1D could be done with a typecast from 2D to 1D!

The typecast function doesn't accept 2D arrays as input (see image).

I grab the data as 1D array and at some point need to convert it to a 2D array (image, actually), so its a reshape from 1D to 2D.

Link to comment
  • 1 year later...

Hi all!

In my VI I've got a while loop in which a double is acquired at each iteration. I need to do a FFT on this sequence of double. I'd like to use the "Spectral Measurements" VI, but I need to convert my sequence of double into a signal and the only way I found here is to store the samples into an array and then use "Convert to Dynamic Data" (here I didn't find any way of passing the sampling rate, the program takes 1Hz so I have to modify the multiplier in the frequency scale of the spectrum). A problem here is that Spectral Measurements takes all the samples that build up the signal and does FFT on these, instead I need only a certain number of these (which is dependent on the frequency resolution I want to obtain) to be FFTed while the acquisition goes on; so at each iteration I throw away the first element of the array and append the next, then convert to dynamic data, then FFT with spectral measurements. Is there a simpler way to do this?

Thanks

Link to comment

QUOTE (Phantom_Lord @ Sep 15 2008, 07:19 PM)

Hi all!

In my VI I've got a while loop in which a double is acquired at each iteration. I need to do a FFT on this sequence of double. I'd like to use the "Spectral Measurements" VI, but I need to convert my sequence of double into a signal and the only way I found here is to store the samples into an array and then use "Convert to Dynamic Data" (here I didn't find any way of passing the sampling rate, the program takes 1Hz so I have to modify the multiplier in the frequency scale of the spectrum). A problem here is that Spectral Measurements takes all the samples that build up the signal and does FFT on these, instead I need only a certain number of these (which is dependent on the frequency resolution I want to obtain) to be FFTed while the acquisition goes on; so at each iteration I throw away the first element of the array and append the next, then convert to dynamic data, then FFT with spectral measurements. Is there a simpler/better way to do this?

Thanks

Under Signal processing there VIs for FFT and Powerspectra that uses doubles as input. On thing you need to do is make the frequency axis.

Link to comment

QUOTE (Anders Björk @ Sep 15 2008, 08:48 PM)

Under Signal processing there VIs for FFT and Powerspectra that uses doubles as input. On thing you need to do is make the frequency axis.

I think you are referring to the point by point section: I tried that, but I would prefer using spectral measurement because it supports windowing and averaging without further coding...

Thanks for your reply!

Link to comment

usually you would use a circular buffer. I think you should be able to find plenty of examples, either in LabVIEW examples or on the web. If you are using a National Instruments card, the circular buffer is usually done for you, but I would need more details on what you are reading to give you advice.

It's somewhat unusual to get one sample per loop, and then perform an FFT, because usually interesting FFTs come from pretty fast data streams, and you are often getting many samples with every iteration.

Link to comment

QUOTE (jdunham @ Sep 15 2008, 09:15 PM)

usually you would use a circular buffer. I think you should be able to find plenty of examples, either in LabVIEW examples or on the web. If you are using a National Instruments card, the circular buffer is usually done for you, but I would need more details on what you are reading to give you advice.

Circular buffer.. I didn't know about it (I'm quite new to labview). Very interesting topic, basically is what I was doing with the array but here I can get some priority guarantees which I was not considering in the other way. I'm acquiring data from a sensor via the CANopen protocol with the NI usb-8473. When the sensor is operative it sends messages at a fixed rate (up to 1KHz) and to acquire this data I set up a while loop in which each message is read and decoded when it is ready in the buffer of the NI device.

QUOTE

It's somewhat unusual to get one sample per loop, and then perform an FFT, because usually interesting FFTs come from pretty fast data streams, and you are often getting many samples with every iteration.

In the starting post I haven't been clear about it, actually, I'm performing FFT at a fixed rate, with a window size dependent on the resolution I want to get.

I'm going to try the circular buffer,

Thank you very much

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.