Working with Large Data Arrays

jfazekas · January 27, 2009

I'm in a bit of a pickle and would like to ask for suggestions. My application requires working on a large set of data that is represented in a single array of u8 integers. The array is about 12 megabytes and is fixed in length.

Once the data set is aquired I have a library of 8 functions that do all sorts of analysis and anomoly checking on the data.

I've studied the GigaLabVIEW examples and see a huge benefit to passing a reference between my subVI's instead of passing the 12-meg wire around my application. This eliminates the inevitable data copies (of a large wire) and I do see a benefit to the application's memory footprint.

My problem is that this is slow. Some of my data analysis functions are iterative and I want them to run 500,000 times. There is a big hit to speed when you have to access the data many times via reference.

To demonstrate the obvious (to myself) I made the quick example below and see a 200x difference in execution speeds. It looks to me like I have a choice to either suffer multiple data copies using the byval approach or suffer speed using byref approach.

Maybe LV9 will have native byref functionality (wish wish).

Antoine Chalons · January 27, 2009

QUOTE (jfazekas @ Jan 26 2009, 07:43 PM)

I'm in a bit of a pickle and would like to ask for suggestions. My application requires working on a large set of data that is represented in a single array of u8 integers. The array is about 12 megabytes and is fixed in length.
Once the data set is aquired I have a library of 8 functions that do all sorts of analysis and anomoly checking on the data.

I've studied the GigaLabVIEW examples and see a huge benefit to passing a reference between my subVI's instead of passing the 12-meg wire around my application. This eliminates the inevitable data copies (of a large wire) and I do see a benefit to the application's memory footprint.

My problem is that this is slow. Some of my data analysis functions are iterative and I want them to run 500,000 times. There is a big hit to speed when you have to access the data many times via reference.

To demonstrate the obvious (to myself) I made the quick example below and see a 200x difference in execution speeds. It looks to me like I have a choice to either suffer multiple data copies using the byval approach or suffer speed using byref approach.

Maybe LV9 will have native byref functionality (wish wish).

Hi,

Your signature says you have LV 8.5 but your VI is saved in LV 8.6.. can you save back and repost in LV 8.5 please ?

Neville D · January 27, 2009

QUOTE (jfazekas @ Jan 26 2009, 10:43 AM)

I've studied the GigaLabVIEW examples and see a huge benefit to passing a reference between my subVI's instead of passing the 12-meg wire around my application. This eliminates the inevitable data copies (of a large wire) and I do see a benefit to the application's memory footprint.

Your right. By ref will be slow. I would stick with direct wires as the fastest way, and look at the data copies. Why are they being formed? Can you do anything about it? LV is smart enough to NOT make copies unless absolutely necessary.

Just passing a wire into a subVI does not mean a copy of the data is made for that subVI (unless there is some branching that changes the data).

See if you can use the inplace element structure to speed things up if replacing elements in a complicated array.

Another approach is to chunk your data into a few manageable sets and work on those (maybe in parallel? Multicore optimization with smaller data! Hey thats a win-win!)

Neville.

jfazekas · January 27, 2009

Sorry. Here is VI in 8.5 speak

QUOTE (Neville D @ Jan 26 2009, 01:59 PM)

Your right. By ref will be slow. I would stick with direct wires as the fastest way, and look at the data copies. Why are they being formed? Can you do anything about it? LV is smart enough to NOT make copies unless absolutely necessary.
Just passing a wire into a subVI does not mean a copy of the data is made for that subVI (unless there is some branching that changes the data).

See if you can use the inplace element structure to speed things up if replacing elements in a complicated array.

Another approach is to chunk your data into a few manageable sets and work on those (maybe in parallel? Multicore optimization with smaller data! Hey thats a win-win!)

Neville.

Thanks for your reply. I find it extremely tedious to try and detect copies. Yes, I remember that "show buffer allocations" does not tell you where copies are made. In the end, you're probably right and I should just pass the array around. Do you know if it would help to typdef the array into a LV Class control?

Neville D · January 27, 2009

QUOTE (jfazekas @ Jan 26 2009, 11:24 AM)

I find it extremely tedious to try and detect copies. Yes, I remember that "show buffer allocations" does not tell you where copies are made. In the end, you're probably right and I should just pass the array around. Do you know if it would help to typdef the array into a LV Class control?

Maybe you can run the profiler and see if the memory usage for a particular subVI seems larger than it should be.

Check that you aren't using any build arrays, and are only extracting small sections of the array at a given time.

Dunno about the LV Class Control. Worth a try, I guess.

N.

jdunham · January 28, 2009

QUOTE (jfazekas @ Jan 26 2009, 11:24 AM)

Do you know if it would help to typdef the array into a LV Class control?

I doubt it would help. There's no magic about classes and memory. From everything I've read on this site, an array and an array inside a class will behave pretty much the same way.

Aristos Queue · January 28, 2009

QUOTE (jfazekas @ Jan 26 2009, 01:24 PM)

Do you know if it would help to typdef the array into a LV Class control?

Shouldn't have any effect -- if the array needed to be copied then the class will need to be copied. Basically, if you're doing something that requires a copy, then a copy is going to be made. Whatever it is, stop doing it. :-) Some things that would cause a copy:

* Using any sort of Global VI to store your data

* a functional global where you use get and set actions to copy the value out of the global and then back into it later

* forking the array wire to two write operations (such as Replace Element or Sort 1D Array etc). As long as you never fork the wire or fork to all readers or to a single writer and the other branches are all pure-functional readers, then you shouldn't have any data copies.

What are you doing in those analysis functions? Are they "destructive analysis"? In other words, do they do stuff that replaces values in the array, which would cause a copy to be made so that you can call the next analysis function?

Antoine Chalons · January 28, 2009

I have noticed that some analysis functions - from the spectral analysis palette not to mention it :shifty: - are pretty inefficient both in terms of speed and memory usage. Sometimes, a 2h refactoring on these can dramatically reduce memory usage and increase calculation speed by up to 30x .

which leads me to ask the same question as AQ : what kind of analysis are you doing ?

jfazekas · January 30, 2009

QUOTE (Aristos Queue @ Jan 27 2009, 12:30 PM)

Shouldn't have any effect -- if the array needed to be copied then the class will need to be copied. Basically, if you're doing something that requires a copy, then a copy is going to be made. Whatever it is, stop doing it. :-) Some things that would cause a copy:
* Using any sort of Global VI to store your data

* a functional global where you use get and set actions to copy the value out of the global and then back into it later

* forking the array wire to two write operations (such as Replace Element or Sort 1D Array etc). As long as you never fork the wire or fork to all readers or to a single writer and the other branches are all pure-functional readers, then you shouldn't have any data copies.

What are you doing in those analysis functions? Are they "destructive analysis"? In other words, do they do stuff that replaces values in the array, which would cause a copy to be made so that you can call the next analysis function?

Basically my approach is this. Create a class. The data object has an array of u8. My class 'INIT' function initializes the array - 12 megs in size. All of the funtions either write (using Replace array subset) or read (using Array Subset) from the class data object. By the way, I limit the Read/Write functions to 30kb as input or output (never read or write more than 30kb at once). I never fork a class wire in any of my use cases. So I think I'm doing the best I can to minimize copies. I have several several analysis functions that do iterative reads on different sections of the data (no writes).

If the class wire goes to a shift register, is a copy made? Do tunnels into any specific structures?

Neville D · January 30, 2009

QUOTE (jfazekas @ Jan 29 2009, 10:46 AM)

If the class wire goes to a shift register, is a copy made? Do tunnels into any specific structures?

I think AQ has already answered your question. No data copies at a fork unless there is a write at one of the forks.

N.

Rolf Kalbermatter · February 6, 2009

QUOTE (jfazekas @ Jan 26 2009, 02:24 PM)

Thanks for your reply. I find it extremely tedious to try and detect copies. Yes, I remember that "show buffer allocations" does not tell you where copies are made. In the end, you're probably right and I should just pass the array around. Do you know if it would help to typdef the array into a LV Class control?

Not sure about LV Class but a typedef in itself won't help. What you should try to do is passing your array in and out of VIs. Avoid branching as much as possible unless you branch of inside a structure to some non-reusing LabVIEW internal nodes such as Index Array, Array Size and similar. Bascially you should try to have the array as one wire going through your entire application. If you need to create a branch make sure it is in the same structure as the function that consumes the branch. You might branch to determine the size of the array but if you do that outside of the structre while the Array Size VI is inside a structure LabVIEW will likely create a copy.

If you have loops to operate on the array create a shift register and wire the array to the left terminal wiring it from that terminal to the inside of the loop and making sure to wire it inside the loop back to the right terminal. When the loop finishes you just get the array from the right terminal and go to the next function. If you do this right LabVIEW will usually already avoid data copies even without using the Inplace Structure. In fact the Inplace Structure does not so much optimize the LabVIEW access (it does some extra optimizations) as much more enforce this type of wiring more strictly.

With these techniques I have created VI libraries operating on huge multi MByte Arrays in speeds comparable what fairly optimized C algorithmes could perform even before the Inplace functions existed.

Rolf Kalbermatter

Sign In

Working with Large Data Arrays

Recommended Posts

jfazekas

Antoine Chalons

Neville D

jfazekas

Neville D

jdunham

Aristos Queue

Antoine Chalons

jfazekas

Neville D

Rolf Kalbermatter

Join the conversation

Browse

Activity

Important Information