Suggestions for improving 2D array access speed?

Kerry · February 3, 2015

Hello all,

We have an application that needs to run (approxomately) at a set frequency. We're having trouble hitting our mark, and we've idenfied the bottleneck as a read from a 2D array, which happens once per cycle. The array size is something like 800000x6, and once each cycle, we extract a row and pass the 6 values along. When we replace the array read with constants, we achieve our 8 msec target.

We've tried a few different methods to access the data, but haven't found any noticable improvement. Here's what we've tried:

After some searching, we thought an "In Place Structure" might be worth trying:

But still no change in execution speed. We also read that global variables are bad for large arrays, so we replaced it with a wire (tested with the second and third methods shown above), but the execution speed actually dropped by a factor of 3. I'm wondering what happens to wires when they pass through loops, case structures and sequences? This is the top level of the application, showing the wired path from "Read Cmnd File" through to "Get Cmnd From Array," which is the sub-VI where the the above screen captures are from.

Is there something else we should be doing here? Maybe some higher-level design issue that we've overlooked?

Also, when the timed while loop doesn't complete in 8 msec, the execution time jumps to 16 msec. Is this normal? Is there a way to have it run "at 8 msec or as fast as possible?"

Thanks in advance,

Kerry

ned · February 3, 2015

It looks like there are some structural issues with your code. You should not be using so many global variables nor sequence structures. Indexing a single element out of an array is a fast operation, so I doubt that's actually the problem. More likely, when you replace the index array with constants, those constants propagate further down the chain and allow LabVIEW to do some optimizations. For example, the cluster feeding CommandGraph becomes a constant, and the compiler may be smart enough to notice this.

It shouldn't affect the speed at all, but you don't need separate Index Array functions, you can expand the function down to index additional elements. LabVIEW will automatically increment unwired indices (although I'm not sure exactly how that works with 2-D arrays, you may want to test).

I recommend that you eliminate nearly all of your global variables, use wires to pass data between functions, and remove the unnecessary sequence structures. If your code is still problematic and you're allowed to post it, share it here (zip up the whole project with VIs) and we'll try to provide pointers.

smithd · February 4, 2015

With a timed loop I believe it defaults to skip missed iterations. In the configuration dialog there should be a setting ("mode" I think) which tells it to run right away. However if this is a windows machine you shouldn't be using a timed loop at all, as it will probably do more harm than good. And if this *isnt* a windows machine, then railing your CPU (which is what changing the timed loop mode will do) is not a good idea. -> Just use a normal loop.

As for the actual problem you're encountering, its hard to say without a better look at the code. You might use the profiler tool (http://digital.ni.com/public.nsf/allkb/9515BF080191A32086256D670069AB68) to give you a better idea of the worst offenders in your code, then focus just on those functions. As ned said, the index should be fast and isn't likely to be the performance issue. Copying a 2D array (reading from a global) or any number of other things could be the problem.

ensegre · February 4, 2015

At the risk of saying something obviously off mark, couldn't be that displaying the extracted numbers in FP indicators is the real bottleneck? (especially if you have a weak video adapter)

ShaunR · February 4, 2015

Ok. first off. Windows is not a real time OS so it isn't deterministic. I know that, you know that but your project manager probably doesn't so that was for him.

Second. I bet you are so close, that if we can just optimise what we already have a little better, we will be home and dry. Sure. Then in 6 months the spec changes to every 7 ms and the project manager says "you did it before, now do it again - it's only 1ms".

So. What is the problem?

Ahh. Just poking over the 8 ms in places.

Lets see if its what all the girls say that size does matter and bigger is better.

Ah yes. No problem with under 8ms. The girls are half wrong and half right. Size does matter but smaller is better.

(In fact it is linear. 400,000 will yeild about 3 ms, 200,000 about 1.5 ms etc).

Easy answer 1. Reduce the data size in the global.

But why so slow? Is it the array disassembling? Is it the reading from the global? Does it need more coffee?

Oooh. Microseconds. My spidy sense tells me its the global.

Easy answer 2: Don't use a global for big arrays if you're time constrained.

That about wraps it up for reading.

In the next issue we will cover: Oh my god. It all falls to crap when I write data.

Edited February 4, 2015 by ShaunR

Gary Rubin · February 4, 2015

Aside from the other things previously mentioned, have you tried reading a 6-element row from the big array, then reading the individual elements of that 6-element vector?

It's been a while since I've played with this stuff, and I don't currently have LabVIEW installed on this machine, but I found that optimization was greatly aided by turning on the memory allocation dots. You may find that your approach is making lots of copies of the big array.

Kerry · February 4, 2015

Thanks for all the responses!

We're slowly working our way through, removing globals sequence structures. The array itself is no longer a global, but there are still some other (much smaller) globals and probably still lots of room for improvement.

After many tests, we were still falling short of our target. Until I read Shaun's post. I noticed that in your final example, you're using a shift register for the array, instead of just connecting the array through the loop. I had been connecting it through the loop - making this change made all of the difference for us! So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

Thanks you all for your help, I've learned quite a bit in the past two days :-)

-Kerry

Edit: Some more searching shows that I was actually breaking many of the best practices when using large data sets:

http://zone.ni.com/reference/en-XX/help/371361H-01/lvconcepts/memory_management_for_large_data_sets/

Edited February 4, 2015 by Kerry

ShaunR · February 4, 2015

So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

Not necessarily. If you wire through and LabVIEW can tell the data won't change, then that "tunnel" may get replaced with a constant by the compiler. If you have dynamic data then shift regsiters can sometimes tell LabVIEW enough about the data to kick in some extra optimisations, but it's not a clear cut as shift regsters good, tunnels bad. I only have an intuitive workflow for using shift regsiteres vs tunnels based on experience, but I'm sure a NI guru can tell you specifically what optimisations can and cannot be enabled with shift regsiters and tunnels.

Glad you got it sorted, though.

Edited February 4, 2015 by ShaunR

smithd · February 4, 2015

After many tests, we were still falling short of our target. Until I read Shaun's post. I noticed that in your final example, you're using a shift register for the array, instead of just connecting the array through the loop. I had been connecting it through the loop - making this change made all of the difference for us! So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

This is very tricky, and I definitely don't totally understand it. Also, despite the NI under my name I am not remotely part of R&D and so may just be making this stuff up. But it seems to be mostly accurate in my experience.

LabVIEW is going to use a set of buffers to store your array. You can see these buffers with the "show buffer allocations" tool, but the tool doesnt show the full story. In the specific image in Shaun's post, there should be no difference between a tunnel and shift register, because everything in the loop is *completely* read-only, meaning that LabVIEW can reference just one buffer (one copy of the array) from multiple locations. If you modify the array (for example, your in-place element structure), it means there has to be one copy of the array which retains the original values and another copy which stores the new values. Thats defined by dataflow semantics. However, labview can perform different optimizations if you use them differently. I've attached a picture. The circled dots are what the buffer allocations tool shows you and the rectangles are my guess of the lifetime of each copy of the array:

There are three versions. In version 1 the array is modified inside the loop, so labview cannot optimize. It must take the original array buffer (blue), make a copy of it in its original form into a second buffer (red) and then change elements of the red buffer so that the access downstream can access the updated (red) data.

Version 2 shows the read-only case. Only one buffer is required.

Version 3 shows where a shift register can aid in optimization. As in version one we are changing the buffer with replace array subset, but because the shift register basically tells labview "these two buffers are actually the same", it doesn't need to copy the data on every iteration. This changes with one simple modification:

However you'll note that in order to force a new data copy (note the dot), I had to use a sequence structure to tell labview "these two versions must be available in memory simultaneously". If you remove the sequence structure, LabVIEW just changes the flow of execution to remove the copy (by performing index before replace):

For fun, I've also put a global version together.

You'll note that the copy is also made on every iteration here (as labview has to leave the buffer in the global location and must also make sure the local buffer, on the wire, is up to date).

Sorry for the tl;dr, but hopefully this makes some sense. If not, please correct me

Mads · February 5, 2015

If global access is a requirement you might want to use a functional global or DVR instead. Here is a crude example that is about 9000 times faster in LV2013 on my machine, and 4500 times faster in LV2014.

ShaunR · February 5, 2015

If global access is a requirement you might want to use a functional global or DVR instead. Here is a crude example that is about 9000 times faster in LV2013 on my machine, and 4500 times faster in LV2014.

That gets covered in

Oh my god. It all falls to crap when I write data.

Mads · February 5, 2015

Oh my god. It all falls to crap when I write data.

The regular global will definitely get into trouble with writes yes.

It depends a bit on the write frequency, but with one write per read the functional global is still fast enough. With writes on the regular global too, the speed relation increases to about 17500x on my machine :-)

Edited February 5, 2015 by Mads

Sign In

Suggestions for improving 2D array access speed?

Recommended Posts

Kerry

ned

smithd

ensegre

ShaunR

Gary Rubin

Kerry

ShaunR

smithd

Mads

ShaunR

Mads

Join the conversation

Browse

Activity

Important Information