Jump to content

Suggestions for improving 2D array access speed?


Kerry

Recommended Posts

Hello all,

 

We have an application that needs to run (approxomately) at a set frequency.  We're having trouble hitting our mark, and we've idenfied the bottleneck as a read from a 2D array, which happens once per cycle.  The array size is something like 800000x6, and once each cycle, we extract a row and pass the 6 values along.  When we replace the array read with constants, we achieve our 8 msec target.

 

We've tried a few different methods to access the data, but haven't found any noticable improvement.  Here's what we've tried:

post-53201-0-95648100-1422977875_thumb.p

 

After some searching, we thought an "In Place Structure" might be worth trying:

post-53201-0-64372800-1422977876_thumb.p

 

But still no change in execution speed.  We also read that global variables are bad for large arrays, so we replaced it with a wire (tested with the second and third methods shown above), but the execution speed actually dropped by a factor of 3.  I'm wondering what happens to wires when they pass through loops, case structures and sequences?  This is the top level of the application, showing the wired path from "Read Cmnd File" through to "Get Cmnd From Array," which is the sub-VI where the the above screen captures are from.

post-53201-0-66075100-1422977872_thumb.p

 

Is there something else we should be doing here?  Maybe some higher-level design issue that we've overlooked?

 

Also, when the timed while loop doesn't complete in 8 msec, the execution time jumps to 16 msec.  Is this normal?  Is there a way to have it run "at 8 msec or as fast as possible?"

 

Thanks in advance,

 

Kerry

post-53201-0-65800100-1422977874_thumb.p

Link to comment

It looks like there are some structural issues with your code. You should not be using so many global variables nor sequence structures. Indexing a single element out of an array is a fast operation, so I doubt that's actually the problem. More likely, when you replace the index array with constants, those constants propagate further down the chain and allow LabVIEW to do some optimizations. For example, the cluster feeding CommandGraph becomes a constant, and the compiler may be smart enough to notice this.

 

It shouldn't affect the speed at all, but you don't need separate Index Array functions, you can expand the function down to index additional elements. LabVIEW will automatically increment unwired indices (although I'm not sure exactly how that works with 2-D arrays, you may want to test).

 

I recommend that you eliminate nearly all of your global variables, use wires to pass data between functions, and remove the unnecessary sequence structures. If your code is still problematic and you're allowed to post it, share it here (zip up the whole project with VIs) and we'll try to provide pointers.

Link to comment

With a timed loop I believe it defaults to skip missed iterations. In the configuration dialog there should be a setting ("mode" I think) which tells it to run right away. However if this is a windows machine you shouldn't be using a timed loop at all, as it will probably do more harm than good. And if this *isnt* a windows machine, then railing your CPU (which is what changing the timed loop mode will do) is not a good idea. -> Just use a normal loop.

 

As for the actual problem you're encountering, its hard to say without a better look at the code. You might use the profiler tool (http://digital.ni.com/public.nsf/allkb/9515BF080191A32086256D670069AB68) to give you a better idea of the worst offenders in your code, then focus just on those functions. As ned said, the index should be fast and isn't likely to be the performance issue. Copying a 2D array (reading from a global) or any number of other things could be the problem.

Link to comment

Aside from the other things previously mentioned, have you tried reading a 6-element row from the big array, then reading the individual elements of that 6-element vector?  

It's been a while since I've played with this stuff, and I don't currently have LabVIEW installed on this machine, but I found that optimization was greatly aided by turning on the memory allocation dots.  You may find that your approach is making lots of copies of the big array.

Link to comment

Thanks for all the responses!

 

We're slowly working our way through, removing globals sequence structures.  The array itself is no longer a global, but there are still some other (much smaller) globals and probably still lots of room for improvement.

 

After many tests, we were still falling short of our target.  Until I read Shaun's post.  I noticed that in your final example, you're using a shift register for the array, instead of just connecting the array through the loop.  I had been connecting it through the loop - making this change made all of the difference for us!  So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

 

Thanks you all for your help, I've learned quite a bit in the past two days :-)

 

-Kerry

 

Edit:  Some more searching shows that I was actually breaking many of the best practices when using large data sets:

http://zone.ni.com/reference/en-XX/help/371361H-01/lvconcepts/memory_management_for_large_data_sets/

Edited by Kerry
Link to comment

So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

 

Not necessarily. If you wire through and LabVIEW can tell the data won't change, then that "tunnel" may get replaced with a constant by the compiler. If you have dynamic data then shift regsiters can sometimes tell LabVIEW enough about the data to kick in some extra optimisations, but it's not a clear cut as shift regsters good, tunnels bad. I only have an intuitive workflow for using shift regsiteres vs tunnels based on experience, but I'm sure a NI guru can tell you specifically what optimisations can and cannot be enabled with shift regsiters and tunnels.

 

Glad you got it sorted, though.

Edited by ShaunR
Link to comment

After many tests, we were still falling short of our target.  Until I read Shaun's post.  I noticed that in your final example, you're using a shift register for the array, instead of just connecting the array through the loop.  I had been connecting it through the loop - making this change made all of the difference for us!  So I assume that shift registers avoid copying the data on every cycle, while "normal?" connections don't?

This is very tricky, and I definitely don't totally understand it. Also, despite the NI under my name I am not remotely part of R&D and so may just be making this stuff up. But it seems to be mostly accurate in my experience.

 

LabVIEW is going to use a set of buffers to store your array. You can see these buffers with the "show buffer allocations" tool, but the tool doesnt show the full story. In the specific image in Shaun's post, there should be no difference between a tunnel and shift register, because everything in the loop is *completely* read-only, meaning that LabVIEW can reference just one buffer (one copy of the array) from multiple locations. If you modify the array (for example, your in-place element structure), it means there has to be one copy of the array which retains the original values and another copy which stores the new values. Thats defined by dataflow semantics. However, labview can perform different optimizations if you use them differently. I've attached a picture. The circled dots are what the buffer allocations tool shows you and the rectangles are my guess of the lifetime of each copy of the array:

post-52313-0-68106900-1423086467.png

 

There are three versions. In version 1 the array is modified inside the loop, so labview cannot optimize. It must take the original array buffer (blue), make a copy of it in its original form into a second buffer (red) and then change elements of the red buffer so that the access downstream can access the updated (red) data.

Version 2 shows the read-only case. Only one buffer is required.

Version 3 shows where a shift register can aid in optimization. As in version one we are changing the buffer with replace array subset, but because the shift register basically tells labview "these two buffers are actually the same", it doesn't need to copy the data on every iteration. This changes with one simple modification:

post-52313-0-14714600-1423086722.png

However you'll note that in order to force a new data copy (note the dot), I had to use a sequence structure to tell labview "these two versions must be available in memory simultaneously". If you remove the sequence structure, LabVIEW just changes the flow of execution to remove the copy (by performing index before replace):

post-52313-0-55275300-1423086868.png

 

For fun, I've also put a global version together.

post-52313-0-55548100-1423086997.png

You'll note that the copy is also made on every iteration here (as labview has to leave the buffer in the global location and must also make sure the local buffer, on the wire, is up to date).

 

Sorry for the tl;dr, but hopefully this makes some sense. If not, please correct me :)

  • Like 2
Link to comment

 Oh my god. It all falls to crap when I write data.   ;) 

 

The regular global will definitely get into trouble with writes yes.

It depends a bit on the write frequency, but with one write per read  the functional global is still fast enough. With writes on the regular global too, the speed relation increases to about 17500x on my machine :-)

Edited by Mads
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.