Jump to content


Photo
- - - - -

OpenG filter array revised


  • Please log in to reply
53 replies to this topic

#1 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 26 July 2012 - 12:53 PM

The current implementation is very slow, so I checked if I can do it any faster and well I can :-) al lot also.

This is my revised version.

Differences:
# elements, current array filter, mine implementation array filter
10000 - 30 ms - 0 ms
100000 - 2154 ms - 5 ms
1000000 - after 7 minuts it was still running - 56 ms


The code:
snippit.png

Attached File  wgtk_filterArray.vi   14.5K   64 downloads

The only difference is that I do not remove duplicates from the "items to filter". Further in this implementation the "filtered items indices" are returned sorted. Also the code could (theoratically) be even faster by first sorting the "items to filter" and then use a binary search instead of the labview linear search.

-edit- small improvement, the second loop iterator for indexing the "filtered items indices" array wasn't necessary. Because it can be calculated by substracting the loop iterator by the iterator for the other array.

Attached File  wgtk_filterArray.vi   14.29K   53 downloads

Edited by Wouter, 26 July 2012 - 01:03 PM.


#2 crossrulz

crossrulz

    Extremely Active

  • Members
  • PipPipPipPip
  • 253 posts
  • Location:Cincinnati, OH
  • Version:LabVIEW 2011
  • Since:2004

Posted 26 July 2012 - 01:55 PM

The thing that's hurting the current implementation is the abundance of build arrays inside of loops, which you found a way around by preallocating arrays. As long as the functionality is the same, I'm all for the upgrade (I haven't fully tested it yet). I am currious what would happen if we removed the duplicate filtered items.

#3 asbo

asbo

    I have no idea what you're talking about... so:

  • V I Engineering, Inc.
  • 1,273 posts
  • Version:LabVIEW 2011
  • Since:2008

Posted 26 July 2012 - 02:18 PM

Cool! :thumbup1: You better wire that error through, though.

#4 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 26 July 2012 - 02:24 PM

As long as the functionality is the same, I'm all for the upgrade (I haven't fully tested it yet). I am currious what would happen if we removed the duplicate filtered items.

Nothing. But i didnt understand why it was build in the first place. Because why would it contain duplicate items? I think its up to the programmer if he wishes to check if the filter list contains duplicates.

Cool! :thumbup1: You better wire that error through, though.


No because the error inputs are also not used in the terminals. I just put them there because maybe in the future I want to throw a error, then I will wire it. Now LabVIEW will just treat it as deadcode.

Edited by Wouter, 26 July 2012 - 02:28 PM.


#5 crossrulz

crossrulz

    Extremely Active

  • Members
  • PipPipPipPip
  • 253 posts
  • Location:Cincinnati, OH
  • Version:LabVIEW 2011
  • Since:2004

Posted 26 July 2012 - 02:40 PM

Nothing. But i didnt understand why it was build in the first place. Because why would it contain duplicate items? I think its up to the programmer if he wishes to check if the filter list contains duplicates.

I'm inclined to agree. Once I started thinking about it, I was like "what idiot would put duplicates in filtered items anyways?" And even if they did, there should be to few to really care about. And get rid of the error clusters. They are not needed for this function.

#6 Phillip Brooks

Phillip Brooks

    The 500 club

  • Members
  • PipPipPipPipPip
  • 749 posts
  • Location:Boston, MA
  • Version:LabVIEW 8.6
  • Since:1999

Posted 26 July 2012 - 04:49 PM


No because the error inputs are also not used in the terminals. I just put them there because maybe in the future I want to throw a error, then I will wire it. Now LabVIEW will just treat it as deadcode.


Be careful about error clusters that do nothing. I updated some low-level internal libraries a couple of years ago and had a bunch of developers get mad at me.

Basically, if you do add the error in / error out terminals to your connector pane for 'future use', make sure you place an Always Copy node on the Error wire or any caller could require a recompile if you make changes...

Edited by Phillip Brooks, 26 July 2012 - 04:52 PM.

Now is the right time to use %^<%Y-%m-%dT%H:%M:%S%3uZ>T


#7 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 26 July 2012 - 04:54 PM


Be careful about error clusters that do nothing. I updated some low-level internal libraries a couple of years ago and had a bunch of developers get mad at me.

Basically, if you do add the error in / error out terminals to your connector pane for 'future use', make sure you place an Always Copy node on the Error wire or any caller could require a recompile if you make changes...


yes but they are not even wired now so as said, and as far as I know LaBVIEW will see it as deadcode and remove it when the code is compiled.

#8 Mellroth

Mellroth

    The 500 club

  • Members
  • PipPipPipPipPip
  • 535 posts
  • Version:LabVIEW 2011
  • Since:1995

Posted 26 July 2012 - 05:32 PM

yes but they are not even wired now so as said, and as far as I know LaBVIEW will see it as deadcode and remove it when the code is compiled.


Personally I feel that unwired terminals just add confusion to the BD (why are these not wired, should they have been etc.).

Then regarding the code I think that a function like this should reuse the input buffer if possible. If the input array terminates in your function it still requires an equally large memory portion to be allocated to store the result, instead I prefer to save the result in the input array.

/J

#9 ShaunR

ShaunR

    LabVIEW Archetype

  • Members
  • PipPipPipPipPipPip
  • 2,223 posts
  • Version:LabVIEW 2009
  • Since:1994

Posted 26 July 2012 - 06:20 PM

yes but they are not even wired now so as said, and as far as I know LaBVIEW will see it as deadcode and remove it when the code is compiled.

Wiring through error clusters (if they exist) is recommended since if someone wires them up it will clear any errors on the wire from previous operations which is unexpected bahaviour.
A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort. (Herm Albright 1876-1944).

Founder and general mischief maker on www.labview-tools.com.
SQlite aficionado and websocket zealot.
If it 'aint in LabVIEW, then you 'aint got a clue!

#10 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 26 July 2012 - 06:31 PM

Wiring through error clusters (if they exist) is recommended since if someone wires them up it will clear any errors on the wire from previous operations which is unexpected bahaviour.

Yes but they aren't wired to the terminal ;) anyhow this is a bit offtopic lets stay ontopic. If you want you can remove them it indeed does not matter.

#11 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 26 July 2012 - 07:39 PM

I thought of something extra aswell. This is nearly the same function as the previous one however this function has a callback as input instead of a list of items. So you can define your own filter function, same terminal type as the wgtk_filterArrayCallback template. Example: now it is not possible to filter all odd elements. If you now build a seperate VI which would check if a element is odd and you give that strict VI ref to the filterArray it will filter all odd elements.

Attached File  wgtk_filterArray (callback).vi   18.4K   39 downloads

Attached File  wgtk_filterArrayCallback.vit   6.24K   40 downloads

Edited by Wouter, 26 July 2012 - 07:41 PM.


#12 Mads

Mads

    Very Active

  • Members
  • PipPipPip
  • 171 posts
  • Location:Bergen, Norway
  • Version:LabVIEW 2012
  • Since:1997

Posted 27 July 2012 - 08:03 AM

The OpenG array functions are some of the most useful VIs there are I think. I'm very grateful for them. I've done quick jobs where those functions have been used extensively.
In the cases where those applications have ended up being used on very large chunks of data I've had to revisit the design though, to remove or redesign the OpenG functions to get an acceptable performance.

So, when I saw this thread now I just picked the next array function I could find on the palette and threw together something I would expect to run faster (i.e. it is not necessarily the optimal solution, but a better one). Attached is the OpenG Remove Duplicates function revised (as mentioned, it was thrown together so I've just used the auto-cleanup on it, not very nice, but it does outperform the original quite nicely). The improvement varies a lot with the size of the array and number of duplicates. I got anything from 2 to 5 times the speed of the original in my quick tests.

I've kept the original's input and outputs, but the performance would have been better if the indexes output was dropped. For optimal performance it would be better to sort the input array and do a binary search for the duplicates. That would add another (optional?) step to recreate the original order of the input array, but still,- on large arrays it would have a huge impact on the performance.

PS. To actually publish an improved version there is of course still a lot of work needed. If deemed of interest I could always redo this one properly and apply it to the full polymorph VI set.

MTO

Attached Files


MTO

#13 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 27 July 2012 - 12:14 PM

I've kept the original's input and outputs, but the performance would have been better if the indexes output was dropped. For optimal performance it would be better to sort the input array and do a binary search for the duplicates. That would add another (optional?) step to recreate the original order of the input array, but still,- on large arrays it would have a huge impact on the performance.


I would rather first sort the array and then iterate over the array, I think its faster because of how LabVIEW works. It still has the same theoratical speed, yours would be O(n*log(n) + n*log(n)) = O(2*n*log(n)) = O(n*log(n)) (because you do a binary search, log(n), in a loop thus multiply by n, becomes n*log(n). Mine is O(n*log(n) + n) = O(n*log(n)). But sure it would be faster then the current implementation which is O(n^2).

-edit-

Funny a simple implementation of the sort and then linear approach is still slower then yours. funny.png

Edited by Wouter, 27 July 2012 - 01:03 PM.


#14 crossrulz

crossrulz

    Extremely Active

  • Members
  • PipPipPipPip
  • 253 posts
  • Location:Cincinnati, OH
  • Version:LabVIEW 2011
  • Since:2004

Posted 27 July 2012 - 02:12 PM

I'm a little surprised how effective that In Place Element Structure helped with the split array.

My attempt at the sort first came out 2x slower than the current implementation. Maybe somebody can find somewhere to improve it.

Attached Files



#15 egraham

egraham

    More Active

  • Members
  • PipPip
  • 33 posts
  • Location:Walnut Creek, CA
  • Version:LabVIEW 2012
  • Since:2007

Posted 27 July 2012 - 02:38 PM

I've realized there are a few caveats to keep in mind when looking at optimizing array manipulation functions.

1. Optimization of code for one data type does not necessarily affect other data types the same way.
2. The type of computer/processor can have a HUGE affect on how effective your optimized code performs.
3. There are likely significant differences in the code performance between different versions of LabVIEW.

Do some benchmark tests with different data types, different methods of processing your arrays, on different types of processors. I was astounded at the differences and realized there's a lot of low level processing going on behind the scenes that I would need to understand to know why performance differences existed between the different scenarios.

#16 Mads

Mads

    Very Active

  • Members
  • PipPipPip
  • 171 posts
  • Location:Bergen, Norway
  • Version:LabVIEW 2012
  • Since:1997

Posted 28 July 2012 - 11:06 AM

If the starting point is a function that repeatedly resizes the array unnecessarily then I would expect there to be plenty of room for improvement before the variations due to data types etc. become significant (or?). If we could make sure that that part of the optimization is applied to all the OpenG functions then we would surely have achieved a lot - and maybe as much as is possible without having different code for different data types, operative systems etc. (The former might not be too bad though...)
MTO

#17 Wouter

Wouter

    Very Active

  • Members
  • PipPipPip
  • 103 posts
  • Version:LabVIEW 2011
  • Since:2006

Posted 28 July 2012 - 12:49 PM

I've realized there are a few caveats to keep in mind when looking at optimizing array manipulation functions.

1. Optimization of code for one data type does not necessarily affect other data types the same way.
2. The type of computer/processor can have a HUGE affect on how effective your optimized code performs.
3. There are likely significant differences in the code performance between different versions of LabVIEW.

Do some benchmark tests with different data types, different methods of processing your arrays, on different types of processors. I was astounded at the differences and realized there's a lot of low level processing going on behind the scenes that I would need to understand to know why performance differences existed between the different scenarios.


1. This is true but I think differences may be very small I think. We can design 2 different algorithms, 1 algorithm which swaps elements in a array inplace, 1 algorithm which copies contents to another array. Case 1) when we swap elements in a array its just changing pointers, so constant time. Case 2) when we retrieve a element from a certain datatype of array A and copy it to array B it may indeed be the case that a algorithm implementation of case 1 is faster because copying and moving that element for that certain datatype in the memory consumes more then constant time, it would have a cost function which depends on the amount of memory the datatype consumes. (I'm not very sure if case 2, also works like this internally as I explained but I think it does?)
2. This will always be the case.
3. True but that is not our failure. I think its fair to assume that the algorithm runs the fastest on the newest version of LabVIEW and the slowest on first version of LabVIEW. The goal, imo, is to achieve a algorithm that minimizes the amount of operations and the memory space.

#18 Mellroth

Mellroth

    The 500 club

  • Members
  • PipPipPipPipPip
  • 535 posts
  • Version:LabVIEW 2011
  • Since:1995

Posted 28 July 2012 - 04:26 PM

So, when I saw this thread now I just picked the next array function I could find on the palette and threw together something I would expect to run faster (i.e. it is not necessarily the optimal solution, but a better one). Attached is the OpenG Remove Duplicates function revised (as mentioned, it was thrown together so I've just used the auto-cleanup on it, not very nice, but it does outperform the original quite nicely). The improvement varies a lot with the size of the array and number of duplicates. I got anything from 2 to 5 times the speed of the original in my quick tests.


We have a similar approach but we didn't make use of the "In Place Element Structure" (but we will from now on), and I think you can optimize this further by getting rid of that array-copying before you enter the loop and by converting to a boolean before the case structure (the case evaluation works faster on booleans than integers, at least in my experience).

RemoveDuplicates.png

/J

#19 Aristos Queue

Aristos Queue

    LV R&D: I write C++/# so you don't have to.

  • Premium Member
  • 2,620 posts
  • Location:Austin, TX
  • Version:LabVIEW 2011
  • Since:2000

Posted 29 July 2012 - 09:57 AM

*
POPULAR

Basically, if you do add the error in / error out terminals to your connector pane for 'future use', make sure you place an Always Copy node on the Error wire or any caller could require a recompile if you make changes...

Right idea, wrong actual solution. Instead of "Always Copy", use the "Mark As Modifier" option on the Inplace Element Structure instead for this purpose. It signals the same "this subVI could modify this value" on the conpane without introducing an unnecessary copy on No Error cases.

#20 Mads

Mads

    Very Active

  • Members
  • PipPipPip
  • 171 posts
  • Location:Bergen, Norway
  • Version:LabVIEW 2012
  • Since:1997

Posted 30 July 2012 - 08:35 AM

You are right Mellroth - you save quite a bit of memory that way. The difference in speed is not noticable at first (in fact it was negative on my machine), but it surfaces when debugging is disabled. Nice. :thumbup1:



We have a similar approach but we didn't make use of the "In Place Element Structure" (but we will from now on), and I think you can optimize this further by getting rid of that array-copying before you enter the loop and by converting to a boolean before the case structure (the case evaluation works faster on booleans than integers, at least in my experience).

RemoveDuplicates.png

/J


MTO