Jump to content
jkflying

LV OOP Kd-Tree much slower than .Net, brute force

Recommended Posts

I am writing a performance-sensitive application which requires the use of a nearest-neighbor lookup. Originally I used a brute-force method, but unfortunately this gets to be very slow as the data size increases . I have a point cloud of ~100k points in 2D, and need around 50k nearest neighbor lookups per second as a minimum performance requirement.

As a solution I wrote a kd-tree in .Net and used LabView to call the .Net dll. However, I discovered that each .Net transaction carries with it about a 0.5ms delay. I've tried bunching the data up into groups, but this only helps so much, as I am using an iterative process.

Armed with my new-found kd-tree knowledge, I then wrote the kd-tree in LV-OOP, using DVRs for both subtrees and leaf values. However, my LV implementation is still 100x slower than the .Net implementation, and 20x slower than brute force. And this is with just 10k points.

I'm fairly new to LV (about 6 months in an academic environment) and I'm fairly sure I've made a massive blunder somewhere, but I don't have any idea what it might be.

http://robowiki.net/...d-tree_Tutorial is the tutorial that I used for writing both trees - note that I've only implemented a 1-NN lookup method, so have no need for the priority queue.

Just some notes:

  • I found using in-place data structure unbundle-bundle was much faster than using normal unbundle-bundle for all of the read/writes.
  • The tree started out with pure by-value subtrees and data, so was even slower before I changed to DVRs.
  • The lookup uses a queue as a stack, rather than being recursive. Not sure if this is good or bad.
  • The add element uses recursion. Again, not sure if this is good or bad.

I've written speed tests for brute force, .Net and LV lookups with names like with test_...... .vi if you want to compare performance.

kd_tree Folder.zip

Thanks in advance for any help

Julian Kent

Share this post


Link to post
Share on other sites

Depending on the size of your point data, does it make sense to use LVOOP? I have no benchmarks to point to, but I imagine for small data, the overhead of LVOOP will outweigh the performance of, say, a cluster of {prev, data, next} that you don't get the nice OOP-ness with. There will probably be more legwork with that route (and you may have to be more careful with memory allocations) but I think for the performance spec you're looking to hit, the simple route may be more effective.

Share this post


Link to post
Share on other sites

Depending on Irrespective of the size of your point data, does it make sense to use LVOOP? I have no benchmarks to point to, but I imagine for small data, the overhead of LVOOP will outweigh the performance of, say, a cluster of {prev, data, next} for loop that you don't get the nice OOP-ness with. There will probably be more less legwork with that route (and you may won't have to be more careful with memory allocations) but I think for the performance spec you're looking to hit, the simple route may be more effective.

....and get rid of the queues.

Share this post


Link to post
Share on other sites

just my 2¢: You can always do (translate) a recursive algorithm with a while loop. With above suggestion of clusters instead of Classes and while loop you should gain significant speeds.

Never mind my post, I looked at the code after I posted. and I don't see a Recursive implementation.

Edited by sam

Share this post


Link to post
Share on other sites

Being a chemist rather than an engineer, I can't say I really know the finer details of what differentiates a k-d from a red-black or a spruce, but those are all very good tips which can take years of experience to learn. They are valuable to know in any context where performance matters.

I for one never thought of the ramifications of using the timed loop. Interesting.

Share this post


Link to post
Share on other sites

asbo, ShaunR:

How would you suggest I place clusters within clusters within clusters within... (dynamically)? I'm desperate for speed, and if it means sacrificing OOP design I'm willing to try it, but as far as I can tell LV specifies cluster contents pretty rigidly. Doing a prev/next doesn't work in a kd tree because we have a large number of dimensions... and implicit kd trees will give this type of structure but use huge amounts of memory (curse of dimensionality and all that). As you'll see in my project, I've already tried a simple brute force method with a simple array subtract/square/sum/min for finding the nearest point, and if it isn't fast enough with raw arrays it definitely won't be fast enough with clusters. Unless I've completely misunderstood you...

sam:

The recursive implementation is simply because I can't think of any other way to move into a tree of unknown depth in a data-flow environment without doing a copy of each subtree as we go. Sure,i could use a while loop, but it would be really slow as it would require copying each subtree onto the stack as I go.

Aristos Queue

Thanks for the feedback. I've gone through and implemented the ideas you've given, excepting the parent_kdtree class, and I've almost doubled my performance, but it's still 50x slower than the .Net. As I'm not really concerned about concurrent access, would you then recommend getting rid of the DVR-on-a-stack technique and instead go for a recursive unbundle/bundle without any DVRs at all then?

I thought the casting might be slow so I tried changing the data type in the queue to just straight kdtree DVRs and it slowed me down about 4x. Not sure why, but it seems that is the wrong thing to do.

mje: A kd-tree is a tree that works in more than one dimension, ie. for a k-dimensional point cloud. It is typically used for fast nearest-neighbor lookup. Check out http://en.wikipedia.org/wiki/K-d_tree. The one described on Wikipedia isn't quite what I'm using, as I don't store points on lower parts of the tree, just the leaves, and at each leaf I keep an array of points. I also have some optimizations such as keeping track of the hyperrect containing each subtree and only traversing that tree if that hyperrect could contain a closer point. A bit abstract, I know :rolleyes:

I just tried profiling, and I spend more time in Exemplar:getSqDist than I do in the entire .Net lookup algorithm. So maybe I'll try a C++ .dll instead... sigh.

Share this post


Link to post
Share on other sites

You've now heard the position of the "OO is bad" crowd.

Wow. I'm a crowd? I really can be in two places at once (multiple salaries :) )

So maybe I'll try a C++ .dll instead... sigh.

For raw speed. It's the only way. Labview is fast (in comparison to, say Java), but not the fastest. A dll call will probably only add about 1-3 usecs of overhead.

Share this post


Link to post
Share on other sites

asbo, ShaunR:

How would you suggest I place clusters within clusters within clusters within... (dynamically)? I'm desperate for speed, and if it means sacrificing OOP design I'm willing to try it, but as far as I can tell LV specifies cluster contents pretty rigidly. Doing a prev/next doesn't work in a kd tree because we have a large number of dimensions... and implicit kd trees will give this type of structure but use huge amounts of memory (curse of dimensionality and all that). As you'll see in my project, I've already tried a simple brute force method with a simple array subtract/square/sum/min for finding the nearest point, and if it isn't fast enough with raw arrays it definitely won't be fast enough with clusters. Unless I've completely misunderstood you...

Yes, I rather meant that prev/next would be indices in a parent array, where data would actually be the data cluster. The indices would be used for lookup, which could make traversal a little more work (thus my legwork comment). So, as a C programmer would write:

cluster data { ... };

cluster node {int prev, cluster data, int next}; (whatever metadata is appropriate, this is really more of a linked list)

cluster[] tree;

I should have warned in my first post, I didn't look at your implementation. I don't consider myself experienced enough to critique LVOOP methodologies, I was just throwing an idea in the ring. I'm not an OOP nay-sayer, but if you can't get the performance you need there may be other options. I'm certainly interested in what implementation reaches your goal for you.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.