Hi, jkflying. Welcome to LabVIEW.
You've now heard the position of the "OO is bad" crowd. If you'd like to rewrite without classes, go for it, but that's not the bulk of your performance problem. If you're like 90% of the world's programmers, you'd probably prefer some help getting the OO solution up to speed. So let's look at what you've got.
There's a lot of text in this post, but I've tried to break it into short paragraphs -- each one is independent, so you can read each one and absorb it by itself without having to take in all of this in one go.
First of all, remove the Timed Loops from your test harnesses. Timed Loops have side effects on the thread scheduling and prioritization of LabVIEW. Lots of things that could be faster are not in order to get determinism within the timed loop (in other words, repeatable schedule is more important than fast schedule on any given iteration). I do not understand the details well enough to teach them to you -- it's an area of LV outside my normal baliwick -- but the long and short of it is that Timed Loops are meant to provide fixed scheduling for real-time deployments, and using them for other purposes will have undesirable side-effects. Instead of timed loops, you can use a Flat Sequence Structure, with a Get Date/Time In Seconds node in the first frame, your code in the second frame, and another Get Date/Time In Seconds node in the third frame.
Now, about the benchmark itself... any language has problems where it just cannot outcompete another given language, but I do not think we are dealing with one of those in your case. One major item to consider: is the .NET implementation thread safe? A lot of the overhead in your code implemented in LV is going into thread safety for that kd tree to be used in parallel branches simultaneously... I'm going to guess that the .NET code is not thread safe if only because almost nothing that C# programmers write is thread safe (haven't looked at your implementation). Unless the .NET solution is doing the same level of mutex locking, you're benchmark comparisons are skewed. We can get the LV implementation up to speed, but this is one of the common problems with comparisons between LV and other languages: as long as everything is dataflow, our compares match theirs. Once you introduce references, we implicitly buy the cost of thread safety, where as they leave you to implement that yourself if you decide you need it. What that often means is that small test apps used for benchmarks suffer because we've got overhead, but real world apps are faster because we then use that thread safety to achieve parallel performance. Please keep that in mind when evaluating.
Looking at your code... you use LabVIEW Object as the type of your DVRs, which is the reason you need all the To More Specific and Preserve Run-Time Class primitives. Introduce a parent class (kdtreeparent.lvclass) that has the methods on it that you need (dynamic dispatch). Have kdtree.lvclass inherit from it, and use kdtreeparent.lvclass inside the DVRs. Now when you access a DVR inside the Inplace Element (IPE) structures you avoid the fairly expensive dynamic type checking. The dynamic type checking is getting a speed boost in LV 2012, but it is better to remove it entirely.
Using the queues as stacks is fine... they're optimized for that use case. However, you should definitely not name the queue... you've got the name "Stack" wired in. That means that you're using the same queue as anyone else who uses the name "Stack". It's just like putting a variable in the global namespace. Bad things happen. There are legitimate reasons for using this feature -- ok, one legitimate reason -- but this isn't it.
Inside addExemplar.vi, you're using Read Left.vi, Read Right.vi and other accessors. Although it is normally fine practice to use your accessors inside the other methods of your class, in this case, I suggest using a plain Unbundle node (not the IPE) to access those fields. The reason is that you're extracting multiple fields and you'll be able to unbundle multiple at once.
Read Domain.vi, as written and used inside addExemplar.vi, is very expensive (and, side note to previous posters, it would also be expensive with plain clusters). You are calling Read Domain.vi and then wiring its output directly to an Index Array node. That's producing a duplicate of the array every time. Instead, create an accessor on Exemplar.lvclass that accesses the domain inside the IPE, indexes one element, and returns that out of the subVI. With that, you'll only be copying a single floating point number instead of allocating a whole array.
In the isTree.vi, you are doing the "!= infinity" check outside of the IPE. This is copying the floating point integer. You can avoid that by moving the "!= infinity" computation inside the IPE. Similar performance improvement can be done in other VIs in your hierarchy.
I am confused about why AddToTree is destroying the DVR and then creating a new one instead of just accessing the DVR through an IPE. If this is a trick to avoid initializing the class (which is my guess as to why you're doing that), you've just introduced a significant thread synchronization problem. If you fork your tree wire on the top-level VI, when you call Add on one branch, you'll invalidate the DVR refnum that is still inside the tree on the bottom branch. And now you'll get errors from that other branch, or just lose all your data. Probably not what you want. This should just access the DVR using an IPE and you should have a Create Tree.vi that returns the object populated with the top-level DVR and a matching Delete Tree.vi which knows how to traverse and deallocate the entitre tree and throw away the top-level DVR.
On that same note, you really should include this in your test cases:
That will show you a big problem with your current implementation. It is a bad idea to put the meta data about the contents of a DVR as by value fields of your top-level class (or cluster) and have the refnums shared. Putting the whole data block into a single top-level DVR makes this truly behave like a refnum type. What that means is that all of your public VIs will all start off by accessing a DVR in order to get access to all the fields that are currently in your kdtree.lvclass private data cluster. Oh... and change your wire color to refnum cyan (it's a predefined color in the color picker) to let people know that your wire contains (and therefore behaves like) a refnum.
Those are the big points. You didn't pick an easy problem for your first out-of-the-gate project. A basic KD tree is kids stuff... a thread safe KD tree? That takes a bit more to reason through. You've got a good start here, though. I have seen DVR-with-classes implementations for other data structures that can beat or exceed .NET thread-safe structures performance, so if you keep chipping at it, I'm confident you can get there.
Happy wiring!